Donate to Science & Enterprise

S&E on Mastodon

S&E on LinkedIn

S&E on Flipboard

Please share Science & Enterprise

Big Data Project Seeks to Autocomplete Software Code

Binary code illustration

(Digitalgov.gov)

5 November 2014. A new project led by computer scientists at Rice University in Houston aims to apply big data analytics and data mining for software developers to generate code the same way as search engines anticipate or correct the entry of search terms. The 4 year, $11 million initiative is funded by Defense Advanced Research Projects Agency, or Darpa, and includes researchers from University of Texas-Austin, University of Wisconsin-Madison, and software code-quality company GrammaTech in Ithaca, New York.

Darpa is supporting this research as part of its Mining and Understanding Software Enclaves, or Muse, program that the agency hopes will change the way software is written. For Muse, Darpa seeks to engage a wide range of capabilities including programming languages, program analysis, theorem proving and verification, testing, compilers, software engineering, machine learning, databases, statistics, and systems from many domains.  The program’s Web site notes it “intends to emphasize creating and leveraging open source technology.”

A central feature of Muse is a public compendium of open source software representing hundreds of billions of lines of code. A specification mining engine would accompany this body of software that harnesses big data analytics to populate and maintain inferences about software properties, behavior, and vulnerabilities. From this base, Darpa expects Muse to generate new ways of automatically generating and repairing complex code.

The system created by the Rice project, known as Pliny — named after the Roman philosopher credited with writing the first encyclopedia — will be designed to read the first lines of code, then recommend the rest of the code. In addition, Pliny would test the code for bugs and security vulnerabilities. “You can think of this as autocomplete for code, but in a far more sophisticated way,” says Rice’s Vivek Sarkar in a university statement. Sarkar is chair of Rice’s computer science department and principal investigator on the project.

The Rice team expects to employ Bayesian statistics, which apply mathematical principles to calculate conditional probabilities. In Bayesian models, probabilities are constantly refined with the introduction of new evidence, creating new conditions and outcomes. Co-investigator on the project Chris Jermain, a fellow Rice computer scientist says, “Much like today’s spell-correction algorithms, it will deliver the most probable solution first, but programmers will be able to cycle through possible solutions if the first answer is incorrect.”

Sarkar tells more about the Pliny project in the following video.

Read more:

*     *     *

Comments are closed.