MIT invents software repair system CodePhage
MIT has invented a new system that can automatically discover and repair serious security vulnerabilities hiding in software programs Jose-Luis Olivares/MIT

MIT's computer scientists have invented a new system that can automatically detect and fix serious security vulnerabilities in software by importing code from other more secure programs.

The CodePhage system is able to detect dangerous bugs in software, and then repair it by importing security checks from software with similar specifications, even if the software is written in a completely different programming language.

Even better, the system doesn't need to access the source code of other programs in order to borrow functionality so it can fix the bugs, so all source code is kept safe.

"We have tons of source code available in open-source repositories, millions of projects, and a lot of these projects implement similar specifications," said Stelios Sidiroglou-Douskos, a research scientist at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) who led the development of CodePhage.

"Even though that might not be the core functionality of the program, they frequently have subcomponents that share functionality across a large number of projects."

MIT researchers' tests found that CodePhage was able to repair serious security vulnerabilities on seven common open-source programs, taking between two to 10 minutes per repair and importing functionality from between two to four donor programs each.

How CodePhage works

Software programs can crash when a specific command is given, such as to open a particular file, or when specific data is fed into the program. This is known as input.

CodePhage works by taking two types of input, one that caused the program to crash, and one that works just fine, and then seeing how the donor program it is borrowing code from responds to the input.

The system analyses how the donor program deals with the input that works fine – if the program has been written in a secure way, it will perform various checks, such as seeing how big the size of input is.

A real world analogy for this would be when you try to put a big file onto a USB memory stick, and the computer alerts you that the file is too big before trying to transfer it.

CodePhage monitors what sort of checks the donor program performs and records a string of symbols to describe it, then looks at how the donor program responds to the input that has been known to cause crashes.

Copying security check functions

If the system notices that the donor program does more and different checks with the crash-inducing input, CodePhage then knows that the program it is repairing is missing one of the security checks, and it then translates its symbolic expression into the coding language used by the program it is trying to fix.

The additional code is added to the source code of the program being fixed, and then the system tests the crash-inducing input on the program again, to see if the new code has fixed the problem.

Otherwise, CodePhage will keep studying the functionality of the donor program and creating additional checks in order to fix the program with bugs.

The researchers say that over time, CodePhage will be able to fix more and more bugs as it learns more specifications.

"The longer-term vision is that you never have to write a piece of code that somebody else has written before," said co-author MIT professor of computer science and engineering Martin Rinard.

"The system finds that piece of code and automatically puts it together with whatever pieces of code you need to make your program work."

More information is available in the research paper entitled "Automatic Error Elimination by Horizontal Code Transfer across Multiple Applications", which was presented in June at the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation in Portland, Oregon .