Milk: MIT develops programming language to speed up parallel computing for big data

Computer scientists in the US have developed a completely new programming language that makes software programs four times faster than any other existing language, in order to solve the complexities found in big data.

Researchers from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) have developed a language called Milk that allows application developers to manage memory more efficiently in programs that need to deal with data scattered across large datasets.

At the moment, a computer chip manages its memory by making the assumption that if a software program requires a large chunk of data stored at a specific memory location, then it very likely will also need the neighbouring chunks of data near to the chunk that has been requested.

Unfortunately, big data doesn't work this way. Designed as a solution for processing ginormous, often unstructured amounts of data in order to gleam intelligent insights to help our daily lives, computers typically have to collect, store and process data in order to form connections between data points and detect patterns.

This puts a strain on computer chips and makes it a slow process to carry out the complex demands required from big data algorithms, as the computer chip works by fetching one single data item, one at a time from its main memory.

Batch processing using multi-core computer processors

Instead of requesting one single data item at a time, the researchers programmed Milk to be able to add a few commands to OpenMP, which is an extension that is used by other coding languages like C and Fortran to make it easier to write code for multicore processors.

In a computer chip, each processor (also known as a "core") has its own cache, which is a small, local, high speed memory bank. The idea is for developers to use Milk to add a few additional lines of code around any instruction requesting data.

Rather than responding to the request immediately, the Milk program gets the address of where the data item is stored and adds it to a list pertaining to a particular core. Once there are enough data item addresses on all the lists, then the cores pool the lists together, and Milk figures out which data items are closest to each other.

Then, new instructions are distributed to the cores, and each core only requests from memory the data items that are needed, and the data is retrieved much more quickly and efficiently.

"Many important applications today are data-intensive, but unfortunately, the growing gap in performance between memory and CPU means they do not fully utilise current hardware," said Matei Zaharia, an assistant professor of computer science at Stanford University.

"Milk helps to address this gap by optimizing memory access in common programming constructs. The work combines detailed knowledge about the design of memory controllers with knowledge about compilers to implement good optimisations for current hardware."

The researchers are presenting a paper on the Milk coding language at the 25<sup>th International Conference on Parallel Architectures and Compilation Techniques in Haifa, Israel, which is held between 11-15 September 2016.