A commonly held view of hedge funds is of secretive organisations that jealously guard the tools that make them money. Contrary to this is the trend among certain firms to open source their software and invite collaboration from the developer community.
Firms that have blazed a trail in the open sourcing of this sort of technology are the likes of AQR, which kick-started the Pandas libraries project, and Man AHL, which has open-sourced its Arctic data storage system.
Arctic powers Man AHL's vast financial market data store and is built on top of the open-source no-SQL database MongoDB. The Arctic codebase was made available on GitHub back in 2015.
Gary Collier, Co-CTO of Man AHL is taking part in a session on open source technology at Newsweek/IBT's forthcoming Data Science in Capital Markets event in London, alongside Wes Mckinney, who created Pandas.
Collier, who holds a Masters in Theoretical Physics from Cambridge, has been a technology hobbyist since the age of 11. He started building software commercially at age 21, and has been in a technology management role for the past few years at Man AHL, which has about 60 technologists. These technologists cover a range of front office duties which include data capture and distribution platforms, continual evolution of Man AHL's in-house Python quantitative research platform, and the production trading platform which trades silently and automatically 24 hours a day. In addition to building the tools and platform, around half of the technology team work directly alongside the quantitative researchers on model development.
Collier remembers the days when there was a lot suspicion from corporates about even using open source software. But over time there has been a rising tide of open source code consumption and it's become vital to engage with the developer community in this way – not least because an open source ethos attracts the brightest developer talent to the fund.
"Open sourcing things is really one way of raising a virtual flag above the office which is saying, the ideals that you as a brilliant developer hold dear – community engagement, openness, collaboration - are also ideals that our business holds dear," said Collier.
"In contrast to some in the quant trading space, internally Man AHL is somewhat different. It's not an organisation comprised of secretive silos, where an individual won't talk to the individual next to them for fear of giving away their secret sauce.
"So the external presentation that we are giving with community engagement and openness also reflects the internals of Man AHL where the environment is highly collaborative and research is shared among members of the team, with technologists and quantitative researchers working together on the same code base."
The American data scientist Wes McKinney pioneered open sourcing of hedge fund software at AQR Capital Management. This became the Pandas package for data analysis in the Python programming language.
"Wes has been very influential with his Pandas library and he has been really a pillar figure in the Python data science community," said Collier. "And if we take a few steps back, we see a whole ecosystem of open source databases, messaging systems, analytics libraries, operating systems – even that's just a very small part of the tidal wave of progression of open source as a whole."
While Pandas is all about handling frames of data in Python, Arctic is a storage layer designed to persist that type of data very efficiently. "It allows people who want to obtain that type of data and perhaps perform parallel computation across a cluster, a very efficient means for them to extract that and perform analysis, run back-tests, risk models etc," said Collier.
"Just like the apps that we see on our phones nowadays and the websites we visit, complex data analysis systems are often constructed by taking together small, high quality libraries that do a particular thing and do it well, and joining them up in innovative ways.
"So if you think about the kind of functionality that Pandas gives you and the functionality that Arctic gives you, and then you say okay, we are going to layer Arctic on top of MongoDB, which is – an open source database.
"Pandas on its own gives you the ability to crunch frames of data to perform analysis on them, but it's not designed as a means of storing, say, 20 years' worth of tick data that correspond to maybe two trillion data points.
"So that's three freely available things there that immediately bootstrap you to do data science and analysis work that would be completely unthinkable a few years ago; you would be spending a lot of money on proprietary closed-source packages."