Facebook data center
Big data is big business Jonathan Nackstrand/AFP/Getty Images

Obviously data-driven investment managers are not going to divulge the secret signals that form the basis of their alpha strategies. But when something is not part of your main business it can help to open source the code, which can then be improved.

These days open sourcing software is a trend that even large hedge funds such as AHL and AQR in the US taking part in.

Saeed Amen, CEO and founder of Cuemacro, is enthusiastic about open source within the big data arena. He has spent over a decade developing algorithmic trading strategies places like Lehman Brothers and Nomura, and a number of large hedge funds. He will joining a panel of other seasoned infrastructure experts at the forthcoming IBT Media Artificial Intelligence and Data Science in Capital Markets event.

Amen said: "From my perspective, working for a small business as opposed to a big bank, I have found it quite enlightening because you don't need to own infrastructure any more, you can just log onto Amazon Web Services; you can easily just get a server and it's something that's available very quickly to do.

"If I think back maybe 10 or 15 years ago, there simply weren't as many tools out there if wanted to do analysis of markets. Say for example you wanted to produce really nice charts to visualise your output, it was a lot more difficult to do that from a programmatic perspective – of course you have things like Excel. But these days there is so many different possibilities for visualisation that it means that I don't have to spend a lot of time actually coding that stuff up. It's already made my life a lot quicker and easier. So to some extent it's helping to level the playing field."

He concedes that hedge funds are always going to have an advantage because they can employ banks of extremely smart people to work at data science challenges full time. But to some extent the playing field is levelling.

"For obvious reasons, particular trading signals from hedge funds and similar organisations remain proprietary because part of their edge is in how they compute that.

"But at the same time I have observed that hedge funds have actually started to open source some parts of their software. Not the signal bit; kind of more the infrastructure-related parts to it."

Examples of this are AHL which has open sourced some of its software relating to database storage for storing time series. There is also AQR in the US, which helped to kick-start "pandas", which helps you deal with time series, and which originally started as an internal project before the fund announced it would be open source.

"I think there are moves to do some open sourcing, but it's a very particular area where people also see the benefit of doing that. For example, if you open source something which is not the main part of your business, having other people look at your code is actually quite beneficial; they might improve it.

"It's a two way thing. But at the same time you can still retain what you feel is your competitive advantage and keep that in a closed source manner."

Amen spends a lot of time today looking for new and exciting data sources. He was ahead of the curve in this respect, starting to look at what could be called alternative data sources back in 2008. While he was at Lehman he wrote a paper on using Google search data for trading currencies – something which had not been considered by many people at that time.

Over the past few years he has developed this, doing a project with textual data analytics provider RavenPack, looking at how news data can be used to trade FX and bonds futures. More recently he did a project for the big financial website Investopedia.

"They [Investopedia] have got their own proprietary data set. So every time you look up something on Google, or Bing or whatever search engine you use, let's say you look up short selling, if that ends up taking you to a link to Investopedia, they then actually collect that data.

"So what they have done is actually created an index based upon anxiety terms like 'short selling', which could be related to investor panic. So basically they can set up a time series of specific search terms that have been used to get access to their website and they have aggregated those terms together in an index.

"My project was basically to try to see how I can extract value from that index for a trading perspective. It got a lot of interested feedback. I wrote a research paper on it."

But just looking at alternative datasets will not necessarily unlock any value, added Amen. A robust hypothesis is needed to begin with.

"Let's say I'm looking at the dollar – what sort of factors would move the dollar? From that I would actually expand and say if this certain factor moves the dollar, say interest rates, what potentially would be an influence on interest rates," he said.