It makes sense for large technology companies like Google and Microsoft to open source AI and machine learning solutions because they have overlapping vertical interests in providing vast cloud services. These come into play when a certain machine learning library becomes popular and users deploy it on the cloud and so forth. It is less clear why financial services companies, which play a much more directly correlated zero sum game, would open up code that they paid the engineering team to create.
It's interesting that hedge funds, traditionally thought to be the most secretive of financial institutions, have been proactive in pushing an open source software agenda. AQR Capital Management was probably patient zero when it came to opening up their code around data storage – and this move, shepherded by software engineer Wes McKinney, kickstarted the popular Pandas libraries project. Now he has returned to open source work at Two Sigma. We have also seen open source data storage offerings coming out of Man AHL in the form of Arctic.
Taking part in a panel on open source infrastructure, McKinney said investment in an open source project yields dividends later: data storage underlies other verticals, and when other people use the software and build libraries on top of it, that makes in-house systems more compatible.
McKinney pointed to the problems you face when it comes to scaling data science. He said a myriad of difficulties emerge when curating lots of types of data that are available, plus the inefficiencies encountered programming APIs and other barriers to individual productivity, are comparable to death by 1,000 cuts.
"With Pandas we really obsessed about how to allow the cleaning of data in a systematic way," he said.
Saeed Amen, founder and CEO, Cuemacro, demonstrated some of the Python code he has uploaded to Github: finmarketpy, chartpy and findatapy. This included a nice animated graph depicting the market assimilation of last year's referendum build up and aftermath, which he referred to as "Brexit the movie".
Also on the panel was Man AHL CTO Gary Collier who was the architect of the Artic open source project. "It's like nailing colours to the mast of the organisation. In the war for the technology talent there is a high correlation between the best developers and open source. Also, functionality is key; it's the way you wire up your code with your data sources. There is a lot less intrinsic value in the actual code than some people think."
Chairing the panel, Paul Bilikon, founder and CEO, Thalesians, asked about how open source works in the context of academia compared to industry.
Viral Shah, CEO and founder Julia Computing Inc, said that in academia it's normal to go on forums and ask questions, whereas within industry there's always the fear you might be giving away your IP. "It's a delicate balance. For industry the focus is on robustness and stability, whereas in academia it's all about experimentation."
McKinney said you can mitigate these tensions to some degree with testing standards and contribution guidelines, especially if you are changing something that might affect an industry user. "In the Apache Software Foundation a key thing is the set of principles that ensures decisions are made on consensus," he said. "Open source can be killed by back room decision making. You must do it in public. This also makes it difficult for some vendor to come and start to exert influence."
The panel were asked about how best to sustain an open source community. Often when software is the core business a consultancy business model is aimed for, but there are other cases where some engineering time dedicated from your company could provide food and water for an open source project.
McKinney feels passionate about this. He said: "I feel a compulsion not to let open source projects die. But without sponsorship it can become hard to sustain. So when commercials ask me how they can help, I say sponsor an individual – to triage issues, do patches; that goes a long way.
"There are hidden costs of being overburdened with maintenance and as a project succeeds you can become a victim of its success. The project exceeds its bandwidth and the cost is innovation."
Later in the day a fintech panel was chaired by Dr Tristan Fletcher, formerly of Thought Machine. He started by pointing out that the FCA received 40 applications for Robo Advisors this year, asking if it was expected to live up to its promise to disrupt the wealth sector.
Hazel Moore OBE, co-founder and chair of First Capital, said yes because humans can no longer process the scale of data available. Regarding robo-advice uptake, she said: "For individuals it is still a fraction of a percentage point, and right now it's pretty simplistic: fit your profile with a set of funds."
Richard Craib, CEO, Numerai, said there was perhaps some confusion between intelligence and automation where robo-advice was concerned. "Is there training data, that you would use in an AI?"
Mike Baliman, founder, London Fintech Podcast, questioned the current buzzword – "regtech". He pointed out that there is a certain class of problems, for example pensions mis-selling, where it would be impossible to predict what would come about after the fact in the likes of the Pensions Review or PPI.
Hazel Moore argued that there are some obvious candidate regtech applications such as KYC or within real time fraud detection, but agreed it was not a panacea.
Finally, the panel turned to the question of hype and expectation. Craib said he gets worried by over-ebullient VCs. "They'll say to you – 'I understand this at a high level', which means they don't actually understand it."
Dr Fletcher said the main offenders are the media. "They drive this hype cycle. It's as if they are warning us about overcrowding on Mars – we are not even there yet."
Presumably those same journalists are closely followed by conference organisers.