Individual consumers are producing the most valuable naturally occurring resource on the planet – data.
Much like oil can be refined into gas, plastics, and many other useful resources, data can be refined into what is most generally referred to as "intelligence". These new intelligence tools are incredibly valuable, useful for taking data that is easy to acquire (such as today's stock price) and transforming it into data that is difficult to acquire (such as tomorrow's stock price).
The supply chain for the production of these intelligence tools starts with the producers of data, and ends with an intelligence that predicts something in the real world. However, this supply chain has one fundamental limitation. In order for these intelligences to be created, the data presently needs to be copied into a centralised location so that a machine learning model can be trained based on the data.
For example, machine learning startups working in healthcare will buy patient data, such as MRI scans, aggregate this into one location where a data scientist will train a model and sell that back to the hospital system. This describes a broader trend in entrepreneurship wherein firms acquire data, train a model and sell the use of that model in the form of an application.
This fundamentally technological limitation (the need to centralise data to train a machine learning model) has had a significant impact on the control structures of the supply chain such that data and intelligence ownership is centrally owned by the supply chain owners (ML startups), not by the producers of data (individuals).
OpenMined, which combines artificial intelligence with homomorphic encryption, multi-party computation and blockchain, is one of handful of new projects (see also Numerai and Ocean) working at the technological cutting edge to solve this problem out of a desire to impact the social one.
There are two goals of the project. First, OpenMined seeks to allow large groups of individuals to train intelligence without needing to ever reveal the contents of their data. Secondly, OpenMined seeks to allow individuals to continue to retain ownership (and access to cashflows) of the downstream intelligence trained on their data.
Andrew Trask, a leader in the OpenMined community, has been looking at ways to decouple ownership of the data and ownership of the machine learning model (the intelligence). Earlier this year, he blogged about how to train a neural network using homomorphic encryption – a special type of encryption that allows someone to modify the encrypted information in specific ways without being able to read it – thus protecting the data and fundamentally altering the AI ownership power balance.
Trask said: "The interesting thing about hiding the model in these various different ways is that it gives me the ability to take a model, encrypt it, and send it to you. You can then make this model 'smarter' by training it on your data locally and then send the model back to me. In this way, you can make my model smarter without me ever seeing your data and without you ever seeing (the decrypted version of) my model. It is upon this core piece of functionality that OpenMined is facilitating a marketplace for intelligence that protects privacy."
There are a few rather complex moving parts in the OpenMined model, but where it leads is simple and unprecedented. This combination of technologies will allow anyone to rent access to their data with the potential to create a revenue stream in return. "Now for the first time we are within spitting distance of private machine learning being real, where people could actually reap that level of revenue without ever sacrificing anything (privacy). This is because OpenMined is working toward something fundamentally different than other offerings," said Trask.
The core difference between OpenMined and other projects in the space is nuanced but profound. Other projects are either marketplaces for data (Ocean), which require participants to give up control of their data, or they are marketplaces for models (Numerai, SingularityNet) which capitalise on the idea that an intelligence useful for one party will also be useful for another.
OpenMined is instead what's called a "Gradient Marketplace", wherein individuals can trade intelligence in its purest and most liquid form, gradients. Protected via encryption, this facilitates the construction of very powerful, general AIs without requiring the loss of privacy.
A new kind of internet
"I think that it's a big enough idea to power a new kind of internet and a new level of expectation of consumers from the products that they want to be able to receive."
Trask said he would love for OpenMined to facilitate a universal basic income-style revenue stream, where people can be compensated for this immensely valuable natural resource they are generating without ever having to give up their privacy. He said the truth is that right now people don't know that their data is that valuable and they don't have the ability to capitalise on it.
"That's one of the things that OpenMined exists to change. The biggest hurdle is not technology, it's actually convincing consumers to do it, to change their ways, to aggregate their data into one spot, and train a model."
There has been some tailwind with healthcare in the US, where every individual consumer has a right to demand access to their data from any healthcare provider. This data becomes significantly more valuable as more information about the same person becomes aggregated. For instance, adding to your location data, what food you are eating, whether you are sick or healthy; the more of a 360 view, the more amazing things you can do with machine learning, said Trask.
"The problem is that no hospital network wants to give up their most precious asset to another hospital network. What they lack is the technological expertise to aggregate it, and the platform with which to take advantage of it."
In theory, using things like homomorphic encryption or multi-party computation, we can protect the model and send it out in the wild. This paves the way for a whole new wave of products that were previously too personal for anyone to even dabble in, such as training a classifier to predict things like the onset or recurrence of mental illness, or the likelihood an individual might attempt suicide.
An essential tool has presented itself in the blockchain, a type of immutable, shared database with no central governing entity which offers a decentralised way of storing value and providing incentive. In this case, the vision is decentralised ownership of data and intelligence; by pushing the control of the primary resource for AI into a decentralised format, the benefits will propagate in a decentralised manner.
"It's interesting," mused Trask, "we didn't start by looking into blockchain. It surprised a lot of people; like we are not looking to ICO, we are not doing any of the kind of normal blockchain things."
"The core technology for us is machine learning. But blockchain is our way of avoiding an awkward situation. We are interested in decentralisation because we are attempting to aggregate the most valuable digital asset ever.
"We need to implement hard structures that ensure it is owned and leveraged in a decentralised manner, so that every individual person is sort of voting for their own benefit, and maybe of their local community.
"So perhaps 50 people can train a model that does something useful and a whole other marketplace can come; the blockchain can host it and remunerate the individual who owns that model in a fully transparent way."
In terms of blockchain flavours, OpenMined is planning to dual-release its alpha version on both the Ethereum and Tendermint testnets. The platform requires fast transaction throughput, and has formed a useful partnership with Tendermint and its proof-of-stake implementation. OpenMined is using the Ethermint protocol to run Ethereum code (Solidity) on the proof-of-stake Tendermint blockchain.
On the subject of performance, homomorphic encryption is known to be computationally onerous.
Trask said: "We are looking at two technologies – homomorphic encryption and multi-party computation; each better in different scenarios.
"Homomorphic encryption is great if you've got small models; most data scientists use models that are actually pretty small on structured data, like linear models, SVMs, logistic regression, that kind of thing.
"However when you get to the big models, it's a lot more feasible to use something called multi-party computation, which basically trades off the computational complexity for network overhead. We know certain things about our participants and that allows us to have a really low number of people who are doing the sharing in multiparty computation, which allows it to be extremely performant."
Andrew Trask, founder of the OpenMined project, will be talking about homomorphic encryption and deep learning at Newsweek's AI and data science conference in New York.