Big data is big business. Take, for example, the type of data that credit card companies have at their disposal. Access to Visa's database would essentially provide a real time window on the entire economy.
This raises interesting questions about how, when and if certain data should be used. Visa makes billions and billions from its core business and while it could doubtless make more money from its data, it's probably not worth the risk.
Tammer Kamel founder and CEO of Quandl, a technology company that provides a plethora of data sets to financial institutions, said Visa is rightly paranoid about the sort of headlines that imprudent monetisation of transaction data could generate.
"Imagine a headline saying: 'Visa selling your information to greedy hedge funds on Wall Street'. They know there is money to be made by extracting insights from the data, but the risk/return profile is scary. Sure they can maybe make an extra $50m, $100m or even half a billion dollars, but that's still small compared to their core revenue and going through that process puts their reputation at risk. This is something these guys are wrestling with right now."
Quandl has over 100,000 users including lots of data scientists from banks and hedge funds, and over the years Kamel has gathered plenty of insight into the data business. He also pointed out that card providers and payment processors are paid a lot of money by major retail outfits like Walmart. If card providers were to turn around and start letting the outside world infer how sales were going at Walmart that would be a huge betrayal.
"Wall Street would love to see all the credit card transactions that are happening at Walmart and work out how Walmart is doing. But if Visa ever released that, Walmart would be livid," he said.
Quandl provides well-structured data, the sort analysts like, via its API, so it's available decoupled from other software or hardware such as Bloomberg terminals. This means users can easily run their own machine learning analytics on it. Quandl's data sets feature everything from alcohol consumption per head of capita in Albania to building control permissions across the US.
"We have a big data set that is essentially all the building permit data in the US, broken down by regions and cities and counties, and the sort of building permits people are applying for – is it residential, is it commercial, are they putting in elevators, are they putting in pools.
"A nice, big, hairy data set, and if you analyse it you can learn all kinds of interesting things about what's happening in the economy right now, because of course construction is a very important part of the economy."
Financial institutions are constantly searching for new alternative data sources; once the rest of the market gets hold of something exotic that works, it quickly gets priced in and will no longer generate alpha. One extremely nascent, big and potentially very hairy flow of data could be captured from the Internet of Things (IoT), which promises billions of extra devices connected to the internet by 2020, endlessly "talking" to each other.
Regarding IoT data, Kamel said he is "super-intrigued" and watchful, but again he cited ownership issues. For example, in the Midwest many farms have smart tractors or sensors in the ground feeding back data about the quality of their soil.
"The sensors send data to a server, and that server does some analysis and goes back to the farmer and says, you need to use this kind of fertiliser in these areas and at these quantities to improve your soil quality and meet your yields.
"The point is, these companies are getting this data from all over the Midwest and implicitly they know the quality of all the fields that are growing corn this year. This means they have a good sense of what the corn yield will be like this year – which of course Wall Street investors would love to know.
"It's a beautiful IoT data set, but extracting the value is a challenge because who owns that data? Technically it's the farmers' private information."
In a situation like this the data sourcing party would need to negotiate bureaucratic challenges, such as how to compensate all the farmers. "I think with IoT data, while it seems so exciting and definitely has all this potential, there are challenges with actually bringing it to market."
Kamel explained there are some clever ways to address issues of ownership and privacy. One of the data sets Quandl sources comes from email, based on a partnership it has with a technology company. "That essentially gives us access to three million people's emails. The company gives away its inbox management software, and in return people opt-in to allow this company to essentially read their emails. Now that might sound a bit evil at first, but what this company does is very responsible; they anonymise and aggregate everything."
Users of this data don't care about any one person's email; only how many of these three million people are shopping at eBay or Best Buy or Amazon; how much are they spending and what are they buying?
"You pull that kind of information out and you can actually draw really interesting conclusions about consumer spending patterns and indeed about how Amazon is doing, right now," said Kamel.
In respect of individuals' privacy, this anonymised processing is a much less invasive use of email data than the type of analytics performed by Google. It reads all your emails to figure out who you are and what you like, so that you can be personally followed around the internet and shown adverts.