Quandl is a professional supplier of financial, economic and alternative data to a select clientele of hedge funds, asset managers and investment banks. It's well known that a potentially benchmark-beating trading strategy is only as good as its data, and as computing gets more powerful many hedge funds and asset managers are scoping out a wide array of alternative data.
Alternative data coveted by investors might include: satellite imagery; shipping and supply chain metrics; GPS data, which could be related to shopping habits or commercial vehicles and vessels; data used within or generated by the sensors of connected cars; emailed receipts from ecommerce transactions and so on.
Quandl has expertise in all of the above. Asked what's currently hot on the alternative data front, CEO Tammer Kamel says there's pretty much always interesting new stuff coming from the web.
He said: "A firm we are working with is watching sites like Instagram to measure product popularity and brand sentiment. So let's say both Nike and Adidas are showing off their new running shoe on Instagram, you can kind of look at how that community is responding, with number of likes and shares and all this kind of thing, and it becomes a proxy for the success of a particular marketing campaign that one of these firms are doing.
"Another neat one is a partnership with a firm that builds the technology that powers certain travel websites - actually many travel websites - and as a by-product of that, these guys have 'lookthrough' into the transactions that are occurring on these websites, specifically with respect to hotel bookings.
"This translates to a fairly accurate measure of how various hotel chains are doing, kind of in real time. We can see exactly how many bookings they're getting; how does that compare to last year, how does that compare to their competitors - and of course that's proving valuable for investors in that particular domain."
Firms thinking about maybe monetising "exhaust data" they create as a by-product of their core business might be concerned about privacy and regulations around divulging personal information. However, while hedge funds may be hungry for data, they are only interested in its aggregated form; they don't care what individuals are up to, and so are never creepy about it like Google is.
Kamel said: "One nice thing about our customer base is that there's really no temptation to include personal information in data we give our customers. Our customers are really not interested in any individual person's behaviour or what they are doing or what they like or what they don't like.
"At the end of the day they have to decide whether they should buy a particular stock and they want to know what the 18-35 year-old demographic is doing with respect to their product. It's aggregates that matter to these people; they couldn't give a damn what one individual is doing.
"I contrast this with stuff you hear about PPI, and in respect of the marketing world and the ad-tech world, where of course it's so invasive. People feel kind of icky and creepy because Google knows exactly what brand of underwear they are buying."
There's plenty of heavy lifting involved in transforming data assets into quantified, actionable insights for investors. And some datasets are gnarlier than others. "In terms of gnarliness, the most is probably event driven location data, when you get these random observations about this person or more specifically this telephone was at this location at this time," said Kamel.
"You have to figure out what's the margin of error on that location, what's near that location, because ultimately you want to use this stuff to figure out how many people are shopping at Walmart or something.
"This means a lot of work has to be done on the raw data. For example, you have to superimpose a map; just because I know a phone was at a certain GPS location, I have to figure where that really is and what is the margin of error."
Kamel said typically messy, noisy or erroneous data would be something like automated identification system (AIS), which is used to pinpoint all ships on the ocean.
"You have all these commercial vessels sending out pings to satellites about where they are and where they are going. But they are notoriously sloppy in the way they do it, which creates all kinds of cleaning problems. The information comes from the bridge of the ship where the captain types it in, or doesn't type it in, or types it in incorrectly, or types it in deceptively."
Newsweek's AI and Data Science in Capital Markets conference on December 6-7 in New York is the most important gathering of experts in Artificial Intelligence and Machine Learning in trading. Join us for two days of talks, workshops and networking sessions with key industry players.