The default setting for financial firms today is to hold on to data, like something from Hoarders. Firms have difficulty parting with in-house data because they don't know what is OK to delete. To be on the safe side, especially from a compliance perspective, they just save everything. Given the availability of ever cheaper computing power and storage, the direct cost of following such a strategy is fairly low.
This means internal data accumulates fast these days. Everything from instant messages via Bloomberg, Slack, Symphony or Skype to legal documents such as business contracts, litigation or regulatory filings; it could be internal analyst newsletters, investment notes, surveys, or reports that you have – not forgetting email, where you are likely to have some valuable discussions buried in the morass of your inbox.
Big data analytics company RavenPack helps firms deal with their data hoarding issues. Specialising in text, RavenPack turns large unstructured datasets into what you would call structured or machine readable content. This creates data that can be easily analysed, manipulated and deployed in financial applications with a focus on alpha creation, risk management, or compliance and so on. RavenPack's clients include some of the most sophisticated quantitative hedge funds and asset managers in the world.
Peter Hafez, Chief Data Scientist at RavenPack, said: "Today, some data-driven hedge funds are evaluating more than a 1000 new datasets per year. The main challenge for them is to figure out what datasets to really dig deep into and focus on. They look for easy ways to structure new content, to identify which datasets hold the most promise when it comes to alpha generation. They're always looking for ways to optimise resources that will help them consume even more data than before."
RavenPack normalises internal content, using semantic intelligence to structure information. "To help make sense of it all, it's useful to apply an ontology as part of the process. This can help you better understand what information is relevant, for instance to a company. Is the company or one of its subsidiaries part of a corporate action, are people talking about their products, or is there perhaps news about its suppliers, customers, or competitors." explains Hafez.
However, it is not enough simply to detect or understand the links between entities. "You also need to understand the context." RavenPack works with an elaborate event taxonomy that is used to detect and classify thousands of actionable events, including anything from lawsuits or product recalls to supply disruptions or civil unrest. "Context is key! Understanding the novelty and relevance of an event can help you make stronger return predictions. Other important features include the sentiment and temporal nature of the event, i.e. is the information positive or negative; is the event taking place now, as opposed to in the future or in the past?"
Once these dimensions are in place, data held in-house is much easier to search with a view to extracting value from it. It may be that a firm's internal data diverges from what external data is saying, and this could provide a competitive advantage. "These days, hedge funds and trading operations are searching out some obscure and unique dataset and taking advantage of that, but it's easy to overlook valuable information you may already have in house", notes Hafez.
"We have had clients come to us, discretionary traders, who have portfolios of 20 or 30 companies. What really bugs them when they do their analysis is thinking there may be some information they are missing and it's actually sitting right there in the inbox," he said.
"They know there is a lot of potentially valuable content there like analyst reports coming in. They just haven't had time to look at it all. Instead, they turn to RavenPack for help on what reports or what emails they should read before making a particular set of trades. Being able to highlight the ten most important emails can create a contrarian view to the trade, potentially making or saving the client millions."