Data scientists can provide firms with access to a magical world of machine learning and all that it promises. But what demarcates this hallowed realm of statistical analysis and its proponents? This question was posed to a panel of experts in the field at Money 20/20 Europe in Copenhagen.
Jay van Zyle of Innosect, moderating, asked if a data scientist is now just a computer person that went on a statistics course, or conversely, a stats person that went on a computer programming course? Or is it entirely a new discipline?
Marco Bressan, chief data scientist, BBVA provided an elegant analogy to illustrate the plight of the data scientist: "One simple way is to look at data scientists doing machine learning as farmers. While traditional software developers you could look at more like manufacturers.
"Traditional software developers would put modules together, and that would come out one machine. While farmers would put in some kind of seeds with no particular knowledge of what would come out of it – and the soil on which data scientists put those seeds is basically the data. So if traditional software was having some kind of software built to output data, machine learning would be putting data into some kind of software that outputs another piece of software."
Extending that conceit, Nuno Sebastião, CEO, Feedzai said there have definitely been lots of construction guys trying to be farmers and a lot of people retraining themselves in machine learning, whereas six months ago they may have been a database administrator or some such.
He added: "It's not the Holy Grail, the answer to all your problems. There are a number of tasks it can be applied to; risk is the one I addressed there are others, marketing comes to mind. Today you can have a ton of models running at the same time, each of them working on a subset of data and making a contribution towards a decision."
But Sebastião warned of the complexity involved in implementing machine learning and meeting the challenges of an "omnidata" structure. "You might be buying something online then returning it in a physical store, or vice-versa, using two different credit cards, one for the purchase and the other one for the chargeback – whatever. How do you make sense of it? Historically that would have been impossible, but today we can.
"It's not so much that there are new machine learning techniques (some of the algos we are using have been around for 40, 50 years) it's more about the way that you can process and manage all of that data – that until recently only a few organisations could make sense of.
Amy Lenander, chief marketing officer, Capital One UK, said: "I think about machine learning as statistics on steroids. We use many of the same principles that you would have used 20 years ago in statistics, we just have better computing power to throw behind it. The key rules of statistics still apply: your data needs to be clean, so that you can actually get great insights out of it."
Martina King, CEO, Featurespace pointed to an adaptive element of self-learning that has moved us beyond rule-based systems. "Particularly with the payments and banking world you have seen working systems that have been rules based. Somebody will write a rule, we will take a slice of data, we will learn something about that and we will apply that to the whole of the rest of the dataset.
"But our datasets are so gigantic now that taking a thin slice of it and applying that everything is not necessarily giving us the right results."
King said a benefit of adaptive systems is that models don't degrade over time, "which is really quite revolutionary, especially in the payments world; people tell us constantly that their models are degrading".
King also cited the example of machine learning algorithms being used in visual recognition to detect the emergence on cancer from skin lesions. She said the maths is there, provided your prior data is good. "If your dataset gives you a really good history then you can start to make good predictions on that. The issue is the quality of data and financial services data is, in my opinion, massively advanced over other sectors in this respect."
On the subject of errors that typically might occur when implementing machine learning systems, Capital One's Lenander said banks must not be afraid to experiment. "I love learning from failures," she said. "If I think back to one of our first attempts at implementing a tree-based model. I'm going back like 10 years or so. I think we neglected to consider what it would really need to implement so we built a fantastically power model – I say fantastically powerful, it was a little bit better that we could have built with a logistic regression.
"We were very excited about the tree-based model, but it's a lot of code that you need to implement it. Whereas a simple model that you would use in the past, you could write in one equation, it would be very clear. But this is hundreds, if not thousands, of lines of code.
"And particularly 10 years ago our systems were bursting at the seams trying to use that sort of model. We used it and we were able to implement it but there were problems that came with that because we had to maintain it; we had to try to explain to customers, to other stakeholders in the business, why we might be declining them for credit.
"It's a lot easier to do that when you have a model with 10-20 variables, than it is when you have a model with various computations of hundreds of variables.
"So a key lesson is always to make sure you can actually implement your insight. We have evolved to a point now where we realise we can get almost all the power from machine learning by using it for insight generation, but actually implemented in some more traditional ways."