When we scan our loyalty card or order the latest gadget from Amazon, few of us give more than a second's thought to what happens to the data we're freely handing over. We might assume that our personal details will be stored in a giant database, gathering digital dust (so to speak) until the business decides to use it for marketing purposes.
Were this the case, then the right to be forgotten – as outlined in the GDPR regulations – would be a relatively straightforward matter. Unfortunately for most businesses, personal and sensitive customer data is rarely so neatly packaged, filed, and referenced.
And, there's a growing schism between every business that holds customer data, the rules of the GDPR and those people who wish to be forgotten.
The futility of securing the "data diaspora"
Data is the fuel of modern business, and it's used to run pretty much every business system one can imagine. So, while it's comforting to think that your user name, credit card details and address are stored in a single database for safekeeping, in reality, your personal details are often copied, duplicated, and used in various places. In fact, 90 per cent of all organizational data is contained in "non-production environments" in places like test and development, reporting, analytics, backup and a range of other applications. Different data types flow through disparate systems, making it nearly impossible for a company to identify the exact data that has been tagged to be "forgotten" by customers.
For example, your personal data may be in a test database for a new mobile marketing app that is being developed by a third party. To facilitate this work, a copy of the database may be uploaded into a cloud service where it be visible to the development team, creating risks of leakage. The development team may make copies of the data and transfer them, all without the knowledge of the data controller. All of these activities are covered by the definition of processing that must be protected within the GDPR.
In short, our personal data is dispersed in any number of separate business systems, spread across clouds, vendors, and geography. This "data diaspora" means that organisations and their data controllers (as defined under the GDPR) may not have full control or even visibility of that information, who is using it, who can access it, and what happens to it after it has served its purpose.
Managing the information in all these silos is an enormous challenge, especially as, for most companies, they lack a single view of non-production data. Governing the information you hold and control yourself is one thing; but, often once data leaves the source, the source administrator may no longer have control or even visibility of that data, who is using it and where it goes after.
With the GDPR deadline now only weeks away, what can businesses do to regain control over their own data diaspora, ensure they can effectively remove every trace of a consumer's personal information and, perhaps most importantly, abide by the letter of the new regulations?
Pseudonymisation and data masking
The drafters of the GDPR are no fools: they appreciate the data management challenges facing enterprises, have no interest in seeing any business fail, and understand the key technologies that can protect citizens' sensitive data.
That is why the regulations specifically accept the use of pseudonymisation as an approach to data protection. Pseudonymisation involves anonymising personal data in a way that removes it from the scope of the GDPR, while still retaining the relationships that make it useful for development and testing.
The de facto standard for pseudonymisation is data masking, which replaces sensitive information with fictitious, yet realistic data. While alternative protections such as encryption are vulnerable to identity breaches, insider threats or compromised decryption keys; data masking simply replaces any information that could conceivably be of value to hackers or other criminals. In short, even if an intruder accesses your innermost systems, there's nothing to steal.
While data masking provides organisations with a tool that fits key challenges emerging from the GDPR, businesses must apply it with a "data first" approach grounded in the notion that data changes and moves over time, in many different places. Specifically, businesses will be most effective in achieving pseudonymisation through masking if they address three key questions about their data: its location, governance, and delivery.
The first step is to understand exactly where customer data resides in the organisation and then they need to ensure that they have appropriate governance procedures in place. This involves establishing full oversight, standardisation and control over how data is moved and manipulated between different systems and business units. Finally, the only way to be able to apply "right to be forgotten" to sprawling data pools is to centralise and automate the delivery of data. If you can have a single control system that then delivers the same data to multiple destinations then you can apply the right to be forgotten to the source and not worry about downstream.
In the world of non-production this can now be done using next-generation data management platforms. They continuously collect data from production environments and then deliver that data to non-production systems such and development, test, reporting and analytics. If there is a request to remove someone's details, an organisation need only remove it from the production source and know that it will be automatically removed by the next daily or hourly refresh.
The GDPR was not designed to be so complex and unwieldy that businesses would inevitably fall foul of the rules. Taken all in all, it represents one of the more sensible pieces of European data legislation, based as it is on a full understanding of modern data masking techniques. With a little application, any business will be able to apply the principles of pseudonymisation, and ensure that data subjects can, if they wish, decide to disappear completely – and permanently.
Eric Schrock is CTO at Delphix