Computer scientists from Stanford University and Princeton University have discovered that the so-called "anonymised" data collected by certain websites is not truly private, as it is possible to connect web browsing histories to specific people just by analysing publicly available information from their Facebook, Twitter and Reddit accounts.
When you're signed into Google's services and browser the web or search for videos on YouTube, you know that your browsing activity is being tracked, because Google openly tells you so, and the same goes for Facebook.
However, many other websites you visit on the web also have the ability to track you, even if they state that the only web browsing records they collect are anonymous.
"Users may assume they are anonymous when they are browsing a news or a health website, but our work adds to the list of ways in which tracking companies may be able to learn their identities," said Arvind Narayanan, an assistant professor of computer science at Princeton's Center for Information Technology Policy.
To figure out who to advertise to, online advertising companies typically build up browsing histories of users by embedding tracking programs into web pages.
Most websites and advertisers promise that any web browsing data collected is not tied to anyone's identity, and in some countries, such as the US, internet service providers (ISPs) are only allowed to store and use information on consumers if the data was "not reasonably linkable" to individual users (and yes, this includes websites you look at on Google Chrome's Incognito Mode).
70% of study participants identified by computer algorithm
Princeton and Stanford researchers wanted to test whether web browsing data was truly anonymous, so they studied anonymised web browsing information submitted by 374 volunteers. The researchers developed a computer algorithm to compare web browsing histories against hundreds of millions of social media profiles with publicly accessible links, and the computer was able to discover the identities of over 70% of the volunteers.
"Each person's browsing history is unique and contains tell-tale signs of their identity," said Sharad Goel, an assistant professor at Stanford who co-authored the study.
The researchers say that although the programs were able to pick out patterns from the anonymised data, the method is not foolproof, so it only works if there are social media profiles that contain links to outside websites. Still, when given a web browsing history that included 30 links that originated from Twitter, the researchers were still able to figure out the user's corresponding Twitter profile over 50% of the time.
This raises concerns about whether users are ever truly anonymous online. Even if you use the Tor browser to anonymise your traffic, if you have a Reddit account and post many links to it, in theory the algorithm could still connect the traffic to your Reddit account from the type of links you post, even if it cannot detect your real name.
The researchers will be presenting their research at the 2017 World Wide Web Conference, which will be held in Perth, Australia from 3 to 7 April.