Well, maybe not everyone – but some clever types from the University of Austin have determined that even when your social networking data is divorced from your identity, it’s a relatively easy job to do some analysis and fit the names to the profiles.
In tests involving the photo-sharing site Flickr and the microblogging service Twitter, the Texas researchers were able to identify a third of the users with accounts on both sites simply by searching for recognizable patterns in anonymized network data. Both Twitter and Flickr display user information publicly, so the researchers anonymized much of the data in order to test their algorithms.
The researchers wanted to see if they could extract sensitive information about individuals using just the connections between users, even if almost all of the names, addresses, and other forms of personally identifying information had been removed. They found that they could, provided they could compare these patterns with those from another social-network graph where some user information was accessible.
The prime appeal of that data is, of course, the ability to use it for targeting advertising over the most desirable demographics – which, for many people, is objectionable in and of itself. More worrying is the potential for unearthing data that – under a restrictive regime, for example – could be used to persecute or criminalise:
For example, the algorithm could theoretically employ the names of a user’s favorite bands and concert-going friends to decode sensitive details such as sexual orientation from supposedly anonymized data. Acquisti believes that the result paints a bleak picture for the future of online privacy. “There is no such thing as complete anonymity,” he says. “It’s impossible.”
Leaving the risks aside for the moment, though, this research has produced some rather fascinating insights into the nature of social networks and human behaviour as a unique identifier:
“The structure of the network around you is so rich, and there are so many different possibilities, that even though you have millions of people participating in the network, we all end up with different networks around us,” says Shmatikov. “Once you deal with sufficiently sophisticated human behavior, whether you’re talking about purchases people make or movies they view or – in this case – friends they make and how they behave socially, people tend to be fairly unique. Every person does a few quirky, individual things which end up being strongly identifying.“
I wonder if the open-source argument about security would apply here? Open software advocates say that having the source code out in the open means that everyone can work on making a program more secure and efficient, rather than just the developers and the crackers; should these analysis methods be made public so we can keep up in the arms race with the snoops and marketeers? [image by luc legay]
What’s almost certain, though, is what any good security expert will have been saying all along – if you’re even slightly worried about something about you becoming public knowledge, assuming you can put it somewhere on the web and keep it private is an act of uninformed delusion. If you want to keep your privacy, it’s down to you to do it.