Tag Archives: analysis

Software and sentiments – language as battlefield

I consider myself pretty fortunate in that I don’t have to moderate the comments here at Futurismic with a heavy hand[1], but that’s down to matters of scale; there just aren’t enough active commenters here to allow severe flamewars to start, but moderating the discussion on a site like BoingBoing is a different matter entirely, and usually requires a layer of direct human interaction after thecommon-or-garden \/1/\9|2/\ spambots have been weeded out.

Those days may be nearing an end, however; New Scientist reports on a new breed of software agent that is programmed to analyse the tone and sentiment of written communication on the web:

The early adopters of these tools are the owners of big brand names in a world where company reputations are affected by customer blogs as much as advertising campaigns. A small but growing group of firms is developing tools that can trawl blogs and online comments, gauging the emotional responses brought about by the company or its products.

[…]

The abusive “flame wars” that plague online discussions are encouraged by the way human psychology plays out over the web, as we’ve explained before. Moderating such discussions can be a time-consuming job, needing much judgment to spot when a heated exchange crosses over into abuse.

Sentiment-aware software can help here too. One example is Adaptive Semantics’ JuLiA – a software agent based on a learning algorithm that has been trained to recognise abusive comments. “She” can take down or quarantine comments that cross a predetermined abuse threshold […]

Work is underway to expand JuLiA’s comprehension abilities – for example, to decide whether text is intelligent, sarcastic, or political in tone.

That’s all well and good, and it’ll probably work for a while – but much like anything else, it’ll be seen as a challenge to exactly the sort of people it’s designed to filter, and we’ll have another software arms race on our hands – albeit one initially played for much lower stakes than the virus/anti-virus game.

But look here a moment:

Another firm, Lexalytics, uses sentiment analysis to influence what people say before it is too late. It can identify which “good news” messages from company executives have the greatest effect on stock price. These results can then be used to advise certain people to speak out more, or less, often, or to gauge the likely effectiveness of a planned release.

Now there’s a double-edged sword; if you can use that analysis to protect and strengthen a stock price, someone can surely use it for exactly the opposite. And even beyond the battlefields of the trading floors and corporate boardrooms, there are plenty of folk who could find a use for software that could advise them on how to make their communications less offensive or incendiary… or more so, if the situation demanded it.

We live in the communication age, so I guess it’s inevitable that communication should become another new frontier for warfare… but look at the bright side: slam poetry contests are going to become a lot more interesting for spectators and participants alike. 😉

[ 1 – That’s not a challenge or a complaint, OK? Thanks. 🙂 ]

On the internet, *everyone* knows you’re a dog

social network analysisWell, maybe not everyone – but some clever types from the University of Austin have determined that even when your social networking data is divorced from your identity, it’s a relatively easy job to do some analysis and fit the names to the profiles.

In tests involving the photo-sharing site Flickr and the microblogging service Twitter, the Texas researchers were able to identify a third of the users with accounts on both sites simply by searching for recognizable patterns in anonymized network data. Both Twitter and Flickr display user information publicly, so the researchers anonymized much of the data in order to test their algorithms.

The researchers wanted to see if they could extract sensitive information about individuals using just the connections between users, even if almost all of the names, addresses, and other forms of personally identifying information had been removed. They found that they could, provided they could compare these patterns with those from another social-network graph where some user information was accessible.

The prime appeal of that data is, of course, the ability to use it for targeting advertising over the most desirable demographics – which, for many people, is objectionable in and of itself. More worrying is the potential for unearthing data that –  under a restrictive regime, for example – could be used to persecute or criminalise:

For example, the algorithm could theoretically employ the names of a user’s favorite bands and concert-going friends to decode sensitive details such as sexual orientation from supposedly anonymized data. Acquisti believes that the result paints a bleak picture for the future of online privacy. “There is no such thing as complete anonymity,” he says. “It’s impossible.”

Leaving the risks aside for the moment, though, this research has produced some rather fascinating insights into the nature of social networks and human behaviour as a unique identifier:

“The structure of the network around you is so rich, and there are so many different possibilities, that even though you have millions of people participating in the network, we all end up with different networks around us,” says Shmatikov. “Once you deal with sufficiently sophisticated human behavior, whether you’re talking about purchases people make or movies they view or – in this case – friends they make and how they behave socially, people tend to be fairly unique. Every person does a few quirky, individual things which end up being strongly identifying.

I wonder if the open-source argument about security would apply here? Open software advocates say that having the source code out in the open means that everyone can work on making a program more secure and efficient, rather than just the developers and the crackers; should these analysis methods be made public so we can keep up in the arms race with the snoops and marketeers? [image by luc legay]

What’s almost certain, though, is what any good security expert will have been saying all along – if you’re even slightly worried about something about you becoming public knowledge, assuming you can put it somewhere on the web and keep it private is an act of uninformed delusion. If you want to keep your privacy, it’s down to you to do it.