Every day, I spend a couple of hours digging through my RSS subscriptions for interesting stories, some of which I use here at Futurismic and most of which I store away at del.icio.us as research material (you know, for those fiction pieces that I keep meaning to find time to write… ahem). [image by ottonassar]
I’m a big fan of tagging my links because it enables me to trawl through the stored pieces (mine, and other people’s as well) by context and related topics, but it turns out there’s a greater benefit – user folksonomies on social bookmarking sites can be used to track and predict emerging trends and fads using mass data analysis:
The researchers tracked different users and noted the submissions they made, as well as the tags used on those posts. Taking this data, they could see what tags were frequently used in correlation with one another. This created a “coocurrence network,” which assigns weight to tags based on how often the tag was used and how many different users applied it.
With this information, it was possible to conduct a random walk (stepping randomly from one tag to another) and note how tags that occur together can form an otherwise undetectable semantic chain. These tags, based on their association with one another, allowed the researchers to follow along as one popular trend gradually replaced its predecessor.
When comparing individual random walks with one another, researchers noted that tags that appear close together in a non-obvious semantic network were likely to be visited by the same user, and tags that were far apart were visited together less often. Although no individual user might be aware of following these obscure connections, they became obvious when the data was examined in bulk.
The applicability of Heaps’ law to Internet tags was noted in particular. Heaps’ law states that the number of distinct words used in a body of text grows sublinearly relative to the size of the text—the bigger texts have more diverse vocabulary, but there are diminishing returns as things scale up. Likewise, the number of unique tags on del.ici.ous and BibsSonomy grow nearly linearly relative to the total number of tags—that is to say, our interests and the vocabulary used to describe them grow directly along with the Internet. It isn’t all just lolcats and musical parodies, even though it might seem so sometimes.
This fascinates me, because it confirms as a real phenomenon something that I always dismissed as a fallacy born of close involvement; scanning close to a thousand RSS feeds a day from a variety of sources and covering a variety of subjects gives me a sense of being able to observe trends bubbling up out the web’s chaotic maelstrom. I get a real kick out of watching a story or meme moving from low-level niche sites into the wider world of the web, and seeing new obsessions gather popularity.
And talk about hindsight – if I’d thought about it, I’d have seen the economic collapse coming about six months or more before it bit in and shifted all my investments somewhere safer. If I’d had any investments, that is…
Of course, this sort of trend analysis could probably be used for profit or surveillance purposes as well as the more abstract goals of research and cultural analysis, but if you haven’t realised that the internet is the ultimate double-edged sword by now… well, you’ve not been following along with my links, have you? 😉