Tag Archives: aggregation

Meatpuppet farming: the (dark) grey-hat global freelance job market

Compsec maven Brian Krebs rakes over the findings of University of California, San Diego research report into the online market for what I like to call meatpuppets: cheap human labour-on-the-web that gets leveraged for bypassing the security systems that are supposed to stop automated spammers.

“The availability of this on-demand, for-hire contract market to do just about anything you can think of means it’s very easy for people to innovate around new scams,” said Stefan Savage, a UCSD computer science professor and co-author of the study.

The UCSD team examined almost seven years worth of data from freelancer.com, a popular marketplace for those looking for work. They found that 65-70 percent of the 84,000+ jobs offered for bidding during that time appeared to be for legitimate work such online content creation and Web programming. The remainder centered around four classes of what they termed “dirty” jobs, such as account registration and verification, social network linking (buying friends and followers), search engine optimization, and ad posting and bulk mailing.

“Though not widely appreciated, today there are vibrant markets for such abuse-oriented services,’” the researchers wrote. “In a matter of minutes, one can buy a thousand phone-verified Gmail accounts for $300, or a thousand Facebook ‘friends’ for $26 – all provided using extensive manual labor.”

The evolving marketplace is best illustrated by the market for services that mass-solve CAPTCHAs — those agglomerations of squiggly numbers and letters that webmail providers and forums frequently require users to input before approving new accounts. The researchers found that the market for CAPTCHA-solving was fostered on freelancer, but quickly expanded into custom markets when the model proved profitable on a large scale. Today, there are plenty of commercial services that pay pennies per day to low-wage workers in India and Eastern Europe to solve these puzzles for people wanting to create huge numbers of accounts at one time.

It’s interesting to see massive crowds of human labour getting rolled quite effectively into these vast and largely automated systems: the darkside equivalent of Amazon’s Mechanical Turk, with a smattering of Matrix metaphors on the side. But those digital peons are just trying to make a living, and when you look at the prices being charged for Twitter followers by the thousand and factor in the significant cut being taken by the service aggregators, you realise that they’re probably not making much more than sweatshop wages. Which means that until the massive differential in income between developed and developing nations gets narrower, web security procedures will always be subject to this sort of outsourced brute-forcing. Shorter version: spam ain’t going anywhere anytime soon.

The irony of having blocked five fake Twitter accounts in the time it took me to write this post is palpable. Death, taxes, noise*, spam.

[ * Anyone who’s worked in the recording or music industries will tell you that noise is the third certainty of life. As, I suspect, will anyone who has lived in a block of flats. ]

Young market problems: ebooks as clearing house for unpublishable content

Part of me really wants to get a decent ereader and start plunging into the brave new market of electronic books; as a writer, reader, some-time publisher and general technoforesight wonk, I feel I should be down in the trenches if I want to see how the campaign is really going. The other half of me is the half that’s been burned by classic early-adopter screw-ups ever since I acquired that tendency from my father; I’m waiting for either a universally accepted open format, a decent open platform, or both. (I doubt I’ll have much longer to wait; I expect I’ll be nailing myself an affordable Android-based tablet in the post-Xmas sales next year.)

So, perforce, I have to get my news about the actual content sloshing around in the ebook marketplace from other people… and while I’m not taking it as broadly representative, this post from James “Big Dumb Object” Bloomer highlights the state of play wherein creators and new middle-men/aggregator outfits are testing the water to see what will actually float. Or, to put it more plainly: everyone’s throwing shit at the wall in order to see what sticks:

The other day I bought How To Write Science Fiction by Paul Di Filippo, tempted by the price (69p) and the prospect of another author’s view on writing SF.

It’s an interesting read, containing thoughts on what maximalist SF is, how to (attempt to) write it and an essay on the creation of Di Filippo’s novel Ciphers. There’s a few interesting nuggets there for me to think about (plus, now, a need to read some Pynchon). However it’s not very long, not really a book and not really about how to write Science Fiction. It’s the sort of text I’d expect to be posted to a blog. It’s the sort of text that in physical form would be thin and flimsy, and I probably wouldn’t ever buy.

It’s going to take a while for pricing to settle down in line with customer expectations, but the nature of the content being sold is a big part of that. Perhaps it’s the case that no one’s gonna pay for a lengthy blog essay when there are umpteen thousand of the things – some of exceptional quality, others not so much – floating around out here on the unwalled web, just waiting to be read. But then again, Nick Mamatas’ Starve Better – my dead-tree version of which I’ve been greatly enjoying over the last week or so, incidentally – is essentially a collection of essays and articles, many of which either were or started out as blog posts or fanzine pieces; it’s retailing at $3.99 for a selection of electronic formats, and – had I been in possession of a decent ereader – I’d have considered that a damned good price for the material it contains. I don’t know how long the di Filippo piece is, exactly, but perhaps the problem here is the attempt to price a single essay fairly; meanwhile, Starve Better is a curation product, an act of filtering Mamatas’ prodigious output down to the best material devoted to a specific topic.

So perhaps we could say that Apex, by doing the old-school publisher thing, have added value to the raw material and thus earned their middle-man cut, while 40k – who, I should note, I think are one of the more interesting ebook ventures I’m aware of at the moment, and not just because they’re publishing a lot of stuff from sf authors – are just rolling chunks of content out of the door with a snappy title and hoping for the best. Maybe the latter would work at a lower price… but until someone sorts out a decent and widely-adopted micropayments system, pricing at under a buck will remain the province of big clearing houses like Amazon who can afford to eat up the transaction charges on a lot of tiny purchases. Economies of scale haven’t gone away just yet, it seems.

More musings from James:

Will this mean that buyers will tread ever more safely when buying books? Perhaps now people will only trust books from the bestseller top ten or those recommended by a high profile book club? It feels to me right now that the lack of physical form may actually hinder more experimental buying once the blush of the new fangled eBooks dies to the norm, the marketing departments have tried to pull a few fast ones and readers have been bitten by buying some dreadful self-published novels?

I think these are very real issues, and not just for publishing; a flattened media landscape means curation and aggregation are becoming at least as important as the traditional editorial roles, and the marketing/PR channel needs to become more focussed on finding the right niche vertical to pitch to, as opposed to the old model of making generalised statements of awesomeness about a piece of work and hoping some hack will cut’n’paste it verbatim. Interesting times ahead.

Excellent Bill Gibson interview

The best author interviews are surely the ones where the interviewer asks the sort of questions that you yourself would have picked, had the opportunity arisen. Granted, the list of questions I’d like to ask of William Gibson is long enough that I could keep the poor guy occupied with them until the heat death of the universe, but Aileen Gallagher of NY Mag‘s The Vulture column has whittled a few of them away on my behalf [via MetaFilter]. Here he is, rethinking terrorism:

You also wrote in Zero History that terrorism is “almost exclusively about branding but only slightly less so about the psychology of lotteries.” How so?

If you’re a terrorist (or a national hero, depending on who’s looking at you), there are relatively few of you and relatively a lot of the big guys you’re up against. Terrorism is about branding because a brand is most of what you have as a terrorist. Terrorists have virtually no resources. I don’t even like using the word terrorism. It’s not an accurate descriptor of what’s going on.

What do you think is going on?

Asymmetric warfare, when you’ve got a little guy and a big guy. [There are] a lot of strategies that the little guy uses to go after the big guy, and a lot of them are branding strategies. The little guy needs a brand because that’s basically all he’s got. He’s got very little manpower, very little money compared to the big guy. The big guy’s got a ton of manpower and a ton of money. So this small coterie of plotters decides to go after a nation-state. If they don’t have a strong brand, nothing’s going to happen. From the first atrocity on, the little guy is building his brand. And that’s why somebody phones in after every bomb and says, “It was us, the Situationist Liberation Army. We blew up that mall.” That’s branding. By the same token, you get these other, surreal moments where they call up and say, “We didn’t do that one.” That’s branding. That’s all it is. A terrorist without a brand is like a fish without a bicycle. It’s just not going anywhere.

And a vindication of Twitter:

I’ve taken to Twitter like a duck to water. Its simplicity allows the user to customize the experience with relatively little input from the Twitter entity itself. I hope they keep it simple. It works because it’s simple. I was never interested in Facebook or MySpace because the environment seemed too top-down mediated. They feel like malls to me. But Twitter actually feels like the street. You can bump into anybody on Twitter.

[…]

Twitter’s huge. There’s a whole culture of people on Twitter who do nothing but handicap racehorses. I’ll never go there. One commonality about people I follow is that they’re all doing what I’m doing: They’re all using it as novelty aggregation and out of that grows some sense of being part of a community. It’s a strange thing. There are countless millions of communities on Twitter. They occupy the same virtual space but they never see each other. They never interact. Really, the Twitter I’m always raving about is my Twitter.

Lots more good nuggets in there; go read.

Homeopapes: journalism by machine

Here’s an interesting piece at Wired UK that picks up the “OMG journalism is dying” ball and runs with it in the direction of automated machine-to-machine and machine-to-person news aggregation:

NewsScope is a machine-readable news service designed for financial institutions that make their money from automated, event-driven, trading. Triggered by signals detected by algorithms within vast mountains of real-time data, trading of this kind now accounts for a significant proportion of turnover in the world’s financial centres.

Reuters’ algorithms parse news stories. Then they assign “sentiment scores” to words and phrases. The company argues that its systems are able to do this “faster and more consistently than human operators”.

Millisecond by millisecond, the aim is to calculate “prevailing sentiment” surrounding specific companies, sectors, indices and markets. Untouched by human hand, these measurements of sentiment feed into the pools of raw data that trigger trading strategies.

[…]

Here and there, interesting possibilities are emerging. Earlier this year, at Northwestern University in the US, a group of computer science and journalism students rigged up a programme called Stats Monkey that uses statistical data to generate news reports on baseball matches.

Stats Monkey relies upon two key metrics: Game Score (which allows a computer to figure out which team members are influencing the action most significantly) and Win Probability (which analyses the state of a game at any particular moment, and calculates which side is likely to win).

Combining the two, Stats Monkey identifies the players who change the course of games, alongside specific turning points in the action. The rest of the process involves on-the-fly assembly of templated “narrative arcs” to describe the action in a format recognisable as a news story.

The resulting news stories read surprisingly well. If we assume that the underlying data is accurate, there’s little to prevent newspapers from using similar techniques to report a wide range of sporting events.

The first knee-jerk question here is “can (or should) we trust those algorithms to remain uncorrupted? How easy would it be for such a system to create news that wasn’t true, or that spun the truth in a particular direction?”

The instant counterargument would be to ask how much more prone to corruption and error an automated system would be compared to the existing human-based systems… all trust needs to be earned, after all, and (speaking for myself) I’ve little trust in the worldview of any media outlet when viewed in isolation. I aggregate my incoming news already through a bunch of semi-manual processes and routines; would something that removes the drudgery of that be inherently bad, or does the risk lie in our laziness and subconscious gravitation toward echo-chambers of our own ideas? Is there any such thing as objective news (at least about anything that really matters, a category which I feel sports doesn’t really occupy)?

All this talk of truth, trust and objective realities puts me in mind of Philip K Dick – more specifically “If There Were No Benny Cemoli”, with its homeopapes churning out news of a planetary adversary who may or may not actually exist. Can anyone recommend more stories that deal with similar themes?