Spam: good food for growing AIs

wall of SpamIf you’ve been groaning in terror at the seemingly ever-growing contents of your spam folder, here’s a silver lining to the internet’s perennial plague – the ever-increasing ability of spambots to solve CAPTCHA puzzles may end up advancing the cause of artificial intelligence research. You see, it turns out that crime actually does pay:

“[von Ahn, inventer of the reCAPTCHA test] has seen bounties as high as $500,000 offered for software to break it – enough to attract people with the skills to the task and five times more than the Loebner Grand Prize offers to the programmer who designs a computer that can truly pass the Turing test.

The demise of reCAPTCHA could, however, be beneficial.

It has users decode distorted text taken from historic books and newspapers that is beyond the ability of optical character recognition (OCR) software to digitise. Humans who fill in a reCAPTCHA are helping translate those books, and spam software could do the same.

“If [the spammers] are really able to write a programme to read distorted text, great – they have solved an AI problem,” says von Ahn. The criminal underworld has created a kind of X prize for OCR.

That bonus for artificial intelligence will come at no more than a short-term cost for security groups. They can simply switch for an alternative CAPTCHA system – based on images, for example – presenting the eager spamming community with a new AI problem to crack.

Indeed, it appears that the Google gang are doing exactly that:

“… the Google researchers were apparently able to come up with the new technique simply by looking into areas that computer scientists had identified as being problematic for computer-based solutions.

They apparently came up with image orientation. Humans can apparently properly orient a variety of images so that the vertical axis matches the real-world orientation of the photograph’s subject; computers can only handle a subset of these. […]

The basic idea behind their scheme is that any functional system will first have to eliminate any images that an automated system is likely to handle properly, as well as any that are difficult for humans to orient. So, for example, computers are good at recognizing things like faces in group shots, as well as horizons in landscape scenes, both of which provide sufficient information to orient the image. In other cases, the image doesn’t have enough information for either humans or computers to properly sort things out—the paper uses the example of a guitar on a featureless background, which could be oriented horizontally, vertically, or in the angled position from which it’s typically played.”

I wonder if there’ll ever be an end to this particular arms race? And, if there is, will it be heralded by the arrival of the Canned Ham Singularity? [image by freezelight]

A beautiful synergy

we_the_peopleIn a wonderful example of what Jeff Bezos describes as Artificial Artificial Intelligence researchers at Carnegie Mellon University have developed a system whereby words from old documents that cannot be read by OCR scanners are used as CAPTCHAs to prevent spamming ‘bots accessing websites, thereby simultaneously assisting in digitizing our written heritage and hindering malicious spammers, from the ScienceNOW article:

The team developed a new program, called reCAPTCHA, which collects words flagged as unreadable by optical scanners as they digitize texts. Those words, in the form of computer optical scans, are then sent to cooperating Web sites and used in place of random CAPTCHAs. The software presents one optically unreadable word and one “control” CAPTCHA word. Getting the control word right identifies the user as a human, and the program records his or her response to the unreadable word and adds it to a database.

[story at ScienceNOW via KurzweilAI.net][image from Thorn Enterprises on flickr]

Spammers Defeat CAPTCHA?

In the war between spammers and everyone else, the spammers may have captured new territory. A new trojan appears to be capable of bypassing the CAPTCHA systems on Yahoo and Hotmail, allowing spammers to create 500 bogus email addresses per hour. CAPTCHA tests are the distorted images of text that computers have previously been unable to read. They’re a kind of simple Turing Test meant to require a human behind a keyboard when creating a new email address.

I am suspicious of the claim that the trojan is actually somehow able to read these images, which have thus far been impossible to crack as a security measure. New Scientist Blog agrees. 500 an hour is not very fast. There is some trickery at work here, perhaps in the form of passing the CAPTCHAs from Hotmail to another website where humans are doing the solving work for the spammers.