How can a computer win at Jeopardy? Elementary, my dear Watson

Paul Raven @ 17-06-2010

This is not only an interesting story, but an engaging piece of journalism, and I heartily recommend you go read it: it’s an NYT magazine piece about Watson, an IBM artificial intelligence project headed by one David Ferucci that does something that artificial intelligences have heretofore been unable to do: beat human players at Jeopardy! [found in a tweet by @noahtron, which was retweeted by someone I follow who, regrettably, has slipped both my memory and my notetaking process – apologies for incomplete attribution]

I’ll pick out a few highlights for the short-on-time, but bookmark it for reading later anyway. We’ll start off with the methodology:

The great shift in artificial intelligence began in the last 10 years, when computer scientists began using statistics to analyze huge piles of documents, like books and news stories. They wrote algorithms that could take any subject and automatically learn what types of words are, statistically speaking, most (and least) associated with it. Using this method, you could put hundreds of articles and books and movie reviews discussing Sherlock Holmes into the computer, and it would calculate that the words “deerstalker hat” and “Professor Moriarty” and “opium” are frequently correlated with one another, but not with, say, the Super Bowl. So at that point you could present the computer with a question that didn’t mention Sherlock Holmes by name, but if the machine detected certain associated words, it could conclude that Holmes was the probable subject — and it could also identify hundreds of other concepts and words that weren’t present but that were likely to be related to Holmes, like “Baker Street” and “chemistry.”

In theory, this sort of statistical computation has been possible for decades, but it was impractical. Computers weren’t fast enough, memory wasn’t expansive enough and in any case there was no easy way to put millions of documents into a computer.

Those are no longer obstacles, of course, or at least not obstacles on the same scale. So, add multiple parallel algorithms, shake vigorously, and…

Watson’s speed allows it to try thousands of ways of simultaneously tackling a “Jeopardy!” clue. Most question-answering systems rely on a handful of algorithms, but Ferrucci decided this was why those systems do not work very well: no single algorithm can simulate the human ability to parse language and facts. Instead, Watson uses more than a hundred algorithms at the same time to analyze a question in different ways, generating hundreds of possible solutions. Another set of algorithms ranks these answers according to plausibility; for example, if dozens of algorithms working in different directions all arrive at the same answer, it’s more likely to be the right one. In essence, Watson thinks in probabilities. It produces not one single “right” answer, but an enormous number of possibilities, then ranks them by assessing how likely each one is to answer the question.

The result? Watson actually competes pretty well against players in the “winner cloud” of Jeopardy! performance, though it’s by no means cock of the rock. Not yet, anyway.

What made the article itself so enjoyable for me was the human story behind it – Ferucci comes across as a real Driven Man, striving to come first in a fiercely competitive and high-stakes scientific race:

Ferrucci refused to talk on the record about Watson’s blind spots. He’s aware of them; indeed, his team does “error analysis” after each game, tracing how and why Watson messed up. But he is terrified that if competitors knew what types of questions Watson was bad at, they could prepare by boning up in specific areas. I.B.M. required all its sparring-match contestants to sign nondisclosure agreements prohibiting them from discussing their own observations on what, precisely, Watson was good and bad at. I signed no such agreement, so I was free to describe what I saw; but Ferrucci wasn’t about to make it easier for me by cataloguing Watson’s vulnerabilities.

As with most AI projects, however, Watson only does one thing, though it (he?) does it pretty well. It’s a function with potential commercial uses (which is why IBM is still throwing money at Ferucci and team), but a general artificial intelligence needs to be able to do more than win at a certain quizshow format. The difficulties of producing a natural-language question-answering intelligence on a par with human learning were pretty neatly showcased by Wolfram|Alpha last year (which, despite being disappointing to the public, is a pretty impressive piece of work in its own right):

This, Wolfram says, is the deep challenge of artificial intelligence: a lot of human knowledge isn’t represented in words alone, and a computer won’t learn that stuff just by encoding English language texts, as Watson does. The only way to program a computer to do this type of mathematical reasoning might be to do precisely what Ferrucci doesn’t want to do — sit down and slowly teach it about the world, one fact at a time. […] Watson can answer only questions asking for an objectively knowable fact. It cannot produce an answer that requires judgment. It cannot offer a new, unique answer to questions like “What’s the best high-tech company to invest in?” or “When will there be peace in the Middle East?” All it will do is look for source material in its database that appears to have addressed those issues and then collate and compose a string of text that seems to be a statistically likely answer. Neither Watson nor Wolfram Alpha, in other words, comes close to replicating human wisdom.

So don’t go announcing the Singularity just yet, eh? Even so, it’s a pretty big leap that Ferucci and friends have made, and the practical applications should hopefully pay the way for more research. Weird times ahead… though Ferucci’s suggestion that Watson could replace call centre drones has a certain appeal.

Be Sociable, Share!