Watson’s victory clear, but perhaps not as impressive as it seems

Paul Raven @ 17-02-2011

So, Watson won at Jeopardy!… by a pretty significant lead, too. Inevitably, lots of folk are keen to downplay this victory, and for a variety of reasons. Commonest complaint would have to be regarding Watson’s speed-to-buzzer advantage, but its minders designers say that it’s not really that big a deal:

Though Watson seemed to be running the round and beating Jennings and Rutter to the punch with its answers many times, Welty insisted that Watson had no particular advantage in terms of buzzer speed. Players can’t buzz in to give their questions until a light turns on after the answer is read, but Welty says that humans have the advantage of timing and rhythm.

“They’re not waiting for the light to come on,” Welty said; rather, the human players try to time their buzzer presses so that they’re coming in as close as possible to the light. Though Watson’s reaction times are faster than a human, Welty noted that Watson has to wait for the light. Dr. Adam Lally, another member of Watson’s team, noted that “Ken and Brad are really fast. They have to be.”

A re-run with some sort of handicap might prove this one way or the other, but I suspect the doubters will find new advantages to pin on the machine… which , to my mind, rather misses the point of the exercise, which was to demonstrate whether or not a machine could outperform humans at a particular task. Quod erat demonstrandum, y’know?

A more interesting point is that even Watson’s creators aren’t entirely sure how Watson achieves what it achieves. George Dvorsky:

Great quote from David Ferrucci, the Lead Researcher of IBM’s Watson Project:

“Watson absolutely surprises me. People say: ‘Why did it get that one wrong?’ I don’t know. ‘Why did it get that one right?’ I don’t know.”Essentially, the IBM team came up with a whole whack of fancy algorithms and shoved them into Watson. But they didn’t know how these formulas would work in concert with each other and result in emergent effects (i.e. computational cognitive complexity). The result is the seemingly intangible, and not always coherent, way in which Watson gets questions right—and the ways in which it gets questions wrong.

As Watson has revealed, when it errs it errs really badly.

This kind of freaks me out a little. When asking computers questions that we don’t know the answers to, we aren’t going to know beyond a shadow of a doubt when a system like Watson is right or wrong. Because we don’t know the answer ourselves, and because we don’t necessarily know how the computer got the answer, we are going to have to take a tremendous leap of faith that it got it right when the answer seems even remotely plausible.

Dvorsky’s underlying point here is that we shouldn’t be too cocky about our ability to ensure artificial intelligences think in the ways we want them to. They’re just as inscrutable as another human mind. Perhaps even more so… which is why he and Anders Sandberg (among others) believe we should foster a healthy fear of powerful AI systems.

But the most interesting point I’ve seen made about Watson’s victory is a skeptical stance over at Memesteading:

When Alex Trebek walked by the 10 racks of 9 servers each, said to include 2880 computing cores and 15 terabytes (15,000 gigabytes) of high-speed RAM main-memory, I couldn’t shake the feeling: this seems like too much hardware… at least if any of the software includes new breakthroughs of actual understanding. As parts of the show took on the character of an IBM infomercial, the feeling only grew.


An offline copy of all of Wikipedia’s articles, as of the last full data-dump, is about 6.5GB compressed, 30GB uncompressed – that’s 1/500th Watson’s RAM. Furthermore, chopping this data up for rapid access – such as creating an inverted index, and replacing named/linked entities with ordinal numbers – tends to result in even smaller representations. So with fast lookup and a modicum of understanding, one server, with 64GB of RAM, could be more than enough to contain everything a language-savvy agent would need to dominate at Jeopardy.

But what if you’re not language savvy, and only have brute-force text-lookup? We can simulate the kinds of answers even a naive text-search approach against a Wikipedia snapshot might produce, by performing site-specific queries on Google.

For many of the questions Watson got right, a naive Google query of the ‘en.wikipedia.org’ domain, using the key words in the clue, will return as the first result the exact Wikipedia article whose title is the correct answer.


With a full, inverse-indexed, cross-linked, de-duplicated version of Wikipedia all in RAM, even a single server, with a few cores, can run hundreds of iteratively-refined probe queries, and scan the full-text of articles for sentences that correlate with the clue, in the seconds it takes Trebek to read the clue.

That makes me think that if you gave a leaner, younger, hungrier team millions of dollars and years to mine the entire history of Jeopardy answers-and-questions for workable heuristics, they could match Watson’s performance with a tiny fraction of Watson’s hardware.

All of which isn’t to demean Watson’s achievement so much as to suggest that perhaps the same results could be reached with a much smaller hardware outlay… though there is an undercurrent of “Big Iron infomercial” in there, too.

Be Sociable, Share!

3 Responses to “Watson’s victory clear, but perhaps not as impressive as it seems”

  1. Sterling Camden says:

    Well sure. That’s just the next step, on the way to the desktop (or palmtop, or braintop, …)

  2. Rick York says:

    The most important thing about Watson was not his store of knowledge. It was his ability to understand natural language. Particularly, when you recognize that Jeopardy’s questions are loaded with puns, weird references and all sorts of other linguistic tricks. All of which are designed, much like crossword puzzle hints, to confuse even the brightest humans.

    Without the natural language capacity, no amount of data would suffice.

  3. Wintermute says:

    I find it interesting, although not surprising that nearly every original article and 2nd-order retweet-baiting link roundup of the “Watson victory”, including the usually minimally sensationalist National Public Radio, have talked about Watson’s victory as a “blow to humanity” or “Machines are becoming human” or “OHNOES TEH ROBOT OVERLORDS IZ COMIN!”. It reminds me of a koan bomb dropped by the AGI world’s Antichrist, John “Chinese Room” Searle, at an IBM conference on Cognitive Computing, Consciousness, Science Philosophy and Mind.

    “I was frequently asked by reporters at the time of the triumph of Deep Blue if I did not think that this was somehow a blow to human dignity. I think it is nothing of the sort. Any pocket calculator can out arithmetic any human mathematician. Is this a blow to human dignity? No, it’s a credit to the ingenuity of the IBM programmers and engineers.”

    And, I think, Watson is simply the spiritual descendant to the pocket calculator, and the Deep Blue human-beating spectacle / PR stunt, which, as before, is shrinkwrapped in mainstream TV, milked by journalists for the “Terminator” sense-a-terror, and candy coated in teh Singularitarian / AI-evangelist koolaid. This machine, while surely a laudable accomplishment by the IBM programmers and engineers, is not “coming alive” or “turning into Skynet” or “having cognitive processes” or bring us closer to conscious machines. What we have when the Madison Avenue lip gloss and the flashing lights and Sci-Fi Theatrics are stripped away is another expert system with a big brute-forced database designed specifically for reading Jeopardy clues and searching for single-output Jeopardy answers. Which is why the health care industry is all over this, because they already *have* expert systems that do the same thing, except theirs read patient symptoms and output single-output patient diagnoses. Ask a doctor: diagnosis can be at times, as or more confusing than simple Jeopardy puns. Breaking language down into syntactical components, analyzing massive amounts of language usage, and discerning statistical patterns is not exactly trivial, but any chatbot does basically that, just not on steroids. Ultimately what it is a very, very elaborate lookup table for computing probable outputs based on inputs. It’s “Looking up the answer” to the question in the Chinese Room, and returning it. No where along the way from Alex Trebek’s prompter to the return of the answer is the meaning of the words “Toronto” or “On The Road” actually understood.