Tag Archives: visual recognition

Software that learns to recognise faces and voices like a child

camera-head stencilsA computer scientist at the University of Pennsylvania has decided to mimic the way children learn to recognise faces and voices in order to speed up the artificial learning curve of intelligent systems:

Using novel learning algorithms that combine audio, video, and text streams, Taskar and his research team are teaching computers to recognize faces and voices in videos. Their system recognizes when someone in the video or audio mentions a name, whether he or she is talking about himself or herself, or whether he or she is talking about someone in the third person. It then maps that correspondence between names and faces and names and voices.

“An intelligent system needs to understand more than just visual input, and more than just language input or audio or speech. It needs to integrate everything in order to really make any progress,” Taskar says.

The information Taskar’s team feeds into the system is free training data harvested from the Internet. Attempts to teach computers visual recognition in the pre-Internet age were hampered in large part by a lack of training content. Today, Taskar says, the Internet provides a “massive digitization of knowledge.” People post videos, comments, blogs, music, and critiques about their favorite things and interests.

Hah! And they said YouTube would never do any real good! Taskar’s computer seems destined for a life of increasing frustration with irresolvable plot lines, though, as they’re training it by showing it episodes of Lost:

As Tasker’s team feeds more data about Lost into the computer—such as video clips, scripts, or blogs—the system improves at identifying people in the video. If, for example, a clip contains footage of characters Kate and Anna Lucia, after being taught, the computer will recognize their faces.

“The alogorithm is learning this from what people say, or from screenplays as well,” Taskar adds. “The screenplay doesn’t tell you who is who, but it tells you there’s a scene with [two characters] talking to each other.”

Taskar says the information the research has produced can be helpful in many ways, particularly in searching videos for content. Currently, if a father is searching for a photo of his daughter playing with the family dog in his gigabytes of photos and videos on his hard drive, unless the photo is tagged “daughter playing with dog,” chances are he isn’t going to be able to find it.

Well, that’s your consumer-level pitch, sure, but the system will be too large and ungainly (and expensive) for Joe Average for a long time. Tasker should probably talk to the UK government… that panoply of CCTV cameras keeps growing, and it costs big money to hire people to watch their output. And what could possibly go wrong with putting an automated recognition system in charge of crime prevention? [image by bixentro]

Hyperlinking reality

where_isResearchers at MOBVIS project are working on a pattern-recognition system that allows you to take a picture of a building on your mobile and have the software identify where you are and what you’re looking at:

…the genius of the system boils down to a higher-dimension, feature-matching algorithm developed by the University of Ljubljana in Slovenia, one of the partners of the project. It can very accurately detect minute but telling differences between similar objects, such as buildings or monuments, both by the appearance of the buildings themselves and their context in the streetscape.

Apparently the system gets it right about 80% of the time.

[from Physorg][from Unhindered by Talent on flickr]

Spam-trap Turing tests train smarter software

Email and comment spam is one of those constant low-grade annoyances that simply becomes part of the furniture if you spend a lot of time on the ‘net, as are the CAPTCHA puzzles you have to take to prove you’re a human. [image from Wikimedia Commons]

Signs are that they won’t be much use much longer, though; a UK researcher has been using the ‘twisted letters’ type of CAPTCHA to train his visual recognition algorithms, while a chap at Palo Alto has a program that can correctly identify cats and dogs 83% of the time – which, lets face it, is probably a better success rate than the average YouTube user can manage.

Sadly, training algorithms against Turing test spam-traps is no more likely to produce a recognisably intelligent piece of software than the Loebner Artificial Intelligence Prize is. But maybe one day we’ll be able to combine all the pieces… if they don’t beat us to it and combine themselves, of course. 😉