… so watch what you say when the webcam’s plugged in, eh?
A research team from the School of Computing Sciences at UEA compared the performance of a machine-based lip-reading system with that of 19 human lip-readers. They found that the automated system significantly outperformed the human lip-readers – scoring a recognition rate of 80 per cent, compared with only 32 per cent for human viewers on the same task.
Furthermore, they found that machines are able to exploit very simplistic features that represent only the shape of the face, whereas human lip-readers require full video of people speaking.
The study also showed that rather than the traditional approach to lip-reading training, in which viewers are taught to spot key lip-shapes from static (often drawn) images, the dynamics and the full appearance of speech gestures are very important.
Using a new video-based training system, viewers with very limited training significantly improved their ability to lip-read monosyllabic words, which in itself is a very difficult task. It is hoped this research might lead to novel methods of lip-reading training for the deaf and hard of hearing.
Might this be a short-cut around the persistent problem of poor voice-recognition software? Why analyse the sound is you can do a better job by watching the face producing it? [via Technovelgy; image by i_forbes, chosen for an old yet oddly topical cultural reference that I suspect no one under 25 is likely to get]
Computers usually do outperform humans when it comes to things they are programmed to do. No doubt about it. Although, I personally would rather hear a broken up speech from a friend then have a computer tell me what s/he is saying. Just personal preference.