Tag Archives: speech-to-text

The Speakularity is Near

NPR’s Matt Thompson crops up at NiemanLab‘s “Predictions For Journalism 2011” with a suggestion that’ll make the volume of the Wikileaks cable-dump look like a drop in the ocean [via MetaFilter]. The nub of the theory is this: pretty soon, automatic speech transcription is going to be cheap and widespread… and that means absolute masses of journalistic material will become easily and cheaply available.

So much of the raw material of journalism consists of verbal exchanges — phone conversations, press conferences, meetings. One of journalism’s most significant production challenges, even for those who don’t work at a radio company, is translating these verbal exchanges into text to weave scripts and stories out of them.

After the Speakularity, much more of this raw material would become available. It would render audio recordings accessible to the blind and aid in translation of audio recordings into different languages. Obscure city meetings could be recorded and auto-transcribed; interviews could be published nearly instantly as Q&As; journalists covering events could focus their attention on analyzing rather than capturing the proceedings.

Because text is much more scannable than audio, recordings automatically indexed to a transcript would be much quicker to search through and edit. Jon Stewart’s crew for The Daily Show uses expensive technology to process and search through the hundreds of hours of video the various news programs air each week. Imagine if that capability were opened up to citizens — if every on-air utterance of every pundit, politician, or policy wonk were searchable on Google.

The very first thing I can imagine would be all the Googlephobes wailing about privacy and data monopolies… but Thompson makes a valid point here, which is the potential for a disconnection between the production of the raw materials of journalism – interviews, press conferences, Q&As, etc etc – from the analysis, comparison and synthesis of that material.

Obviously that’s going to mean further job losses in the journalism sector; that production work would be done by the folk at the bottom of the office hierarchy, or so I assume, so it’s not all kittens and roses. Heck, once the tech becomes cheap and ubiquitous enough, it’ll open up the field to independent journalists and small niche venues in a way that’s never been logistically or economically sustainable before (though whether there’ll be a good way to monetise those niche verticals is another question entirely); the privileged access and momentum of the big venues will be hard to maintain, and that may lead to a fall in quality… though that will depend on how one defines quality journalism, of course, which is another open question.

But the most important factor would be the widespread access to the raw materials – not just to journalists, but to the public. Storage is cheap, and text doesn’t eat much bandwidth; there’d be no reason not to upload the entire transcript of an interview for those who wanted to read it alongside the edited highlights and pull-quotes. Indeed, those venues that failed to make said materials available would start to look as if they had something to hide… after all, recent events suggest that transparency will become a big issue in the near future, wouldn’t you say?