YouTube recently introduced automatic captioning for its videos. "O frabjous day! Callooh! Callay!" said people* I work with who are faced with the time-consuming and tricky task of creating transcripts for videos. Having seen some very … odd … output from high-end, state-of-the-art, speech recognition software, I suggested they wait to see for themselves before getting too excited.
I could easily make this a "bloopers" post, but it's almost too easy. Did you see the one where Massachusetts Secretary of Education Paul Reville says, "what the application does" but the automatic captioning is, "for the prostitutes"? That particular video is transcribed so poorly that it's not clear that it would save any time to edit their transcript file over doing it yourself.
But I think a better measurement of accuracy is looking at how a particular word or phrase is transcribed in the same video. Let's take a look at Mark Roth giving a TEDTalk, Suspended animation is within our grasp where he uses the phrase "hydrogen sulfide" fifteen times. It's correctly transcribed five times, including the first time he says it. The rest are pretty far off, and give you no idea what he's talking about. This is a problem - it's a key fact in his presentation.
- hydrogen sulfide (8:42)
- I didn't sell five (8:44)
- hydrogen sulfide (9:04)
- some hydrant sell fighting (10:17)
- hundreds cell five (10:48)
- hydrogen sulfide (11:14)
- him I didn't sell fight (11:36)
- I didn't sell five (13:43)
- hydrogen sulfide (14:04)
- hydrogen sulfide (14.17)
- hide himself I (14:46)
- apply to sell five (14:55)
- the party itself by (16:18)
- I didn't sell side (16:28)
- high turn still fighters (17:07)
The take-away is clear. If you care about accuracy, if you want people to be able to understand what is actually being said, do not rely solely on automatic captioning.
* No, they didn't really say that. That's from Lewis Carroll's poem, The Jaberwocky. I wonder what YouTube would make of that?