Auto-garbled captioning

YouTube recently introduced automatic captioning for its videos. "O frabjous day! Callooh! Callay!" said people* I work with who are faced with the time-consuming and tricky task of creating transcripts for videos. Having seen some very … odd … output from high-end, state-of-the-art, speech recognition software, I suggested they wait to see for themselves before getting too excited.

I could easily make this a "bloopers" post, but it's almost too easy. Did you see the one where Massachusetts Secretary of Education Paul Reville says, "what the application does" but the automatic captioning is, "for the prostitutes"? That particular video is transcribed so poorly that it's not clear that it would save any time to edit their transcript file over doing it yourself.

But I think a better measurement of accuracy is looking at how a particular word or phrase is transcribed in the same video. Let's take a look at Mark Roth giving a TEDTalk, Suspended animation is within our grasp where he uses the phrase "hydrogen sulfide" fifteen times. It's correctly transcribed five times, including the first time he says it. The rest are pretty far off, and give you no idea what he's talking about. This is a problem - it's a key fact in his presentation.

  1. hydrogen sulfide (8:42)
  2. I didn't sell five (8:44)
  3. hydrogen sulfide (9:04)
  4. some hydrant sell fighting (10:17)
  5. hundreds cell five (10:48)
  6. hydrogen sulfide (11:14)
  7. him I didn't sell fight (11:36)
  8. I didn't sell five (13:43)
  9. hydrogen sulfide (14:04)
  10. hydrogen sulfide (14.17)
  11. hide himself I (14:46)
  12. apply to sell five (14:55)
  13. the party itself by (16:18)
  14. I didn't sell side (16:28)
  15. high turn still fighters (17:07)

The take-away is clear. If you care about accuracy, if you want people to be able to understand what is actually being said, do not rely solely on automatic captioning.

* No, they didn't really say that. That's from Lewis Carroll's poem, The Jaberwocky. I wonder what YouTube would make of that?

4 responses
I tried triggering an automatic caption when I finally saw the feature one day. I created captioning for some YouTube clip from the recent Oscars, and had a good laugh. When I returned a few days later, it had been removed for some copyright violation.

I was a bit amused by the ease with which I could start a captioning process. I imagined starting some activism, getting everyone to press the Caption button all over YouTube. However, I didn't know how to edit. (I didn't poke around that much.) Having said "A" (create captions), they need to say "B" (now here's how to edit them").

Karen, right now, you can only edit the transcripts of your own videos. It allows you to download the captioning file so you can edit the transcript or timings, and then upload it. Wouldn't it be lovely if there were crowdsourcing options, though? I'd be worried about vandalism; there would need to be a way for corrections to be submitted for review. Maybe Google will work on the next since it would benefit them, too, to have accurate transcripts to index.
@Sarah There actually are crowdsourcing options. Dotsub.com is one I've used a lot. Just provide DotSub with the YouTube URL and some meta data, upload the Google-generated captions if they're any good at all, and let the community edit the captions and translate them into other languages. TED has used this site to translate over 5000 videos into over 70 languages: http://www.ted.com/OpenTranslationProject
@Terrill, thanks for the great tip. I wonder if any governments are doing something like this? (I couldn't find any in a few quick searches.) It's interesting that TED is only using it for translations, providing the transcriptions themselves.