A conversation on browser-based speech recognition

About a month ago, I got into a brief Twitter conversation with the brilliant fellow who wrote Annyang, Tal Ater. Twitter sadly does not do a great job of threading the conversation, but there were some excellent points made. I wanted to share them with the Mojo Lingo audience.

You can read the whole exchange below, but I think there are a few key takeaways:

  • Browser-based speech recognition is important, and will become even more important
  • Browser-based speech recognition has a way to go to be universally available and as useful as it should be to developers (and, ultimately, to users)
  • This area is getting a lot of attention, especially from developers
  • We need browser makers to get this done!
  • As speech recognition continues to get cheaper, more accurate, and more widely available, some really exciting products will become reality

Here is the entire conversation:

@bklang @benlangfeld Just watched a couple of your talks from @adhearsion. Very interesting. It was great seeing #annyang demoed on stage.

@bklang @benlangfeld If you've used it anywhere, I'd love to see it, as well as hear your feedback.

@TalAter @benlangfeld Thanks! Annyang is pretty cool. We've demoed it several times, but haven't deployed. Main problem: browser support :(

@bklang @benlangfeld Yes, browsers are slow to catch on… But I've seen resourceful hackers using it on anything from Arduino to AR Drones.

@TalAter @benlangfeld In this case, it's not just "catching on". Major problems to adoption: need a speech recognizer (and they aren't free)

@TalAter @benlangfeld Other problems: the spec isn't complete/ratified; Chrome implementation does not allow selecting alternate recognizer

@TalAter @benlangfeld I did find a partial @Firefox implementation from GSoC, but I don't think it was merged, probably for those reasons

@bklang @benlangfeld Major boons to adoption: Browsers are built by companies with deep pockets. Also, client side recognition is an option.

@TalAter @benlangfeld I wish I were that optimistic. Client-side would be poor quality and expensive to build/maintain: different API per OS

@TalAter @benlangfeld Server-side would need to be licensed, not-cheap at scale. ASR market has too few competitors esp for open-ended recog

@bklang @benlangfeld My hope is that once people get used to SR as a basic feature in their car, Android, Siri, Google Glass, they won't

@bklang @benlangfeld settle for anything less in their browser. It won't be just a fancy feature you can ignore, but a basic requirement.

@TalAter @benlangfeld Me too! I think good/cheap (enough) speech recog WILL eventually happen…just hope it's sooner rather than later

@bklang @benlangfeld and I do wish Chrome implemented grammar. I'm tired of it calling me Tall, and annyang not knowing its own name.

Thanks Tal for your insight, and thank you for Annyang. I look forward to continuing the conversation and working together for a better, more speech-enabled world.

Don’t forget to follow both Tal (@TalAter) and myself (@bklang) and join in the conversation!

Subscribe to our mailing list

* indicates required
I want to read about...
Email Format

What do you think?