Upcoming Features in Adhearsion 1.2

Over the last couple weeks we have been hard at work on some cool new features for Adhearsion 1.2.  I will describe them in detail below, but let me start by asking that you help us out by testing in your environment prior to the actual release.  It is simple to do if your project already uses Bundler.  Just add the following line to your Gemfile:

gem 'adhearsion', :git => 'git://github.com/adhearsion/adhearsion.git', :branch => :develop

And now, the features:

Text-to-Speech Support

A long and often-requested feature, Adhearsion is finally getting properly abstracted support for Text-to-Speech engines. We are starting with support in Asterisk for two of the more commonly used Asterisk TTS engine: Cepstral and UniMRCP (in our case, used with the NeoSpeech engine). We also support TTS via Tropo for all you AGItate users.  Support for others is easy to add, so let us know (or better: send a pull request) if there is one we missed.

Direct TTS with #speak

Using TTS is so easy, it is a wonder we did not do this earlier.

First, configure your default TTS engine in config/startup.rb. This does assume that you have already configured the engine in Asterisk as well:

config.asterisk.speech_engine = :cepstral

Then just call #speak in your dialplan.rb or your component:

speak 'Hello, this is the voice of Adhearsion!'

You can also override the configured engine with an argument to #speak. This is handy for side-by-side comparison:

speak 'Four score and seven years ago...', :engine => :cepstral
speak 'Four score and seven years ago...', :engine => :unimrcp

But wait…there’s more! Using the excellent and brand new RubySpeech library by Ben Langfeld, you can easily form SSML (that is, Speech Synthesis Markup Language) to get even more control over your TTS output. SSML is supported natively by TropoCepstral and NeoSpeech (the backend behind our UniMRCP setup), and is the industry standard. Example:

require 'ruby_speech'

stuff_to_say = RubySpeech::SSML.draw do
  voice gender: :male, name: 'fred' do
    string "Hi, I'm Fred. The time is currently "
    say_as 'date', format: 'dmy' do
      "01/02/1960"
    end
  end
end

speak stuff_to_say

This will speak using the male voice Fred and hint the TTS backend to speak the string “01/02/1960” as a date. Cool, huh? SSML could probably be an entire blog post on its own. For more information about RubySpeech, see its Github repository, or even check out the official SSML specification.

TTS prompts with #input

In addition to calling #speak directly, we have extended the #input method to allow TTS prompts as well. Just pass :speak with a hash of options and your prompt will be spoken:

input 3, :speak => {:text => 'How much wood could a woodchuck chuck?'}
# Or, to specify the speech engine explicitly:
input 3, :speak => {:text => 'How much wood could a woodchuck chuck?', :engine => :unimrcp}

TTS as a fallback if a sound file is unplayable

Let us pretend that we have an application where a user is able to create a prompt. The prompt has text associated with it (for reporting purposes) and they also have the ability to upload or record the actual audio. What happens if the user forgets to record the actual prompt? Well, now that we have TTS support and the actual text of the prompt handy, we can cover for them!

play_or_speak 'user_prompt_1' => {:text => "This is the text of the user's prompt"}

Of course, we gave the same treatment to #input. If both :play and :speak are given, then Adhearsion will attempt to play the sound file and, if that fails, fall back to the TTS prompt:

input 5, :play => 'user_prompt_1', :speak => {:text => "This is the text of the user's prompt"}

Smarter #input

Be user friendly by terminating your #inputs early

Let us think to another hypothetical example: you want the caller to enter some amount in currency. To make the experience more user-friendly, you want the input request to terminate as soon as you get a valid response, instead of waiting for the full 7 digits required to represent $1000.00 (keyed as “1000*00”). How can you do this? Well, starting with Adhearsion 1.2, you can pass a block to #input that will determine when to stop collecting digits:

input 7, :speak => {:text => "How much is that doggie in the window?"} do |value|
  value =~ /\*\d\d/
end

This block will return something that evaluates to true if the digit string matches “00″ (or “” followed by any two digits), and false (actually, nil in this case) otherwise. Thus, if a caller enters “1*50” you know that he means “$1.50” without having to wait for the digit timeout.

Final thoughts

We at Mojo Lingo and the Adhearsion project are excited with how this framework is maturing. Thanks to constant feedback from the Adhearsion community we are making consistent improvements. Also, a special thanks goes to Mojo Lingo’s client IfByPhone for sponsoring development of most of the exciting new features going into Adhearsion 1.2 as well as RubySpeech.

Subscribe to our mailing list

* indicates required
I want to read about...
Email Format

What do you think?