Cloud vs. Premise: How to choose a Voice Application Platform

Recently, a question was posed on a telephony mailing list:

“can anyone recommend a fast time-to-market development platform for voice apps? (Not LAMP + Asterisk)”

My answer: It depends a bit on your requirements. Let’s take a look at two high-level options: Cloud vs. Premise.


For a pure-cloud system, lets take a look at the likes of Tropo or Twilio.


  • Nice feature list. Especially for Tropo, you have immediate, no-extra-cost Text-to-speech and Speech Recognition, phone numbers in area codes around the world, and support for several programming languages
  • Simple pricing. Last I checked they’re both around $0.03/min — yes, that’s expensive relative to wholesale SIP, but you have no other costs
  • No infrastructure to run. Combine something like Twilio or Tropo with an app running on Heroku and you have zero infrastructure to manage. In Tropo’s case, you can even run the app directly on their servers
  • Best for simple apps. Low complexity lends itself to small, self-contained functionality, like a simple IVR or a call recording feature


  • Service/pricing options limited. In both cases you can’t bring in 3rd party providers. We’ve had customers in the past who wanted DIDs from regions not served by the cloud host. There’s no (good) way around that: you get whatever service they provide, nothing more. You also cannot shop around for better termination rates, or make quality/cost tradeoffs
  • Limited to simple applications. Twilio covers probably 80% to 90% of telephony use cases, but that leaves a lot of interesting applications out in the cold. Tropo is better about this, but there are still limits to the API. Call progress analysis (answering machine detection) is one example: it’s not available on either platform. Twilio also does not allow some interesting interactions between two live calls that are possible when you control the whole stack like a premise system


For many people, the right choice is to run the infrastructure themselves. Here are some reasons why. We will focus on Adhearsion in tandem with one of the supported telephony engines: Asterisk, FreeSWITCH or PRISM.


  • Functionality. Really sophisticated voice apps are possible in a real programming language (as opposed to extensions.conf or dialplan.xml). This means modern practices like unit testing, functional testing, access to a HUGE library of pre-built functionality (Rubygems), and easy integration with web services and/or databases, including databases behind your firewall
  • Control. Provides the most control possible over phone calls. You can bridge two calls together, tear them apart, redirect them, record media, play media, do ASR and TTS, integrate with instant messaging and web dashboards, etc.
  • Self-hosted or cloud. Adhearsion apps can run on your own infrastructure, if that is a requirement (for PCI or SOX or other compliance reasons, or simply as a matter of choice). Note that while you can self-host, you can also run Adhearsion in the cloud. We’ve done it on both Amazon AWS and Heroku
  • Shop around for rates. You can purchase VoIP services from whatever service providers you choose, and you can mix-and-match. Especially important at high volume or in obscure markets, or when you need different origination and termination providers


  • Learning Curve. You may already be familiar with Asterisk or FreeSWITCH. You may have to learn Adhearsion. These packages have a steeper curve (due to their capabilities) than either cloud offering above
  • Infrastructure costs. Most people who chose to employ Adhearsion end up running it themselves, whether hosted at facility or in their offices. Someone will need to manage these servers
  • Too much power. I’m not being flippant here: For some jobs the simpler tool is the right one. We use and love Adhearsion for most of our apps, but we’re usually doing more complicated things that need the functionality. I probably wouldn’t start out with Adhearsion if all I wanted to do was make a simple call recording app
  • Licenses needed for ASR and TTS. There are no good open source/free ASR or TTS engines available. If you need them, you’ll have to license them

Know Thyself

Informed decisions are the best decisions. Hopefully the above provides a starting point for more discussion about your own needs. Knowing exactly what features you need, and what regulatory or policy requirements to meet, will help guide the decision. Not every voice application is created equal, and not one size will fit all.

Want to hear more?

If you like what you've just read, sign up for our low-volume mailing list. Adam will make sure you get a copy of the good stuff as soon as it's ready. Just leave your email address for him here.

* = required field

Subscribe to our mailing list

* indicates required
I want to read about...
Email Format

What do you think?