When I lived in north Texas, it was easy to see a storm coming – the thunderheads gathered on the horizon to the west and you could see them building for hours before they arrived.
After
eComm 2009: Emerging Communications Conference I saw the same kind of storm clouds gathering in the west. This time rather than bringing wind, rain and lightening I believe they mark a significant shift in the evolution of speech recognition – from hosted
VXML based platforms to Telephony “in the cloud” or what’s being called Speech 2.0.
This shift builds on the trend towards
cloud computing; the hosting of applications and more importantly services in the cloud and provides a number of potential benefits including:
- Reduced or no front end capital investment
- Reduced operating expenses and easier upgrade/scalability
- Rapid speed to market deployment for new application
- Easier integration with and synchronization with existing web self-service
- Access to best-of-breed technologies
- Superior network reliability and redundancy
- Gigabyte interfaces for fast reliable operations
- Scalability to meet high or rapidly changing call volumes
- Compatibility with existing telephone and Web infrastructure
Several specific Speech 2.0 platforms were presented at
eComm, each with its own twist on features, supported interfaces and hosting options. All of them offer some form of free developer account so that you can sign up and try them yourself.
The platforms that I took note of were:
Tropo –
Voxeo introduced
Tropo, an in-the-cloud development platform that lets users create and deploy speech and telephony applications using a
simple API (application programming interface). The
API support application development in Groovy, JavaScript,
PHP, Python, and Ruby and is designed as an alternative to the standard XML- and
VoiceXML-based platforms that have become so common in the last few years. Applications can incorporate inbound calling via the public switched telephone network, Session Initiation Protocol (SIP),
Skype, and
iNum, while also providing appropriate connections for outbound calling. Capabilities include robust call control, playing and recording audio,
touchtone entry, speech recognition, text-to-speech, and
mashups with Web services. Planned application capabilities include call recording, conferencing, and Web services.
Twilio -
Twilio provides an in-cloud
API for voice communications that leverages existing web development skills, resources and infrastructure. Designed to enable web applications to be able to interact with phone callers,
Twilio allows you use your existing web development skills, existing code, existing servers, existing databases and existing karma to solve these problems quickly without the need to learn some foreign
telecom programming languages, or set up an entire stack of PBX software.
Twilio provides the infrastructure; you provide the business logic via HTTP. Currently
PHP, REST.
IfByPhone –
IfByPhone is a hosted voice application and platform company with a simplified approach to the deployment of stand-alone and web-integrated voice services for small and medium sized businesses (
SMB). Through a combination of telephony and web services
IfByPhone offers you
prebuilt applications or a programmable
API which enables you to create inbound or outbound calls or other
IVR functionality. The configuration and deployment tools look and feel just like Web applications, and require no previous knowledge of telephony programming or terminology.
In the spirit of full disclosure, I use a mash-up of
IfByPhone’s applications, a “Click to Call” button on my own website which allows callers to connect to me directly from my website. It prompts the caller for their phone number, places a call to them, and then invokes a “Find Me” application that places simultaneous calls to me at several possible numbers and bridges which ever number I answer on to the caller.
Jaduka –
Jaduka provides a SOAP-based Web Services interface which enables companies to easily blend voice into their
workflow activities
Jaduka's Web services
API makes adding the benefits of voice communication to enterprise applications as easy as constructing a
mash-up.
I’m sure there are other platforms that I’
ve missed. If I’
ve overlooked you, drop me a note.
These cloud-based telephony providers start with a hosted platform, accessible via the open Internet and provide a an
API whose premise is to enable developers not familiar with speech recognition or telephony to incorporate a speech based channel into their existing application infrastructure without having to learn a great deal of domain specific skills or development languages, like
VXML or
CCXML that are typically used today to build voice enabled applications. Each varies slightly but in general the primary interfaces are built on common web based development languages and protocols such as
REST,
SOAP,
PHP,
Groovy,
JavaScript,
Python, and
Ruby. In addition, some of these platforms have also developed more complete applets or small applications which can be used right of the box or with a nominal amount of configuration.
In theory, this approach makes it easy for developers to add voice interfaces to existing applications. But as your mother reminded you when you were are child: “just because you can
does not mean you should”. Speech enabled applications have taken a long time to reach the main stream success. There are many reasons for this, one of which was the need to develop experience in what’s required to provide users with an interface that works well for the caller. There is as much art to this as there is science and technology. As with many endeavors, our first attempts often fail to meet expectations. Developers using these cloud based platforms will still need to design and implement good users interface practices for voice which are much different than those applied in visual application interfaces. Much has been written about this, so I’ll leave that topic to another conversation.
This evolution in platforms when added to the strong uptake in hosted platforms is sure to have significant impact on the business models in the speech recognition and telephony industries. With the concentration of fewer and larger consumers of the core technology, pricing and volume of sales with no doubt change for firms like Nuance and the various
IVR hardware vendors.
Over the next month, I'll explore these new cloud based platform individually by building a application on each and sharing my experience and results with you. I'll wrap up this series of posts with some kind of comparison post giving you my perspective on the pros and cons of each. Stay tuned!
If you're already using or experimenting with one of these cloud based telephony platforms I'd love to hear from you about your experience as well.