Monday, April 27, 2009

The Clouds are Gathering - Speech 2.0

When I lived in north Texas, it was easy to see a storm coming – the thunderheads gathered on the horizon to the west and you could see them building for hours before they arrived.

After eComm 2009: Emerging Communications Conference I saw the same kind of storm clouds gathering in the west. This time rather than bringing wind, rain and lightening I believe they mark a significant shift in the evolution of speech recognition – from hosted VXML based platforms to Telephony “in the cloud” or what’s being called Speech 2.0.

This shift builds on the trend towards cloud computing; the hosting of applications and more importantly services in the cloud and provides a number of potential benefits including:


  • Reduced or no front end capital investment
  • Reduced operating expenses and easier upgrade/scalability
  • Rapid speed to market deployment for new application
  • Easier integration with and synchronization with existing web self-service
  • Access to best-of-breed technologies
  • Superior network reliability and redundancy
  • Gigabyte interfaces for fast reliable operations
  • Scalability to meet high or rapidly changing call volumes
  • Compatibility with existing telephone and Web infrastructure
Several specific Speech 2.0 platforms were presented at eComm, each with its own twist on features, supported interfaces and hosting options. All of them offer some form of free developer account so that you can sign up and try them yourself.

The platforms that I took note of were:

TropoVoxeo introduced Tropo, an in-the-cloud development platform that lets users create and deploy speech and telephony applications using a simple API (application programming interface). The API support application development in Groovy, JavaScript, PHP, Python, and Ruby and is designed as an alternative to the standard XML- and VoiceXML-based platforms that have become so common in the last few years. Applications can incorporate inbound calling via the public switched telephone network, Session Initiation Protocol (SIP), Skype, and iNum, while also providing appropriate connections for outbound calling. Capabilities include robust call control, playing and recording audio, touchtone entry, speech recognition, text-to-speech, and mashups with Web services. Planned application capabilities include call recording, conferencing, and Web services.

Twilio - Twilio provides an in-cloud API for voice communications that leverages existing web development skills, resources and infrastructure. Designed to enable web applications to be able to interact with phone callers, Twilio allows you use your existing web development skills, existing code, existing servers, existing databases and existing karma to solve these problems quickly without the need to learn some foreign telecom programming languages, or set up an entire stack of PBX software. Twilio provides the infrastructure; you provide the business logic via HTTP. Currently PHP, REST.

IfByPhoneIfByPhone is a hosted voice application and platform company with a simplified approach to the deployment of stand-alone and web-integrated voice services for small and medium sized businesses (SMB). Through a combination of telephony and web services IfByPhone offers you prebuilt applications or a programmable API which enables you to create inbound or outbound calls or other IVR functionality. The configuration and deployment tools look and feel just like Web applications, and require no previous knowledge of telephony programming or terminology.

In the spirit of full disclosure, I use a mash-up of IfByPhone’s applications, a “Click to Call” button on my own website which allows callers to connect to me directly from my website. It prompts the caller for their phone number, places a call to them, and then invokes a “Find Me” application that places simultaneous calls to me at several possible numbers and bridges which ever number I answer on to the caller.

JadukaJaduka provides a SOAP-based Web Services interface which enables companies to easily blend voice into their workflow activities Jaduka's Web services API makes adding the benefits of voice communication to enterprise applications as easy as constructing a mash-up.

I’m sure there are other platforms that I’ve missed. If I’ve overlooked you, drop me a note.

These cloud-based telephony providers start with a hosted platform, accessible via the open Internet and provide a an API whose premise is to enable developers not familiar with speech recognition or telephony to incorporate a speech based channel into their existing application infrastructure without having to learn a great deal of domain specific skills or development languages, like VXML or CCXML that are typically used today to build voice enabled applications. Each varies slightly but in general the primary interfaces are built on common web based development languages and protocols such as REST, SOAP, PHP, Groovy, JavaScript, Python, and Ruby. In addition, some of these platforms have also developed more complete applets or small applications which can be used right of the box or with a nominal amount of configuration.

In theory, this approach makes it easy for developers to add voice interfaces to existing applications. But as your mother reminded you when you were are child: “just because you can does not mean you should”. Speech enabled applications have taken a long time to reach the main stream success. There are many reasons for this, one of which was the need to develop experience in what’s required to provide users with an interface that works well for the caller. There is as much art to this as there is science and technology. As with many endeavors, our first attempts often fail to meet expectations. Developers using these cloud based platforms will still need to design and implement good users interface practices for voice which are much different than those applied in visual application interfaces. Much has been written about this, so I’ll leave that topic to another conversation.

This evolution in platforms when added to the strong uptake in hosted platforms is sure to have significant impact on the business models in the speech recognition and telephony industries. With the concentration of fewer and larger consumers of the core technology, pricing and volume of sales with no doubt change for firms like Nuance and the various IVR hardware vendors.

Over the next month, I'll explore these new cloud based platform individually by building a application on each and sharing my experience and results with you. I'll wrap up this series of posts with some kind of comparison post giving you my perspective on the pros and cons of each. Stay tuned!

If you're already using or experimenting with one of these cloud based telephony platforms I'd love to hear from you about your experience as well.

2 comments:

Danielle Morrill said...

Hi Jeff,

Thanks for taking a look at Twilio, we're looking forward to playing with the application you cook up with our platform. If you need any help please feel free to drop the team a line at help@twilio.com - we're listening!

Cheers,
Danielle Morrill
Community Manager @ Twilio

Iván said...

Hi Jeff,

We agree with this vision, it a great article about telephony / speech 2.0 and answering to your suggestion of missed solutions; you could talk some words about Asterisk PBX the most extended telephony open source software that some of these players are using as a part of their API or hosted VoIP services...

Today Asterisk ecosystem include addons like VXI* VoiceXML browser, is able to run advanded VoiceXML IVR apps or provide a hosted IVR service for a big server farms, even using cloud virtual servers OS.

Telephony 2.0 new startups can this software as a component of their IT infrastructure to provide value-added services over it.

Building blocks of Speech 2.0 are very insteresting for readers that need to start their own projects.

Bests,
Ivan Sixto
Business Dev. Manager @i6net