Thursday, April 09, 2009

Collecting SLM Data in an existing self service application

I had the opportunity to call Comcast (my internet service provider) this week about a problem with my cable modem.

The call was answered by the usual self-service application that asked me a series of questions (a directed dialog style) including my home phone number and then if I was calling about my cable TV service, Internet service or phone service. At that point the application took an unexpected change of direction by asking me to state why I was calling. After I said "Internet not working" the application informed me that it collected that information for a future application enhancement then went back to the normal directed dialog style with 5 or 6 more questions. No doubt, the data collected will be pared with my ultimate destination and purpose in the call to help develop a natural language component.

I immediately thought of a post by Phillip Hunter this week on his Design-Outloud blog and the related article in SpeechTech Magazine: Is it Natural? I'd encourage you to read both, especially Phillip's blog post. These will give you a good overview of the challenges to this approach.

It's obvious that Comcast is gathering data to add some kind of "Natural Language" capability to their customer self service application. This would imply that Comcast has a very high call volume for this application, as it typically costs a large 6 figure number to build a Natural Language call steering application based on SLM (statistical language modeling). It's often difficult to justify the expense of building and maintaining this kind of approach.

While this approach is often very productive and cost justified for high volume callers, an equally workable approach with a slightly narrower focus can be developed using a typical SRGS grammar for a much lower cost, using an technique known as grammatical inference.

Essentially grammatical inference tools rely on artificial intelligence to build your SRGS and GSL grammars automatically based on example utterances. Just as you build an SLM grammar using caller utterances, with grammatical inference, you feed the utterances into a ‘grammar learning tool’ which outputs a set of grammar rules in whatever format you require (e.g. GSL or SRGS). The grammar learner has a fundamental knowledge of the language that you are building the grammar for (e.g. English) and combines this with your utterances to produce a set of grammar rules. Unlike an SLM approach, grammatical inference allows you to build a usable grammar with only a very small number of sample utterances (‘tens of utterances’ rather than ‘tens of thousands’). Of course, if more training data is available, you can feed the grammar learner as much as you like. One such tool that I've used is offered by Inference Communications Pty. Ltd. I'm sure others are out there waiting to be found.

When working with my clients, I often find that major speech vendors jump straight to a recommendation of Natural Language (with glimmers of large licensing and professional services fees in their eyes) when a more conservative and less expensive approach "may" work just as well.

Regardless of the direction you choice, consult with a skilled voice interaction designer who can help you parse through the pros & cons for each approach and choose the right one for you and your callers.

P.S. Also worthy of mentioning this week is a new web site from The Association for Voice Interaction Design. If you have anything more than a passing interested in voice interaction design, they have some great references to other blogs, websites and publications.

No comments: