VoiceXML Tutorial: Part 1 Introduction and User Interaction with DTMF

1 VoiceXML Tutorial: Part 1 Introduction and User Interac...

Author: Samuel Walters

0 downloads 0 Views

1 VoiceXML Tutorial: Part 1 Introduction and User Interaction with DTMFPresented by Plum Voice

2 About Plum Plum offers high-performance, versatile, and scalable IVR systems and hosting that can automate any phone. We pride ourselves on delivering solutions that satisfy customers ranging from small and medium-sized business to some of the largest enterprises in the world.

3 1. VoiceXML Tutorial VoiceXML 2.0 is the World Wide Web consortium standard for scripting voice applications. In this tutorial, we construct a VoiceXML interactive voice response (IVR) for a customer service center. Some aspects of this tutorial assume you have your own web server. For a full production level application, this is the recommended configuration. Starting from a simple "Hello World" application, we build a telephony application which includes:  Dynamic response driven by touch tone or speech input Advanced text-to-speech (TTS) speech synthesis and automatic speech recognition (ASR) System integration with enterprise databases

4 1.1 Introduction to VXML We begin with nearly the simplest complete VoiceXML application. The application here is analogous to an answering machine set to play an announcement only.

In this example, the user would hear a synthesized voice say, "Welcome to Plum Voice." Then the system would simply hang up. The

defines the basic unit of interaction in VoiceXML. This form includes only a single of executable content which in turn includes a single to the user. By default, any plain text within a prompt is passed to the system's text-to-speech (TTS) synthesis engine to be generated as audio.

5 1.1 Introduction to VXML Also, as the tag declares, every VoiceXML document is an XML document. The basic structure of the VoiceXML should be familiar to anyone who has looked at HTML web documents. Tags are set off by brackets and are closed with a forward slash . VoiceXML documents must adhere strictly to the XML standard. The document must begin with the tag. Then the rest of the document is enclosed within the tags. Unlike HTML, all tags must be closed and certain special characters must be escaped with a safe alternative. For example, the less than sign <, when it is not used to open a tag, must be escaped with a safe alternative (e.g. <).

6 1.1 Introduction to VXML For static prompts such as this welcome message, we'll probably want to use a human announcer instead of TTS. TTS has come a long way, but there's still no substitute for the real thing. For recorded prompts, we use the Welcome to Plum Voice. In this case, the source ("src") reference is relative to the VXML document URL in which it appears. WAV files are a generic container type. WAV files include a header which indicates the actual audio sample size, encoding, and rate used. Supported formats vary by VoiceXML implementation and not all possible WAV file formats are supported. The Plum Voice Platform supports 8 kHz audio files in 16 bit linear, 8 bit µ-law (u-law), or 8 bit A-law encoding in WAV files or headerless files.

7 1.1 Introduction to VXML The text within the audio tag is not required. We could have included no content: which is equivalent to The text included within the audio tag in the example above is something like the ALT text for images in HTML. If the VoiceXML platform is unable to open or play the source ("src") file in the audio tag, it falls back on generating TTS from the included text.

8 1.1 Introduction to VXML It is good practice to store your audio files on the same local server as your application script. For example, here is what our server files would look like on our local server. From the screenshot above, note that in the files folder of our local server, test.php is our script that contains the reference to the file, welcome.wav.

9 1.1 Introduction to VXML welcome.wav is stored in our wav folder. Thus, when referencing the source ("src") file in our audio tag, we do: Welcome to Plum Voice. The benefit of storing audio files on your local server as opposed to the audio repository is that it allows for easier file management. Suppose you wanted to change the name of one of your audio files. If this file is stored locally on your server, you could just go in and rename the file yourself. However, with the audio repository, you are not able to manage these files. For example, if you deleted a recording in the audio repository (in this case, let's call it 12.wav) and uploaded a replacement file, the replacement file would not take the deleted recording's old name. It would take the next highest number available out of your recordings (in this case, let's say it got named 21.wav). If you are concerned about loading times for audio files from your local server, please note that when these audio files have been cached, they will have the same load times as if stored on our audio repository. 0

10 1.2 User Interaction with DTMFGrammars are used by speech recognizers to determine what the recognizer should listen for, and so describe the utterances a user may say. Starting with VoiceXML Version 2.0, the W3C requires that all VoiceXML platforms must support at least one common format, the XML Form of the W3C Speech Recognition Grammar Specification (SRGS). Plum implements the SRGS+XML grammar format for both Voice and DTMF grammars as well as JSpeech Grammar Format (JSGF).

11 1.2 User Interaction with DTMFTo control user input, we can explicitly create input fields and specify allowable grammars for user input. We do this by explicitly using the tag for each inside a

. The element is used to provide a speech (or DTMF) grammar that:  Specifies a set of utterances or DTMF key presses that a user may speak or type to perform an action or supply information.  Returns a corresponding semantic interpretation for a matching input.

12 1.2 User Interaction with DTMFThe following example shows how to set up a grammar for DTMF input from the user: 1|2|3 For sales, press 1. For tech support, press 2. For company directory, press 3. Welcome to sales. Welcome to tech support. Welcome to the company directory. Here we specify a grammar for the field using JSGF (Java Speech Grammar Format) grammar syntax which is the default syntax for the Plum Voice Platform. To do this example in SRGS+XML format, it would look like this.

13 1.2 User Interaction with DTMF

From this example, notice that the SRGS+XML grammar in this example is longer than the JSGF grammar in the example before it. For numeric input, JSGF is often a shorter alternative.

14 About us Plum Voice was founded in 2000 as The Plum Group Inc. With headquarters in New York and offices in New York City, Boston, Denver and London, Plum creates technologies for personalized audio communication. Plum provides interactive voice response platforms, systems and hosting services to developers and companies to automate call center and business process over the phone. Products and services include: The Plum VoiceXML Platform Plum IVR Hosting Suite Plum Survey Plum IVR Server Plum Professional Services QuickFuse

15 Up Next: User Interaction with Speech, Built-in Grammars, and Standard Events

VoiceXML Tutorial: Part 1 Introduction and User Interaction with DTMF

Recommend Documents