Dialogic® Continuous Speech Processing API

**project girl** · 08-11-2012, 02:32 PM

Dialogic® Continuous Speech Processing API

.pdf

Continuous Speech.pdf (Size: 531.51 KB / Downloads: 72)

Purpose

This publication provides guidelines for building applications using the Dialogic® Continuous
Speech Processing (CSP) Software and Dialogic® Voice Software in a Linux or Windows®
environment.
It is a companion guide to the Dialogic® Continuous Speech Processing API Library Reference
which provides details on functions and parameters in the Dialogic® CSP library.

Applicability

This document version (05-1699-006) is published for Dialogic® Host Media Processing Software
Release 3.1LIN.
This document may also be applicable to other software releases (including service updates) on
Linux or Windows® operating systems. Check the Release Guide for your software release to
determine whether this document is supported.

How to Use This Publication

Refer to this publication after you have installed the hardware and the system software which
includes the CSP software.
This publication assumes that you are familiar with the Linux or Windows® operating system and
the C programming language. It is helpful to keep the Dialogic® Voice API Library Reference and
Dialogic® Voice API Programming Guide handy as you develop your application.

Key Features

The Dialogic® CSP Software provides a high-level interface to Dialogic® boards and is a building
block for creating host-based automatic speech recognition (ASR) applications. Dialogic® CSP
Software gives you the ability to stream voice-activated, pre-speech buffered, echo-cancelled voice
data to an ASR engine.
Dialogic® CSP Software consists of a library of functions, device drivers, firmware, sample
demonstration programs and technical documentation to help you create leading-edge ASR
applications. It is an enhancement to existing echo cancellation resource (ECR) and barge-in
technology.

Key features of CSP include:

• Full-duplex operation, which means the capability of simultaneously sending and receiving
(playing and recording) voice data on a single CSP channel.
• Echo canceller that significantly reduces echo in the incoming signal (up to 64 ms on select
Dialogic® DM3 boards and up to 16 ms on Dialogic® Springware boards).
• Voice activity detector (VAD) that determines when significant audio energy is detected on the
channel and enables data to be sent only when speech is present, thereby reducing CPU
loading.

CSP Components

The Dialogic® CSP Software consists of several CSP components, many of which reside in the
firmware level of the board:
• echo canceller
• voice activity detector (VAD)
• pre-speech buffer
• barge-in and voice event signaling
• streaming or recording
Figure 1 depicts the data flow from the network to the CSP voice channel. This figure shows how
echo is introduced in the signal in the network and how it is cancelled. It also illustrates the option
of sending echo-cancelled data over the TDM bus to another board, regardless of whether this
second board is CSP-capable or not.

Echo Canceller Overview

The echo canceller is a component in the Dialogic® CSP Software that is used by applications to
eliminate echo in the incoming signal. In the scenario described in Section 1.2, “CSP
Components”, on page 12, the incoming signal is the utterance “Steve Smith.” Because of the echo
canceller, the “Steve Smith” signal has insignificant echo and can be processed more accurately by
the speech recognition engine.
Figure 3 shows a close-up view of how the echo canceller works. After the incoming signal is
processed by the echo canceller, the resulting signal no longer has significant echo and is then sent
to the host application.

Tap Length

The duration of an echo is measured in tens of milliseconds. An echo canceller can remove some
limited number of these milliseconds, and this number is known as the length of the echo canceller.
The length of an echo canceller is sometimes given as “taps,” where each tap is 125 microseconds
long.
The longer the tap length, the more echo is cancelled from the incoming signal. However, this
means more processing power is required. When determining the tap length value, consider the
length of the echo delay in your system as well as your overall system configuration.
On Dialogic® DM3 boards, the media load which is downloaded when you start the board
determines what tap length values are supported. Some Dialogic® DM3 boards support one value
only, 128 taps (16 ms). Other Dialogic® DM3 boards support 512 taps (64 ms). For information on
media loads, see the appropriate Configuration Guide for your product or product family. For
information on tap length support on Dialogic® DM3 boards, see the Release Guide for the system
release you are using.

Voice Activity Detector (VAD)

When a caller begins to speak over a prompt (also known as barge-in), the application typically
stops the playing of the prompt so that it isn’t distracting to the caller.
A voice activity detector (VAD) is a component in the Dialogic® CSP Software that examines the
caller’s incoming signal and determines if the signal contains significant energy and is likely to be
speech rather than a click, for example. The significance is determined by configurable parameters.
The VAD has several configurable parameters such as the threshold of energy that is considered
significant during prompt play and after the prompt has completed play. For more information, see
parameter descriptions in ec_setparm( ) in the Dialogic® Continuous Speech Processing API
Library Reference.
For information on the VAD, see Chapter 5, “Using the Voice Activity Detector”. For information
on choices of operating modes for the VAD, see Section 5.1, “Voice Activity Detector Operating
Modes”, on page 35.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Processing of collected data PPT	seminar projects maker	1	718	15-09-2017, 12:48 PM Last Post: jaseela123
	Information Processing Using Transient Dynamics of Semiconductor Lasers Subject	seminar projects maker	1	797	11-09-2017, 04:41 PM Last Post: jaseela123
	Internet Speech	Electrical Fan	0	27,008,619	25-08-2017, 09:32 PM Last Post: Electrical Fan
	Graphical Processing Unit	computer science crazy	0	20,473,685	25-08-2017, 09:32 PM Last Post: computer science crazy
	Internet Speech Full Seminar Report Download	computer science crazy	0	17,197,017	25-08-2017, 09:32 PM Last Post: computer science crazy
	SALT (Speech Application Language Tags)	computer science crazy	0	9,250,483	25-08-2017, 09:32 PM Last Post: computer science crazy
	SEMIANR REPORT ON MapReduce: Simplified Data Processing On Large Clusters	super	0	10,340,510	25-08-2017, 09:32 PM Last Post: super
	On-line Analytical Processing (OLAP)	computer science crazy	0	20,240,233	25-08-2017, 09:32 PM Last Post: computer science crazy
	A fast and low bitrate Speech Codec	nit_cal	0	9,445,066	25-08-2017, 09:32 PM Last Post: nit_cal
	GRAPHICAL PROCESSING UNIT (GPU)	seminar projects crazy	0	12,246,954	25-08-2017, 09:32 PM Last Post: seminar projects crazy

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.