30-09-2016, 02:15 PM
1457067140-TelchemyEchosolutions.pdf (Size: 555.12 KB / Downloads: 6)
Introduction
What is Echo?
Echo is an obvious and very annoying problem
in telephony systems, and can occur in Voice over IP,
Cellular and long distance connections.
There are two major types of echo:
Talker Echo (Figures 1 & 3) occurs when a
proportion of the talker’s (i.e. person speaking)
voice is reflected back to them. The talker
hears a delayed copy of his or her own
voice.
Listener Echo (Figure 2) occurs when a
talker’s voice is reflected back to them and
then re- reflected again towards the listener.
The listener hears two or more copies of the
talker’s speech. Listener Echo is less common
than Talker Echo.
Echo is a common problem in Voice over IP
services - not because VoIP introduces echo, but because VoIP introduces echo, but
because VoIP increases delay increases delay and makes echo more and makes echo more
obvious and annoying.
There are techniques that can be applied to reduce
echo problems, such as echo cancellation and echo
suppression, however these are not always effective.
Sources of Echo
There are two common causes of echo:
Reflections in 2-4 wire interfaces.
Acoustic Echo
Some “echo” is deliberately introduced in telephone
systems. In a typical telephone handset, a proportion of
the speech energy from the microphone is fed back to
the earpiece. This provides a natural way to control the
loudness of the talker - if someone speaks very loudly;
this results in a loud signal being fed back to their ear.
This deliberate feedback is called “sidetone”. Because
the signal is fed back instantaneously, it does not sound
like echo, which by definition is delayed with respect to
the original speech.
Electrical Echo
Both Voice over IP and traditional digital telephone
systems (PCM, ISDN) are “4-wire” in the sense that
the signal in one direction is carried over a separate
“pair” of wires and the signal in the other direction on
a separate “pair” of wires. This means that the two
signals are independent of each other.
Analog local loops typically used to connect to
individual telephones, are “2-wire” as the signals going
in both directions are carried over the same pair of
wires. Where a 4-wire digital system connects to a
2-wire analog system, it is necessary to perform a 2-4
wire conversion using either a using either a transformer hybrid or transformer hybrid or
active hybrid. This conversion function is typically
built into Central Office or PBX line cards, or into
channel banks.
The 2-4 wire conversion process typically relies
on the hybrid being “balanced”, which means that
the loading presented by the 2-wire line matches that
expected by the hybrid. If there is some mismatch, the
transmit and receive signals on the 2-wire line cannot be
properly separated and hence an echo occurs.
Echo is a very common problem on PCM-analog
loop interconnections. However, with conventional
telephone systems the delay is so short that the echo
does not sound like echo, it sounds like sidetone.
Acoustic Echo
Acoustic echo (Figure 3) occurs when some
proportion of the sound coming out of the “speaker”
part of a telephone handset or headset can be heard by
the microphone part of the handset or headset. This can
be due to poor design, or even to the user holding the
handset away from their ear.
Impact of Echo on VoIP Call Quality
Talker echo is the most common type of echo and
results in a proportion of the talker’s (person speaking)
voice being reflected back to them. The discussion
below primarily relates to this type of echo.
Echo is typically reported in terms of Echo Return
Loss (ERL). This is the ratio between the original
signal and the echo level expressed in decibels (dB).
A higher ratio corresponds to a smaller echo, hence a
55 dB echo return loss would be a low echo level and
15 dB quite a high echo level.
The chart on the following page shows the
relationship between delay and conversational quality
for two conditions - firstly with a low level of echo (55
dB echo return loss) and secondly with moderate level
of echo (35 dB echo return loss).
If round trip delay is very short, say less than 30 mS,
then the talker cannot distinguish between the echo and
the deliberately introduced sidetone.
If the delay is a little longer, say 50 mS, then the
talker cannot hear the delayed copy of their speech as
a distinct copy, however it does impact speech quality,
resulting in a sound quality generally described as
“hollow”, “cave-like”, “tunnel-like” or similar.
As the delay increases further, the echo becomes
more obviously echo - and the combined effect of the
loudness of the echo and its delay cause considerable
annoyance.
Echo Suppression and Cancellation
Echo Suppression (or NLP)
Low to moderate levels of talker echo cannot be
easily heard while the talker is actually speaking but are
much more obvious during the gaps in speech (silence
periods). An early approach to masking echo problems
was to detect when these silence periods occur and to
replace the silence with artificial background noise.
Echo suppression is often called non-linear processing
(NLP).
Echo Cancellation
Echo cancellation is a more sophisticated approach
to removing the echo that may be present on telephone
connections. An adaptive signal-processing algorithm
monitors the speech signals going in each direction and
attempts to learn the characteristics of the echo - i.e. if
an echo is present, then what are its associated delay
and amplitude? As the echo cancellation algorithm
attempts to learn the characteristics of the echo path,
the echo is reduced more and more as the learned
characteristics become more accurate. The adaptation
process is temporarily suspended during doubletalk, i.e.
when both users are speaking simultaneously.
For an echo cancellor to operate, it has to keep
some history of the sampled speech signal that was the
original source of the echo. This history uses significant
amounts of memory, usually a scarce resource in the
digital signal processing (DSP) chips used in VoIP
systems. If the echo delay is greater than the length of
this history kept by the echo canceller, then it will be
unable to cancel the echo.
The time taken for the echo canceller to learn the
characteristics of the echo is called the convergence
time. Sometimes a severe echo can be heard for a few
seconds at the start of a call - this is due to the time
taken for the echo canceller to converge and cancel the
echo and is therefore called convergence echo.
Implementation of Echo Cancellation and
Suppression
Echo cancellation and echo suppression are usually
implemented together, and are able to reduce quite
significant levels of echo.
The Echo Return Loss Enhancement (ERLE)
represents the improvement in echo level introduced by
the echo canceller. For example:
If echo return loss (ERL) is 25 dB, and echo return
loss enhancement (ERLE) is 30 dB, then residual echo
return loss (ERL + ERLE) is 55 dB.
The echo canceller may reduce echo levels by
25 - 35 dB and the addition of echo suppression (NLP)
can further improve this.
Note that ERL can potentially reach 0 dB and some
echo cancellers can only improve the echo level loss by
30 dB, which may still allow the echo to be audible.
Echo cancellers are commonly implemented in
VoIP gateways and typically are configured to cancel
echoes from the “trunk” side of the gateway (i.e. the
non-VoIP side). Echo cancellers may be also used in IP
phones to control acoustic echo from the handset. This
is common in full-duplex speakerphones. However,
acoustic echo cancellation is not always implemented in
IP handsets or softphones.
Echo Measurement and the
VoIP Performance Management
Framework
Echo may be measured using specialized test
tools that analyze audio signals, or may be estimated
by the echo cancellers typically integrated into VoIP
gateways. Specialized test tools are obviously able to
make more accurate measurements. However, these
tools would only be practical for troubleshooting once a problem has been identified. Due to the nature of echo,
problems can occur on an apparently ad hoc basis and
hence it is desirable to detect echo problems on live
calls as well as collect data for post-analysis.
The emerging protocols that fit within the VoIP
Performance Management Framework are able to
support the detection and reporting of echo problems
affecting live calls. A key element in this process is
RTCP XR [1].
During calls, endpoints with Telchemy's VQmon/
EP exchange RTCP XR VoIP metrics reports. These
reports contain the estimated Residual Echo Return
Loss (RERL) after the effects of echo cancellation,
as well as the network round trip delay. The RERL
value is an estimate made by the echo canceller. This
approach allows the estimated echo level on every call
to be reported.
For example, consider an IP phone connected to a
remote trunking gateway, and the gateway connects
to the traditional telephone network. Say that some
echo is occurring on the telephone network and the
IP phone user is experiencing talker echo. The echo
canceller in the trunking gateway will attempt to cancel
the echo (or at least reduce the level of the echo)
from the telephone network. RTCP XR VoIP metrics
reports from the trunking gateway to the IP phone
will report the estimated residual echo level. This
allows the IP phone to incorporate the estimated echo
level into its calculations of call quality. The IP phone
may then report the call quality using SIP. Then the
conversational call quality metric reported by the phone
would incorporate the estimated echo level reported by
the trunking gateway.