Breaking Audio CAPTCHAs

**seminar ideas** · 11-05-2012, 11:30 AM

Breaking Audio CAPTCHAs

.pdf

Breaking_Audio_CAPTCHAs ajay.pdf (Size: 227.49 KB / Downloads: 174)

Introduct ion

CAPTCHAs [1] are automated tests designed to tell computers and humans apart by
presenting users with a problem that humans can solve but current computer programs
cannot. Because CAPTCHAs can distinguish between humans and computers with high
probability, they are used for many different security applications: they prevent bots from
voting continuously in online polls, automatically registering for millions of spam email
accounts, automatically purchasing tickets to buy out an event, etc. Once a CAPTCHA is
broken (i.e., computer programs can successfully pass the test), bots can impersonate
humans and gain access to services that they should not. Therefore, it is important for

CAPTCHAs to be secure.

To pass the typical visual CAPTCHA, a user must correctly type the characters displayed in
an image of distorted text. Many visual CAPTCHAs have been broken with machine

learning techniques [2]-[3], though some remain secure against such attacks. Because
visually impaired users who surf the Web using screen-reading programs cannot see this type
of CAPTCHA, audio CAPTCHAs were created. Typical audio CAPTCHAs consist of one
or several speakers saying letters or digits at randomly spaced intervals. A user must
correctly identify the digits or characters spoken in the audio file to pass the CAPTCHA. To
make this test difficult for current computer systems, specifically automatic speech
recognition (ASR) programs, background noise is injected into the audio files.
Since no official evaluation of existing audio CAPTCHAs has been reported, we tested the
security of audio CAPTCHAs used by many popular Web sites by running machine learning
experiments designed to break them. In the next section, we provide an overview of the
literature related to our project. Section 3 describes our methods for creating training data,
and section 4 describes how we create classifiers that can recognize letters, digits, and noise.
In section 5, we discuss how we evaluated our methods on widely used audio CAPTCHAs
and we give our results. In particular, we show that the audio CAPTCHAs used by sites
such as Google and Digg are susceptible to machine learning attacks. Section 6 mentions the
proposed design of a new more secure audio CAPTCHA based on our findings.

Lit eratur e r evi ew
To break the audio CAPTCHAs, we derive features from the CAPTCHA audio and use
several machine learning techniques to perform ASR on segments of the CAPTCHA. There
are many popular techniques for extracting features from speech. The three techniques we use
are mel-frequency cepstral coefficients (MFCC), perceptual linear prediction (PLP), and
relative spectral transform-PLP (RASTA-PLP). MFCC is one of the most popular speech
feature representations used. Similar to a fast Fourier transform (FFT), MFCC transforms an
audio file into frequency bands, but (unlike FFT) MFCC uses mel-frequency bands, which
are better for approximating the range of frequencies humans hear. PLP was designed to
extract speaker-independent features from speech [4]. Therefore, by using PLP and a variant
such as RASTA-PLP, we were able to train our classifiers to recognize letters and digits
independently of who spoke them. Since many different people recorded the digits used in
one of the types of audio CAPTCHAs we tested, PLP and RASTA-PLP were needed to
extract the features that were most useful for solving them.
In [4]-[5], the authors conducted experiments on recognizing isolated digits in the presence
of noise using both PLP and RASTA-PLP. However, the noise used consisted of telephone
or microphone static caused by recording in different locations. The audio CAPTCHAs we
use contain this type of noise, as well as added vocal noise and/or music, which is supposed
to make the automated recognition process much harder.
The authors of [3] emphasize how many visual CAPTCHAs can be broken by successfully
splitting the task into two smaller tasks: segmentation and recognition. We follow a similar
approach in that we first automatically split the audio into segments, and then we classify
these segments as noise or words.
In early March 2008, concurrent to our work, the blog of Wintercore Labs [6] claimed to
have successfully broken the Google audio CAPTCHA. After reading their Web article and
viewing the video of how they solve the CAPTCHAs, we are unconvinced that the process
is entirely automatic, and it is unclear what their exact pass rate is. Because we are unable to
find any formal technical analysis of this program, we can neither be sure of its accuracy nor
the extent of its automation.

Cr eat ion of traini ng data

Since automated programs can attempt to pass a CAPTCHA repeatedly, a CAPTCHA is
essentially broken when a program can pass it more than a non-trivial fraction of the time;
e.g., a 5% pass rate is enough.
Our approach to breaking the audio CAPTCHAs began by first splitting the audio files into
segments of noise or words: for our experiments, the words were spoken letters or digits. We
used manual transcriptions of the audio CAPTCHAs to get information regarding the
location of each spoken word within the audio file. We were able to label our segments
accurately by using this information.

We gathered 1,000 audio CAPTCHAs from each of the following Web sites: google.com,
digg.com, and an older version of the audio CAPTCHA in recaptcha.net. Each of the
CAPTCHAs was annotated with the information regarding letter/digit locations provided by
the manual transcriptions. For each type of CAPTCHA, we randomly selected 900 samples
for training and used the remaining 100 for testing.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Progressive Streaming of Video And Audio Over Http	mechanical engineering crazy	0	23,043,623	25-08-2017, 09:32 PM Last Post: mechanical engineering crazy
	Storage of marketing collateral-Audio	Computer Science Clay	0	11,595,919	25-08-2017, 09:32 PM Last Post: Computer Science Clay
	INFORMATION HIDING USING AUDIO STEGANOGRAPHY – A SURVEY	mkaasees	0	217	11-11-2016, 11:03 AM Last Post: mkaasees
	LI-FI BASED AUDIO COMMUNICATION ,DATA TRANSFER AND DEVICE SWITCHING	mkaasees	0	102	24-09-2016, 02:29 PM Last Post: mkaasees
	Audio Manager	presentation Abstract	0	318	12-06-2015, 04:15 PM Last Post: presentation Abstract
	audio conferencing	subine	0	206	15-10-2014, 04:04 PM Last Post: subine
	Queue Breaking System	seminar code	0	135	03-10-2014, 10:09 AM Last Post: seminar code
	PRESENTATION ON AUDIO STEGANOGRAPHY	project maker	0	233	23-06-2014, 10:41 AM Last Post: project maker
	Data Hiding in Audio Files	seminar post	0	233	15-05-2014, 02:18 PM Last Post: seminar post
	Multi-pose lipreading and audio-visual speech recognition Report	study tips	0	1,055	20-08-2013, 01:16 PM Last Post: study tips

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.