lip reading systems & proposed joint audio visual speech processing system

sho · 28-02-2010, 10:44 AM

hi plz mail me seminar report of this topic , i am in urgent need of this.

tikku · 01-04-2010, 08:02 PM

hi sho can u send me ppt about lip reading and its proposed joints to my email jibs143[at]gmail.com.thank you very much

reports-crawler · 03-04-2010, 08:49 AM

Joint Audio-Visual Speech Processing
Visual speech information present in the speakerâ„¢s mouth region
has long been viewed as a source for improving the robustness
and naturalness of human-computer-interfaces . where the acoustic channel is corrupted, the automatic speech recognition
(ASR) systems falls below usability and this system comes into use here.

Introduction:
Human speech is by nature bimodal, both in its production and
perception. humans integrate audio
and visual stimuli to perceive speech. Researchs have been going on the integration of the visual modality into the speech channel of the human-computerinterface (HCI), aiming in improving its robustness and naturalness. the visual channel can benefit processes such as speaker identification, verification, localization, speech event detection , speech signal separation , coding , video indexing and retrieval , and text-to-speech.

The Visual Front End:
Visual speech features generally fit into one of the following
three categories:
a)Appearance based features: assume that all video pixels
within a region-of-interest (ROI) are informative about the
spoken utterance.
b) shape based ones:assumes that most speechreading information
is contained in the contours of the speakerâ„¢s lips, or more generally,
of the face
c) or combination of both.

Audio-visual features in our system:
The system used here produces appearance
based features and operates on full face video with no artificial
face markings due to which both face detection and ROI
extraction are required. Tracking provides the mouth location, size, and orientation,
which are then smoothed over time to improve robustness.a 6464 pixel ROI is obtained
for every video frame Based on the resulting estimates. a two-dimensional, separable discrete cosine
transform (DCT) is applied to the ROI, and the 100 highest energy
DCT coefficients are retained. Then an intraframe
linear discriminant analysis (LDA) projection is applied To reduce dimensionality which resulting in a 30-dimensional feature vector. Then a a maximum likelihood linear transformation (MLLT) is applied that improves maximum likelihood based statistical data modelling.

report:
http://citeseerx.ist.psu.edu/viewdoc/dow...1&type=pdf

adiiti · 29-10-2011, 09:29 AM

hi plz mail me the full report and the ppt.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	microcontroller based locker security system with auto alerting and punishment system	Guest	0	1,381	26-03-2019, 02:56 AM Last Post: Guest
	application of mathematics in digital image processing ppt	Guest	0	1,277	17-12-2018, 05:30 PM Last Post: Guest
	malayalam speech for sthree suraksha	Guest	0	855	27-11-2018, 08:29 PM Last Post: Guest
	matlab satellite image processing tutorial pdf	Guest	0	965	18-11-2018, 09:55 AM Last Post: Guest
	anchoring speech for physics seminar	Guest	0	963	11-08-2018, 09:13 PM Last Post: Guest
	sales order processing and invoicing sopi project abstract digital	Guest	1	1,572	03-07-2018, 11:00 PM Last Post: Guest
	visa processing system data flow diagram	Guest	0	747	24-04-2018, 11:11 PM Last Post: Guest
	existing system and proposed system of skinput technology	Guest	0	835	11-03-2018, 04:36 PM Last Post: Guest
	srs document for credit card processing	Guest	2	1,037	08-01-2018, 01:45 AM Last Post: Raymondnof
	jala samrakshanam malayalam speech	Guest	1	789	22-11-2017, 04:44 PM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.