28-02-2010, 10:44 AM
hi plz mail me seminar report of this topic , i am in urgent need of this.
28-02-2010, 10:44 AM
hi plz mail me seminar report of this topic , i am in urgent need of this.
01-04-2010, 08:02 PM
hi sho can u send me ppt about lip reading and its proposed joints to my email jibs143[at]gmail.com.thank you very much
03-04-2010, 08:49 AM
Joint Audio-Visual Speech Processing Visual speech information present in the speakerâ„¢s mouth region has long been viewed as a source for improving the robustness and naturalness of human-computer-interfaces . where the acoustic channel is corrupted, the automatic speech recognition (ASR) systems falls below usability and this system comes into use here. Introduction: Human speech is by nature bimodal, both in its production and perception. humans integrate audio and visual stimuli to perceive speech. Researchs have been going on the integration of the visual modality into the speech channel of the human-computerinterface (HCI), aiming in improving its robustness and naturalness. the visual channel can benefit processes such as speaker identification, verification, localization, speech event detection , speech signal separation , coding , video indexing and retrieval , and text-to-speech. The Visual Front End: Visual speech features generally fit into one of the following three categories: a)Appearance based features: assume that all video pixels within a region-of-interest (ROI) are informative about the spoken utterance. b) shape based ones:assumes that most speechreading information is contained in the contours of the speakerâ„¢s lips, or more generally, of the face c) or combination of both. Audio-visual features in our system: The system used here produces appearance based features and operates on full face video with no artificial face markings due to which both face detection and ROI extraction are required. Tracking provides the mouth location, size, and orientation, which are then smoothed over time to improve robustness.a 6464 pixel ROI is obtained for every video frame Based on the resulting estimates. a two-dimensional, separable discrete cosine transform (DCT) is applied to the ROI, and the 100 highest energy DCT coefficients are retained. Then an intraframe linear discriminant analysis (LDA) projection is applied To reduce dimensionality which resulting in a 30-dimensional feature vector. Then a a maximum likelihood linear transformation (MLLT) is applied that improves maximum likelihood based statistical data modelling. report: http://citeseerx.ist.psu.edu/viewdoc/dow...1&type=pdf
29-10-2011, 09:29 AM
hi plz mail me the full report and the ppt.
|
|