Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: emotional analysis of text to speech
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Abstract
Natural speech has a particular rhythm or prosody. This prosody depends on different factors like dialect, style of speech and emotional state of the speaker. Prosody is determined by the variations in intensity, pitch and duration. While acquiring a specific speaking style in a particular language, the rhythm of the words, phrases and sentences get stored in the human brain. At the time of speaking, these patterns are retrieved from the brain and contribute to modelling the prosodic structure of natural speech. When new words or phrases are encountered, prosodic patterns of similar words or phrases will be imposed.
Synthetic speech fails to incorporate this prosody and hence tends to sound robotic and even unintelligible at times. Since prosody is language specific, extensive studies have to be performed for each language. Models have to be developed to catch the variations of these acoustic parameters which result in naturalization. In this thesis clustering in duration pattern is attempted. To do this recorded speech will be taken, segmented manually with the help of speech analysis software and duration vectors will be formed. . Clustering will be applied to these duration vectors to identify the patterns in the recorded speech.
Introduction

Natural speech has a particular rhythm or prosody. This prosody depends on different factors like dialect, style of speech and emotional state of the speaker. Synthetic speech fails to incorporate this prosody and hence tends to sound robotic, and even unintelligible at times.
Prosody is mainly determined by the variations in intensity, pitch and duration of speech.
• Prosody used to convey lexical meaning: Stress, accentual and tone languages.
• Prosody used to convey non-lexical information: Intonation type (Question vs. declarative sentences).
• Prosody used to convey discourse functions: Focus, prominence, discourse segments, etc.
• Prosody used to convey emotion.
• Prosody tied to the physical system: pitch declination.
In order to enhance the naturalness of synthetic speech, these variations also need to be incorporated. Since prosody is language-specific, extensive studies have to be performed for each language. Models have to be developed to catch the variations of these acoustic parameters which result in naturalness.
Human Speech
Prosodic features are suprasegmental. They are not confined to any one segment, but occur in some higher level of an utterance. These prosodic units are the actual phonetic "spurts", or chunks of speech. They need not correspond to grammatical units such as phrases and clauses, though they may; and these facts suggest insights into how the brain processes speech.
While acquiring a particular speaking style in a language, the rhythm of words, phrases or even sentences as a whole get stored in the human brain. While speaking, these stored patterns are retrieved from the brain and imposed on the spoken words. Even when new words or phrases are encountered, the rhythm of similar words or phrases are imposed on them.


Phonetic nature of Malayalam speech

Malayalam, like most of the other Indian Languages, is a phonetic language having a written form that has direct correspondence to the spoken form. Each character corresponds to a syllable, which has an invariant pronunciation irrespective of the context in which it occurs (with only one or two exceptions). Due to the one to one correspondence between letters and phonemes, framing rules for extracting phonemes from words is comparatively uncomplicated. The phonemes (called ‘varnams’ in Sanskrit) are divided into two types: vowel phonemes (swara varnam) and consonant phonemes (vyanjan varnam). They together broadly constitute the Varnamala (alphabet set). The Varnamala or alphabet set is phonetically structured. The vowels and consonants are separately grouped and systematically. The set of 16 vowels forms the first row of varnamala followed by stop consonants. The phonetic nature of the language and the systematic categorization of the alphabet set can be effectively used for analysis and modelling.

Clustering
Cluster analysis, also called segmentation analysis or taxonomy analysis, is a way to create groups of objects, or clusters, in such a way that the profiles of objects in the same cluster are very similar and the profiles of objects in different clusters are quite distinct. In other words, cluster analysis is an exploratory data analysis tool which aims at sorting different objects into groups in a way that the degree of association between two objects is maximal if they belong to the same group and minimal otherwise. Given the above, cluster analysis can be used to discover structures in data without providing an explanation/interpretation. In other words, cluster analysis simply discovers structures in data without explaining why they exist. cluster analysis methods are mostly used when we do not have any a priori hypotheses, but are still in the exploratory phase of our research. In a sense, cluster analysis finds the "most significant solution possible." One of the important considerations in clustering is to select a proper distance measure to accurately express the degree of similarity or dissimilarity between data objects. Similarities are a set of rules that serve as criteria for grouping or separating items. These distances (similarities) can be based on a single dimension or multiple dimensions, with each dimension representing a rule or condition for grouping objects. The most straightforward way of computing distances between objects in a multi-dimensional space is to compute Euclidean distances.

Joining (Tree) clustering

The purpose of this algorithm is to join together objects into successively larger clusters, using some measure of similarity or distance. A typical result of this type of clustering is the hierarchical tree.

Hierarchical Tree
Consider a Horizontal Hierarchical Tree Plot (see graph below). On the left of the plot, we begin with each object in a class by itself. Now we lower our threshold regarding the decision when to declare two or more objects to be members of the same cluster.

Figure1. Tree diagram
As a result we link more and more objects together and aggregate (amalgamate) larger and larger clusters of increasingly dissimilar elements. Finally, in the last step, all objects are joined together. In these plots, the horizontal axis denotes the linkage distance. Thus, for each node in the graph (where a new cluster is formed) we can read off the criterion distance at which the respective elements were linked together into a new single cluster. When the data contain a clear "structure" in terms of clusters of objects that are similar to each other, then this structure will often be reflected in the hierarchical tree as distinct branches. As the result of a successful analysis with the joining method, one is able to detect clusters (branches) and interpret those branches.
Amalgamation or Linkage Rules
At the first step, when each object represents its own cluster, the distances between those objects are defined by the chosen distance measure. A linkage or amalgamation rule is needed to determine when two clusters are sufficiently similar to be linked together. There are various possibilities:
Single linkage (nearest neighbour). As described above, in this method the distance between two clusters is determined by the distance of the two closest objects (nearest neighbours) in the different clusters. This rule will, in a sense, string objects together to form clusters, and the resulting clusters tend to represent long "chains."
Complete linkage (furthest neighbour). In this method, the distances between clusters are determined by the greatest distance between any two objects in the different clusters (i.e., by the "furthest neighbours"). This method usually performs quite well in cases when the objects actually form naturally distinct "clumps."

get the report here:
http://www.mediafire?kc9qajye15eet3u


to get information about the topic emotional annotation of text full report,ppt and related topic refer the link bellow

https://seminarproject.net/Thread-emotio...on-of-text

https://seminarproject.net/Thread-emotio...-to-speech