26-02-2013, 04:43 PM
Affective computing
Affective computing.docx (Size: 92.57 KB / Downloads: 24)
Areas of affective computing
Detecting and recognizing emotional information
Detecting emotional information begins with passive sensors which capture data about the user's physical state or behavior without interpreting the input. The data gathered is analogous to the cues humans use to perceive emotions in others. For example, a video camera might capture facial expressions, body posture and gestures, while a microphone might capture speech. Other sensors detect emotional cues by directly measuring physiological data, such as skin temperature and galvanic resistance.[6]
Recognizing emotional information requires the extraction of meaningful patterns from the gathered data. This is done using machine learning techniques that process different modalities speech recognition, natural language processing, or facial expression detection, and produce either labels (i.e. 'confused') or coordinates in a valence-arousal space. Literature reviews such as,[7] and[8] provide comprehensive coverage of the state of the art.
Emotion in machines
Another area within affective computing is the design of computational devices proposed to exhibit either innate emotional capabilities or that are capable of convincingly simulating emotions. A more practical approach, based on current technological capabilities, is the simulation of emotions in conversational agents in order to enrich and facilitate interactivity between human and machine.[9] While human emotions are often associated with surges in hormones and other neuropeptides, emotions in machines might be associated with abstract states associated with progress (or lack of progress) in autonomous learning systems[citation needed]. In this view, affective emotional states correspond to time-derivatives (perturbations) in the learning curve of an arbitrary learning system.[citation needed]
Marvin Minsky, one of the pioneering computer scientists in artificial intelligence, relates emotions to the broader issues of machine intelligence stating in The Emotion Machine that emotion is "not especially different from the processes that we call 'thinking.'"[10]
Technologies of affective computing
Emotional speech
One can take advantage of the fact that changes in the autonomic nervous system indirectly alter speech, and use this information to produce systems capable of recognizing affect based on extracted features of speech. For example, speech produced in a state of fear, anger or joy becomes faster, louder, precisely enunciated with a higher and wider pitch range. Other emotions such as tiredness, boredom or sadness, lead to slower, lower-pitched and slurred speech.[11] Emotional speech processing recognizes the user's emotional state by analyzing speech patterns. Vocal parameters and prosody features such as pitch variables and speech rate are analyzed through pattern recognition.[12][13]
Speech recognition is a great method of identifying affective state, having an average success rate reported in research of 63%.[14] This result appears fairly satisfying when compared with humans’ success rate at identifying emotions, but a little insufficient compared to other forms of emotion recognition (such as those which employ physiological states or facial processing).[14] Furthermore, many speech characteristics are independent of semantics or culture, which makes this technique a very promising one to use.[15]
Algorithms
The process of speech affect detection requires the creation of a reliable database, broad enough to fit every need for its application, as well as the selection of a successful classifier which will allow for quick and accurate emotion identification.
Currently, the most frequently used classifiers are linear discriminant classifiers (LDC), k-nearest neighbour (k-NN), Gaussian mixture model (GMM), support vector machines (SVM), decision tree algorithms and hidden Markov models (HMMs).[16] Various studies showed that choosing the appropriate classifier can significantly enhance the overall performance of the system.[14] The list below gives a brief description of each algorithm:
• LDC – Classification happens based on the value obtained from the linear combination of the feature values, which are usually provided in the form of a feature vector.
• k-NN – Classification happens by locating the object in the feature space, and comparing it with the k nearest neighbours (training examples). The majority vote decides on the classification.
• GMM – is a probabilistic model used for representing the existence of sub-populations within the overall population. Each sub-population is described using the mixture distribution, which allows for classification of observations into the sub-populations.[17]
• SVM – is a type of (usually binary) linear classifier which decides in which of the two (or more) possible classes, each input may fall into.
• Decision tree algorithms – work based on following a decision tree in which leaves represent the classification outcome, and branches represent the conjunction of subsequent features that lead to the classification.
• HMMs – a statistical Markov model in which the states and state transitions are not directly available to observation. Instead, the series of outputs dependent on the states are visible. In the case of affect recognition, the outputs represent the sequence of speech feature vectors, which allow the deduction of states’ sequences through which the model progressed. The states can consist of various intermediate steps in the expression of an emotion, and each of them has a probability distribution over the possible output vectors. The states’ sequences allow us to predict the affective state which we are trying to classify, and this is one of the most commonly used techniques within the area of speech affect detection.
Databases
The vast majority of present systems are data-dependent. This creates one of the biggest challenges in detecting emotions based on speech, as it implicates choosing an appropriate database used to train the classifier. Most of the currently possessed data was obtained from actors and is thus a representation of archetypal emotions. Those so-called acted databases are usually based on the Basic Emotions theory (by Paul Ekman), which assumes the existence of six basic emotions (anger, fear, disgust, surprise, joy, sadness), the others simply being a mix of the former ones.[18] Nevertheless, these still offer high audio quality and balanced classes (although often too few), which contribute to high success rates in recognizing emotions.