02-11-2016, 10:57 AM
1463465824-lect4.ppt (Size: 525.5 KB / Downloads: 7)
Spoken Language Input
The ability to converse freely with a machine represents the ultimate challenge to our understanding of the production and perception processes involved in human speech communication.
Speech recognition is the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words. The recognized words can be the final results, for such applications as commands & control, data entry, and document preparation. They can also serve as the input to further linguistic processing in order to achieve speech understanding
Speech recognition is a difficult problem, largely because of the many sources of variability associated with the signal. First, the acoustic realizations of phonemes, the smallest sound units of which words are composed, are highly dependent on the context in which they appear.
Second, acoustic variability can result from changes in the environment as well as in the position and characteristics of the transducer. Third, within speaker variability can result from changes in the speaker's physical and emotional state, speaking rate, or voice quality.
Written Language Input
The written form of language is contained in printed documents, such as newspapers, magazines and books, and in handwritten matter, such as found in notebooks and personal letters.
Fundamental characteristics of writing are:
1. It consists of articial graphical marks on a surface;
2. Its purpose is to communicate something;
3. This purpose is achieved by virtue of the mark's conventional relation to language
Written Language Input
.
OCR: Print
OCR: Handwriting
Handwriting as Computer Interface
Handwriting Analysis
Spoken Output Technologies
Though this modeling is still one of the ultimate goals of synthesis research, advances in computer science have widened the research field to include Text-to-Speech processing
Mathematical Methods
1) High-level Linguistic Methods
The mathematical methods of syntax, morphology and phonology are suited for describing sets of strings, especially hierarchically structured strings. Most of these methods came from formal language theory. Most notable is the formal theory of languages and grammars emerging from the Chomsky hierarchy
A variety of grammar models have been designed in linguistics and language technology. The grammars of formal language theory are rewrite systems with atomic nonterminal symbols that stand for lexical and syntactic categories.
Mathematical Methods
2)Statistical and Low-level Processing Methods
The predominant approach is based on an information theoretic view of language processing as a noisy-channel information transmission. The other component is the language model which gives the so-called a-priori distribution, the probability of a message in its context to be sent.
A special type of stochastic finite-state automata, hidden Markov models (HMMs), have been utilized for the recognition of spoken words, syllables or phonemes
Mathematical Methods
Statistical methods are employed today for substituting or supporting discrete symbolic methods in almost every area of language processing. Examples of promising approaches are
1)Statistical part-of-speech tagging
2)Probabilistic parsing
3)Ambiguity resolution
4)Lexical knowledge acquisition
5)Statistical machine translation
Mathematical Methods
High-level Linguistic Methods
The mathematical methods of syntax, morphology and phonology are suited for describing sets of strings, especially hierarchically structured strings .
Most notable is the formal theory of languages and grammars emerging from the Chomsky hierarchy. The grammar models developed in linguistics do not directly correspond to the ones from formal language theory. A variety of grammar models have been designed in linguistics and language technology.
Mathematical Methods
2)Statistical and Low-level Processing Methods
Many researchers turned to statistical data-driven methods for designing language technology applications. A special type of stochastic finite-state automata, hidden Markov models (HMMs), have been utilized for the recognition of spoken words, syllables or
phonemes
AMBIGUITY RESOLUTION
SEMANTIC INTERPRETATION
This module translates the parse tree or parse tree fragments into a semantic structure or logical form or event frame. All of these are basically explicit representations of predicate-argument and modification relations that are implicit in the sentence
DIALOGUE AND CONVERSATIONAL AGENTS
Conversation and dialogue is the most fundamental and specially privileged arena of language
Conversational agents, also known as spoken dialogue systems, or spoken language systems.
These are programs which communicate with users in spoken natural language in order to make travel arrangements, answer questions about weather or sports, route telephone calls, act as a general telephone assistant.
Another promising domain is automatic call routing. A call routing system directs incoming calls in a telephone call center, transferring the call to the appropriate human
HUMAN CONVERSATION
Turns and Turn-Taking:
Dialogue is characterized by turn-taking
How do speakers know when it is the proper time to contribute their turn?
It turns out that conversation and language itself are structured in such a way as to deal efficiently with this resource allocation problem
Turn-taking Rule
a. If during this turn the current speaker has selected A as the next speaker then A must speak next.
b. If the current speaker does not select the next speaker, any other speaker may take the next turn.
c. If no one else takes the next turn, the current speaker may take the next turn.
BASIC DIALOGUE SYSTEMS
The ASR (automatic speech recognition)
Extract meaning from the input. It takes audio input, generally from the telephone, and returns a transcribed string of words
It takes audio input, generally from the telephone, and returns a transcribed string of words.ASR systems used for dictation or transcription generally use a single broadly-trained N-gram language model.
ASR component
ASR systems in conversational agent generally use language models that are specific to a dialogue state.
For example, if the system has just asked the user “What city are you departing from?”, the ASR language model can be constrained to only consist of city names, or perhaps sentences of the form ‘I want to(leave/depart) from [CITYNAME]’.
These dialogue-state-specific language models can consist of hand-written finite-state or context-free grammars, or of N-gram grammars trained on sub corpora extracted from the answers to particular questions in some training set.
NLU (natural language understanding)
The NLU component of dialogue systems must produce a semantic representation which is appropriate for the dialogue task. Thus a sentence like Show me morning flights from Boston to San Francisco on Tuesday might
correspond to the following filled-out frame.
In practice, most dialogue systems rely on simpler domain-specific semantic analyzers, In a semantic
grammar, the actual node names in the parse tree correspond to the semantic entities which are being expressed, as in the following grammar fragments:
The semantic grammar approach is very widely used, but has two weaknesses: discreteness (since it is non-probabilistic it has no ambiguity-resolution method) and hand-coding (hand-written grammars are expensive and slow to create).
The discreteness problem can be solved by adding probabilities to the grammar.HMM can be used here also.
NLU (natural language understanding)
Dialogue Manager
The final component of a dialogue system is the dialogue manager, which controls the architecture and structure of the dialogue. The dialogue manager takes input from the ASR/NLU components, maintains some sort of state, interfaces with the task manager, and passes output.
The simplest dialogue manager architecture is a finite-state manager.
Interlingual machine translation
It is one of the classic approaches to machine translation. In this approach, the source language, i.e. the text to be translated is transformed into an interlingua,
It is abstract language-independent representation. The target language is then generated from the interlingua.
Within the rule-based machine translation paradigm, the interlingual approach is an alternative to the direct approach and the transfer approach
Interlingual machine translation
In the direct approach, words are translated directly without passing through an additional representation.
In the transfer approach the source language is transformed into an abstract, less language-specific representation.
Linguistic rules which are specific to the language pair then transform the source language representation into an abstract target language representation and from this the target sentence is generated.
Interlingual machine translation
The advantages are that it requires fewer components in order to relate each source language to each target language
It takes fewer components to add a new language and it handles languages that are very different from each other .
The obvious disadvantage is that the definition of an interlingua is difficult and maybe even impossible for a wider domain.
The ideal context for interlingual machine translation is thus multilingual machine translation in a very specific domain
Transfer-based machine translation
It is a type of machine translation. It is based on the idea of interlingua and is currently one of the most widely used methods of machine translation .
In interlingua-based MT this intermediate representation must be independent of the languages in question, whereas in transfer-based MT, it has some dependence on the language pair involved
Transfer-based machine translation
The way in which transfer-based machine translation systems work varies substantially, but in general they follow the same pattern:
They apply sets of linguistic rules which are defined as correspondences between the structure of the source language and that of the target language.
The first stage involves analyzing the input text for morphology and syntax (and sometimes semantics) to create an internal representation. The translation is generated from this representation using both bilingual dictionaries and grammatical rules.
Example-based machine translation
This approach to machine translation is often characterized by its use of a bilingual corpus with parallel texts as its main knowledge base, at run-time.
Example of bilingual corpus
English Japanese
How much is that red umbrella? Ano akai kasa wa ikura desu ka
.How much is that small camera? Ano chiisai kamera wa ikura desu
Example-based machine translation
An example-based machine translation system would learn three units of translation:
How much is that X ? corresponds to Ano X wa ikura desu ka.
red umbrella corresponds to akai kasa
small camera corresponds to chiisai kamera
Composing these units can be used to produce novel translations in the future
Dictionary-based machine translation
Method based on dictionary entries, which means that the words will be translated as a dictionary does word by word, usually without much correlation of meaning between them.
Dictionary lookups may be done with or without morphological analysis.Dictionary-based machine translation is ideally suitable for the translation of long lists of phrases on the subsentential