20-09-2016, 12:22 PM
1455351717-6.pdf (Size: 97.89 KB / Downloads: 5)
Introduction
Artificial Intelligence(AI) is a vast field with immeasurable value due to practical
and intellectual reasons. Several different scientific disciplines like Philosophy,
Mathematics, Economics, Neuroscience, Psychology, Computer Science,
Linguistics and Cybernetics contribute to it [7]. AI is roughly divided
into the fields of reasoning, knowledge, planning, learning, natural language
processing (communication), perception and the ability to move and manipulate
objects [7].
Lately, Natural Language Processing received a fair amount of attention
from mainstream media due to its applications in today’s mobile devices.
Almost all of the latest smartphones and tablets are equipped with systems
to recognize, translate and output spoken data.
This article focuses on the current status of Natural Language Processing,
concentrating in particular on the sub-fields Speech Recognition and Machine
Translation.
1.1 Overview
In section one we will introduce some terms and describe them briefly. Section
two is about the current status of speech recognition. The focus of section
three lies on the actual translation process and its difficulties. Afterwards in
section four we will give a short summary and provide some thoughts about
the future of AI.
2 Disambiguation
Natural Language Processing: Enable a computer to communicate in
human language.
AI-Completeness: Is a classification of problems in the AI field. An AIComplete
problem is as hard as the central problem of AI which is the
modeling of human(like) intelligence.
Or, in other words, an AI-Complete problem is as hard as the Turing
Test [5].
3 Speech Recognition
3.1 Description and Status
Speech Recognition(SR) is a field of Natural Language Processing and has
been studied since the 1950s. The main goal is to teach computers how to
translate recorded sound into a textual document. This sounds easy at first
but turns out to be hard in an uncontrolled real world environment.
Today we are able to recognize speech in a controlled environment with
limited vocabulary and fixed grammar such as credit-card numbers with an
accuracy of almost 100%.
If we give up the constraints on vocabulary and grammar we end up with
a AI-Complete problem [6].
3.2 Challenges and Limitations
One of the problems is owed to the fact that word boundaries are not easily
detectable in continuous speech. This broadens search space due to the
additional ambiguity.
The search space or vocabulary is in fact one of biggest problems and its
size is directly related to the quality of a SR system.
The fact that words often sound similar and the distinction is not possible
only based on the acoustic input, some kind of semantic analysis is needed
to determine the appropriate word in the given context. A proper SR system
should also be able to judge if a sentence is grammatically correct and to rule
out nonsensical combination of words. The amount of context knowledge
required to achieve this task is immense and makes real time computation
almost impossible.
Several other factors need to be taken into account to build a SR system
like environmental noise, speaker dependence or independence, poor articulation
and pronunciation.
4 Machine Translation
4.1 Description and Status
Machine Translation(MT) is the computerized translation of a source document
in a given language into a target document in some other defined
language while preserving the semantics of the given text. MT is solely
computer-driven and shall not be confused with computer aided translation
where humans are involved.
Today’s most successful systems are based on statistical methods and machine
learning. That means that there is no hardwiring of rules for different
languages but a learning algorithm which is trained with a large amount of
bilingual text material [1].
4.2 Challenges and Limitations
MT is as SR an AI-Complete problem [6]. The overall challenge is once more
the ambiguity of language.
Arnold(2003) [2] used four categories to classify those challenges.
Form under-determines content
A sentence can only be fully understood with a context given.
Content under-determines form
There is more than one way to express something.
Languages differ
Simple translation of a sentence does not always preserve meaning.
Description Problem
Is not a problem of speech itself but of computers, which need to be
capable to gather and store the knowledge needed to translate.
The main challenges for the current statistical approaches are training
and data-representation and decision making [1]. These two points can be
easily translated into Arnold’s four point scheme. It is evident that training
and data-representation is directly related to Arnold’s description problem.
Decision making is not that obvious but if you look closer you will see that
the first three points of Arnold’s categories boil down to making the right
decision based on more or less structured data.
5 Conclusion
In science fiction a device capable of translating speech in real time is reality
since the book “First Contact”1
. In real life we reach acceptable results in
controlled and constrained environments. Moore’s law helped to build more
sophisticated systems and cope with the immense computing power needed
to do so.
Nevertheless are both of the problems considered AI-Complete. This
means, that to build a perfect translation system we have to be able to build
human-like intelligence. This fact alone should give us a clue about the long
road ahead of us.