Statistical Machine Translation

**seminar flower** · 01-10-2012, 12:00 PM

Statistical Machine Translation

.pdf

statistical_mt.pdf (Size: 225.33 KB / Downloads: 42)

Abstract

Machine Translation (MT) refers to the use of computers for the task of translating
automatically from one language to another. The differences between languages
and especially the inherent ambiguity of language make MT a very difficult problem.
Traditional approaches to MT have relied on humans supplying linguistic knowledge
in the form of rules to transform text in one language to another. Given the vastness
of language, this is a highly knowledge intensive task. Statistical MT is a radically
different approach that automatically acquires knowledge from large amounts of
training data. This knowledge, which is typically in the form of probabilities of various
language features, is used to guide the translation process. This report provides
an overview of MT techniques, and looks in detail at the basic statistical model.

Machine Translation: an Overview

Machine Translation (MT) can be defined as the use of computers to automate
some or all of the process of translating from one language to another. MT is an
area of applied research that draws ideas and techniques from linguistics, computer
science, Artificial Intelligence (AI), translation theory, and statistics. Work began
in this field as early as in the late 1940s, and various approaches — some ad hoc,
others based on elaborate theories — have been tried over the past five decades.
This report discusses the statistical approach to MT, which was first suggested by
Warren Weaver in 1949 [Weaver, 1949], but has found practical relevance only in
the last decade or so. This approach has been made feasible by the vast advances
in computer technology, in terms of speed and storage capacity, and the availability
of large quantities of text data.
This chapter provides the context for the detailed discussion of Statistical MT
that appears in the following chapters. In this chapter, we look at the issues in
MT, and briefly describe the various approaches that have evolved over the last five
decades of MT research.

Difficulties in Machine Translation

Although the ultimate goal of MT, as in AI, may be to equal the best human efforts,
the current targets are much less ambitious. MT aims not to translate literary
work, but technical documents, reports, instruction manuals etc. Even here, the
goal usually is not fluent translation, but only correct and understandable output.
To appreciate the difficulty of MT, we will look at some examples of language
features that are especially problematic from the point of view of translation.

Structural Differences

Just as English follows a Subject-Verb-Object (SVO) ordering in sentences, each
language follows a certain sentence structure. Hindi, for example, is a Subject-
Object-Verb language. Apart from this basic feature, languages also differ in the
structural (or syntactic) constructions that they allow and disallow. These differences
have to be respected during translation.
For instance, post-modifiers in English become pre-modifiers in Hindi, as can
be seen from the following pair of sentences. These sentences also illustrate the
SVO and SOV sentence structure in these languages. Here, S is the subject of the
sentence, S_m is the subject modifier, and similarly for the verb (V) and the object
(O).

Approaches to Machine Translation

We have seen in the previous section that languages differ in vocabulary and structure.
MT, then, can be thought of as a process that reduces these differences to the
extent possible. This perspective leads to what is known as the transfer model for
MT.

The Transfer Approach

The transfer model involves three stages: analysis, transfer, and generation. In the
analysis stage, the source language sentence is parsed, and the sentence structure and
the constituents of the sentence are identified. In the transfer stage, transformations
are applied to the source language parse tree to convert the structure to that of the
target language. The generation stage translates the words and expresses the tense,
number, gender etc. in the target language.

Corpus-based Approaches

The approaches that we have seen so far, all use human-encoded linguistic knowledge
to solve the translation problem. We will now look at some approaches that do not
explicitly use such knowledge, but instead use a training corpus (plur. corpora) of
already translated texts — a parallel corpus — to guide the translation process. A
parallel corpus consists of two collections of documents: a source language collection,
and a target language collection. Each document in the source language collection
has an identified counterpart in the target language collection.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	STATISTICAL PROCESS CONTROL REPORT	project girl	1	985	19-09-2017, 11:33 AM Last Post: jaseela123
	Virtual Memory – Address Translation, Paging and Segmentation Schemes	seminar tips	1	1,324	02-09-2017, 01:24 PM Last Post: jaseela123
	Parallel Virtual Machine : Seminar Report and PPT	seminar projects maker	1	629	02-09-2017, 01:01 PM Last Post: jaseela123
	Virtual Machine Vs. Emulator	nit_cal	0	14,197,025	25-08-2017, 09:32 PM Last Post: nit_cal
	Machine translation	computer science crazy	0	17,226,315	25-08-2017, 09:32 PM Last Post: computer science crazy
	The design and implementation of a virtual machine operating system using a Virtual A	Computer Science Clay	0	6,726,055	25-08-2017, 09:32 PM Last Post: Computer Science Clay
	Parallel Virtual Machine	presentation Abstract	0	241	22-05-2015, 02:46 PM Last Post: presentation Abstract
	Example Based Machine Translation	presentation Abstract	0	308	20-05-2015, 04:25 PM Last Post: presentation Abstract
	Example Based Machine Translation	presentation Abstract	0	322	20-05-2015, 04:25 PM Last Post: presentation Abstract
	Smart Skin for Machine Handling	presentation Abstract	0	425	20-05-2015, 03:49 PM Last Post: presentation Abstract

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.