Help   About ProQuest | 

Dissertations & Theses
The world's most comprehensive collection of dissertations and theses.Learn More...

Citation/Abstract

Print  |  Email  |  Order a Copy  
Empirical methods for exploiting parallel texts
by Melamed, Ilya Dan, Ph.D., University of Pennsylvania, 1998, 204 pages; AAT 9829948

Abstract (Summary)

The translation of a text can be viewed as a detailed annotation of the text's meaning. From this point of view, texts that exist in two languages (bitexts) are the richest accessible source of linguistic knowledge. Such knowledge can be exploited in many ways, if it can be automatically acquired. The acquisition process is invariably based on automatic methods for inducing translational equivalence relations between the two halves of a bitext. At the word token level, these relations are called bitext maps; at the word type level, they are called translation models. This dissertation advances the state of the art in methods for determining both kinds of translational equivalence. It also shows how to integrate these methods to exploit a much wider variety of bitexts than was previously possible.

The dissertation begins by showing that the language-specific aspects of the bitext mapping problem can be encapsulated and modularized away, leaving only a problem of geometric pattern recognition. The best solution is then the one that maximizes the signal-to-noise ratio in the search space and employs the fastest and most accurate search algorithm. The dissertation presents new methods for maximizing the signal strength, for filtering noise, and for searching the resulting scatterplot in linear expected space and time. The unprecedented accuracy of this solution enables a new application of bitext maps--automatic detection of omissions in translations.

The second half of the dissertation makes a number of advances in statistical translation modeling. First, it proves the feasibility of modeling translational equivalence independently of word order. Second, the dissertation shows why and how translation models can benefit from an explicit noise model. Third, it shows how the noise model can be conditioned on almost any kind of pre-existing language-specific knowledge, and that even simple linguistic clues can significantly improve translation model accuracy. Fourth, the dissertation shows how to automatically determine the sense inventories of words in bitext and how to automatically discover word sequences that are translated as a unit. This information enables translation models that account for polysemy and for phrasal translations.

Indexing (document details)

Advisor:Marcus, Mitchell
School:University of Pennsylvania
School Location:United States -- Pennsylvania
Keyword(s):translation, machine translation, bitexts, bilingual lexicons
Source:DAI-B 59/04, p. 1740, Oct 1998
Source type:Dissertation
Subjects:Computer science
Publication Number: AAT 9829948
ISBN:9780591827996
Document URL:http://proquest.umi.com/pqdlink?did=737704641&Fmt=7&clientId =79356&RQT=309&VName=PQD
ProQuest document ID:737704641


 

 » Purchase the full text

Dissertations and theses can be purchased in a variety of formats which may include: PDF for web download, softcover, hardcover, or microform. Click the "Order a Copy" button to see the formats available for this item.

Available without purchase:

Preview  Preview

Print  |  Email  |  Order a Copy  
^Back to Top
Copyright © 2009 ProQuest LLC. All rights reserved. Terms and Conditions