Machine Translation Practice Flashcards

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/31

Earn XP

Description and Tags

These flashcards cover the definitions, history, models, evaluation methods, and tools associated with Machine Translation based on the lecture notes.

Last updated 4:16 AM on 5/19/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

32 Terms

New cards

Machine Translation

The process of converting text in one language into another while preserving its meaning.

New cards

Statistical Approach

Introduced by IBM in 1988, this approach uses large corpora of translated texts and statistical models to learn translation rules rather than relying on linguists to define transformations and lexicons.

New cards

Phrase-based Models

The currently dominant approach in statistical machine translation based on mapping short text chunks, typically 1 to 3 words long.

New cards

Fluency

A fine-grained distinction in human assessment used to evaluate if a translation flows naturally and is smooth.

New cards

Adequacy

A fine-grained distinction in human assessment used to evaluate if a translation conveys the full meaning of the original text.

New cards

Task-based Evaluation

An evaluation method where quality is tested by seeing if the translation fulfills an information need, such as an assessor being able to answer questions about the content of the translated text.

New cards

Automatic Evaluation Metrics

Computational methods like WER, BLEU, and METEOR used to frequently and cost-effectively rank machine translation systems, often validating their accuracy via correlation studies with human judges.

New cards

Matches

In automatic evaluation, words that appear in both the reference translation and the machine translation output.

New cards

Insertions

In automatic evaluation, words that appear only in the machine translation output and not in the reference.

New cards

Deletions

In automatic evaluation, words that appear only in the reference translation and are missing from the machine translation output.

New cards

PER (Position-independent error rate)

One of the earliest automatic evaluation metrics proposed for measuring translation accuracy.

New cards

Word Alignment

A fundamental step in statistical machine translation models that involves detecting word-level translations from parallel corpora.

New cards

Sentence-aligned Parallel Corpus

A collection of texts where each foreign sentence $f$ is paired with its English translation $e$ .

New cards

IBM Model 1

A very simplistic model for word alignment used as a stepping stone to more sophisticated models.

New cards

IBM Model 2

An alignment model that introduces the use of absolute word positions within sentences.

New cards

Fertility (IBM Model 3)

A concept introduced in IBM Model 3 describing the phenomenon where a single word can produce multiple words in translation.

New cards

Symmetrization

A process of refinement used to address fundamental flaws in the original IBM word alignment models.

New cards

Phrase Translation Table

The massive knowledge source used in phrase-based models to store mappings between short text chunks.

New cards

Hypotheses

The term used for partial translations, which are organized in stacks during the decoding process.

New cards

Beam Search

A decoding method that searches through the most promising part of the search space by illuminating a limited number of alternatives.

New cards

Cube Pruning

A popular variation of the decoding heuristic used in machine translation systems.

New cards

MERT (Minimum Error Rate Training)

A multi-dimensional optimization problem also known as parameter tuning.

New cards

Recursion

A fundamental property of language that is addressed by tree-based machine translation models.

New cards

Berkeley Word Aligner

A tool that integrates the idea of symmetrizing word alignments closely into the alignment method.

New cards

SRILM and IRSTLM

Toolkits developed for language modeling, with IRSTLM specifically targeting compact representation and scalable training.

New cards

Moses

The most widely used toolkit for machine translation, implementing most standard statistical methods and drawing on tools for alignment and language modeling.

New cards

Joshua

A more recent decoder focused on hierarchical and syntax-based translation models.

New cards

Apertium

A project aimed at constructing rule-based machine translation systems for many language pairs.

New cards

Canadian Hansards

A parallel corpus consisting of the proceedings of the Canadian parliament translated between French and English.

New cards

Europarl Corpus

A corpus consisting of translated proceedings of the European parliament, offering about 40 million words in each of 11 languages.

New cards

Acquis Corpus

A corpus of legal documents from the European Union covering 22 languages and up to 40 million words per language.

New cards

OPUS Project

A project that collects parallel corpora from various sources, including open source documentation and movie subtitles.