0% found this document useful (0 votes)

120 views17 pages

English-Afaan Oromoo Translation Study

This document summarizes an experiment on English to Afaan Oromoo machine translation using a statistical approach. The experiment used 20,000 sentence pairs from various documents to build translation and language models. The models achieved an average BLEU score of 17.74% after correcting alignment errors. While the score is fair for this language pair given the limited data, increasing the size and quality of training data could improve accuracy. Next steps include growing the parallel corpus through system output and using comparable corpora.

Uploaded by

Christine Ghali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

120 views17 pages

English-Afaan Oromoo Translation Study

Uploaded by

Christine Ghali

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

English – Afaan Oromoo

Machine Translation:
An Experiment Using a Statistical
Approach

Sisay Adugna Andreas Eisele

Haramaya University DFKI GmbH
Ethiopia Germany
sisayie@[Link] eisele@[Link]
Outline

  Introduction
  Objectives
  Experiment
  Result and Discussion
  Conclusion
  Next Steps
  Acknowledgement
Introduction

  Afaan Oromoo (ISO Language Code: om)‫‏‬

  17 million people's mother tongue – MS Encarta
  24,395,000 people's Oﬃcial working language‐CSA
  Spoken also in Kenya and Somalia

  English (ISO Language Code: en)‫‏‬

  Lingua franca of online informaKon.
  71% of all web pages – [Link]
Objectives

  The paper has two main goals:

1. to test how far we can go with the available
limited parallel corpus for the English – Oromo
language pair and the applicability of existing
Statistical Machine Translation (SMT) systems
on this language pair.
2. to analyze the output of the system with the
objective of identifying the challenges that need
to be tackled.
Experiment
Monolingual Corpus Bilingual Corpus

Training Set Test Set

Language Translation Modeling

Modeling
Source Reference

Language Model Translation Model

Decoding

Target

Evaluation

Performance Metric
Experiment ...

  Data
  Documents include the ConsKtuKon of FDRE (Federal DemocraKc Republic of Ethiopia),
  ProclamaKons of the Council of Oromia Regional State,
  Universal DeclaraKon of Human Right and Kenyan Refugee Act
  Religious and medical documents

  Source
  Council of Oromia Regional State (Caﬀee Oromiyaa)‫‏‬
  WWW
Experiment ...

  Size and organization

  20K Sentence pairs (EN, OM) or (300,000 words) for TM
  62K Sentences (OM) or (1,024,156 words) for LM
  90% for training and 10% for tesKng
Experiment ...

  Software tools used

  Preprocessing : PERL and python scripts
  Language Modeling: SRILM
  Alignment: GIZA++
  Phrase‐based TranslaKon Modeling: Moses
  Decoding: Moses
  Postprocessing: PERL scripts
  EvaluaKon: PERL Script
  DemonstraKon: Python Scripts
Result and Discussion

  Sentence aligner mistake in tokenization

  Due to appostrophe called hudhaa(`)‫ ‏‬in Oromo
  Wrong tokenizaKon bal'ina  bal ‘ ina
  Results in wrong alignment
Result and Discussion ...

  Impurity in the data
  mis‐alligned sentences pairs were found to cause lower
BLUE score of 5.06%
  Example of wrongly aligned sentence pair

  CorrecKng the sentence pairs manually improved BLUE
score to 17.74%
Result and Discussion ...

•  Result after improving the alignment

•  Average BLEU Score of 17.74%

•  As n increases, accuracy decreases sharply
Result and Discussion ...

  In addition to limited size and impurity of the

data, the BLUE score was affected by:
  Availability of a single reference translation
  Domain of the test data
  the system performs better if it is tested on
religious documents than documents from other
domain
Conclusion

  How well has this system performed?

  Average score was 17.74%
  Compare?
  No MT for Oromoo
  Compared to other systems
  Fair score as shown in the tables on the following slide
Conclusion (Cont.)‫‏‬

•  Size

•  Score

(From Koehn, 2005)‫‏‬

Next Steps

  Grow of parallel corpora for this language pair

using the output of the system
  Consider collection and use of comparable
corpora
  Building linguistic models of Oromo morphology
in a suitable finite-state formalism
Relation to ongoing projects

EuroMatrix Plus plans to build

  easy-to-access MT engines for many EU language pairs
  a platform for translation and post-editing of Wikipedia articles
Languages like Oromoo could be easily incorporated

ACCURAT works on learning of MT models from comparable

corpora, which would be highly applicable to Oromoo
We would need additional manpower to make this happen
Acknowledgement

  EU projects EuroMatrix and EuroMatrix Plus

  Saarland University
  DFKI GmbH
  Addis Ababa University
  German Academic Exchange Service (DAAD)

Amharic-Afaan Oromo Translation System
No ratings yet
Amharic-Afaan Oromo Translation System
103 pages
English Amharic Document Translation Using Hybrid Approach - by Samrawit Zewgneh - Addis Ababa University
100% (2)
English Amharic Document Translation Using Hybrid Approach - by Samrawit Zewgneh - Addis Ababa University
62 pages
English-Afaan Oromo Translation Thesis
No ratings yet
English-Afaan Oromo Translation Thesis
92 pages
Chaltu Fita
No ratings yet
Chaltu Fita
84 pages
Bidirectional Himtana-English Translation
No ratings yet
Bidirectional Himtana-English Translation
8 pages
Predicting GAT Results at AAU Using ML
No ratings yet
Predicting GAT Results at AAU Using ML
17 pages
Moges Ahmed NER For Amharic Language PDF
No ratings yet
Moges Ahmed NER For Amharic Language PDF
76 pages
English-Amharic Neural Translation Thesis
No ratings yet
English-Amharic Neural Translation Thesis
17 pages
Afaan Oromo Syntactic Parser Thesis
No ratings yet
Afaan Oromo Syntactic Parser Thesis
77 pages
Afaan Oromo Text Retrieval System
No ratings yet
Afaan Oromo Text Retrieval System
79 pages
BSc in Computer Science: Rapid App Dev
No ratings yet
BSc in Computer Science: Rapid App Dev
2 pages
GIS for Livestock Suitability in Ethiopia
No ratings yet
GIS for Livestock Suitability in Ethiopia
85 pages
Geez Language Spelling Checker Thesis
No ratings yet
Geez Language Spelling Checker Thesis
95 pages
English-Afaa Xonso School Dictionary
No ratings yet
English-Afaa Xonso School Dictionary
267 pages
Amharic Named Entity Recognition Thesis
No ratings yet
Amharic Named Entity Recognition Thesis
85 pages
Bilingual Dictionary for Sidaama and Amharic
100% (1)
Bilingual Dictionary for Sidaama and Amharic
63 pages
Ge’ez-Amharic Neural Translation Study
No ratings yet
Ge’ez-Amharic Neural Translation Study
119 pages
Mekelle University Overview and Capacity
100% (1)
Mekelle University Overview and Capacity
1 page
Hadiyyisa Language Stemmer Proposal
100% (1)
Hadiyyisa Language Stemmer Proposal
27 pages
Spell Checker for Sidaamu Afoo Language
100% (1)
Spell Checker for Sidaamu Afoo Language
127 pages
Deep Learning for Afan Oromo SBD
100% (1)
Deep Learning for Afan Oromo SBD
18 pages
Ethiopian ICT Glossary PDF
100% (1)
Ethiopian ICT Glossary PDF
240 pages
Bilingual Chatbot for Ethio-Telecom Support
No ratings yet
Bilingual Chatbot for Ethio-Telecom Support
103 pages
AAU PHD Curriculum CSD
No ratings yet
AAU PHD Curriculum CSD
31 pages
Amharic Hate Speech Detection AI
No ratings yet
Amharic Hate Speech Detection AI
7 pages
Ge’ez Verbs Morphological Analysis
No ratings yet
Ge’ez Verbs Morphological Analysis
117 pages
Amsale Zelalem's CV in Amharic
No ratings yet
Amsale Zelalem's CV in Amharic
5 pages
Amharic Social Media Content Filtering
No ratings yet
Amharic Social Media Content Filtering
92 pages
Megersa Oljira
100% (3)
Megersa Oljira
106 pages
Geez Language Deep Learning POS Tagging
No ratings yet
Geez Language Deep Learning POS Tagging
6 pages
Afan Oromo Keyword Extraction Proposal
100% (2)
Afan Oromo Keyword Extraction Proposal
2 pages
Amharic Grammar Checker Development
50% (2)
Amharic Grammar Checker Development
97 pages
Yonas Kenenisa Defar
No ratings yet
Yonas Kenenisa Defar
103 pages
Cost and Schedule Overruns in Projects
No ratings yet
Cost and Schedule Overruns in Projects
75 pages
Machine Learning in Oromia Court Cases
No ratings yet
Machine Learning in Oromia Court Cases
111 pages
Selected Courses for CS Exit Exam
No ratings yet
Selected Courses for CS Exit Exam
1 page
Amharic Grammar Error Detection Model
No ratings yet
Amharic Grammar Error Detection Model
88 pages
BSc Computer Science Curriculum Overview
No ratings yet
BSc Computer Science Curriculum Overview
302 pages
Natnael Mekuanent 2021
No ratings yet
Natnael Mekuanent 2021
86 pages
Amharic-Afaan Oromo Translation Thesis
No ratings yet
Amharic-Afaan Oromo Translation Thesis
77 pages
Amharic News Classification Thesis
No ratings yet
Amharic News Classification Thesis
79 pages
Afan Oromo Learning App Project
50% (2)
Afan Oromo Learning App Project
45 pages
English-Awngi MT Thesis Overview
No ratings yet
English-Awngi MT Thesis Overview
77 pages
Second Chance Education Model in Ethiopia
No ratings yet
Second Chance Education Model in Ethiopia
7 pages
Kidnapping and Human Security Study
100% (1)
Kidnapping and Human Security Study
90 pages
Afan Oromo Education Curriculum Guide
No ratings yet
Afan Oromo Education Curriculum Guide
120 pages
Ethiopian Passport Payment Instructions
No ratings yet
Ethiopian Passport Payment Instructions
2 pages
Mizan-Tepi University CS Curriculum 2021
100% (1)
Mizan-Tepi University CS Curriculum 2021
268 pages
Afaan Oromo-English CLIR Approach
No ratings yet
Afaan Oromo-English CLIR Approach
95 pages
MSc Questionnaire for Computing Program
No ratings yet
MSc Questionnaire for Computing Program
6 pages
Machine Learning in Addis Ababa Traffic Analysis
No ratings yet
Machine Learning in Addis Ababa Traffic Analysis
16 pages
Internship Report at MoWSA
100% (1)
Internship Report at MoWSA
34 pages
Factors Influencing Writing Skills
No ratings yet
Factors Influencing Writing Skills
16 pages
PHD Curriculum
No ratings yet
PHD Curriculum
81 pages
Curriculum Details for IT Degree
100% (1)
Curriculum Details for IT Degree
172 pages
Web-Based Recruitment for Kotebe University
No ratings yet
Web-Based Recruitment for Kotebe University
30 pages
Bidirectional English-Afaan Oromo Translation
No ratings yet
Bidirectional English-Afaan Oromo Translation
1 page
ANVITA: Multilingual NMT for Africa
No ratings yet
ANVITA: Multilingual NMT for Africa
8 pages
English-Afaan Oromo Statistical Machine Translation
100% (1)
English-Afaan Oromo Statistical Machine Translation
6 pages
Urdu Machine Translation Challenges
No ratings yet
Urdu Machine Translation Challenges
6 pages
The Baptismal Ritual in The Earliest Eth
No ratings yet
The Baptismal Ritual in The Earliest Eth
54 pages
Liturgy of the Epiphany Celebration
No ratings yet
Liturgy of the Epiphany Celebration
3 pages
Venetian Chalice for Emperor Dawit
No ratings yet
Venetian Chalice for Emperor Dawit
5 pages
Genres of Ethiopian-Eritrean Christian Literature With A Focus On Hagiography
100% (1)
Genres of Ethiopian-Eritrean Christian Literature With A Focus On Hagiography
30 pages
Religious Tourism Potential of Gishen Derbe Kerbe Mariam, Ethiopia
No ratings yet
Religious Tourism Potential of Gishen Derbe Kerbe Mariam, Ethiopia
19 pages
تاريخ إثيوبيا - د. زاهر رياض
No ratings yet
تاريخ إثيوبيا - د. زاهر رياض
286 pages
Oromo Dictionary Project Overview
No ratings yet
Oromo Dictionary Project Overview
1 page
2016 Kia Sorento Specifications Overview
No ratings yet
2016 Kia Sorento Specifications Overview
2 pages
Coptic Lexical Influence On Egyptian Arabic
No ratings yet
Coptic Lexical Influence On Egyptian Arabic
10 pages
A-Z Vocabulary of Ancient Egypt
No ratings yet
A-Z Vocabulary of Ancient Egypt
2 pages
العمارة القبطية في مصر
No ratings yet
العمارة القبطية في مصر
207 pages
GST 111 Lecture Notes-1
No ratings yet
GST 111 Lecture Notes-1
96 pages
Câu Viết Lại Ngữ Pháp Tiếng Anh
No ratings yet
Câu Viết Lại Ngữ Pháp Tiếng Anh
3 pages
Infinitive and Gerund Exercises
No ratings yet
Infinitive and Gerund Exercises
1 page
English Vocabulary for Pharmacy Visits
No ratings yet
English Vocabulary for Pharmacy Visits
10 pages
Urban Spatial Structure Analysis Models
No ratings yet
Urban Spatial Structure Analysis Models
9 pages
Understanding Have, Has, Had Been
100% (1)
Understanding Have, Has, Had Been
8 pages
How to Write a Structured Essay
No ratings yet
How to Write a Structured Essay
2 pages
Career Guidance Program 2024 Launch
No ratings yet
Career Guidance Program 2024 Launch
2 pages
Relative Pronouns Exercise Guide
No ratings yet
Relative Pronouns Exercise Guide
2 pages
ArcelorMittal Strike and Workgroup Findings
No ratings yet
ArcelorMittal Strike and Workgroup Findings
2 pages
Past Perfect Tense ESL Worksheets
No ratings yet
Past Perfect Tense ESL Worksheets
1 page
Understanding Syllables in Language
No ratings yet
Understanding Syllables in Language
53 pages
Grace Nichols' Island Man Analysis Guide
No ratings yet
Grace Nichols' Island Man Analysis Guide
20 pages
Câu Điều Kiện Loại 3: Lý Thuyết & Bài Tập
No ratings yet
Câu Điều Kiện Loại 3: Lý Thuyết & Bài Tập
4 pages
Analyzing Characterisation in Texts
100% (1)
Analyzing Characterisation in Texts
3 pages
Cambridge English Empower A2 Progress Test: Student Name
No ratings yet
Cambridge English Empower A2 Progress Test: Student Name
12 pages
Grade 6 English Lesson Plan: Unit 10
No ratings yet
Grade 6 English Lesson Plan: Unit 10
10 pages
Pronunciation Guide for -ed and -s Forms
No ratings yet
Pronunciation Guide for -ed and -s Forms
2 pages
Tong King Lee - Artificial Intelligence and Posthumanist Translation Chat GPT Versus The Translator
No ratings yet
Tong King Lee - Artificial Intelligence and Posthumanist Translation Chat GPT Versus The Translator
22 pages
December 2025-2026 LKG Academic Plan
No ratings yet
December 2025-2026 LKG Academic Plan
2 pages
English Connectors Glossary Guide
No ratings yet
English Connectors Glossary Guide
6 pages
Language Communities in Japan Ed John C. Maher Full Chapters Instanly
100% (6)
Language Communities in Japan Ed John C. Maher Full Chapters Instanly
138 pages
Japanese Colors and Their Meanings
No ratings yet
Japanese Colors and Their Meanings
20 pages
Egyptian Math Revision Pack for 5th Grade
No ratings yet
Egyptian Math Revision Pack for 5th Grade
3 pages
Lophoc 21: Stress and Grammar Exercises
No ratings yet
Lophoc 21: Stress and Grammar Exercises
2 pages
AAC Research Manual Overview
No ratings yet
AAC Research Manual Overview
63 pages
Bakhtin and Winnicott On Dialogue, Self, and Cure
No ratings yet
Bakhtin and Winnicott On Dialogue, Self, and Cure
19 pages
Understanding Question Words in English
No ratings yet
Understanding Question Words in English
1 page
Grade 5 Reading Remediation Program
No ratings yet
Grade 5 Reading Remediation Program
6 pages
Regular and Irregular Verbs List
No ratings yet
Regular and Irregular Verbs List
2 pages