|
Das Dokument ist frei verfügbar |
|
| Nachweis | Kein Nachweis verfügbar |
|
This paper describes the acquisition preprocessing segmentation and alignment of an Amharic-English parallel corpus. In doing so we addressed language-specific issues such as normalization and end-ofsentence disambiguation. The corpus consists of 145 820 Amharic-English parallel sentences (segments) from various sources. This corpus is larger in size than previously compiled corpora. It is released for research purposes and can be used to train or support Amharic-English machine translation systems. |
|
|