Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
An Amharic Stemmer : Reducing Words to their Citation Forms
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences. Programvaruutveckling.
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
2007 (English)In: Computational Approaches to Semitic Languages: Common Issues and Resources, 2007Conference paper, Published paper (Other academic)
Abstract [en]

Stemming is an important analysis step in a number of areas such as natural language processing (NLP), information retrieval (IR), machine translation(MT) and text classification. In this paper we present the development of a stemmer for Amharic that reduces words to their citation forms. Amharic is a Semitic language with rich and complex morphology. The application of such a stemmer is in dictionary based cross language IR, where there is a need in the translation step, to look up terms in a machine readable dictionary (MRD). We apply a rule based approach supplemented by occurrence statistics of words in a MRD and in a 3.1M words news corpus. The main purpose of the statistical upplements is to resolve ambiguity between alternative segmentations. The stemmer is evaluated on Amharic text from two domains, news articles and a classic fiction text. It is shown to have an accuracy of 60% for the old fashioned fiction text and 75% for the news articles.

Place, publisher, year, edition, pages
2007.
Identifiers
URN: urn:nbn:se:su:diva-12116OAI: oai:DiVA.org:su-12116DiVA: diva2:178636
Available from: 2008-01-17 Created: 2008-01-17Bibliographically approved

Open Access in DiVA

No full text

Other links

http://www.aclweb.org/anthology/W/W07/W07-0814

Search in DiVA

By author/editor
Asker, LarsAlemu Argaw, Atelach
By organisation
Department of Computer and Systems Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 195 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf