Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Using Uplug and SiteSeeker to construct a cross language search engine for Scandinavian
Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap. Informationssystem.
Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
2007 (Engelska)Konferensbidrag, Publicerat paper (Övrigt vetenskapligt)
Abstract [en]

This paper presents how we adapted a website search engine for cross language information retrieval, using the Uplug word alignment tool for parallel corpora.We first studied the monolingual search queries posed by the visitors of the website of the Nordic council containing five different languages. In order to compare how well different types of bilingual dictionaries covered the most common queries and terms on the website we tried a collection of ordinary bilingual dictionaries, a small manually constructed trilingual dictionary and an automatically constructed trilingual dictionary, constructed from the news corpus in the website using Uplug. The pre-cision and recall of the automatically constructed Swedish-English dictionary using Uplug were 71 and 93 percent, re-spectively. We found that precision and recall increase significantly in samples with high word frequency, but we could not confirm that POS-tags improve pre-cision. The collection of ordinary dic-tionaries, consisting of about 200 000 words, only cover 41 of the top 100 search queries at the website. The automatically built trilingual dictionary com-bined with the small manually built trilingual dictionary, consisting of about 2 300 words, and covers 36 of the top search queries.

Ort, förlag, år, upplaga, sidor
2007.
Nyckelord [en]
Cross, language, information, retrieval, parallel, corpora, word, alignment, Swedish, Danish, Norwegian.
Identifikatorer
URN: urn:nbn:se:su:diva-12167OAI: oai:DiVA.org:su-12167DiVA, id: diva2:178687
Tillgänglig från: 2008-01-16 Skapad: 2008-01-16Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Sök vidare i DiVA

Av författaren/redaktören
Dalianis, HerculesRimka, Martin
Av organisationen
Institutionen för data- och systemvetenskap

Sök vidare utanför DiVA

GoogleGoogle Scholar

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 511 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf