Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Noisy Parallel Corpus Filtering through Projected Word Embeddings
Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.ORCID iD: 0000-0002-6027-4156
2019 (English)In: Proceedings of the Fourth Conference on Machine Translation (WMT), Association for Computational Linguistics, 2019, Vol. 3, p. 279-283Conference paper, Published paper (Refereed)
Abstract [en]

We present a very simple method for parallel text cleaning of low-resource languages, based on projection of word embeddings trained on large monolingual corpora in high-resource languages. In spite of its simplicity, we approach the strong baseline system in the downstream machine translation evaluation.

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2019. Vol. 3, p. 279-283
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:su:diva-172783OAI: oai:DiVA.org:su-172783DiVA, id: diva2:1349750
Conference
Fourth Conference on Machine Translation (WMT19), Florence, Italy, August 1-2, 2019
Available from: 2019-09-09 Created: 2019-09-09 Last updated: 2019-12-17Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Free full text

Search in DiVA

By author/editor
Kurfali, MurathanÖstling, Robert
By organisation
Computational Linguistics
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 82 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf