Change search
Refine search result
1 - 3 of 3
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Kurfali, Murathan
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Östling, Robert
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Noisy Parallel Corpus Filtering through Projected Word Embeddings2019In: Proceedings of the Fourth Conference on Machine Translation (WMT), Association for Computational Linguistics, 2019, Vol. 3, p. 279-283Conference paper (Refereed)
    Abstract [en]

    We present a very simple method for parallel text cleaning of low-resource languages, based on projection of word embeddings trained on large monolingual corpora in high-resource languages. In spite of its simplicity, we approach the strong baseline system in the downstream machine translation evaluation.

  • 2.
    Kurfali, Murathan
    et al.
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Östling, Robert
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Zero-shot transfer for implicit discourse relation classification2019In: 20th Annual Meeting of the Special Interest Group on Discourse and Dialogue: Proceedings of the Conference, Association for Computational Linguistics, 2019, p. 226-231Conference paper (Refereed)
    Abstract [en]

    Automatically classifying the relation between sentences in a discourse is a challenging task, in particular when there is no overt expression of the relation. It becomes even more challenging by the fact that annotated training data exists only for a small number of languages, such as English and Chinese. We present a new system using zero-shot transfer learning for implicit discourse relation classification, where the only resource used for the target language is unannotated parallel text. This system is evaluated on the discourse-annotated TEDMDB parallel corpus, where it obtains good results for all seven languages using only English training data.

  • 3. Zeyrek, Deniz
    et al.
    Mendes, Amália
    Grishina, Yulia
    Kurfali, Murathan
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics. Middle East Technical University, Turkey.
    Gibbon, Samuel
    Ogrodniczuk, Maciej
    TED Multilingual Discourse Bank (TED-MDB): a parallel corpus annotated in the PDTB style2019In: Language resources and evaluation, ISSN 1574-020X, E-ISSN 1574-0218Article in journal (Refereed)
    Abstract [en]

    TED-Multilingual Discourse Bank, or TED-MDB, is a multilingual resource where TED-talks are annotated at the discourse level in 6 languages (English, Polish, German, Russian, European Portuguese, and Turkish) following the aims and principles of PDTB. We explain the corpus design criteria, which has three main features: the linguistic characteristics of the languages involved, the interactive nature of TED talks—which led us to annotate Hypophora, and the decision to avoid projection. We report our annotation consistency, and post-annotation alignment experiments, and provide a cross-lingual comparison based on corpus statistics.

1 - 3 of 3
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf