Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Multi-Word Expression Dataset for Swedish
Stockholm University, Faculty of Humanities, Department of Linguistics.ORCID iD: 0000-0002-7020-8275
Stockholm University, Faculty of Humanities, Department of Linguistics.ORCID iD: 0000-0002-6027-4156
Stockholm University, Faculty of Humanities, Department of Linguistics.
Stockholm University, Faculty of Humanities, Department of Linguistics.ORCID iD: 0000-0003-4040-3544
2020 (English)In: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), Marseille: European Language Resources Association (ELRA) , 2020, p. 4402-4409Conference paper, Published paper (Refereed)
Abstract [en]

We present a new set of 96 Swedish multi-word expressions annotated with degree of (non-)compositionality. In contrast to most previous compositionality datasets we also consider syntactically complex constructions and publish a formal specification of each expression. This allows evaluation of computational models beyond word bigrams, which have so far been the norm. Finally, we use the annotations to evaluate a system for automatic compositionality estimation based on distributional semantics. Our analysis of the disagreements between human annotators and the distributional model reveal interesting questions related to the perception of compositionality, and should be informative to future work in the area.

Place, publisher, year, edition, pages
Marseille: European Language Resources Association (ELRA) , 2020. p. 4402-4409
Keywords [en]
multi-word expressions, compositionality, distributional semantic
National Category
Natural Language Processing
Research subject
Computational Linguistics
Identifiers
URN: urn:nbn:se:su:diva-192115OAI: oai:DiVA.org:su-192115DiVA, id: diva2:1543695
Conference
12th Conference on Language Resources and Evaluation (LREC 2020), Marseille, France, May 11–16, 2020
Available from: 2021-04-12 Created: 2021-04-12 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Free full text

Authority records

Kurfali, MurathanÖstling, RobertSjons, JohanWirén, Mats

Search in DiVA

By author/editor
Kurfali, MurathanÖstling, RobertSjons, JohanWirén, Mats
By organisation
Department of Linguistics
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 179 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf