Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Hybrid Methods for Coreference Resolution in Swedish
Stockholm University, Faculty of Humanities, Department of Linguistics.ORCID iD: 0000-0002-9447-8544
2010 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

The aim of this thesis is to improve coreference resolution in Swedish by providing a hybrid approach based on combining data-driven methods and linguistic knowledge. Coreference resolution here consists in identifying all expressions in a text that have the same referent, for example, a person or an object.

The linguistic knowledge is based on Accessibility Theory (Ariel 1990). This is used for guiding the  selection of likely anaphor-antecedent pairs from the set of all possible such pairs in a text. The data-driven method adopted is Memory-Based Learning (MBL), a supervised method based on the idea that learning means storing experiences in memory, and that new problems are solved by reusing solutions from similar experiences (Daelemans and Van den Bosch 2005).

The referring expressions covered by the system are names, definite descriptions, and pronouns. In order to maximize performance, we use different classifiers with a specific set of linguistically motivated features for each type of expression. The great majority of features used for classification are domain- and language-independent.

We demonstrate two ways of using this method of linguistically motivated selection of anaphor-antecedent pairs.

First, the amount of training examples stored in memory  is reduced. We find that for coreference resolution of definite descriptions and names, the amount of training data can thereby be reduced with only a minor loss in performance, but for pronoun resolution there is a negative effect.

Second, selection can be used for improving on coreference resolution results. This is the first step in our hybrid approach to coreference resolution, where the second step is the application of an MBL classifier for determining coreference between the selected pairs. Results indicate that this hybrid approach is advantageous for coreference resolution of definite descriptions and names. For pronoun resolution, there is a negative effect on recall along with a positive effect on precision.

Place, publisher, year, edition, pages
Stockholm: Department of Linguistics, Stockholm University , 2010. , 191 p.
Keyword [en]
coreference resolution, coreference, anaphora, discourse processing
National Category
Language Technology (Computational Linguistics)
Research subject
Computational Linguistics
Identifiers
URN: urn:nbn:se:su:diva-38395ISBN: 978-91-7447-075-8 (print)OAI: oai:DiVA.org:su-38395DiVA: diva2:311715
Public defence
2010-06-03, hörsal 5, hus B, Universitetsvägen 10 B, Stockholm, 13:00 (English)
Opponent
Supervisors
Note
För att köpa boken skicka en beställning till exp@ling.su.se/ To order the book send an e-mail to exp@ling.su.seAvailable from: 2010-05-11 Created: 2010-04-13 Last updated: 2014-05-26Bibliographically approved

Open Access in DiVA

fulltext(3236 kB)735 downloads
File information
File name FULLTEXT01.pdfFile size 3236 kBChecksum SHA-512
e80451bb2409ec27e5bbb157c0c6d4110ed05c502ab565ff5e3c8692cda74c3812d7f906d77a918f87080c85269d7212189f406c580f2b94cf0004bfdc68d2a7
Type fulltextMimetype application/pdf
errata(36 kB)83 downloads
File information
File name ERRATA01.pdfFile size 36 kBChecksum SHA-512
607f9b6ce5378db22ce65449793041780a4b4b7cb4eefc8f2e8f400ed545b30052ac9e28e9fa503fe663ebba26debab2ae7d750fe8d0eb1285edd800c1e91c45
Type errataMimetype application/pdf

Search in DiVA

By author/editor
Nilsson, Kristina
By organisation
Department of Linguistics
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 740 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 896 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf