Change search
ReferencesLink to record
Permanent link

Direct link
Hybrid Methods for Coreference Resolution in Swedish
Stockholm University, Faculty of Humanities, Department of Linguistics.ORCID iD: 0000-0002-9447-8544
2010 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

The aim of this thesis is to improve coreference resolution in Swedish by providing a hybrid approach based on combining data-driven methods and linguistic knowledge. Coreference resolution here consists in identifying all expressions in a text that have the same referent, for example, a person or an object.

The linguistic knowledge is based on Accessibility Theory (Ariel 1990). This is used for guiding the  selection of likely anaphor-antecedent pairs from the set of all possible such pairs in a text. The data-driven method adopted is Memory-Based Learning (MBL), a supervised method based on the idea that learning means storing experiences in memory, and that new problems are solved by reusing solutions from similar experiences (Daelemans and Van den Bosch 2005).

The referring expressions covered by the system are names, definite descriptions, and pronouns. In order to maximize performance, we use different classifiers with a specific set of linguistically motivated features for each type of expression. The great majority of features used for classification are domain- and language-independent.

We demonstrate two ways of using this method of linguistically motivated selection of anaphor-antecedent pairs.

First, the amount of training examples stored in memory  is reduced. We find that for coreference resolution of definite descriptions and names, the amount of training data can thereby be reduced with only a minor loss in performance, but for pronoun resolution there is a negative effect.

Second, selection can be used for improving on coreference resolution results. This is the first step in our hybrid approach to coreference resolution, where the second step is the application of an MBL classifier for determining coreference between the selected pairs. Results indicate that this hybrid approach is advantageous for coreference resolution of definite descriptions and names. For pronoun resolution, there is a negative effect on recall along with a positive effect on precision.

Place, publisher, year, edition, pages
Stockholm: Department of Linguistics, Stockholm University , 2010. , 191 p.
Keyword [en]
coreference resolution, coreference, anaphora, discourse processing
National Category
Language Technology (Computational Linguistics)
Research subject
Computational Linguistics
URN: urn:nbn:se:su:diva-38395ISBN: 978-91-7447-075-8OAI: diva2:311715
Public defence
2010-06-03, hörsal 5, hus B, Universitetsvägen 10 B, Stockholm, 13:00 (English)
För att köpa boken skicka en beställning till To order the book send an e-mail to from: 2010-05-11 Created: 2010-04-13 Last updated: 2014-05-26Bibliographically approved

Open Access in DiVA

fulltext(3236 kB)621 downloads
File information
File name FULLTEXT01.pdfFile size 3236 kBChecksum SHA-512
Type fulltextMimetype application/pdf
errata(36 kB)58 downloads
File information
File name ERRATA01.pdfFile size 36 kBChecksum SHA-512
Type errataMimetype application/pdf

Search in DiVA

By author/editor
Nilsson, Kristina
By organisation
Department of Linguistics
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 626 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 634 hits
ReferencesLink to record
Permanent link

Direct link