Åpne denne publikasjonen i ny fane eller vindu >>Vise andre…
2018 (engelsk)Inngår i: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018) / [ed] Nicoletta Calzolari, Khalid Choukri, Christopher Cieri, Thierry Declerck, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis, Takenobu Tokunaga, European Language Resources Association, 2018, s. 817-824Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]
This paper describes an approach to identifying speakers and addressees in dialogues extracted from literary fiction, along with a dataset annotated for speaker and addressee. The overall purpose of this is to provide annotation of dialogue interaction between characters in literary corpora in order to allow for enriched search facilities and construction of social networks from the corpora. To predict speakers and addressees in a dialogue, we use a sequence labeling approach applied to a given set of characters. We use features relating to the current dialogue, the preceding narrative, and the complete preceding context. The results indicate that even with a small amount of training data, it is possible to build a fairly accurate classifier for speaker and addressee identification across different authors, though the identification of addressees is the more difficult task.
sted, utgiver, år, opplag, sider
European Language Resources Association, 2018
Emneord
literary corpora, speaker identification, addressee identification, quote attribution
HSV kategori
Forskningsprogram
datorlingvistik
Identifikatorer
urn:nbn:se:su:diva-154260 (URN)979-10-95546-00-9 (ISBN)
Konferanse
Language Resources and Evaluation Conference, Miyazaki, Japan, 7–12 May, 2018
Forskningsfinansiär
Swedish Research Council, 821-2013-2003
2018-03-212018-03-212025-02-01bibliografisk kontrollert