Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Stockholm University Strindberg Corpus: Contents and possibilities
Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.ORCID iD: 0000-0002-9447-8544
Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.ORCID iD: 0000-0001-5611-6369
Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.ORCID iD: 0000-0003-4040-3544
2012 (English)In: Arvet efter Strindberg - The Strindberg Legacy. The 18th International Strindberg Conference. Stockholm University, May 31 -- June 3, 2012., 2012Conference paper, Oral presentation only (Other academic)
Abstract [en]

The Stockholm University Strindberg Corpus (SUSC) consists of seven novels by August Strindberg annotated for parts-of-speech with morphological analysis and lemmas. The corpus is freely available.

SUSC consists of approximately 400 000 tokens annotated for parts-of-speech, including morphological analysis and lemmas, using the Stockholm-Umeå Corpus tag set in PAROLE-format. The annotated texts have been converted to XML which makes the corpus searchable with corpus analysis tools such as Xaira. This allows for e.g., searching for concordances with a specific wordform, part-of-speech and/or lemma, for pattern matching, and collocation extraction.

The current version of the corpus includes seven works which can be classified as autobiographical:

  • Tjänstekvinnans son (The son of a servant, 1886-87)
  • Han och hon (He and she, 1919)
  • Inferno (Inferno, 1897)
  • Legender and Jakob brottas (Legends and Jacob wrestles, 1898)
  • Fagervik och Skamsund (Fair haven and Foulstrand, 1902)
  • Ensam (Alone, 1903)

We are aware of three other electronic collections of Strindberg’s works: Projekt Runeberg, Litteraturbanken and Språkbanken. While these are valuable resources, SUSC is an important addition because, unlike the first two, it is linguistically annotated, and unlike the third, the data is available for download and thus can be fully inspected and processed using the researcher’s software of choice. Even more importantly, researchers can add their analyses as new layers of annotation of the corpus.

Place, publisher, year, edition, pages
2012.
Keyword [en]
corpus stylistics
National Category
Languages and Literature
Research subject
Computational Linguistics
Identifiers
URN: urn:nbn:se:su:diva-80992OAI: oai:DiVA.org:su-80992DiVA: diva2:558731
Conference
The 18th International Strindberg Conference. Stockholm University, May 31--June 3, 2012.
Available from: 2012-10-04 Created: 2012-10-04 Last updated: 2014-05-26

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Nilsson Björkenstam, KristinaGustafson-Capková, SofiaWirén, Mats
By organisation
Computational Linguistics
Languages and Literature

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 470 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf