Change search
ReferencesLink to record
Permanent link

Direct link
From surprisal to tagging and syntactic parsing: measuring the idiom and syntax principle
Zurich University.
Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.ORCID iD: 0000-0003-2815-395X
2014 (English)In: LEXICAL BUNDLES IN ENGLISH NON-FICTION WRITING: FORMS AND FUNCTIONS, 2014Conference paper, Abstract (Refereed)
Abstract [en]

We introduced surprisal as abstraction from lexical bundles to lexical bundleness. There are forces beyond lexical bundles: on the one hand word-sequence abstractions to word classes, on the other hand the syntax principle (SSP) in contradistinction to the idiom principle (SIP). We ultimately aim for a model of their mutual influence (Sinclair 1991).We motivate the use of models, then abstract to word-class models using a part-of-speech tagger, and to syntactic models, using a large-scale parser. Part-of-speech taggers assign word-classes based on sequences. They typically achieve high accuracy. Areas of low accuracy and low tagger confidence for word class assignment indicate low model fit, and thus often high entropy, lack of formulaic sequences. Tagger model fit can be used as measure of morphosyntactic bundleness.Although creative language (SSP) is rarer, it needs to be respected. We thus also use a syntactic parser language model (Schneider 2008) which combines SSP in form of a hand-written competence grammar and SIP as probabilistic performance disambiguation, paying tribute to Hoey (2005)'s insights on lexical priming. We show that parser model fit is lower on low-level L2 texts, as we can expect according to Pawley and Syder (1983). Finally, we introduce measures of syntactic surprisal.

Place, publisher, year, edition, pages
National Category
General Language Studies and Linguistics
URN: urn:nbn:se:su:diva-109373OAI: diva2:764524
12th ESSE CONFERENCE, SLANG16 workshop
Available from: 2014-11-19 Created: 2014-11-19 Last updated: 2014-11-19

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Grigonyte, Gintare
By organisation
Computational Linguistics
General Language Studies and Linguistics

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 93 hits
ReferencesLink to record
Permanent link

Direct link