From surprisal to tagging and syntactic parsing: measuring the idiom and syntax principle
2014 (English)In: LEXICAL BUNDLES IN ENGLISH NON-FICTION WRITING: FORMS AND FUNCTIONS, 2014Conference paper, Abstract (Refereed)
We introduced surprisal as abstraction from lexical bundles to lexical bundleness. There are forces beyond lexical bundles: on the one hand word-sequence abstractions to word classes, on the other hand the syntax principle (SSP) in contradistinction to the idiom principle (SIP). We ultimately aim for a model of their mutual influence (Sinclair 1991).We motivate the use of models, then abstract to word-class models using a part-of-speech tagger, and to syntactic models, using a large-scale parser. Part-of-speech taggers assign word-classes based on sequences. They typically achieve high accuracy. Areas of low accuracy and low tagger confidence for word class assignment indicate low model fit, and thus often high entropy, lack of formulaic sequences. Tagger model fit can be used as measure of morphosyntactic bundleness.Although creative language (SSP) is rarer, it needs to be respected. We thus also use a syntactic parser language model (Schneider 2008) which combines SSP in form of a hand-written competence grammar and SIP as probabilistic performance disambiguation, paying tribute to Hoey (2005)'s insights on lexical priming. We show that parser model fit is lower on low-level L2 texts, as we can expect according to Pawley and Syder (1983). Finally, we introduce measures of syntactic surprisal.
Place, publisher, year, edition, pages
General Language Studies and Linguistics
IdentifiersURN: urn:nbn:se:su:diva-109373OAI: oai:DiVA.org:su-109373DiVA: diva2:764524
12th ESSE CONFERENCE, SLANG16 workshop