Comprehenders need to incrementally integrate incoming input with previously processed material. Constraint-based and probabilistic theories of language understanding hold that comprehenders do this by drawing on implicit knowledge about the statistics of the language signal, as observed in their previous experience. I test this prediction against the processing of grammatical relations in Swedish transitive sentences, combining corpus-based modeling and a self-paced reading experiment.
Grammatical relations are often assumed to express role-semantic (e.g., Actor and Undergoer) and discourse-related (such as topic and focus) functions that are encoded on the basis of a systematic interplay between morphosyntactic (e.g., case and word order), semantic / referential (e.g., animacy and definiteness) and verb semantic (e.g., volitionality and sentience) information. Constraint-based and probabilistic theories predict that these information types serve as cues in the process of assigning functions to the argument NPs during language comprehension. The weighting, interplay and availability of these cues vary across languages but do so in systematic ways. For example, languages with fixed word orders tend to have less morphological marking of grammatical relations than languages with less rigid word order restrictions. The morphological marking of grammatical relations is also in many languages restricted to NP arguments which are non-prototypical or marked in terms of semantic or referential properties, given their functions (overt case marking of objects is, e.g., restricted to personal pronouns in English and Swedish). I first assess how these factors affect constituent order (i.e. the order of grammatical relations) in a corpus of Swedish and then test whether comprehenders use the statistical information contained in these cues.
Corpus study. The distribution of SVO and OVS orders conditional on semantic / referential (e.g., animacy and givenness), morphosyntactic (e.g., case) and verb semantic (e.g. volitionality) information was calculated on the basis of 16552 transitive sentences, extracted from a syntactically annotated corpus of Swedish. Three separate mixed logistic regression models were fit to derive the incremental predictions that a simulated comprehender with experience in Swedish would have after seeing the sentence up to and including the first NP (model 1), the verb (model 2), or the second NP (model 3). The regression models provide separate estimates of the objective probability of SVO vs. OVS word order at each point in the sentence. This information was used to design stimuli for a self-paced reading experiment to test whether comprehenders draw on this objectively present information in the input.
Self-paced reading experiment. 45 participants read transitive sentences that varied with respect to word order (SVO vs. OVS), NP1 animacy (animate vs. inanimate) and verb class (volitional vs. experiencer). By-region reading times were well-described by the region-by-region shifts in the probability of SVO vs. OVS word order, calculated as the relative entropy. For example, reading times in the NP2 region observed in locally ambiguous, object-initial sentences were mitigated when the animacy of NP1 and its interaction with the verb class bias towards an object-initial word order, as predicted by the constraint-based and probabilistic theories.
Helsinki, Finland, 2015.