Multiword structures in different materials, and with different goals and methodologies
2013 (English)In: Yearbook of Corpus Linguistics and Pragmatics 2013 / [ed] Jesús Romero Trillo, Berlin, Heidelberg: Springer Berlin/Heidelberg, 2013, 77-103 p.Chapter in book (Refereed)
It is a well-known fact that multi-word units (MWUs), however pervasive they may be in language use, are difficult to define To date, no definition widely agreed upon exists and, seemingly, an even more complicated endeavor has been to agree upon which subcategories should be included for analysis.
As Granger & Pacquot (2008) point out, largely two methodologies are being used in the study of MWUs. One is the phraseological method, where researchers use linguistic criteria regarding e.g. fixedness and exchangeability, with some intervention of researcher introspection and intersubjectivity. Within this tradition, mainly idioms (‘grab the bull by its horns’) and collocations (‘draw a conclusion’) have been explored. The other is the statistical method, where sequences are given the status of MWUs on the basis of purely statistical measures such as log likelihood and MI-score. Frequently, studies using these methods focus on lexical bundles (Biber et al. 2003, Ellis et al. 2008) and also on collocations (Durrant & Schmitt, 2009). In addition, proponents of this latter method tend to use very large corpora of written academic English, whereas the phraseological tradition is used on all sorts of corpora.
The fact that these different methods thus target different MWUs considering that a major challenge for linguistics today is to map to what extent language use is composed of ready-made chunks (cf. Erman & Warren 2000, Melčuk 1998). Accordingly, a methodology is called for that can account for MWUs from a more holistic perspective.
In order to pinpoint methodological issues related to MWU identification in corpora, the present study will analyze data from English and Spanish spoken corpora and confront phraseological/introspective methods with statistical methods. The results show that the different methods arrive at essentially divergent sets of MWUs, and the consequences thereof will be discussed.
Place, publisher, year, edition, pages
Berlin, Heidelberg: Springer Berlin/Heidelberg, 2013. 77-103 p.
, Yearbook of Corpus Linguistics and Pragmatics, ISSN 2213-6819 ; 1
multi-word units, idioms, collocations, lexical bundles, statistical methods, phraseological methods, spoken English, spoken Spanish
Research subject English; Spanish; Bilingualism
IdentifiersURN: urn:nbn:se:su:diva-96968ISBN: 978-94-007-6250-3OAI: oai:DiVA.org:su-96968DiVA: diva2:668328