Change search
ReferencesLink to record
Permanent link

Direct link
The Coll Corpus: Towards a corpus of web-based college student newspapers
Stockholm University, Faculty of Humanities, Department of English.
2002 (English)In: New Frontiers of Corpus Research: Papers from the 21st International Conference on English Language Research on Computerized Corpora, Amsterdam: Rodopi, 2002, 71-90 p.Chapter in book (Other academic)
Abstract [en]

Unlike major English-language corpora hitherto released, on-line college student newspapers provide an unexplored record from much younger writers. In these newspapers, 20-year-olds address their peers in a situation that largely parallels standard newspaper writing as regards formal correctness and time pressure. Nearly unconstrained by outside intervention or house style sheets, they deal with a range of university student interests, including creative writing.

This preliminary version of the Coll Corpus consists of one issue each of nearly all 300-plus college and university newspapers available on the Web as of spring 1999, with a total of 3.88 million words. Although AmE dominates, the resultant geographical distribution is relatively well matched to actual population ratios. In its present form, the corpus already allows exploration of numerous lexical and semantic features along temporal and geographic dimensions. Given its on-line accessibility, future versions should be easily expandable by several orders of magnitude.

Place, publisher, year, edition, pages
Amsterdam: Rodopi, 2002. 71-90 p.
Keyword [en]
corpus linguistics, corpora, electronic newspapers, Internet, web newspapers
National Category
Specific Languages
URN: urn:nbn:se:su:diva-131850ISBN: 90-420-1237-4OAI: diva2:184551
Available from: 2007-10-16 Created: 2007-10-16 Last updated: 2014-08-27Bibliographically approved
In thesis
1. Studies in Corpora and Idioms: Getting the cat out of the bag
Open this publication in new window or tab >>Studies in Corpora and Idioms: Getting the cat out of the bag
2014 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

“Idiomatic” expressions, usually called “idioms”, such as a dime a dozen, a busman’s holiday, or to have bats in your belfry are a curious part of any language: they usually have a fixed lexical (why a busman?) and structural composition (only dime and dozen in direct conjunction mean ‘common, ordinary’), can be semantically obscure (why bats?), yet are widely recognized in the speech community, in spite of being so rare that only large corpora can provide us with access to sufficient empirical data on their use.

In this compilation thesis, four published studies focusing on idioms in corpora are presented. Study 1 details the creation of and data in the author’s medium-sized corpus from 1999, the 3.7 million word Coll corpus of online university student newspapers, with comparisons to data from standard corpora of the time. Study 2 examines the extent to which recognized idioms are to be found in the Coll corpus and how they can be varied. Study 3 draws upon the British National Corpus and a series of British and American newspaper corpora to see how idioms may be “anchored” in their contexts, primarily by the device of premodification via an adjective appropriate to the context, not to the idiom. Study 4 examines idiom-usage patterns in the Time Magazine corpus, focusing on possible aspects of diachronic change over the near-century Time represents.

The introductory compilation chapter places and discusses these studies in their contexts of contemporary idiom and corpus research; building on these studies, it provides two specific examples of potential ways forward in idiom research: an examination of the idioms used in a specific subgenre of newspapers (editorials), and a detailed suggestion for teachers about how to examine multiple facets of a specific modern idiom (the glass ceiling) in the classroom. Finally, a summing-up includes suggestions for further research, particularly at the level of the patterning of individual idioms, rather than treating them as a homogeneous phenomenon.

Place, publisher, year, edition, pages
Stockholm: Department of English, Stockholm University, 2014. 217 p.
Coll corpus, corpora, corpus creation, idioms, idiom variation, idiom-breaking, online newspapers, student newspapers, college newspapers
National Category
Specific Languages
Research subject
urn:nbn:se:su:diva-18029 (URN)978-91-7447-975-1 (ISBN)
Public defence
2014-10-11, Lecture Hall 7 D, Universitetsvägen 10 D, Stockholm, 10:00 (English)
Available from: 2014-09-18 Created: 2007-10-16 Last updated: 2014-12-16Bibliographically approved

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Minugh, David
By organisation
Department of English
Specific Languages

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 167 hits
ReferencesLink to record
Permanent link

Direct link