Change search
ReferencesLink to record
Permanent link

Direct link
Automatic Classification of Factuality Levels: A Case Study on Swedish Diagnoses and the Impact of Local Context
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
2011 (English)In: The Fourth International Symposium on Languages in Biology and Medicine, Singapore, 2011Conference paper (Refereed)
Abstract [en]

Clinicians express different levels of knowledge certainty when reasoning about a patient’s status. Automatic extraction of relevant information is crucial in the clinical setting, which means that factuality levels need to be distinguished. We present an automatic classifier using Conditional Random Fields, which is trained and tested on a Swedish clinical corpus annotated for factuality levels at a diagnosis statement level: the Stockholm EPR Diagnosis-Factuality Corpus. The classifier obtains promising results (best overall results are 0.699 average F-measure using all classes, 0.762 F-measure using merged classes), using simple local context features. Preceding context is more useful than posterior, although best results are obtained using a window size of +/-4. Lower levels of certainty are more problematic than higher levels, which was also the case for the human annotators in creating the corpus. A manual error analysis shows that conjunctions and other higher-level features are common sources of errors.

Place, publisher, year, edition, pages
Singapore, 2011.
National Category
Information Systems
Research subject
Computer and Systems Sciences
URN: urn:nbn:se:su:diva-68729OAI: diva2:473284
Fourth International Symposium on Languages in Biology and Medicine, LBM 2011
Available from: 2012-01-05 Created: 2012-01-05 Last updated: 2012-03-27Bibliographically approved
In thesis
1. Shades of Certainty: Annotation and Classification of Swedish Medical Records
Open this publication in new window or tab >>Shades of Certainty: Annotation and Classification of Swedish Medical Records
2012 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Access to information is fundamental in health care. This thesis presents research on Swedish medical records with the overall goal of building intelligent information access tools that can aid health personnel, researchers and other professions in their daily work, and, ultimately, improve health care in general.

The issue of ethics and identifiable information is addressed by creating an annotated gold standard corpus and porting an existing de-identification system to Swedish from English. The aim is to move towards making textual resources available to researchers without risking exposure of patients’ confidential information. Results for the rule-based system are not encouraging, but results for the gold standard are fairly high.

Affirmed, uncertain and negated information needs to be distinguished when building accurate information extraction tools. Annotation models are created, with the aim of building automated systems. One model distinguishes certain and uncertain sentences, and is applied on medical records from several clinical departments. In a second model, two polarities and three levels of certainty are applied on diagnostic statements from an emergency department. Overall results are promising. Differences are seen depending on clinical practice, annotation task and level of domain expertise among the annotators.

Using annotated resources for automatic classification is studied. Encouraging overall results using local context information are obtained. The fine-grained certainty levels are used for building classifiers for real-world e-health scenarios.

This thesis contributes two annotation models of certainty and one of identifiable information, applied on Swedish medical records. A deeper understanding of the language use linked to conveying certainty levels is gained. Three annotated resources that can be used for further research have been created, and implications for automated systems are presented.

Place, publisher, year, edition, pages
Stockholm: Department of Computer and Systems Sciences, Stockholm University, 2012. 78 p.
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 12-002
Clinical documentation, Certainty level classification, Annotation, E-health, Corpus creation, De-identification, Speculative language, Medical Records, Swedish, Natural Language Processing, Language Technology
National Category
Information Systems, Social aspects
Research subject
Computer and Systems Sciences
urn:nbn:se:su:diva-74828 (URN)978-91-7447-444-2 (ISBN)
Public defence
2012-04-27, Sal C, Forum 100, Isafjordsgatan 39, Kista, 13:00 (English)
Available from: 2012-04-05 Created: 2012-03-27 Last updated: 2012-03-28Bibliographically approved

Open Access in DiVA

fulltext(147 kB)279 downloads
File information
File name FULLTEXT01.pdfFile size 147 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Velupillai, Sumithra
By organisation
Department of Computer and Systems Sciences
Information Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 279 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 99 hits
ReferencesLink to record
Permanent link

Direct link