Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Developing a standard for de-identifying electronic patient records written in Swedish: precision, recall and F-measure in a manual and computerized annotation trial
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
2009 (English)In: International Journal of Medical Informatics, ISSN 1386-5056, E-ISSN 1872-8243, Vol. 78, no 12, e19-e26 p.Article in journal (Refereed) Published
Abstract [en]

Background

Electronic patient records (EPRs) contain a large amount of information written in free text. This information is considered very valuable for research but is also very sensitive since the free text parts may contain information that could reveal the identity of a patient. Therefore, methods for de-identifying EPRs are needed. The work presented here aims to perform a manual and automatic Protected Health Information (PHI)-annotation trial for EPRs written in Swedish.

Methods

This study consists of two main parts: the initial creation of a manually PHI-annotated gold standard, and the porting and evaluation of an existing de-identification software written for American English to Swedish in a preliminary automatic de-identification trial. Results are measured with precision, recall and F-measure.

Results

This study reports fairly high Inter-Annotator Agreement (IAA) results on the manually created gold standard, especially for specific tags such as names. The average IAA over all tags was 0.65 F-measure (0.84 F-measure highest pairwise agreement). For name tags the average IAA was 0.80 F-measure (0.91 F-measure highest pairwise agreement). Porting a de-identification software written for American English to Swedish directly was unfortunately non-trivial, yielding poor results.

Conclusion

Developing gold standard sets as well as automatic systems for de-identification tasks in Swedish is feasible. However, discussions and definitions on identifiable information is needed, as well as further developments both on the tag sets and the annotation guidelines, in order to get a reliable gold standard. A completely new de-identification software needs to be developed.

Place, publisher, year, edition, pages
2009. Vol. 78, no 12, e19-e26 p.
Keyword [en]
Medical informatics applications, Natural language processing, Medical record systems, Electronic patient records in Swedish, Protected health information, Ethical issues, Annotation
National Category
Computer and Information Science
Identifiers
URN: urn:nbn:se:su:diva-33411DOI: 10.1016/j.ijmedinf.2009.04.005ISI: 000272036200012OAI: oai:DiVA.org:su-33411DiVA: diva2:283092
Available from: 2009-12-23 Created: 2009-12-23 Last updated: 2017-12-12Bibliographically approved
In thesis
1. Shades of Certainty: Annotation and Classification of Swedish Medical Records
Open this publication in new window or tab >>Shades of Certainty: Annotation and Classification of Swedish Medical Records
2012 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Access to information is fundamental in health care. This thesis presents research on Swedish medical records with the overall goal of building intelligent information access tools that can aid health personnel, researchers and other professions in their daily work, and, ultimately, improve health care in general.

The issue of ethics and identifiable information is addressed by creating an annotated gold standard corpus and porting an existing de-identification system to Swedish from English. The aim is to move towards making textual resources available to researchers without risking exposure of patients’ confidential information. Results for the rule-based system are not encouraging, but results for the gold standard are fairly high.

Affirmed, uncertain and negated information needs to be distinguished when building accurate information extraction tools. Annotation models are created, with the aim of building automated systems. One model distinguishes certain and uncertain sentences, and is applied on medical records from several clinical departments. In a second model, two polarities and three levels of certainty are applied on diagnostic statements from an emergency department. Overall results are promising. Differences are seen depending on clinical practice, annotation task and level of domain expertise among the annotators.

Using annotated resources for automatic classification is studied. Encouraging overall results using local context information are obtained. The fine-grained certainty levels are used for building classifiers for real-world e-health scenarios.

This thesis contributes two annotation models of certainty and one of identifiable information, applied on Swedish medical records. A deeper understanding of the language use linked to conveying certainty levels is gained. Three annotated resources that can be used for further research have been created, and implications for automated systems are presented.

Place, publisher, year, edition, pages
Stockholm: Department of Computer and Systems Sciences, Stockholm University, 2012. 78 p.
Series
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 12-002
Keyword
Clinical documentation, Certainty level classification, Annotation, E-health, Corpus creation, De-identification, Speculative language, Medical Records, Swedish, Natural Language Processing, Language Technology
National Category
Information Systems, Social aspects
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-74828 (URN)978-91-7447-444-2 (ISBN)
Public defence
2012-04-27, Sal C, Forum 100, Isafjordsgatan 39, Kista, 13:00 (English)
Opponent
Supervisors
Available from: 2012-04-05 Created: 2012-03-27 Last updated: 2012-03-28Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Velupillai, SumithraDalianis, HerculesHassel, Martin
By organisation
Department of Computer and Systems Sciences
In the same journal
International Journal of Medical Informatics
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 131 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf