Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
From Disorder to Order: Extracting clinical findings from unstructured text
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.ORCID iD: 0000-0001-6164-7762
2012 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Medical disorders and findings are examples of important information in health record text. Through developing methods for automatically extracting these entities from the health record text, the possibility of making use of the information by automatic computerised processes increases. That a disorder or finding is mentioned in the health record, however, does not necessarily imply that it has been observed in the patient, because disorders that are ruled out and findings that are not observed in the patient are also mentioned.

This licentiate thesis investigates the possibility of automatically extracting disorders and findings from Swedish health record text and the possibility of automatically determining whether these findings and disorders are negated or not.

A rule- and terminology-based system that uses several Swedish medical terminologies, including SNOMED~CT and ICD-10 for extracting disorders, findings and body structures mentioned in Swedish clinical text was constructed and evaluated. Moreover, an English rule-based system for negation detection, NegEx, was adapted to Swedish and evaluated on clinical text written in Swedish.

The evaluation showed that disorders and findings were recognised with low recall, whereas body structures were recognised with comparatively good results. The negation detection system that was adapted to Swedish achieved the same recall as the English system, but lower precision.

The evaluated systems are accurate enough to be useful in some applications, but need to be further developed, especially when it comes to recognising disorders and findings.

Place, publisher, year, edition, pages
Stockholm: Department of Computer and Systems Sciences, Stockholm University , 2012. , 79 p.
Series
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 12-005
Keyword [en]
Text mining, named entity recognition, clinical language processing
National Category
Language Technology (Computational Linguistics)
Research subject
Computer and Systems Sciences
Identifiers
URN: urn:nbn:se:su:diva-95967OAI: oai:DiVA.org:su-95967DiVA: diva2:662603
Opponent
Supervisors
Available from: 2013-11-29 Created: 2013-11-07 Last updated: 2013-11-29Bibliographically approved
List of papers
1. Rule-based Entity Recognition and Coverage of SNOMED CT in Swedish Clinical Text
Open this publication in new window or tab >>Rule-based Entity Recognition and Coverage of SNOMED CT in Swedish Clinical Text
2012 (English)In: LREC 2012 8th ELRA Conference on Language Resources and Evaluation: Proceedings, European Language Resources Association (ELRA) , 2012, 1250-1257 p.Conference paper, Published paper (Refereed)
Abstract [en]

Named entity recognition of the clinical entities disorders, findings and body structures is needed for information extraction from unstructured text in health records. Clinical notes from a Swedish emergency unit were annotated and used for evaluating a rule- and terminology-based entity recognition system. This system used different preprocessing techniques for matching terms to SNOMED CT, and, one by one, four other terminologies were added. For the class body structure, the results improved with preprocessing, whereas only small improvements were shown for the classes disorder and finding. The best average results were achieved when all terminologies were used together. The entity body structure was recognised with a precision of 0.74 and a recall of 0.80, whereas lower results were achieved for disorder (precision: 0.75, recall: 0.55) and for finding (precision: 0.57, recall: 0.30). The proportion of entities containing abbreviations were higher for false negatives than for correctly recognised entities, and no entities containing more than two tokens were recognised by the system. Low recall for disorders and findings shows both that additional methods are needed for entity recognition and that there are many expressions in clinical text that are not included in SNOMED CT.

Place, publisher, year, edition, pages
European Language Resources Association (ELRA), 2012
Keyword
Electronic patient records, Swedish, SNOMED CT, named entity recognition
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-82257 (URN)000323927701056 ()978-2-9517408-7-7 (ISBN)
Conference
8th International Conference on Language Resources and Evaluation (LREC), Istanbul, Turkey, 23-25 May, 2012
Available from: 2012-11-12 Created: 2012-11-12 Last updated: 2014-11-19Bibliographically approved
2. Retrieving disorders and findings: Results using SNOMED CT and NegEx adapted for Swedish
Open this publication in new window or tab >>Retrieving disorders and findings: Results using SNOMED CT and NegEx adapted for Swedish
2011 (English)In: LOUHI 2011 Health Document Text Mining and Information Analysis 2011: Proceedings of LOUHI 2011 Third International Workshop on Health Document Text Mining and Information AnalysisBled, Slovenia, July 6, 2011. / [ed] Øystein Nytrø, Laura Slaughter, Hans Moen, 2011, 11-17 p.Conference paper, Published paper (Other academic)
Abstract [en]

Access to reliable data from electronic health records is of high importance in several key areas in patient care, biomedical research, and education. However, many of the clinical entities are negated in the patient record text. Detecting what is a negation and what is not is therefore a key to high quality text mining. In this study we used the NegEx system adapted for Swedish to investigate negated clinical entities. We applied the system to a subset of free-text entries under a heading containing the word ‘assessment’ from the Stockholm EPR corpus, containing in total 23,171,559 tokens. Specifically, the explored entities were the SNOMED CT terms having the semantic categories ‘finding’ or ‘disorder’. The study showed that the proportion of negated clinical entities was around 9%. The results thus support that negations are abundant in clinical text and hence negation detection is vital for high quality text mining in the medical domain.

Series
CEUR Workshop Proceedings, ISSN 1613-0073 ; 744
Keyword
Negation detection, Clinical text, Electronic patient records, SNOMED CT, Swedish, Negationsdetektion, Klinisk text, Elektroniska patientjournaler, SNOMED CT, Svenska
National Category
Information Science
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-62354 (URN)
Conference
Third International Workshop on Health Document Text Mining and Information AnalysisBled, Slovenia, July 6, 2011, Bled Slovenia, Collocated with AIME 2011.
Available from: 2011-09-15 Created: 2011-09-15 Last updated: 2013-11-29Bibliographically approved
3. Negation detection in Swedish clinical text: An adaption of NegEx to Swedish
Open this publication in new window or tab >>Negation detection in Swedish clinical text: An adaption of NegEx to Swedish
2011 (English)In: Journal of Biomedical Semantics, ISSN 2041-1480, E-ISSN 2041-1480, Vol. 2, no S3, 1-12 p.Article in journal (Refereed) Published
Abstract [en]

Background: Most methods for negation detection in clinical text have been developed for English text, and there is a need for evaluating the feasibility of adapting these methods to other languages. A Swedish adaption of the English rule-based negation detection system NegEx, which detects negations through the use of trigger phrases, was therefore evaluated. Results: The Swedish adaption of NegEx showed a precision of 75.2% and a recall of 81.9%, when evaluated on 558 manually classified sentences containing negation triggers, and a negative predictive value of 96.5% when evaluated on 342 sentences not containing negation triggers. Conclusions: The precision was significantly lower for the Swedish adaptation than published results for the English version, but since many negated propositions were identified through a limited set of trigger phrases, it could nevertheless be concluded that the same trigger phrase approach is possible in a Swedish context, even though it needs to be further developed. Availability: The triggers used for the evaluation of the Swedish adaption of NegEx are available at http://people.dsv.su.se/~mariask/resources/triggers.txt and can be used together with the original NegEx program for negation detection in Swedish clinical text.

Keyword
Negation detection, NLP, Medical informatics
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-62353 (URN)10.1186/2041-1480-2-S3-S3 (DOI)
Conference
Second Louhi Workshop on Text and Data Mining of Health Documents, Los Angeles, CA, USA, 05 June 2010
Available from: 2011-09-15 Created: 2011-09-15 Last updated: 2017-12-08Bibliographically approved
4. Creating and Evaluating a Consensus for Negated and Speculative Words in a Swedish Clinical Corpus
Open this publication in new window or tab >>Creating and Evaluating a Consensus for Negated and Speculative Words in a Swedish Clinical Corpus
2010 (English)In: Proceedings of the Workshop on Negation and Speculation in Natural Language Processing ((NeSp-NLP 2010)) / [ed] Roser Morante, Caroline Sporleder, Antwerp: University of Antwerp , 2010, 5-13 p.Conference paper, Published paper (Refereed)
Abstract [en]

In this paper we describe the creation of a consensus corpus that was obtained through combining three individual annotations of the same clinical corpus in Swedish. We used a few basic rules that were executed automatically to create the consensus. The corpus contains negation words, speculative words, uncertain expressions and certain expressions. We evaluated the consensus using it for negation and speculation cue detection. We used Stanford NER, which is based on the machine learning algorithm Conditional Random Fields for the training and detection. For comparison we also used the clinical part of the BioScope Corpus and trained it with Stanford NER. For our clinical consensus corpus in Swedish we obtained a precision of 87.9 percent and a recall of 91.7 percent for negation cues, and for English with the Bioscope Corpus we obtained a precision of 97.6 percent and a recall of 96.7 percent for negation cues.

Place, publisher, year, edition, pages
Antwerp: University of Antwerp, 2010
National Category
Information Science
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-51878 (URN)9789057282669 (ISBN)
Conference
Negation and Speculation in Natural Language Processing, NeSp-NLP 2010 NeSp-NLP 2010 Workshop, Uppsala, Sweden
Available from: 2011-01-12 Created: 2011-01-12 Last updated: 2013-11-29Bibliographically approved

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Skeppstedt, Maria
By organisation
Department of Computer and Systems Sciences
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 190 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf