Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rule-based Entity Recognition and Coverage of SNOMED CT in Swedish Clinical Text
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
2012 (English)In: LREC 2012 8th ELRA Conference on Language Resources and Evaluation: Proceedings, European Language Resources Association (ELRA) , 2012, 1250-1257 p.Conference paper, Published paper (Refereed)
Abstract [en]

Named entity recognition of the clinical entities disorders, findings and body structures is needed for information extraction from unstructured text in health records. Clinical notes from a Swedish emergency unit were annotated and used for evaluating a rule- and terminology-based entity recognition system. This system used different preprocessing techniques for matching terms to SNOMED CT, and, one by one, four other terminologies were added. For the class body structure, the results improved with preprocessing, whereas only small improvements were shown for the classes disorder and finding. The best average results were achieved when all terminologies were used together. The entity body structure was recognised with a precision of 0.74 and a recall of 0.80, whereas lower results were achieved for disorder (precision: 0.75, recall: 0.55) and for finding (precision: 0.57, recall: 0.30). The proportion of entities containing abbreviations were higher for false negatives than for correctly recognised entities, and no entities containing more than two tokens were recognised by the system. Low recall for disorders and findings shows both that additional methods are needed for entity recognition and that there are many expressions in clinical text that are not included in SNOMED CT.

Place, publisher, year, edition, pages
European Language Resources Association (ELRA) , 2012. 1250-1257 p.
Keyword [en]
Electronic patient records, Swedish, SNOMED CT, named entity recognition
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
URN: urn:nbn:se:su:diva-82257ISI: 000323927701056ISBN: 978-2-9517408-7-7 (print)OAI: oai:DiVA.org:su-82257DiVA: diva2:567235
Conference
8th International Conference on Language Resources and Evaluation (LREC), Istanbul, Turkey, 23-25 May, 2012
Available from: 2012-11-12 Created: 2012-11-12 Last updated: 2014-11-19Bibliographically approved
In thesis
1. From Disorder to Order: Extracting clinical findings from unstructured text
Open this publication in new window or tab >>From Disorder to Order: Extracting clinical findings from unstructured text
2012 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Medical disorders and findings are examples of important information in health record text. Through developing methods for automatically extracting these entities from the health record text, the possibility of making use of the information by automatic computerised processes increases. That a disorder or finding is mentioned in the health record, however, does not necessarily imply that it has been observed in the patient, because disorders that are ruled out and findings that are not observed in the patient are also mentioned.

This licentiate thesis investigates the possibility of automatically extracting disorders and findings from Swedish health record text and the possibility of automatically determining whether these findings and disorders are negated or not.

A rule- and terminology-based system that uses several Swedish medical terminologies, including SNOMED~CT and ICD-10 for extracting disorders, findings and body structures mentioned in Swedish clinical text was constructed and evaluated. Moreover, an English rule-based system for negation detection, NegEx, was adapted to Swedish and evaluated on clinical text written in Swedish.

The evaluation showed that disorders and findings were recognised with low recall, whereas body structures were recognised with comparatively good results. The negation detection system that was adapted to Swedish achieved the same recall as the English system, but lower precision.

The evaluated systems are accurate enough to be useful in some applications, but need to be further developed, especially when it comes to recognising disorders and findings.

Place, publisher, year, edition, pages
Stockholm: Department of Computer and Systems Sciences, Stockholm University, 2012. 79 p.
Series
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 12-005
Keyword
Text mining, named entity recognition, clinical language processing
National Category
Language Technology (Computational Linguistics)
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-95967 (URN)
Opponent
Supervisors
Available from: 2013-11-29 Created: 2013-11-07 Last updated: 2013-11-29Bibliographically approved
2. Extracting Clinical Findings from Swedish Health Record Text
Open this publication in new window or tab >>Extracting Clinical Findings from Swedish Health Record Text
2014 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Information contained in the free text of health records is useful for the immediate care of patients as well as for medical knowledge creation. Advances in clinical language processing have made it possible to automatically extract this information, but most research has, until recently, been conducted on clinical text written in English. In this thesis, however, information extraction from Swedish clinical corpora is explored, particularly focusing on the extraction of clinical findings. Unlike most previous studies, Clinical Finding was divided into the two more granular sub-categories Finding (symptom/result of a medical examination) and Disorder (condition with an underlying pathological process). For detecting clinical findings mentioned in Swedish health record text, a machine learning model, trained on a corpus of manually annotated text, achieved results in line with the obtained inter-annotator agreement figures. The machine learning approach clearly outperformed an approach based on vocabulary mapping, showing that Swedish medical vocabularies are not extensive enough for the purpose of high-quality information extraction from clinical text. A rule and cue vocabulary-based approach was, however, successful for negation and uncertainty classification of detected clinical findings. Methods for facilitating expansion of medical vocabulary resources are particularly important for Swedish and other languages with less extensive vocabulary resources. The possibility of using distributional semantics, in the form of Random indexing, for semi-automatic vocabulary expansion of medical vocabularies was, therefore, evaluated. Distributional semantics does not require that terms or abbreviations are explicitly defined in the text, and it is, thereby, a method suitable for clinical corpora. Random indexing was shown useful for extending vocabularies with medical terms, as well as for extracting medical synonyms and abbreviation dictionaries.

Place, publisher, year, edition, pages
Stockholm University: Department of Computer and Systems Sciences, Stockholm University, 2014. 128 p.
Series
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 15-001
Keyword
Named entity recognition, Corpora development, Clinical text processing, Distributional semantics, Random indexing, Vocabulary expansion, Assertion classification, Clinical text mining, Electronic health records, Swedish
National Category
Information Systems, Social aspects
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-109254 (URN)978-91-7649-054-9 (ISBN)
Public defence
2015-01-23, Lilla hörsalen, NOD-huset, Borgarfjordsgatan 12, Kista, 13:00 (English)
Opponent
Supervisors
Available from: 2014-12-29 Created: 2014-11-17 Last updated: 2014-11-21Bibliographically approved

Open Access in DiVA

No full text

Other links

http://www.lrec-conf.org/proceedings/lrec2012/pdf/521_Paper.pdf

Search in DiVA

By author/editor
Skeppstedt, MariaKvist, MariaDalianis, Hercules
By organisation
Department of Computer and Systems Sciences
Information Systems

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 69 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf