Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Vocabulary Expansion by Semantic Extraction of Medical Terms
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
2013 (English)In: Proceedings of the 5th International Symposiumon Languages in Biology and Medicine, 2013, 63-68 p.Conference paper, Published paper (Refereed)
Abstract [en]

Automatic methods for vocabulary expansion are valuable in supporting the development of terminological resources. Here, we evaluate two methods based on distributional semantics for extracting terms that belong to a certain semantic category. In a list of 1000 terms extracted from a corpus of Swedish medical text, the best method obtains a recall of 0.53 and 0.88, respectively, for identifying 90 terms that are known to belong to the semantic categories Medical Finding and Pharmaceutical Drug.

Place, publisher, year, edition, pages
2013. 63-68 p.
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
URN: urn:nbn:se:su:diva-98599ISBN: 978-4-9907802-0-3 (print)OAI: oai:DiVA.org:su-98599DiVA: diva2:684530
Conference
The 5th International Symposium on Languages in Biology and Medicine (LBM 2013), Tokyo, Japan, 12 - 13 December, 2013
Available from: 2014-01-08 Created: 2014-01-08 Last updated: 2014-11-19Bibliographically approved
In thesis
1. Extracting Clinical Findings from Swedish Health Record Text
Open this publication in new window or tab >>Extracting Clinical Findings from Swedish Health Record Text
2014 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Information contained in the free text of health records is useful for the immediate care of patients as well as for medical knowledge creation. Advances in clinical language processing have made it possible to automatically extract this information, but most research has, until recently, been conducted on clinical text written in English. In this thesis, however, information extraction from Swedish clinical corpora is explored, particularly focusing on the extraction of clinical findings. Unlike most previous studies, Clinical Finding was divided into the two more granular sub-categories Finding (symptom/result of a medical examination) and Disorder (condition with an underlying pathological process). For detecting clinical findings mentioned in Swedish health record text, a machine learning model, trained on a corpus of manually annotated text, achieved results in line with the obtained inter-annotator agreement figures. The machine learning approach clearly outperformed an approach based on vocabulary mapping, showing that Swedish medical vocabularies are not extensive enough for the purpose of high-quality information extraction from clinical text. A rule and cue vocabulary-based approach was, however, successful for negation and uncertainty classification of detected clinical findings. Methods for facilitating expansion of medical vocabulary resources are particularly important for Swedish and other languages with less extensive vocabulary resources. The possibility of using distributional semantics, in the form of Random indexing, for semi-automatic vocabulary expansion of medical vocabularies was, therefore, evaluated. Distributional semantics does not require that terms or abbreviations are explicitly defined in the text, and it is, thereby, a method suitable for clinical corpora. Random indexing was shown useful for extending vocabularies with medical terms, as well as for extracting medical synonyms and abbreviation dictionaries.

Place, publisher, year, edition, pages
Stockholm University: Department of Computer and Systems Sciences, Stockholm University, 2014. 128 p.
Series
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 15-001
Keyword
Named entity recognition, Corpora development, Clinical text processing, Distributional semantics, Random indexing, Vocabulary expansion, Assertion classification, Clinical text mining, Electronic health records, Swedish
National Category
Information Systems, Social aspects
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-109254 (URN)978-91-7649-054-9 (ISBN)
Public defence
2015-01-23, Lilla hörsalen, NOD-huset, Borgarfjordsgatan 12, Kista, 13:00 (English)
Opponent
Supervisors
Available from: 2014-12-29 Created: 2014-11-17 Last updated: 2014-11-21Bibliographically approved

Open Access in DiVA

No full text

Other links

http://people.dsv.su.se/~mariask/publications/lbm2013.pdf

Search in DiVA

By author/editor
Skeppstedt, MariaHenriksson, Aron
By organisation
Department of Computer and Systems Sciences
Information Systems

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 38 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf