Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Semi-supervised medical entity recognition: A study on Spanish and Swedish clinical corpora
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
Show others and affiliations
Number of Authors: 6
2017 (English)In: Journal of Biomedical Informatics, ISSN 1532-0464, E-ISSN 1532-0480, Vol. 71, 16-30 p.Article in journal (Refereed) Published
Abstract [en]

Objective: The goal of this study is to investigate entity recognition within Electronic Health Records (EHRs) focusing on Spanish and Swedish. Of particular importance is a robust representation of the entities. In our case, we utilized unsupervised methods to generate such representations. Methods: The significance of this work stands on its experimental layout. The experiments were carried out under the same conditions for both languages. Several classification approaches were explored: maximum probability, CRF, Perceptron and SVM. The classifiers were enhanced by means of ensembles of semantic spaces and ensembles of Brown trees. In order to mitigate sparsity of data, without a significant increase in the dimension of the decision space, we propose the use of clustered approaches of the hierarchical Brown clustering represented by trees and vector quantization for each semantic space. Results: The results showed that the semi-supervised approaches significantly improved standard supervised techniques for both languages. Moreover, clustering the semantic spaces contributed to the quality of the entity recognition while keeping the dimension of the feature-space two orders of magnitude lower than when directly using the semantic spaces. Conclusions: The contributions of this study are: (a) a set of thorough experiments that enable comparisons regarding the influence of different types of features on different classifiers, exploring two languages other than English; and (b) the use of ensembles of clusters of Brown trees and semantic spaces on EHRs to tackle the problem of scarcity of available annotated data.

Place, publisher, year, edition, pages
2017. Vol. 71, 16-30 p.
Keyword [en]
Medical entity recognition, Supervised and unsupervised learning, Health records
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:su:diva-147956DOI: 10.1016/j.jbi.2017.05.009ISI: 000409394600002PubMedID: 28526460OAI: oai:DiVA.org:su-147956DiVA: diva2:1149798
Available from: 2017-10-17 Created: 2017-10-17 Last updated: 2017-10-17Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textPubMed

Search in DiVA

By author/editor
Perez, AliciaWeegar, RebeckaDalianis, Hercules
By organisation
Department of Computer and Systems Sciences
In the same journal
Journal of Biomedical Informatics
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 1 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf