Corpus-Driven Terminology Development: Populating Swedish SNOMED CT with Synonyms Extracted from Electronic Health Records
2013 (English)In: Proceedings of the 2013 Workshop on Biomedical Natural Language Processing (BioNLP 2013), Association for Computational Linguistics, 2013, 36-44 p.Conference paper (Refereed)
The various ways in which one can refer to the same clinical concept needs to be accounted for in a semantic resource such as SNOMED CT. Developing terminological resources manually is, however, prohibitively expensive and likely to result in low coverage, especially given the high variability of language use in clinical text. To support this process, distributional methods can be employed in conjunction with a large corpus of electronic health records to extract synonym candidates for clinical terms. In this paper, we exemplify the potential of our proposed method using the Swedish version of SNOMED CT, which currently lacks synonyms. A medical expert inspects two thousand term pairs generated by two semantic spaces -- one of which models multiword terms in addition to single words -- for one hundred preferred terms of the semantic types disorder and finding.
Place, publisher, year, edition, pages
Association for Computational Linguistics, 2013. 36-44 p.
electronic health records, distributional semantics, terminologies
Research subject Computer and Systems Sciences
IdentifiersURN: urn:nbn:se:su:diva-95585ISBN: 978-1-937284-54-1OAI: oai:DiVA.org:su-95585DiVA: diva2:660893
2013 Workshop on Biomedical Natural Language Processing (BioNLP 2013), Sofia, Bulgaria, August 4-9, 2013