Cue-based assertion classification for Swedish clinical text-Developing a lexicon for pyConTextSwe
2014 (English)In: Artificial Intelligence in Medicine, ISSN 0933-3657, E-ISSN 1873-2860, Vol. 61, no 3, 137-144 p.Article in journal (Refereed) Published
Objective: The ability of a cue-based system to accurately assert whether a disorder is affirmed, negated, or uncertain is dependent, in part, on its cue lexicon. In this paper, we continue our study of porting an assertion system (pyConTextNLP) from English to Swedish (pyConTextSwe) by creating an optimized assertion lexicon for clinical Swedish. Methods and material: We integrated cues from four external lexicons, along with generated inflections and combinations. We used subsets of a clinical corpus in Swedish. We applied four assertion classes (definite existence, probable existence, probable negated existence and definite negated existence) and two binary classes (existence yes/no and uncertainty yes/no) to pyConTextSwe. We compared pyConTextSwe's performance with and without the added cues on a development set, and improved the lexicon further after an error analysis. On a separate evaluation set, we calculated the system's final performance. Results: Following integration steps, we added 454 cues to pyConTextSwe. The optimized lexicon developed after an error analysis resulted in statistically significant improvements on the development set (83%F-score, overall). The system's final F-scores on an evaluation set were 81% (overall). For the individual assertion classes, F-score results were 88% (definite existence), 81% (probable existence), 55% (probable negated existence), and 63% (definite negated existence). For the binary classifications existence yes/no and uncertainty yes/no, final system performance was 97%/87% and 78%/86% F-score, respectively. Conclusions: We have successfully ported pyConTextNLP to Swedish (pyConTextSwe). We have created an extensive and useful assertion lexicon for Swedish clinical text, which could form a valuable resource for similar studies, and which is publicly available.
Place, publisher, year, edition, pages
2014. Vol. 61, no 3, 137-144 p.
Assertion classification, Clinical text mining, Dictionaries, Medical Language Processing, Information extraction, Electronic health records
Research subject Computer and Systems Sciences
IdentifiersURN: urn:nbn:se:su:diva-107440DOI: 10.1016/j.artmed.2014.01.001ISI: 000340233700003OAI: oai:DiVA.org:su-107440DiVA: diva2:747720