Change search
Link to record
Permanent link

Direct link
Velupillai, SumithraORCID iD iconorcid.org/0000-0002-4178-2980
Publications (10 of 34) Show all publications
Mowery, D. L., South, B. R., Christensen, L., Leng, J., Peltonen, L.-M., Salantera, S., . . . Chapman, W. W. (2016). Normalizing acronyms and abbreviations to aid patient understanding of clinical texts: ShARe/CLEF eHealth Challenge 2013, Task 2. Journal of Biomedical Semantics, 7, Article ID 43.
Open this publication in new window or tab >>Normalizing acronyms and abbreviations to aid patient understanding of clinical texts: ShARe/CLEF eHealth Challenge 2013, Task 2
Show others...
2016 (English)In: Journal of Biomedical Semantics, E-ISSN 2041-1480, Vol. 7, article id 43Article in journal (Refereed) Published
Abstract [en]

Background: The ShARe/CLEF eHealth challenge lab aims to stimulate development of natural language processing and information retrieval technologies to aid patients in understanding their clinical reports. In clinical text, acronyms and abbreviations, also referenced as short forms, can be difficult for patients to understand. For one of three shared tasks in 2013 (Task 2), we generated a reference standard of clinical short forms normalized to the Unified Medical Language System. This reference standard can be used to improve patient understanding by linking to web sources with lay descriptions of annotated short forms or by substituting short forms with a more simplified, lay term. Methods: In this study, we evaluate 1) accuracy of participating systems' normalizing short forms compared to a majority sense baseline approach, 2) performance of participants' systems for short forms with variable majority sense distributions, and 3) report the accuracy of participating systems' normalizing shared normalized concepts between the test set and the Consumer Health Vocabulary, a vocabulary of lay medical terms. Results: The best systems submitted by the five participating teams performed with accuracies ranging from 43 to 72 %. A majority sense baseline approach achieved the second best performance. The performance of participating systems for normalizing short forms with two or more senses with low ambiguity (majority sense greater than 80 %) ranged from 52 to 78 % accuracy, with two or more senses with moderate ambiguity (majority sense between 50 and 80 %) ranged from 23 to 57 % accuracy, and with two or more senses with high ambiguity (majority sense less than 50 %) ranged from 2 to 45 % accuracy. With respect to the ShARe test set, 69 % of short form annotations contained common concept unique identifiers with the Consumer Health Vocabulary. For these 2594 possible annotations, the performance of participating systems ranged from 50 to 75 % accuracy. Conclusion: Short form normalization continues to be a challenging problem. Short form normalization systems perform with moderate to reasonable accuracies. The Consumer Health Vocabulary could enrich its knowledge base with missed concept unique identifiers from the ShARe test set to further support patient understanding of unfamiliar medical terms.

Keywords
Natural language processing, Acronyms, Abbreviations, Consumer health information, Unified Medical Language System
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:su:diva-132549 (URN)10.1186/s13326-016-0084-y (DOI)000379112500001 ()27370271 (PubMedID)
Available from: 2016-08-18 Created: 2016-08-15 Last updated: 2022-09-15Bibliographically approved
Grigonyte, G., Kvist, M., Wirén, M., Velupillai, S. & Henriksson, A. (2016). Swedification patterns of Latin and Greek affixes in clinical text. Nordic Journal of Linguistics, 39(1), 5-37
Open this publication in new window or tab >>Swedification patterns of Latin and Greek affixes in clinical text
Show others...
2016 (English)In: Nordic Journal of Linguistics, ISSN 0332-5865, E-ISSN 1502-4717, Vol. 39, no 1, p. 5-37Article in journal (Refereed) Published
Abstract [en]

Swedish medical language is rich with Latin and Greek terminology which has undergone a Swedification since the 1980s. However, many original expressions are still used by clinical professionals. The goal of this study is to obtain precise quantitative measures of how the foreign terminology is manifested in Swedish clinical text. To this end, we explore the use of Latin and Greek affixes in Swedish medical texts in three genres: clinical text, scientific medical text and online medical information for laypersons. More specifically, we use frequency lists derived from tokenised Swedish medical corpora in the three domains, and extract word pairs belonging to types that display both the original and Swedified spellings. We describe six distinct patterns explaining the variation in the usage of Latin and Greek affixes in clinical text. The results show that to a large extent affixes in clinical text are Swedified and that prefixes are used more conservatively than suffixes.

Keywords
affixes, clinical text, corpus linguistics, health records, Latin and Greek terminology
National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:su:diva-129031 (URN)10.1017/S0332586515000293 (DOI)000374241300001 ()
Available from: 2016-04-13 Created: 2016-04-13 Last updated: 2022-02-23Bibliographically approved
Velupillai, S., Weegar, R. & Kvist, M. (2016). Temporal Annotation of Swedish Intensive Care Notes. In: : . Paper presented at AMIA 2016 Annual Symposium, Chicago, USA, November 12 - 16, 2016.
Open this publication in new window or tab >>Temporal Annotation of Swedish Intensive Care Notes
2016 (English)Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

We describe the creation of a corpus of Swedish intensive care unit (ICU) notes annotated for temporal expressions. Clinical notes from an ICU in Stockholm, Sweden were used. The HeidelTime system was adapted to develop Swedish clinical time expression (TIMEX3) resources. Overall micro-average Inter-Annotator Agreement is high (86% F1). We have created Swedish lexical resources with clinically specific time expressions that will be useful for the development of a Swedish clinical text temporal reasoning system.

Keywords
annotations, temporal expresssions
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-136636 (URN)
Conference
AMIA 2016 Annual Symposium, Chicago, USA, November 12 - 16, 2016
Available from: 2016-12-12 Created: 2016-12-12 Last updated: 2022-02-28Bibliographically approved
Velupillai, S., Mowery, D. L., Abdelrahman, S., Christensen, L. & Chapman, W. W. (2015). BluLab: Temporal Information Extraction for the 2015 Clinical TempEval Challenge. In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015): . Paper presented at The 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, June 4-5, 2015 (pp. 815-819). Association for Computational Linguistics
Open this publication in new window or tab >>BluLab: Temporal Information Extraction for the 2015 Clinical TempEval Challenge
Show others...
2015 (English)In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Association for Computational Linguistics, 2015, p. 815-819Conference paper, Published paper (Refereed)
Abstract [en]

The 2015 Clinical TempEval Challenge addressed the problem of temporal reasoning in the clinical domain by providing an annotated corpus of pathology and clinical notes related to colon cancer patients. The challenge consisted of six subtasks: TIMEX3 and event span detection, TIMEX3 and event attribute classification, document relation time and narrative container relation classification. Our BluLab team participated in all six subtasks. For the TIMEX3 and event subtasks, we developed a ClearTK support vector machine pipeline using mainly simple lexical features along with information from rule-based systems. For the relation subtasks, we employed a conditional random fields classification approach, with input from a rule-based system for the narrative container relation subtask. Our team ranked first for all TIMEX3 and event subtasks, as well as for the document relation subtask.

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2015
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-119903 (URN)
Conference
The 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, Colorado, June 4-5, 2015
Available from: 2015-11-11 Created: 2015-08-28 Last updated: 2022-02-23Bibliographically approved
Dalianis, H., Henriksson, A., Kvist, M., Velupillai, S. & Weegar, R. (2015). HEALTH BANK - A Workbench for Data Science Applications in Healthcare. In: Industry Track Workshop: . Paper presented at CAiSE Industry Track, CAiSE-IT 2015 - co-located with 27th Conference on Advanced Information Systems Engineering, CAiSE 2015; Stockholm; Sweden; 11 June 2015 through ; Code 112715 -------------------------------------------------------------------------------- CAiSE Industry Track, CAiSE-IT 2015 - co-located with 27th Conference on Advanced Information Systems Engineering, CAiSE 2015; Stockholm; Sweden; 11 June 2015 through. (pp. 1-18). CEUR Workshop Proceedings, 1381
Open this publication in new window or tab >>HEALTH BANK - A Workbench for Data Science Applications in Healthcare
Show others...
2015 (English)In: Industry Track Workshop, CEUR Workshop Proceedings , 2015, Vol. 1381, p. 1-18Conference paper, Published paper (Refereed)
Abstract [en]

The enormous amounts of data that are generated in the healthcare process and stored in electronic health record (EHR) systems are an underutilized resource that, with the use of data science applica- tions, can be exploited to improve healthcare. To foster the development and use of data science applications in healthcare, there is a fundamen- tal need for access to EHR data, which is typically not readily available to researchers and developers. A relatively rare exception is the large EHR database, the Stockholm EPR Corpus, comprising data from more than two million patients, that has been been made available to a lim- ited group of researchers at Stockholm University. Here, we describe a number of data science applications that have been developed using this database, demonstrating the potential reuse of EHR data to support healthcare and public health activities, as well as facilitate medical re- search. However, in order to realize the full potential of this resource, it needs to be made available to a larger community of researchers, as well as to industry actors. To that end, we envision the provision of an in- frastructure around this database called HEALTH BANK – the Swedish Health Record Research Bank. It will function both as a workbench for the development of data science applications and as a data explo- ration tool, allowing epidemiologists, pharmacologists and other medical researchers to generate and evaluate hypotheses. Aggregated data will be fed into a pipeline for open e-access, while non-aggregated data will be provided to researchers within an ethical permission framework. We believe that HEALTH BANK has the potential to promote a growing industry around the development of data science applications that will ultimately increase the efficiency and effectiveness of healthcare.

Place, publisher, year, edition, pages
CEUR Workshop Proceedings, 2015
Series
CEUR Workshop Proceedings, ISSN 1613-0073 ; 1381
Keywords
electronic health record, data science, health intelligence, infrastructure, data mining, text mining, predictive modeling, clinical text, health bank, health record research
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-122827 (URN)
Conference
CAiSE Industry Track, CAiSE-IT 2015 - co-located with 27th Conference on Advanced Information Systems Engineering, CAiSE 2015; Stockholm; Sweden; 11 June 2015 through ; Code 112715 -------------------------------------------------------------------------------- CAiSE Industry Track, CAiSE-IT 2015 - co-located with 27th Conference on Advanced Information Systems Engineering, CAiSE 2015; Stockholm; Sweden; 11 June 2015 through.
Available from: 2015-11-11 Created: 2015-11-10 Last updated: 2022-02-23Bibliographically approved
Velupillai, S., Duneld, M., Henriksson, A., Kvist, M., Skeppstedt, M. & Dalianis, H. (Eds.). (2015). Louhi 2014: Special issue on health text mining and information analysis. Paper presented at EACL 2014 Workshop - The Fifth International Workshop on Health Text Mining and Information Analysis, Gothenburg, Sweden, April 27, 2014. London: BioMed Central
Open this publication in new window or tab >>Louhi 2014: Special issue on health text mining and information analysis
Show others...
2015 (English)Conference proceedings (editor) (Refereed)
Place, publisher, year, edition, pages
London: BioMed Central, 2015
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-119911 (URN)
Conference
EACL 2014 Workshop - The Fifth International Workshop on Health Text Mining and Information Analysis, Gothenburg, Sweden, April 27, 2014
Note

Special Issue: BMC Medical Informatics and Decision Making, ISSN 1472-6947, Volume 15, Supplement 2.

Available from: 2015-11-11 Created: 2015-08-28 Last updated: 2022-02-23Bibliographically approved
Velupillai, S., Duneld, M., Henriksson, A., Kvist, M., Skeppstedt, M. & Dalianis, H. (2015). Louhi 2014: Special issue on health text mining and information analysis: introduction. Paper presented at Louhi 2014: The Fifth International Workshop on Health Text Mining and Information Analysis, Gothenburg, Sweden, April 27, 2014. BMC Medical Informatics and Decision Making, 2(SI), 1-3
Open this publication in new window or tab >>Louhi 2014: Special issue on health text mining and information analysis: introduction
Show others...
2015 (English)In: BMC Medical Informatics and Decision Making, E-ISSN 1472-6947, Vol. 2, no SI, p. 1-3Article in journal (Refereed) Published
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-119912 (URN)10.1186/1472-6947-15-S2-S1 (DOI)000367479700001 ()
Conference
Louhi 2014: The Fifth International Workshop on Health Text Mining and Information Analysis, Gothenburg, Sweden, April 27, 2014
Available from: 2015-11-11 Created: 2015-08-28 Last updated: 2022-05-10Bibliographically approved
Velupillai, S., Mowery, D. L., South, B. R., Kvist, M. & Dalianis, H. (2015). Recent Advances in Clinical Natural Language Processing in Support of Semantic Analysis. IMIA Yearbook of Medical Informatics, 10(1), 183-193
Open this publication in new window or tab >>Recent Advances in Clinical Natural Language Processing in Support of Semantic Analysis
Show others...
2015 (English)In: IMIA Yearbook of Medical Informatics, ISSN 0943-4747, Vol. 10, no 1, p. 183-193Article in journal (Refereed) Published
Abstract [en]

Objectives

We present a review of recent advances in clinical Natural Language Processing (NLP), with a focus on semantic analysis and key subtasks that support such analysis.

Methods

We conducted a literature review of clinical NLP research from 2008 to 2014, emphasizing recent publications (2012-2014), based on PubMed and ACL proceedings as well as relevant referenced publications from the included papers.

Results

Significant articles published within this time-span were included and are discussed from the perspective of semantic analysis. Three key clinical NLP subtasks that enable such analysis were identified: 1) developing more efficient methods for corpus creation (annotation and de-identification), 2) generating building blocks for extracting meaning (morphological, syntactic, and semantic subtasks), and 3) leveraging NLP for clinical utility (NLP applications and infrastructure for clinical use cases). Finally, we provide a reflection upon most recent developments and potential areas of future NLP development and applications.

Conclusions

There has been an increase of advances within key NLP subtasks that support semantic analysis. Performance of NLP semantic analysis is, in many cases, close to that of agreement between humans. The creation and release of corpora annotated with complex semantic information models has greatly supported the development of new tools and approaches. Research on non-English languages is continuously growing. NLP methods have sometimes been successfully employed in real-world clinical tasks. However, there is still a gap between the development of advanced resources and their utilization in clinical settings. A plethora of new clinical use cases are emerging due to established health care initiatives and additional patient-generated sources through the extensive use of social media and other devices.

Keywords
Clinical Natural Language Processing, Semantics, Information Extraction, Annotation, Domain Adaptation, Review
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-122874 (URN)10.15265/IY-2015-009 (DOI)26293867 (PubMedID)
Available from: 2015-11-11 Created: 2015-11-11 Last updated: 2022-02-23Bibliographically approved
Lövestam, E., Velupillai, S. & Kvist, M. (2014). Abbreviations in Swedish Clinical Text - use by three professions. In: Christian Lovis, Brigitte Séroussi, Arie Hasman, Louise Pape-Haugaard, Osman Saka, Stig Kjær Andersen (Ed.), e-Health – For Continuity of Care: . Paper presented at 25th European Medical Informatics Conference (MIE), Istanbul,, Turkey, August 31 - September 3, 2014 (pp. 720-724). IOS Press
Open this publication in new window or tab >>Abbreviations in Swedish Clinical Text - use by three professions
2014 (English)In: e-Health – For Continuity of Care / [ed] Christian Lovis, Brigitte Séroussi, Arie Hasman, Louise Pape-Haugaard, Osman Saka, Stig Kjær Andersen, IOS Press, 2014, p. 720-724Conference paper, Published paper (Refereed)
Abstract [en]

A list of 266 abbreviations from dieticians' notes in patient records was used to extract the same abbreviations from patient records written by three professions: dieticians, nurses and physicians. A context analysis of 40 of the abbreviations showed that ambiguous meanings were common. Abbreviations used by dieticians were found to be used by other professions, but not always with the same meaning. This ambiguity of abbreviations might cause misunderstandings and put patient safety at risk.

Place, publisher, year, edition, pages
IOS Press, 2014
Series
Studies in Health Technology and Informatics, ISSN 0926-9630, E-ISSN 1879-8365 ; 205
Keywords
Abbreviations, health records, dietetic records, ambiguity
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-110946 (URN)10.3233/978-1-61499-432-9-720 (DOI)000454226100141 ()25160281 (PubMedID)978-1-61499-431-2 (ISBN)978-1-61499-432-9 (ISBN)
Conference
25th European Medical Informatics Conference (MIE), Istanbul,, Turkey, August 31 - September 3, 2014
Available from: 2014-12-19 Created: 2014-12-19 Last updated: 2022-02-23Bibliographically approved
Velupillai, S., Skeppstedt, M., Kvist, M., Mowery, D., Chapman, B. E., Dalianis, H. & Chapman, W. W. (2014). Cue-based assertion classification for Swedish clinical text-Developing a lexicon for pyConTextSwe. Artificial Intelligence in Medicine, 61(3), 137-144
Open this publication in new window or tab >>Cue-based assertion classification for Swedish clinical text-Developing a lexicon for pyConTextSwe
Show others...
2014 (English)In: Artificial Intelligence in Medicine, ISSN 0933-3657, E-ISSN 1873-2860, Vol. 61, no 3, p. 137-144Article in journal (Refereed) Published
Abstract [en]

Objective: The ability of a cue-based system to accurately assert whether a disorder is affirmed, negated, or uncertain is dependent, in part, on its cue lexicon. In this paper, we continue our study of porting an assertion system (pyConTextNLP) from English to Swedish (pyConTextSwe) by creating an optimized assertion lexicon for clinical Swedish. Methods and material: We integrated cues from four external lexicons, along with generated inflections and combinations. We used subsets of a clinical corpus in Swedish. We applied four assertion classes (definite existence, probable existence, probable negated existence and definite negated existence) and two binary classes (existence yes/no and uncertainty yes/no) to pyConTextSwe. We compared pyConTextSwe's performance with and without the added cues on a development set, and improved the lexicon further after an error analysis. On a separate evaluation set, we calculated the system's final performance. Results: Following integration steps, we added 454 cues to pyConTextSwe. The optimized lexicon developed after an error analysis resulted in statistically significant improvements on the development set (83%F-score, overall). The system's final F-scores on an evaluation set were 81% (overall). For the individual assertion classes, F-score results were 88% (definite existence), 81% (probable existence), 55% (probable negated existence), and 63% (definite negated existence). For the binary classifications existence yes/no and uncertainty yes/no, final system performance was 97%/87% and 78%/86% F-score, respectively. Conclusions: We have successfully ported pyConTextNLP to Swedish (pyConTextSwe). We have created an extensive and useful assertion lexicon for Swedish clinical text, which could form a valuable resource for similar studies, and which is publicly available.

Keywords
Assertion classification, Clinical text mining, Dictionaries, Medical Language Processing, Information extraction, Electronic health records
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-107440 (URN)10.1016/j.artmed.2014.01.001 (DOI)000340233700003 ()
Note

AuthorCount:7;

Available from: 2014-09-17 Created: 2014-09-15 Last updated: 2022-03-23Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-4178-2980

Search in DiVA

Show all publications