Detecting Healthcare-Associated Infections in Electronic Health Records: Evaluation of Machine Learning and Preprocessing Techniques
2014 (English)In: Proceedings of the 6th International Symposium on Semantic Mining in Biomedicine (SMBM 2014), University of Aveiro , 2014, 3-10 p.Conference paper (Refereed)
Healthcare-associated infections (HAI) are in- fections that patients acquire in the course of medical treatment. Being a severe pub- lic health problem, detecting and monitoring HAI in healthcare documentation is an impor- tant topic to address. Research on automated systems has increased over the past years, but performance is yet to be enhanced. The dataset in this study consists of 214 records obtained from a Point-Prevalence Survey. The records are manually classified into HAI and NoHAI records. Nine different preprocess- ing steps are carried out on the data. Two learning algorithms, Random Forest (RF) and Support Vector Machines (SVM), are applied to the data. The aim is to determine which of the two algorithms is more applicable to the task and if preprocessing methods will affect the performance. RF obtains the best performance results, yielding an F1 -score of 85% and AUC of 0.85 when lemmatisation is used as a preprocessing technique. Irrespec- tive of which preprocessing method is used, RF yields higher recall values than SVM, with a statistically significant difference for all but one preprocessing method. Regarding each classifier separately, the choice of preprocess- ing method led to no statistically significant improvement in performance results.
Place, publisher, year, edition, pages
University of Aveiro , 2014. 3-10 p.
Research subject Computer and Systems Sciences
IdentifiersURN: urn:nbn:se:su:diva-108679DOI: 10.5167/uzh-98982OAI: oai:DiVA.org:su-108679DiVA: diva2:760057
Sixth International Symposium on Semantic Mining in Biomedicine (SMBM 2014), Aveiro, Portugal, October 6-7, 2014