Generalization of Malaria Incidence Prediction Models by Correcting Sample Selection Bias
2013 (English)In: Advanced Data Mining and Applications: Proceedings, Part II / [ed] Hiroshi Motoda et al., Springer Berlin/Heidelberg, 2013, 189-200 p.Conference paper (Refereed)
Performance measurements obtained from dividing a single sample into training and test sets, e.g. by employing cross-validation, may not give an accurate picture of the performance of any model developed from the sample, on the set of examples to which the model will be applied. Such measurements, which are due to that training and test samples are drawn according to different distributions may hence be misleading. In this study, two support vector machine models for predicting malaria incidence developed from certain regions and time periods in Mozambique are evaluated on data from novel regions and time periods, and the use of selection bias correction is investigated. It is observed that significant reductions in the predicted error can be obtained using the latter approach, strongly suggesting that techniques of this kind should be employed if test data can be expected to be drawn from some other distribution than what is the origin of the training data.
Place, publisher, year, edition, pages
Springer Berlin/Heidelberg, 2013. 189-200 p.
, Lecture Notes in Computer Science, ISSN 0302-9743 ; 8347
prediction, generalization, sample selection bias, malaria incidence
Research subject Computer and Systems Sciences
IdentifiersURN: urn:nbn:se:su:diva-97729DOI: 10.1007/978-3-642-53917-6_17ISBN: 978-3-642-53916-9ISBN: 978-3-642-53917-6OAI: oai:DiVA.org:su-97729DiVA: diva2:679959
9th International Conference, ADMA 2013, Hangzhou, China, December 14-16, 2013