Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Mining Mozambique Health Data: The Case of Malaria: From Bayesian Incidence Risk to Incidence Case Predictions
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences. Eduardo Mondlane University (UEM). (Department of Computer and Systems Sciences (DSV))
2015 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The health sector in Mozambique is piled with data, holding records of major public health diseases, such as malaria, cholera, etc. The process of scrutinizing such a mass of health data for useful information is challenging but essential for the health authorities and professionals. Statistical learning and inferential approaches can be used to provide health decision makers with appropriate tools for disease diagnosis and assessment, where the analysis is performed using Bayesian predictive techniques and data mining. The purpose of this thesis is to investigate how predictive data mining and Bayesian regression methods can be used effectively, so as to extract useful knowledge from reported malaria health data to support decision making and management. 

In summary, effective Bayesian predictive methods based on spatial and space-time reported cases of malaria have been derived, allowing the extraction of the main risk factors for malaria. Predictive models that combine consecutive temporal connections within the analysis of the space-time variations of the disease have been found to be relevant when the explicit modeling of seasonality is not required or is even unfeasible.

Investigation of the most effective ways to derive numerical predictive models was performed using several regression predictive methods. The conclusions are that effective numerical prediction of new cases of the disease can be achieved by training support vector machines using a time-window approach for the choice of different training sets based on a number of years and reducing the time towards the test set. The best performance is obtained for a smaller time-window. Another contribution of this thesis is the determining of the importance of predictors in the prediction of the incidence of malaria, performed by adopting the permutation accuracy strategy (from the random forests method) using the test set. Also, an additional contribution relates to a significant reduction in the predictive error, which has been obtained by the employment of a sample correction bias strategy, while testing the predictive models in different regions, other than where they were initially developed.

Place, publisher, year, edition, pages
Stockholm: Department of Computer and Systems Sciences, Stockholm University , 2015. , 93 p.
Series
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 15-020
National Category
Public Health, Global Health, Social Medicine and Epidemiology
Research subject
Computer and Systems Sciences
Identifiers
URN: urn:nbn:se:su:diva-122672ISBN: 978-91-7649-304-5 (print)OAI: oai:DiVA.org:su-122672DiVA: diva2:867936
Public defence
2015-12-16, Aula NOD, NOD-huset, Borgarfjordsgtan 12, Kista, 13:00 (English)
Opponent
Supervisors
Available from: 2015-11-24 Created: 2015-11-08 Last updated: 2015-12-14Bibliographically approved
List of papers
1. Mapping malaria incidence distribution that accounts for environmental factors in Maputo Province - Mozambique
Open this publication in new window or tab >>Mapping malaria incidence distribution that accounts for environmental factors in Maputo Province - Mozambique
2010 (English)In: Malaria Journal, ISSN 1475-2875, E-ISSN 1475-2875, Vol. 9, 79Article in journal (Refereed) Published
Abstract [en]

Background: The objective was to study if an association exists between the incidence of malaria and some weather parameters in tropical Maputo province, Mozambique. Methods: A Bayesian hierarchical model to malaria count data aggregated at district level over a two years period is formulated. This model made it possible to account for spatial area variations. The model was extended to include environmental covariates temperature and rainfall. Study period was then divided into two climate conditions: rainy and dry seasons. The incidences of malaria between the two seasons were compared. Parameter estimation and inference were carried out using MCMC simulation techniques based on Poisson variation. Model comparisons are made using DIC. Results: For winter season, in 2001 the temperature covariate with estimated value of -8.88 shows no association to malaria incidence. In year 2002, the parameter estimation of the same covariate resulted in 5.498 of positive level of association. In both years rainfall covariate determines no dependency to malaria incidence. Malaria transmission is higher in wet season with both covariates positively related to malaria with posterior means 1.99 and 2.83 in year 2001. For 2002 only temperature is associated to malaria incidence with estimated value 2.23. Conclusions: The incidence of malaria in year 2001, presents an independent spatial pattern for temperature in summer and for rainfall in winter seasons respectively. In year 2002 temperature determines the spatial pattern of malaria incidence in the region. Temperature influences the model in cases where both covariates are introduced in winter and summer season. Its influence is extended to the summer model with temperature covariate only. It is reasonable to state that with the occurrence of high temperatures, malaria incidence had certainly escalated in this year.

National Category
Computer and Information Science
Research subject
Statistics; Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-49239 (URN)10.1186/1475-2875-9-79 (DOI)000276657300001 ()
Available from: 2010-12-13 Created: 2010-12-13 Last updated: 2017-12-11Bibliographically approved
2. Spatial and temporal patterns of malaria incidence in Mozambique
Open this publication in new window or tab >>Spatial and temporal patterns of malaria incidence in Mozambique
2011 (English)In: Malaria Journal, ISSN 1475-2875, E-ISSN 1475-2875, Vol. 10, 189Article in journal (Refereed) Published
Abstract [en]

Background: The objective of this study is to analyze the spatial and temporal patterns of malaria incidence as to determine the means by which climatic factors such as temperature, rainfall and humidity affect its distribution in Maputo province, Mozambique. Methods: This study presents a model of malaria that evolves in space and time in Maputo province-Mozambique, over a ten years period (1999-2008). The model incorporates malaria cases and their relation to environmental variables. Due to incompleteness of climatic data, a multiple imputation technique is employed. Additionally, the whole province is interpolated through a Gaussian process. This method overcomes the misalignment problem of environmental variables (available at meteorological stations points) and malaria cases (available as aggregates for every district - area). Markov Chain Monte Carlo (MCMC) methods are used to obtain posterior inference and Deviance Information Criteria (DIC) to perform model comparison. Results: A Bayesian model with interaction terms was found to be the best fitted model. Malaria incidence was associated to humidity and maximum temperature. Malaria risk increased with maximum temperature over 28 degrees C (relative risk (RR) of 0.0060 and 95% Bayesian credible interval (CI) of 0.00033-0.0095) and humidity (relative risk (RR) of 0.00741 and 95% Bayesian CI 0.005141-0.0093). The results would suggest that additional non-climatic factors including socio-economic status, elevation, etc. also influence malaria transmission in Mozambique. Conclusions: These results demonstrate the potential of climate predictors particularly, humidity and maximum temperature in explaining malaria incidence risk for the studied period in Maputo province. Smoothed maps obtained as monthly average of malaria incidence allowed to visualize months of initial and peak transmission. They also illustrate a variation on malaria incidence risk that might not be related to climatic factors. However, these factors are still determinant for malaria transmission and intensity in the region.

National Category
Computer and Information Science Mathematics
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-66560 (URN)10.1186/1475-2875-10-189 (DOI)000294260800001 ()
Available from: 2011-12-22 Created: 2011-12-20 Last updated: 2017-12-08Bibliographically approved
3. Comparison of infant malaria incidence in districts of Maputo province, Mozambique
Open this publication in new window or tab >>Comparison of infant malaria incidence in districts of Maputo province, Mozambique
2011 (English)In: Malaria Journal, ISSN 1475-2875, E-ISSN 1475-2875, Vol. 10, 93Article in journal (Refereed) Published
Abstract [en]

Background: Malaria is one of the principal health problems in Mozambique, representing 48% of total external consultations and 63% of paediatric hospital admissions in rural and general hospitals with 26.7% of total mortality. Plasmodium falciparum is responsible for 90% of all infections being also the species associated with most severe cases. The aim of this study was to identify zones of high malaria risk, showing their spatially and temporal pattern. Methods: Space and time Poison model for the analysis of malaria data is proposed. This model allows for the inclusion of environmental factors: rainfall, temperature and humidity as predictor variables. Modelling and inference use the fully Bayesian approach via Markov Chain Monte Carlo (MCMC) simulation techniques. The methodology is applied to analyse paediatric data arising from districts of Maputo province, Mozambique, between 2007 and 2008. Results: Malaria incidence risk is greater for children in districts of Manhica, Matola and Magude. Rainfall and humidity are significant predictors of malaria incidence. The risk increased with rainfall (relative risk - RR: .006761, 95% interval: .001874, .01304), and humidity (RR: .049, 95% interval: .03048, .06531). Malaria incidence was found to be independent of temperature. Conclusions: The model revealed a spatial and temporal pattern of malaria incidence. These patterns were found to exhibit a stable malaria transmission in most non-coastal districts. The findings may be useful for malaria control, planning and management.

National Category
Computer Science
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-68805 (URN)10.1186/1475-2875-10-93 (DOI)000290862800001 ()
Available from: 2012-01-09 Created: 2012-01-07 Last updated: 2017-12-08Bibliographically approved
4. Predicting the Incidence of Malaria Cases in Mozambique Using Regression Trees and Forests
Open this publication in new window or tab >>Predicting the Incidence of Malaria Cases in Mozambique Using Regression Trees and Forests
2013 (English)In: International Journal of Computer Science and Electronics Engineering (IJCSEE), ISSN 2320-401X, Vol. 1, no 1, 50-54 p.Article in journal (Refereed) Published
Abstract [en]

Malaria remains a significant public health concern in Mozambique with disease cases reported in almost every province. This study investigates the prediction models of the number of malaria cases in districts of Maputo province. Used data include administrative districts, malaria cases, indoor residual spray and climatic variables temperature, rainfall and humidity. Regression trees and random forest models were developed using the statistical tool R, and applied to predict the number of malaria cases during one year, based on observations from preceding years. Models were compared with respect to the mean squared error (MSE) and correlation coefficient. Indoor Residual Spray (IRS), month of January, minimal temperature and rainfall variables were found to be the most important factors when predicting the number of malaria cases, with some districts showing high malaria incidence. Additionally, by reducing the time window for what historical data to take into account, predictive performance can be increased substantially.

Keyword
malaria, regression trees, regression forests
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-86341 (URN)
Available from: 2013-01-12 Created: 2013-01-12 Last updated: 2015-11-09Bibliographically approved
5. Strengthening the Health Information System in Mozambique through Malaria Incidence Prediction
Open this publication in new window or tab >>Strengthening the Health Information System in Mozambique through Malaria Incidence Prediction
2013 (English)In: IST-Africa 2013 Conference Proceedings / [ed] Paul Cunningham, Miriam Cunningham, IEEE Computer Society, 2013, 1-7 p.Conference paper, Published paper (Refereed)
Abstract [en]

Malaria is one of the principal health problems in Mozambique, affecting mostly children. The prediction of accurate future incidence cases is crucial for the implementation of appropriate policies of intervention and disease control in order to strengthen the health system. We propose a model based on support vector machines (SVM) for predicting yearly malaria incidence cases for children 0-4 years of age in the Maputo province, Mozambique. The predictive model is trained on two years of historical malaria data in combination with climatic and malaria control factors. A grid optimization parameter tuning procedure was firstly employed to detect the best parameters and select the kernel. In order to determine the most influential factors, variable importance was calculated through estimating the impact of permuting feature values on the predictive performance. The most important malaria incidence predictors turned out to be temperature variation, followed by Matutuine (district), April (month) and Namaacha (district).

Place, publisher, year, edition, pages
IEEE Computer Society, 2013
Keyword
Prediction, Malaria Incidence, Support Vector Regression, Data Mining
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-97750 (URN)978-1-905824-38-0 (ISBN)
Conference
IST-Africa 2013, 29 - 31 May, Nairobi, Kenya
Available from: 2013-12-17 Created: 2013-12-17 Last updated: 2015-11-09Bibliographically approved
6. Comparing Support Vector Regression and Random Forests for Predicting Malaria Incidence in Mozambique
Open this publication in new window or tab >>Comparing Support Vector Regression and Random Forests for Predicting Malaria Incidence in Mozambique
2013 (English)In: 2013 International Conference on Advances in ICT for Emerging Regions (ICTer), IEEE Computer Society, 2013, 217-221 p.Conference paper, Published paper (Refereed)
Abstract [en]

Accurate prediction of malaria incidence is essentialfor the management of several activities in the ministry of health in Mozambique. This study investigates the comparison ofsupport vector machines (SVMs) and random forests (RFs) forthis purpose. A dataset with records of malaria cases covering theperiod 1999-2008 was used to evaluate predictive models on thelast year when developed from one up to nine years of historicaldata. Mean squared error (MSE) was used as performancemetric. The scheme for estimating variable importance commonlyemployed for RFs was also adopted for SVMs. SVMs developedfrom two year of historical data obtained the best predictionaccuracy. Hence, if we are interested in predicting the actualnumber of malaria cases the support vector machines modelshould be chosen. In the analysis of variable importance, IndoorResidual Spray (IRS), the districts of Manhiça and Matola andmonth of January turned out to be the most important predictorsin both the SVM and RF models.

Place, publisher, year, edition, pages
IEEE Computer Society, 2013
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-97714 (URN)10.1109/ICTer.2013.6761181 (DOI)978-1-4799-1274-2 (ISBN)
Conference
2013 International Conference on Advances in ICT for Emerging Regions (ICTer), 11-15 December 2013, Colombo (Sri Lanka)
Available from: 2013-12-17 Created: 2013-12-17 Last updated: 2015-11-09Bibliographically approved
7. Generalization of Malaria Incidence Prediction Models by Correcting Sample Selection Bias
Open this publication in new window or tab >>Generalization of Malaria Incidence Prediction Models by Correcting Sample Selection Bias
2013 (English)In: Advanced Data Mining and Applications: Proceedings, Part II / [ed] Hiroshi Motoda et al., Springer Berlin/Heidelberg, 2013, 189-200 p.Conference paper, Published paper (Refereed)
Abstract [en]

Performance measurements obtained from dividing a single sample into training and test sets, e.g. by employing cross-validation, may not give an accurate picture of the performance of any model developed from the sample, on the set of examples to which the model will be applied. Such measurements, which are due to that training and test samples are drawn according to different distributions may hence be misleading. In this study, two support vector machine models for predicting malaria incidence developed from certain regions and time periods in Mozambique are evaluated on data from novel regions and time periods, and the use of selection bias correction is investigated. It is observed that significant reductions in the predicted error can be obtained using the latter approach, strongly suggesting that techniques of this kind should be employed if test data can be expected to be drawn from some other distribution than what is the origin of the training data.

Place, publisher, year, edition, pages
Springer Berlin/Heidelberg, 2013
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 8347
Keyword
prediction, generalization, sample selection bias, malaria incidence
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-97729 (URN)10.1007/978-3-642-53917-6_17 (DOI)978-3-642-53916-9 (ISBN)978-3-642-53917-6 (ISBN)
Conference
9th International Conference, ADMA 2013, Hangzhou, China, December 14-16, 2013
Available from: 2013-12-17 Created: 2013-12-17 Last updated: 2015-11-09Bibliographically approved

Open Access in DiVA

Mining Mozambique Health Data(2515 kB)243 downloads
File information
File name FULLTEXT01.pdfFile size 2515 kBChecksum SHA-512
0b26595ead1a4926b4f0702a15e4057bb60979eee3457154fbb09e06c232a4286a4ef6577fabca94eb11f5188917693a7cfea5dd10005ce84ba202f3a06faa9b
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Zacarias, Orlando P.
By organisation
Department of Computer and Systems Sciences
Public Health, Global Health, Social Medicine and Epidemiology

Search outside of DiVA

GoogleGoogle Scholar
Total: 243 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 502 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf