Large-scale regression-based pattern discovery: The example of screening the WHO global drug safety database
2010 (English)In: Statistical Analysis and Data Mining, ISSN 1932-1864, E-ISSN 1932-1872, Vol. 3, no 4, 197-208 p.Article in journal (Refereed) Published
Most measures of interestingness for patterns of co-occurring events are based on data projections onto contingency tables for the events of primary interest. As an alternative, this article presents the first implementation of shrinkage logistic regression for large-scale pattern discovery, with an evaluation of its usefulness in real-world binary transaction data. Regression accounts for the impact of other covariates that may confound or otherwise distort associations. The application considered is international adverse drug reaction (ADR) surveillance, in which large collections of reports on suspected ADRs are screened for interesting reporting patterns worthy of clinical follow-up. Our results show that regression-based pattern discovery does offer practical advantages. Specifically it can eliminate false positives and false negatives due to other covariates. Furthermore, it identifies some established drug safety issues earlier than a measure based on contingency tables. While regression offers clear conceptual advantages, our results suggest that methods based on contingency tables will continue to play a key role in ADR surveillance, for two reasons: the failure of regression to identify some established drug safety concerns as early as the currently used measures, and the relative lack of transparency of the procedure to estimate the regression coefficients. This suggests shrinkage regression should be used in parallel to existing measures of interestingness in ADR surveillance and other large-scale pattern discovery applications.
Place, publisher, year, edition, pages
2010. Vol. 3, no 4, 197-208 p.
shrinkage regression, lasso, confounding, masking, direct and indirect associations, adverse drug reaction surveillance, drug safety, pharmacovigilance
Research subject Computer and Systems Sciences
IdentifiersURN: urn:nbn:se:su:diva-51946DOI: 10.1002/sam.10078OAI: oai:DiVA.org:su-51946DiVA: diva2:386428