Representing descriptors derived from multiple conformations as uncertain features for machine learning
2013 (English)In: Journal of Molecular Modeling, ISSN 1610-2940, E-ISSN 0948-5023, Vol. 19, no 6, 2679-2685 p.Article in journal (Refereed) Published
Uncertainty was introduced into the chemical descriptors of 11 datasets by conformational analysis in order to incorporate three-dimensional information and to investigate the resulting predictive performance of a state-of-the-art machine learning method, random forests, for binary classification tasks. A number of strategies for handling uncertainty in random forests were evaluated. The study showed that when incorporating three-dimensional information as uncertainty into chemical descriptors, the use of uniform probability distributions over the range of possible values, in conjunction with fractional distribution of compounds clearly outperforms the use of normal distributions as well as sampling from both normal and uniform distributions. The main conclusion of this study is that, even when distributions of uncertain values are provided, the random forest method can generate models that are almost as accurate from the expected values of these distributions alone. Hence, there seems to be little advantage to using the more elaborate methods of incorporating uncertainty in chemical descriptors when using random forests rather than replacing the distributions with single-point values. The results also show that random forest models with similar performances can also be generated using three-dimensional descriptor information derived from single (lowest-energy or Corina-derived) conformations.
Place, publisher, year, edition, pages
2013. Vol. 19, no 6, 2679-2685 p.
Machine learning, Random forests, Conformational analysis, Uncertainty, Binary classification
Biochemistry and Molecular Biology Chemical Sciences Computer and Information Science
IdentifiersURN: urn:nbn:se:su:diva-91931DOI: 10.1007/s00894-013-1806-zISI: 000319362500052OAI: oai:DiVA.org:su-91931DiVA: diva2:636605