Change search
ReferencesLink to record
Permanent link

Direct link
Representing descriptors derived from multiple conformations as uncertain features for machine learning
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
2013 (English)In: Journal of Molecular Modeling, ISSN 1610-2940, E-ISSN 0948-5023, Vol. 19, no 6, 2679-2685 p.Article in journal (Refereed) Published
Abstract [en]

Uncertainty was introduced into the chemical descriptors of 11 datasets by conformational analysis in order to incorporate three-dimensional information and to investigate the resulting predictive performance of a state-of-the-art machine learning method, random forests, for binary classification tasks. A number of strategies for handling uncertainty in random forests were evaluated. The study showed that when incorporating three-dimensional information as uncertainty into chemical descriptors, the use of uniform probability distributions over the range of possible values, in conjunction with fractional distribution of compounds clearly outperforms the use of normal distributions as well as sampling from both normal and uniform distributions. The main conclusion of this study is that, even when distributions of uncertain values are provided, the random forest method can generate models that are almost as accurate from the expected values of these distributions alone. Hence, there seems to be little advantage to using the more elaborate methods of incorporating uncertainty in chemical descriptors when using random forests rather than replacing the distributions with single-point values. The results also show that random forest models with similar performances can also be generated using three-dimensional descriptor information derived from single (lowest-energy or Corina-derived) conformations.

Place, publisher, year, edition, pages
2013. Vol. 19, no 6, 2679-2685 p.
Keyword [en]
Machine learning, Random forests, Conformational analysis, Uncertainty, Binary classification
National Category
Biochemistry and Molecular Biology Chemical Sciences Computer and Information Science
URN: urn:nbn:se:su:diva-91931DOI: 10.1007/s00894-013-1806-zISI: 000319362500052OAI: diva2:636605


Available from: 2013-07-10 Created: 2013-07-09 Last updated: 2013-11-27Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Boström, Henrik
By organisation
Department of Computer and Systems Sciences
In the same journal
Journal of Molecular Modeling
Biochemistry and Molecular BiologyChemical SciencesComputer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 12 hits
ReferencesLink to record
Permanent link

Direct link