Learning Decision Trees from Histogram Data
2015 (English)In: 11th International Conference on Data Mining (DMIN'15), The 2015 World Congress in Computer Science, Computer Engineering, and Applied Computing , 2015Conference paper (Refereed)Text
When applying learning algorithms to histogram data, bins of such variables are normally treated as separate independent variables. However, this may lead to a loss of information as the underlying dependencies may not be fully exploited. In this paper, we adapt the standard decision tree learning algorithm to handle histogram data by proposing a novel method for partitioning examples using binned variables. Results from employing the algorithm to both synthetic and real-world data sets demonstrate that exploiting dependencies in histogram data may have positive effects on both predictive performance and model size, as measured by number of nodes in the decision tree. These gains are however associated with an increased computational cost and more complex split conditions. To address the former issue, an approximate method is proposed, which speeds up the learning process substantially while retaining the predictive performance.
Place, publisher, year, edition, pages
The 2015 World Congress in Computer Science, Computer Engineering, and Applied Computing , 2015.
Histogram Trees, Histogram Learning
Research subject Computer and Systems Sciences
IdentifiersURN: urn:nbn:se:su:diva-125140OAI: oai:DiVA.org:su-125140DiVA: diva2:891929