Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Adapted Random Survival Forest for Histograms to Analyze NOx Sensor Failure in Heavy Trucks
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
2019 (English)In: Machine Learning, Optimization, and Data Science: Proceedings / [ed] Giuseppe Nicosia, Prof. Panos Pardalos, Renato Umeton, Prof. Giovanni Giuffrida, Vincenzo Sciacca, Springer, 2019, p. 83-94Conference paper, Published paper (Refereed)
Abstract [en]

In heavy duty trucks operation, important components need to be examined regularly so that any unexpected breakdowns can be prevented. Data-driven failure prediction models can be built using operational data from a large fleet of trucks. Machine learning methods such as Random Survival Forest (RSF) can be used to generate a survival model that can predict the survival probabilities of a particular component over time. Operational data from the trucks usually have many feature variables represented as histograms. Although bins of a histogram can be considered as an independent numeric variable, dependencies among the bins might exist that could be useful and neglected when bins are treated individually. Therefore, in this article, we propose extension to the standard RSF algorithm that can handle histogram variables and use it to train survival models for a NOx sensor. The trained model is compared in terms of overall error rate with the standard RSF model where bins of a histogram are treated individually as numeric features. The experiment results shows that the adapted approach outperforms the standard approach and the feature variables considered important are ranked.

Place, publisher, year, edition, pages
Springer, 2019. p. 83-94
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 11943
Keywords [en]
Histogram survival forest, Histogram features, NOx sensor failure
National Category
Computer Systems
Research subject
Computer and Systems Sciences
Identifiers
URN: urn:nbn:se:su:diva-178506DOI: 10.1007/978-3-030-37599-7_8ISBN: 978-3-030-37598-0 (print)ISBN: 978-3-030-37599-7 (electronic)OAI: oai:DiVA.org:su-178506DiVA, id: diva2:1390149
Conference
5th International Conference, LOD 2019, Siena, Italy, September 10-13, 2019
Available from: 2020-01-31 Created: 2020-01-31 Last updated: 2022-02-26Bibliographically approved
In thesis
1. Random Forest for Histogram Data: An application in data-driven prognostic models for heavy-duty trucks
Open this publication in new window or tab >>Random Forest for Histogram Data: An application in data-driven prognostic models for heavy-duty trucks
2020 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Data mining and machine learning algorithms are trained on large datasets to find useful hidden patterns. These patterns can help to gain new insights and make accurate predictions. Usually, the training data is structured in a tabular format, where the rows represent the training instances and the columns represent the features of these instances. The feature values are usually real numbers and/or categories. As very large volumes of digital data are becoming available in many domains, the data is often summarized into manageable sizes for efficient handling. To aggregate data into histograms is one means to reduce the size of the data. However, traditional machine learning algorithms have a limited ability to learn from such data, and this thesis explores extensions of the algorithms to allow for more effective learning from histogram data.

The thesis focuses on the decision tree and random forest algorithms, which are easy to understand and implement. Although, a single decision tree may not result in the highest predictive performance, one of its benefits is that it often allows for easy interpretation. By combining many such diverse trees into a random forest, the performance can be greatly enhanced, however at the cost of reduced interpretability. By first finding out how to effectively train a single decision tree from histogram data, these findings could be carried over to building robust random forests from such data. The overarching research question for the thesis is: How can the random forest algorithm be improved to learn more effectively from histogram data, and how can the resulting models be interpreted? An experimental approach was taken, under the positivist paradigm, in order to answer the question. The thesis investigates how the standard decision tree and random forest algorithms can be adapted to make them learn more accurate models from histogram data. Experimental evaluations of the proposed changes were carried out on both real world data and synthetically generated experimental data. The real world data was taken from the automotive domain, concerning the operation and maintenance of heavy-duty trucks. Component failure prediction models were built from the operational data of a large fleet of trucks, where the information about their operation over many years have been summarized as histograms. The experimental results showed that the proposed approaches were more effective than the original algorithms, which treat bins of histograms as separate features. The thesis also contributes towards the interpretability of random forests by evaluating an interactive visual tool for assisting users to understand the reasons behind the output of the models.

Place, publisher, year, edition, pages
Stockholm: Department of Computer and Systems Sciences, Stockholm University, 2020. p. 74
Series
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 20-003
Keywords
Histogram data, random forest, NOx sensor failure, random forest interpretation
National Category
Computer Systems
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-178776 (URN)978-91-7911-024-6 (ISBN)978-91-7911-025-3 (ISBN)
Public defence
2020-03-20, Ka-Sal C (Sven-Olof Öhrvik), Electrum 1, våningsplan 2, Kistagången 16, KTH Kista, Stockholm, 10:00 (English)
Opponent
Supervisors
Note

At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 6: Accepted.

Available from: 2020-02-26 Created: 2020-02-05 Last updated: 2022-02-26Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Gurung, Ram B.

Search in DiVA

By author/editor
Gurung, Ram B.
By organisation
Department of Computer and Systems Sciences
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 162 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf