Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Predicting NOx sensor failure in heavy duty trucks using histogram-based random forests
Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
2017 (Engelska)Ingår i: International Journal of Prognostics and Health Management, E-ISSN 2153-2648, Vol. 8, nr 1, artikel-id 008Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Being able to accurately predict the impending failures of truck components is often associated with significant amount of cost savings, customer satisfaction and flexibility in maintenance service plans. However, because of the diversity in the way trucks typically are configured and their usage under different conditions, the creation of accurate prediction models is not an easy task. This paper describes an effort in creating such a prediction model for the NOx sensor, i.e., a component measuring the emitted level of nitrogen oxide in the exhaust of the engine. This component was chosen because it is vital for the truck to function properly, while at the same time being very fragile and costly to repair. As input to the model, technical specifications of trucks and their operational data are used. The process of collecting the data and making it ready for training the model via a slightly modified Random Forest learning algorithm is described along with various challenges encountered during this process. The operational data consists of features represented as histograms, posing an additional challenge for the data analysis task. In the study, a modified version of the random forest algorithm is employed, which exploits the fact that the individual bins in the histograms are related, in contrast to the standard approach that would consider the bins as independent features. Experiments are conducted using the updated random forest algorithm, and they clearly show that the modified version is indeed beneficial when compared to the standard random forest algorithm. The performance of the resulting prediction model for the NOx sensor is promising and may be adopted for the benefit of operators of heavy trucks.

Ort, förlag, år, upplaga, sidor
2017. Vol. 8, nr 1, artikel-id 008
Nyckelord [en]
Histogram Features, NOx sensor prognostics, Histogram-based random forest
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
data- och systemvetenskap
Identifikatorer
URN: urn:nbn:se:su:diva-149432DOI: 10.36001/ijphm.2017.v8i1.2535OAI: oai:DiVA.org:su-149432DiVA, id: diva2:1161600
Tillgänglig från: 2017-11-30 Skapad: 2017-11-30 Senast uppdaterad: 2023-07-24Bibliografiskt granskad
Ingår i avhandling
1. Random Forest for Histogram Data: An application in data-driven prognostic models for heavy-duty trucks
Öppna denna publikation i ny flik eller fönster >>Random Forest for Histogram Data: An application in data-driven prognostic models for heavy-duty trucks
2020 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Data mining and machine learning algorithms are trained on large datasets to find useful hidden patterns. These patterns can help to gain new insights and make accurate predictions. Usually, the training data is structured in a tabular format, where the rows represent the training instances and the columns represent the features of these instances. The feature values are usually real numbers and/or categories. As very large volumes of digital data are becoming available in many domains, the data is often summarized into manageable sizes for efficient handling. To aggregate data into histograms is one means to reduce the size of the data. However, traditional machine learning algorithms have a limited ability to learn from such data, and this thesis explores extensions of the algorithms to allow for more effective learning from histogram data.

The thesis focuses on the decision tree and random forest algorithms, which are easy to understand and implement. Although, a single decision tree may not result in the highest predictive performance, one of its benefits is that it often allows for easy interpretation. By combining many such diverse trees into a random forest, the performance can be greatly enhanced, however at the cost of reduced interpretability. By first finding out how to effectively train a single decision tree from histogram data, these findings could be carried over to building robust random forests from such data. The overarching research question for the thesis is: How can the random forest algorithm be improved to learn more effectively from histogram data, and how can the resulting models be interpreted? An experimental approach was taken, under the positivist paradigm, in order to answer the question. The thesis investigates how the standard decision tree and random forest algorithms can be adapted to make them learn more accurate models from histogram data. Experimental evaluations of the proposed changes were carried out on both real world data and synthetically generated experimental data. The real world data was taken from the automotive domain, concerning the operation and maintenance of heavy-duty trucks. Component failure prediction models were built from the operational data of a large fleet of trucks, where the information about their operation over many years have been summarized as histograms. The experimental results showed that the proposed approaches were more effective than the original algorithms, which treat bins of histograms as separate features. The thesis also contributes towards the interpretability of random forests by evaluating an interactive visual tool for assisting users to understand the reasons behind the output of the models.

Ort, förlag, år, upplaga, sidor
Stockholm: Department of Computer and Systems Sciences, Stockholm University, 2020. s. 74
Serie
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 20-003
Nyckelord
Histogram data, random forest, NOx sensor failure, random forest interpretation
Nationell ämneskategori
Datorsystem
Forskningsämne
data- och systemvetenskap
Identifikatorer
urn:nbn:se:su:diva-178776 (URN)978-91-7911-024-6 (ISBN)978-91-7911-025-3 (ISBN)
Disputation
2020-03-20, Ka-Sal C (Sven-Olof Öhrvik), Electrum 1, våningsplan 2, Kistagången 16, KTH Kista, Stockholm, 10:00 (Engelska)
Opponent
Handledare
Anmärkning

At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 6: Accepted.

Tillgänglig från: 2020-02-26 Skapad: 2020-02-05 Senast uppdaterad: 2022-02-26Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltext

Person

Gurung, Ram B.Lindgren, TonyBoström, Henrik

Sök vidare i DiVA

Av författaren/redaktören
Gurung, Ram B.Lindgren, TonyBoström, Henrik
Av organisationen
Institutionen för data- och systemvetenskap
I samma tidskrift
International Journal of Prognostics and Health Management
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 224 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf