Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Regularisation of regression trees by summation of p-values
Stockholm University, Faculty of Science, Department of Mathematics.
Stockholm University, Faculty of Science, Department of Mathematics.
Stockholm University, Faculty of Science, Department of Mathematics.
Stockholm University, Faculty of Science, Department of Mathematics.
(English)Manuscript (preprint) (Other academic)
Abstract [en]

The standard procedure to decide on the complexity of a CART regression tree is to use cross-validation with the aim of obtaining a predictor that generalises well to unseen data. The randomness in the selection of folds implies that the selected CART regression tree is not a deterministic function of the data. Moreover, the cross-validation procedure may become time consuming and result in inefficient use oftraining data. We propose a simple deterministic in-sample method that can be used for stopping the growing of a CART regression tree based on node-wise statistical tests. This testing procedure is derived using a connection to change point detection, where the null hypothesis corresponds to no signal. The suggested p-value based procedure allows us to consider covariate vectors of arbitrary dimension and allows us to bound the p-value of an entire tree from above. Further, we show that the test detects a not too weak signal with a high probability, given a not too small sample size. We illustrate our methodology and the asymptotic results on both simulated and real world data. Additionally, we illustrate how the p-value based method can be used to construct a deterministic piece-wise constant auto-calibrated predictor based on a given black-box predictor.

Keywords [en]
regression trees, CART, p-value, stopping criterion, multiple testing, max statistics, auto-calibration
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:su:diva-253113DOI: 10.48550/arXiv.2505.18769OAI: oai:DiVA.org:su-253113DiVA, id: diva2:2043553
Available from: 2026-03-05 Created: 2026-03-05 Last updated: 2026-03-09Bibliographically approved
In thesis
1. Large exposure asymptotics in insurance valuation and reserving, tree regularisation and stochastic control
Open this publication in new window or tab >>Large exposure asymptotics in insurance valuation and reserving, tree regularisation and stochastic control
2026 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

This thesis investigates several topics in actuarial mathematics and applied probability, including insurance valuation and reserving, regularisation of regression trees, and stochastic optimisation in an extended dividend problem. The thesis is based on four papers. 

Paper I provides a justification of the chain ladder predictor and Mack’s estimator for the prediction error within a classical compound Poisson model under large exposure, that is, when the number of contracts tends to infinity. Although the model does not satisfy the assumptions of Mack’s distribution-free chain ladder, both the predictor and the estimator are shown to arise in the large exposure limit.

Paper II studies the valuation of liability cashflows with capital requirements in a multi-period setting. Since explicit valuation is generally infeasible and Monte Carlo methods are often computationally challenging, an explicit and easily computable valuation formula is derived. The formula is obtained as a large exposure limit under a conditional weak convergence assumption on the liability cashflows.

Paper III introduces a regularisation method for regression trees based on node-wise statistical tests. At each node, a p-value is computed using a change point test, resulting in a regularised regression tree that is a deterministic function of the training data. Unlike cross-validation, the method avoids randomness from data splitting and ensures efficient use of the full dataset.

Paper IV revisits the classical dividend problem with ruin at zero by incorporating an additional default mechanism based on cumulative occupation time in a low-surplus region. This extension reflects realistic default triggers such as regulatory pressure or liquidity stress. The problem is solved explicitly, yielding closed-form expressions for both the optimal control and the value function. 

Place, publisher, year, edition, pages
Stockholm: Department of Mathematics, Stockholm University, 2026. p. 56
Keywords
claims reserving, valuation, regression trees, optimal dividends
National Category
Probability Theory and Statistics
Research subject
Mathematical Statistics
Identifiers
urn:nbn:se:su:diva-253128 (URN)978-91-8107-534-2 (ISBN)978-91-8107-535-9 (ISBN)
Public defence
2026-05-29, Lärosal 4, Albano Hus 1, Vån 2, Albanovägen 28, Stockholm, 13:00 (English)
Opponent
Supervisors
Available from: 2026-05-06 Created: 2026-03-09 Last updated: 2026-03-24Bibliographically approved

Open Access in DiVA

fulltext(724 kB)37 downloads
File information
File name FULLTEXT01.pdfFile size 724 kBChecksum SHA-512
4df9afb32ebc6e60628892e2116b96eca3ab9cfd8f4cc1d953850fceedd5c10feca91def66f042e641d96940272c1257d22a22f40198a8c5553ee252010acb01
Type fulltextMimetype application/pdf

Other links

Publisher's full texthttps://arxiv.org/abs/2505.18769v2
By organisation
Department of Mathematics
Probability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 3670 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf