Empirical evaluation of sparse classification boundaries and HC-feature thresholding in high-dimensional data
2013 (English)Report (Other academic)
The analysis of high-throughput data commonly used in modern applications poses many statistical challenges, one of which is the selection of a small subset of features that are likely to be informative for a specific project. This issue is crucial for success of supervised classification in very high-dimensional setting with sparsity patterns. In this paper, we derive an asymptotic framework that represents sparse and weak blocks model and suggest a technique for block-wise feature selection by thresholding. Our procedure extends the standard Higher Criticism (HC) thresholding to the case where dependence structure underlying the data can be taken into account and is shown to be optimally adaptive, i. e. performs well without knowledge of the sparsity and weakness parameters. We empirically investigate the detection boundary of our HC procedure and performance properties of some estimators of sparsity parameter. The relevance and benefits of our approach in high-dimensional classification is demonstrated using both simulation and real data.
Place, publisher, year, edition, pages
2013. , 37 p.
Research Report / Department of Statistics, Stockholm University, ISSN 0280-7564 ; 2013:5
Higher criticism, detection boundary, high dimensionality, supervised classification, separation strength
Probability Theory and Statistics
Research subject Statistics
IdentifiersURN: urn:nbn:se:su:diva-95263OAI: oai:DiVA.org:su-95263DiVA: diva2:659225