Change search
ReferencesLink to record
Permanent link

Direct link
Clustering with Confidence: Finding Clusters with Statistical Guarantees
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
2016 (English)Article in journal (Refereed) Submitted
Abstract [en]

Clustering is a widely used unsupervised learning method for finding structure in the data. However, the resulting clusters are typically presented without any guarantees on their robustness; slightly changing the used data sample or re-running a clustering algorithm involving some stochastic component may lead to completely different clusters. There is, hence, a need for techniques that can quantify the instability of the generated clusters. In this study, we propose a technique for quantifying the instability of a clustering solution and for finding robust clusters, termed core clusters, which correspond to clusters where the co-occurrence probability of each data item within a cluster is at least 1−α  . We demonstrate how solving the core clustering problem is linked to finding the largest maximal cliques in a graph. We show that the method can be used with both clustering and classification algorithms. The proposed method is tested on both simulated and real datasets. The results show that the obtained clusters indeed meet the guarantees on robustness.

Place, publisher, year, edition, pages
2016.
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
URN: urn:nbn:se:su:diva-137472OAI: oai:DiVA.org:su-137472DiVA: diva2:1062747
Available from: 2017-01-08 Created: 2017-01-08 Last updated: 2017-01-13

Open Access in DiVA

No full text

Other links

arXiv:1612.08714

Search in DiVA

By author/editor
Boström, HenrikPapapetrou, Panagiotis
By organisation
Department of Computer and Systems Sciences
Information Systems

Search outside of DiVA

GoogleGoogle Scholar

ReferencesLink to record
Permanent link

Direct link