Change search
Link to record
Permanent link

Direct link
Kiper, Busra Tas
Alternative names
Publications (4 of 4) Show all publications
Tas Kiper, B. (2024). Contemporary developments and applications of unsupervised machine learning methods. (Doctoral dissertation). Stockholm: Department of Mathematics, Stockholm University
Open this publication in new window or tab >>Contemporary developments and applications of unsupervised machine learning methods
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

This thesis presents state-of-the-art developments in the field of unsupervised learning, particularly in clustering analysis. Unsupervised learning is a branch of machine learning whose task is to discover hidden patterns and relationships in high-dimensional data without any labels. It is an important step in providing valuable insights, e.g., the existence of important discrete structures and low-dimensional features, for downstream statistical analyses as well as revealing anomalies. The achievements of this thesis detailed below advance our toolboxes in pattern recognition and anomaly detection that have potential applications in many scientific areas with unstructured and unlabelled data.

Paper I presents the application of unsupervised change point (CP) detection to molecular time series to explain the dynamics of motor proteins. Data-driven non-parametric detection of CP enables an objective identification and modelling of stepping patterns in molecular motors. Beyond CP detection, this study provides further tools to analyze molecular motors, such as the reliable extraction of reaction statistics and establishing a predictive model for the reaction rates. The methods developed and applied in this paper are applicable to time series data from a broad range of scientific fields.

Paper II proposes the Graph-based Fuzzy Density Peak Clustering (GF-DPC) method that comprises comprehensive generalizations of existing density-based clustering methods. The first generalization is employing graph-based methods to estimate densities and capture nonlinearities in the data that enhances the power of detecting clusters with arbitrary shapes. On the other hand, a fuzzy extension is formulated to provide a probabilistic framework to assign data points to clusters. Finally, the identification of cluster centers and the number of clusters is automated in terms of the fuzzy clustering validation index. Compared with other well-known fuzzy clustering methods, the superior performances of GF-DPC in discovering clusters with arbitrary shapes, densities, separations and overlapping are demonstrated using both intuitive examples and real datasets.

Paper III establishes a validation framework versatile for fuzzy clustering, termed the Shape-aware Generalized Silhouette Analysis (SAGSA), based on the silhouette index. In SAGSA, a probabilistic framework is formulated to quantify the degree of cohesion and separation for the detected fuzzy clusters. In addition, graph-based distances are employed in SAGSA to facilitate an accurate validation of nonlinear clustering structures. Most importantly, a 2-dimensional graphical tool, the cohesion-separation (CS) plot, is introduced to enable visual diagnoses of possible problems in the clustering results at the point-wise, cluster-wise and global levels regardless of the dimensionality of the dataset. Finally, we illustrate the effectiveness of SAGSA in cluster validation compared with other commonly used methods in terms of various test examples of clustering challenges, these include clusters with arbitrary shapes, imbalance sizes, overlapping, hierarchical structures, mixed with noises, etc.

Place, publisher, year, edition, pages
Stockholm: Department of Mathematics, Stockholm University, 2024. p. 35
Keywords
Clustering analysis, Fuzzy clustering, Graph-based methods, Clustering validation, Time series analysis, Change point detection
National Category
Computational Mathematics
Research subject
Computational Mathematics
Identifiers
urn:nbn:se:su:diva-231996 (URN)978-91-8014-869-6 (ISBN)978-91-8014-870-2 (ISBN)
Public defence
2024-09-13, Hörsal 2, Hus 2, Campus Albano, Albanovägen 18, Stockholm, 13:00 (English)
Opponent
Supervisors
Available from: 2024-08-21 Created: 2024-07-11 Last updated: 2024-08-13Bibliographically approved
Watanabe, R. R., Tas Kiper, B., Zarco-Zavala, M., Hara, M., Kobayashi, R., Ueno, H., . . . Noji, H. (2023). Rotary properties of hybrid F1-ATPases consisting of subunits from different species. iScience, 26(5), Article ID 106626.
Open this publication in new window or tab >>Rotary properties of hybrid F1-ATPases consisting of subunits from different species
Show others...
2023 (English)In: iScience, E-ISSN 2589-0042, Vol. 26, no 5, article id 106626Article in journal (Refereed) Published
Abstract [en]

F-1-ATPase (F-1) is an ATP-driven rotary motor protein ubiquitously found in many species as the catalytic portion of FoF1-ATP synthase. Despite the highly conserved amino acid sequence of the catalytic core subunits: alpha and beta, F-1 shows diversity in the maximum catalytic turnover rate V-max and the number of rotary steps per turn. To study the design principle of F-1, we prepared eight hybrid F(1)s composed of subunits from two of three genuine (F)1s: thermophilic Bacillus PS3 (TF1), bovine mitochondria (bMF(1)), and Paracoccus denitrificans (PdF1), differing in the V-max and the number of rotary steps. The V-max of the hybrids can be well fitted by a quadratic model highlighting the dominant roles of 0 and the couplings between alpha-beta. Although there exist no simple rules on which subunit dominantly determines the number of steps, our findings show that the stepping behavior is characterized by the combination of all subunits.

National Category
Biochemistry Molecular Biology
Identifiers
urn:nbn:se:su:diva-229678 (URN)10.1016/j.isci.2023.106626 (DOI)001001097500001 ()37192978 (PubMedID)2-s2.0-85153262159 (Scopus ID)
Available from: 2024-05-27 Created: 2024-05-27 Last updated: 2025-02-20Bibliographically approved
Tas Kiper, B., Tavakolian, N. & Li, C.-B.Automated graph-based fuzzy density peak clustering to detect high-dimensional discrete structures of arbitrary shapes.
Open this publication in new window or tab >>Automated graph-based fuzzy density peak clustering to detect high-dimensional discrete structures of arbitrary shapes
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Density-based clustering methods are prominent clustering approaches to discover discrete structures buried in high-dimensional (HD) data in terms of density variations. Among them is the well-known Density Peak Clustering (DPC) proposed by Rodriguez and Laio (2014) that performs fairly well in detecting clusters with nonlinear shapes and varying densities. However, it has several shortcomings that it does not learn about the nonlinear shapes of the underlying HD data, is lack of a probabilistic framework to handle overlapping clusters, and is not fully automated.

Here we develop comprehensive generalizations of DPC, termed Graph-based Fuzzy Density Peak Clustering (GF-DPC), to circumvent these limitations. In GF-DPC, graph-based methods are employed to robustly estimate densities and capture nonlinearities in the HD data that enhances its power in detecting clusters with arbitrary shapes. Furthermore, a fuzzy extension is introduced that returns a probabilistic assignment of data points to the detected clusters. Finally, the identification of cluster centers and the number of clusters are automated and generalized in terms of fuzzy clustering validation index. The superior performances of GF-DPC compared to other well-known fuzzy clustering methods in discovering clusters with arbitrary shapes, densities, separations and overlapping are demonstrated using both intuitive examples and real datasets.

Keywords
Density based clustering, Fuzzy clustering, Graph distance, Automatic validation
National Category
Computational Mathematics
Identifiers
urn:nbn:se:su:diva-231993 (URN)
Available from: 2024-07-11 Created: 2024-07-11 Last updated: 2024-07-11
Tas Kiper, B. & Li, C.-B.Shape-aware generalized silhouette analysis to evaluate fuzzy clustering at the point-wise, cluster-wise and global levels.
Open this publication in new window or tab >>Shape-aware generalized silhouette analysis to evaluate fuzzy clustering at the point-wise, cluster-wise and global levels
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Validation is an essential part of clustering analysis to assess the quality of the detected patterns. One of the most well-known validation methods is the silhouette index that is only applicable to hard clustering results. In this paper, we develop a fuzzy clustering validation framework based on the silhouette index, termed Shape-aware Generalized Silhouette Analysis (SAGSA), which allows for an extensive evaluation and diagnoses of possible problems in the clustering results at the point-wise, cluster-wise and global levels.

In particular, a probabilistic framework to quantify the cohesion (compactness) and separation of the detected clusters is formulated to handle fuzzy clustering results. Furthermore, graph-based (shape-aware) distances are employed to faithfully capture nonlinear structures enabling an accurate validation of curved clusters. Finally, a graphical tool, cohesion-separation (CS) plot, is introduced that allows us to visually assess clustering results at different levels regardless of the dimensionality of the dataset. To show its effectiveness in diagnosing problems in clustering results, SAGSA is compared with other fuzzy clustering validation methods on test cases with different types of clustering challenges, namely, clusters with arbitrary shapes, imbalance sizes, overlapping, hierarchical structures, mixed with noises, etc.

Keywords
Clustering validation, Silhouette index, Graph distance, Fuzzy clustering
National Category
Computational Mathematics
Identifiers
urn:nbn:se:su:diva-231994 (URN)
Available from: 2024-07-11 Created: 2024-07-11 Last updated: 2024-07-11
Organisations

Search in DiVA

Show all publications