Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Nearest Neighbor Classification in High Dimensions
Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.ORCID-id: 0000-0003-1100-8334
2024 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

The simple k nearest neighbor (kNN) method can be used to learn from high dimensional data such as images and microarrays without any modification to the original version of the algorithm. However, studies show that kNN's accuracy is often poor in high dimensions due to the curse of dimensionality; a large number of instances are required to maintain a given level of accuracy in high dimensions. Furthermore, distance measurements such as the Euclidean distance may be meaningless in high dimensions. As a result, dimensionality reduction could be used to assist nearest neighbor classifiers in overcoming the curse of dimensionality. Although there are success stories of employing dimensionality reduction methods, the choice of which methods to use remains an open problem. This includes understanding how they should be used to improve the effectiveness of the nearest neighbor algorithm.

The thesis examines the research question of how to learn effectively with the nearest neighbor method in high dimensions. The research question was broken into three smaller questions.  These were addressed by developing effective and efficient nearest neighbor algorithms that leveraged dimensionality reduction. The algorithm design was based on feature reduction and classification algorithms constructed using the reduced features to improve the accuracy of the nearest neighbor algorithm. Finally, forming nearest neighbor ensembles was investigated using dimensionality reduction.

A series of empirical studies were conducted to determine which dimensionality reduction techniques could be used to enhance the performance of the nearest neighbor algorithm in high dimensions. Based on the results of the initial studies, further empirical studies were conducted and they demonstrated that feature fusion and classifier fusion could be used to improve the accuracy further. Two feature and classifier fusion techniques were proposed, and the circumstances in which these techniques should be applied were examined. Furthermore, the choice of the dimensionality reduction method for feature and classifier fusion was investigated. The results indicate that feature fusion is sensitive to the selection of the dimensionality reduction method. Finally, the use of dimensionality reduction in nearest neighbor ensembles was investigated. The results demonstrate that data complexity measures such as the attribute-to-instance ratio and Fisher's discriminant ratio can be used to select the nearest neighbor ensemble depending on the data type.

sted, utgiver, år, opplag, sider
Stockholm: Department of Computer and Systems Sciences, Stockholm University , 2024. , s. 62
Serie
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 24-003
Emneord [en]
Nearest Neighbor, High-Dimensional Data, Curse of Dimensionality, Dimensionality Reduction
HSV kategori
Forskningsprogram
data- och systemvetenskap
Identifikatorer
URN: urn:nbn:se:su:diva-225627ISBN: 978-91-8014-645-6 (tryckt)ISBN: 978-91-8014-646-3 (digital)OAI: oai:DiVA.org:su-225627DiVA, id: diva2:1829796
Disputas
2024-03-05, lilla hörsalen, NOD-huset, Borgarfjordsgatan 12, Kista, 13:00 (engelsk)
Opponent
Veileder
Tilgjengelig fra: 2024-02-09 Laget: 2024-01-19 Sist oppdatert: 2024-02-02bibliografisk kontrollert
Delarbeid
1. Reducing High-Dimensional Data by Principal Component Analysis vs. Random Projection for Nearest Neighbor Classification
Åpne denne publikasjonen i ny fane eller vindu >>Reducing High-Dimensional Data by Principal Component Analysis vs. Random Projection for Nearest Neighbor Classification
2006 (engelsk)Inngår i: Proceedings of the Fifth International Conference on Machine Learning and Applications, 2006Konferansepaper, Publicerat paper (Fagfellevurdert)
Identifikatorer
urn:nbn:se:su:diva-37977 (URN)
Tilgjengelig fra: 2011-01-18 Laget: 2010-03-24 Sist oppdatert: 2024-01-19bibliografisk kontrollert
2. Classification of Microarrays with kNN: Comparison of Dimensionality Reduction Methods
Åpne denne publikasjonen i ny fane eller vindu >>Classification of Microarrays with kNN: Comparison of Dimensionality Reduction Methods
2007 (engelsk)Inngår i: Intelligent Data Engineering and Automated Learning - IDEAL 2007 / [ed] Hujun Yin, Peter Tino, Emilio Corchado, Will Byrne, Xin Yao, Berlin, Heidelberg: Springer Verlag , 2007, s. 800-809Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Dimensionality reduction can often improve the performance of the k-nearest neighbor classifier (kNN) for high-dimensional data sets, such as microarrays. The effect of the choice of dimensionality reduction method on the predictive performance of kNN for classifying microarray data is an open issue, and four common dimensionality reduction methods, Principal Component Analysis (PCA), Random Projection (RP), Partial Least Squares (PLS) and Information Gain(IG), are compared on eight microarray data sets. It is observed that all dimensionality reduction methods result in more accurate classifiers than what is obtained from using the raw attributes. Furthermore, it is observed that both PCA and PLS reach their best accuracies with fewer components than the other two methods, and that RP needs far more components than the others to outperform kNN on the non-reduced dataset. None of the dimensionality reduction methods can be concluded to generally outperform the others, although PLS is shown to be superior on all four binary classification tasks, but the main conclusion from the study is that the choice of dimensionality reduction method can be of major importance when classifying microarrays using kNN.

sted, utgiver, år, opplag, sider
Berlin, Heidelberg: Springer Verlag, 2007
Serie
Lecture Notes in Computer Science ; 4881/2007
HSV kategori
Identifikatorer
urn:nbn:se:su:diva-37828 (URN)10.1007/978-3-540-77226-2_80 (DOI)978-3-540-77225-5 (ISBN)
Konferanse
8th International Conference on Intelligent Data Engineering and Automated Learning, LNCS 4881
Tilgjengelig fra: 2010-03-23 Laget: 2010-03-23 Sist oppdatert: 2024-01-19bibliografisk kontrollert
3. Fusion of Dimensionality Reduction Methods: A Case Study in Microarray Classification
Åpne denne publikasjonen i ny fane eller vindu >>Fusion of Dimensionality Reduction Methods: A Case Study in Microarray Classification
2009 (engelsk)Inngår i: Proceedings of the 12th International Conference on Information Fusion, 2009Konferansepaper, Publicerat paper (Fagfellevurdert)
Identifikatorer
urn:nbn:se:su:diva-33439 (URN)
Tilgjengelig fra: 2009-12-23 Laget: 2009-12-23 Sist oppdatert: 2024-01-19
4. Improving Fusion of Dimensionality Reduction Methods for Nearest Neighbor Classification
Åpne denne publikasjonen i ny fane eller vindu >>Improving Fusion of Dimensionality Reduction Methods for Nearest Neighbor Classification
2009 (engelsk)Inngår i: Proceedings of the Eighth International Conference on Machine Learning and Applications, IEEE Computer Society , 2009Konferansepaper, Publicerat paper (Fagfellevurdert)
sted, utgiver, år, opplag, sider
IEEE Computer Society, 2009
HSV kategori
Forskningsprogram
data- och systemvetenskap
Identifikatorer
urn:nbn:se:su:diva-135392 (URN)978-0-7695-3926-3 (ISBN)
Tilgjengelig fra: 2016-11-08 Laget: 2016-11-08 Sist oppdatert: 2024-01-19
5. Choice of Dimensionality Reduction Methods for Feature and Classifier Fusion with Nearest Neighbor Classifiers
Åpne denne publikasjonen i ny fane eller vindu >>Choice of Dimensionality Reduction Methods for Feature and Classifier Fusion with Nearest Neighbor Classifiers
2012 (engelsk)Inngår i: 15th International Conference on Information Fusion, IEEE Computer Society Digital Library, 2012, s. 875-881Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Often high dimensional data cause problems for currently used learning algorithms in terms of efficiency and effectiveness. One solution for this problem is to apply dimensionality reduction by which the original feature set could be reduced to a small number of features while gaining improved accuracy and/or efficiency of the learning algorithm. We have investigated multiple dimensionality reduction methods for nearest neighbor classification in high dimensions. In previous studies, we have demonstrated that fusion of different outputs of dimensionality reduction methods, either by combining classifiers built on reduced features, or by combining reduced features and then applying the classifier, may yield higher accuracies than when using individual reduction methods. However, none of the previous studies have investigated what dimensionality reduction methods to choose for fusion, when outputs of multiple dimensionality reduction methods are available. Therefore, we have empirically investigated different combinations of the output of four dimensionality reduction methods on 18 medicinal chemistry datasets. The empirical investigation demonstrates that fusion of nearest neighbor classifiers obtained from multiple reduction methods in all cases outperforms the use of individual dimensionality reduction methods, while fusion of different feature subsets is quite sensitive to the choice of dimensionality reduction methods.

sted, utgiver, år, opplag, sider
IEEE Computer Society Digital Library, 2012
Emneord
machine learning, nearest neighbor classifier, dimensionality reduction, feature fusion, classifier fusion
HSV kategori
Forskningsprogram
data- och systemvetenskap
Identifikatorer
urn:nbn:se:su:diva-82219 (URN)978-1-4673-0417-7 (ISBN)978-0-9824438-4-2 (ISBN)
Konferanse
15th International Conference on Information Fusion, 9-12 July 2012, Singapore
Tilgjengelig fra: 2012-11-12 Laget: 2012-11-12 Sist oppdatert: 2024-01-19bibliografisk kontrollert
6. Random subspace and random projection nearest neighbor ensembles for high dimensional data
Åpne denne publikasjonen i ny fane eller vindu >>Random subspace and random projection nearest neighbor ensembles for high dimensional data
2022 (engelsk)Inngår i: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 191, artikkel-id 116078Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

The random subspace and the random projection methods are investigated and compared as techniques for forming ensembles of nearest neighbor classifiers in high dimensional feature spaces. The two methods have been empirically evaluated on three types of high-dimensional datasets: microarrays, chemoinformatics, and images. Experimental results on 34 datasets show that both the random subspace and the random projection method lead to improvements in predictive performance compared to using the standard nearest neighbor classifier, while the best method to use depends on the type of data considered; for the microarray and chemoinformatics datasets, random projection outperforms the random subspace method, while the opposite holds for the image datasets. An analysis using data complexity measures, such as attribute to instance ratio and Fisher’s discriminant ratio, provide some more detailed indications on what relative performance can be expected for specific datasets. The results also indicate that the resulting ensembles may be competitive with state-of-the-art ensemble classifiers; the nearest neighbor ensembles using random projection perform on par with random forests for the microarray and chemoinformatics datasets.

Emneord
Nearest neighbor ensemble, High dimensional data, Random subspace method, Random projection method
HSV kategori
Forskningsprogram
data- och systemvetenskap
Identifikatorer
urn:nbn:se:su:diva-200514 (URN)10.1016/J.ESWA.2021.116078 (DOI)000736167200011 ()
Tilgjengelig fra: 2022-01-06 Laget: 2022-01-06 Sist oppdatert: 2024-01-19bibliografisk kontrollert

Open Access i DiVA

fulltext(851 kB)95 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 851 kBChecksum SHA-512
75c9747c3b6bef09af1638f8d5d850c297b5f553866e4d835100528cef3c2aa6a66c3cf889f17478a441feeac0f41af5084af8c4ed3d70d3f16227d412fe8bcc
Type fulltextMimetype application/pdf

Søk i DiVA

Av forfatter/redaktør
Deegalla, Sampath
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 95 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

isbn
urn-nbn

Altmetric

isbn
urn-nbn
Totalt: 892 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf