Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Classification of Microarrays with kNN: Comparison of Dimensionality Reduction Methods
Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
Stockholms universitet, Samhällsvetenskapliga fakulteten, Institutionen för data- och systemvetenskap.
2007 (Engelska)Ingår i: Intelligent Data Engineering and Automated Learning - IDEAL 2007 / [ed] Hujun Yin, Peter Tino, Emilio Corchado, Will Byrne, Xin Yao, Berlin, Heidelberg: Springer Verlag , 2007, s. 800-809Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Dimensionality reduction can often improve the performance of the k-nearest neighbor classifier (kNN) for high-dimensional data sets, such as microarrays. The effect of the choice of dimensionality reduction method on the predictive performance of kNN for classifying microarray data is an open issue, and four common dimensionality reduction methods, Principal Component Analysis (PCA), Random Projection (RP), Partial Least Squares (PLS) and Information Gain(IG), are compared on eight microarray data sets. It is observed that all dimensionality reduction methods result in more accurate classifiers than what is obtained from using the raw attributes. Furthermore, it is observed that both PCA and PLS reach their best accuracies with fewer components than the other two methods, and that RP needs far more components than the others to outperform kNN on the non-reduced dataset. None of the dimensionality reduction methods can be concluded to generally outperform the others, although PLS is shown to be superior on all four binary classification tasks, but the main conclusion from the study is that the choice of dimensionality reduction method can be of major importance when classifying microarrays using kNN.

Ort, förlag, år, upplaga, sidor
Berlin, Heidelberg: Springer Verlag , 2007. s. 800-809
Serie
Lecture Notes in Computer Science ; 4881/2007
Nationell ämneskategori
Systemvetenskap, informationssystem och informatik
Identifikatorer
URN: urn:nbn:se:su:diva-37828DOI: 10.1007/978-3-540-77226-2_80ISBN: 978-3-540-77225-5 (tryckt)OAI: oai:DiVA.org:su-37828DiVA, id: diva2:305374
Konferens
8th International Conference on Intelligent Data Engineering and Automated Learning, LNCS 4881
Tillgänglig från: 2010-03-23 Skapad: 2010-03-23 Senast uppdaterad: 2024-01-19Bibliografiskt granskad
Ingår i avhandling
1. Nearest Neighbor Classification in High Dimensions
Öppna denna publikation i ny flik eller fönster >>Nearest Neighbor Classification in High Dimensions
2024 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

The simple k nearest neighbor (kNN) method can be used to learn from high dimensional data such as images and microarrays without any modification to the original version of the algorithm. However, studies show that kNN's accuracy is often poor in high dimensions due to the curse of dimensionality; a large number of instances are required to maintain a given level of accuracy in high dimensions. Furthermore, distance measurements such as the Euclidean distance may be meaningless in high dimensions. As a result, dimensionality reduction could be used to assist nearest neighbor classifiers in overcoming the curse of dimensionality. Although there are success stories of employing dimensionality reduction methods, the choice of which methods to use remains an open problem. This includes understanding how they should be used to improve the effectiveness of the nearest neighbor algorithm.

The thesis examines the research question of how to learn effectively with the nearest neighbor method in high dimensions. The research question was broken into three smaller questions.  These were addressed by developing effective and efficient nearest neighbor algorithms that leveraged dimensionality reduction. The algorithm design was based on feature reduction and classification algorithms constructed using the reduced features to improve the accuracy of the nearest neighbor algorithm. Finally, forming nearest neighbor ensembles was investigated using dimensionality reduction.

A series of empirical studies were conducted to determine which dimensionality reduction techniques could be used to enhance the performance of the nearest neighbor algorithm in high dimensions. Based on the results of the initial studies, further empirical studies were conducted and they demonstrated that feature fusion and classifier fusion could be used to improve the accuracy further. Two feature and classifier fusion techniques were proposed, and the circumstances in which these techniques should be applied were examined. Furthermore, the choice of the dimensionality reduction method for feature and classifier fusion was investigated. The results indicate that feature fusion is sensitive to the selection of the dimensionality reduction method. Finally, the use of dimensionality reduction in nearest neighbor ensembles was investigated. The results demonstrate that data complexity measures such as the attribute-to-instance ratio and Fisher's discriminant ratio can be used to select the nearest neighbor ensemble depending on the data type.

Ort, förlag, år, upplaga, sidor
Stockholm: Department of Computer and Systems Sciences, Stockholm University, 2024. s. 62
Serie
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 24-003
Nyckelord
Nearest Neighbor, High-Dimensional Data, Curse of Dimensionality, Dimensionality Reduction
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
data- och systemvetenskap
Identifikatorer
urn:nbn:se:su:diva-225627 (URN)978-91-8014-645-6 (ISBN)978-91-8014-646-3 (ISBN)
Disputation
2024-03-05, lilla hörsalen, NOD-huset, Borgarfjordsgatan 12, Kista, 13:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2024-02-09 Skapad: 2024-01-19 Senast uppdaterad: 2024-02-02Bibliografiskt granskad

Open Access i DiVA

fulltext(360 kB)1322 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 360 kBChecksumma SHA-512
24de53533bd5599960bfad1a7a3445c7fcc6538d17c70985148eb014781e1144c514fdb9b77b58412d368bb67925edf3408e558fd18725fb3da2cd0af53c506a
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltext

Person

Deegalla, SampathBoström, Henrik

Sök vidare i DiVA

Av författaren/redaktören
Deegalla, SampathBoström, Henrik
Av organisationen
Institutionen för data- och systemvetenskap
Systemvetenskap, informationssystem och informatik

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 1404 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 519 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf