Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Random subspace and random projection nearest neighbor ensembles for high dimensional data
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences. University of Peradeniya, Sri Lanka.ORCID iD: 0000-0003-1100-8334
University of Peradeniya, Sri Lanka.
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.ORCID iD: 0000-0002-4632-4815
KTH Royal Institute of Technology, Sweden.
2022 (English)In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 191, article id 116078Article in journal (Refereed) Published
Abstract [en]

The random subspace and the random projection methods are investigated and compared as techniques for forming ensembles of nearest neighbor classifiers in high dimensional feature spaces. The two methods have been empirically evaluated on three types of high-dimensional datasets: microarrays, chemoinformatics, and images. Experimental results on 34 datasets show that both the random subspace and the random projection method lead to improvements in predictive performance compared to using the standard nearest neighbor classifier, while the best method to use depends on the type of data considered; for the microarray and chemoinformatics datasets, random projection outperforms the random subspace method, while the opposite holds for the image datasets. An analysis using data complexity measures, such as attribute to instance ratio and Fisher’s discriminant ratio, provide some more detailed indications on what relative performance can be expected for specific datasets. The results also indicate that the resulting ensembles may be competitive with state-of-the-art ensemble classifiers; the nearest neighbor ensembles using random projection perform on par with random forests for the microarray and chemoinformatics datasets.

Place, publisher, year, edition, pages
2022. Vol. 191, article id 116078
Keywords [en]
Nearest neighbor ensemble, High dimensional data, Random subspace method, Random projection method
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
URN: urn:nbn:se:su:diva-200514DOI: 10.1016/J.ESWA.2021.116078ISI: 000736167200011OAI: oai:DiVA.org:su-200514DiVA, id: diva2:1625293
Available from: 2022-01-06 Created: 2022-01-06 Last updated: 2024-01-19Bibliographically approved
In thesis
1. Nearest Neighbor Classification in High Dimensions
Open this publication in new window or tab >>Nearest Neighbor Classification in High Dimensions
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The simple k nearest neighbor (kNN) method can be used to learn from high dimensional data such as images and microarrays without any modification to the original version of the algorithm. However, studies show that kNN's accuracy is often poor in high dimensions due to the curse of dimensionality; a large number of instances are required to maintain a given level of accuracy in high dimensions. Furthermore, distance measurements such as the Euclidean distance may be meaningless in high dimensions. As a result, dimensionality reduction could be used to assist nearest neighbor classifiers in overcoming the curse of dimensionality. Although there are success stories of employing dimensionality reduction methods, the choice of which methods to use remains an open problem. This includes understanding how they should be used to improve the effectiveness of the nearest neighbor algorithm.

The thesis examines the research question of how to learn effectively with the nearest neighbor method in high dimensions. The research question was broken into three smaller questions.  These were addressed by developing effective and efficient nearest neighbor algorithms that leveraged dimensionality reduction. The algorithm design was based on feature reduction and classification algorithms constructed using the reduced features to improve the accuracy of the nearest neighbor algorithm. Finally, forming nearest neighbor ensembles was investigated using dimensionality reduction.

A series of empirical studies were conducted to determine which dimensionality reduction techniques could be used to enhance the performance of the nearest neighbor algorithm in high dimensions. Based on the results of the initial studies, further empirical studies were conducted and they demonstrated that feature fusion and classifier fusion could be used to improve the accuracy further. Two feature and classifier fusion techniques were proposed, and the circumstances in which these techniques should be applied were examined. Furthermore, the choice of the dimensionality reduction method for feature and classifier fusion was investigated. The results indicate that feature fusion is sensitive to the selection of the dimensionality reduction method. Finally, the use of dimensionality reduction in nearest neighbor ensembles was investigated. The results demonstrate that data complexity measures such as the attribute-to-instance ratio and Fisher's discriminant ratio can be used to select the nearest neighbor ensemble depending on the data type.

Place, publisher, year, edition, pages
Stockholm: Department of Computer and Systems Sciences, Stockholm University, 2024. p. 62
Series
Report Series / Department of Computer & Systems Sciences, ISSN 1101-8526 ; 24-003
Keywords
Nearest Neighbor, High-Dimensional Data, Curse of Dimensionality, Dimensionality Reduction
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
urn:nbn:se:su:diva-225627 (URN)978-91-8014-645-6 (ISBN)978-91-8014-646-3 (ISBN)
Public defence
2024-03-05, lilla hörsalen, NOD-huset, Borgarfjordsgatan 12, Kista, 13:00 (English)
Opponent
Supervisors
Available from: 2024-02-09 Created: 2024-01-19 Last updated: 2024-02-02Bibliographically approved

Open Access in DiVA

fulltext(1229 kB)433 downloads
File information
File name FULLTEXT01.pdfFile size 1229 kBChecksum SHA-512
d45206bff36050661bcaef5c0b90d8ffb41ad0e78914b2b1ffb2b9fd20665338edeb25c833deb866be73854195acc6c40a8689a55fb29d8dad36727b7ff494b4
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records

Deegalla, SampathPapapetrou, Panagiotis

Search in DiVA

By author/editor
Deegalla, SampathPapapetrou, Panagiotis
By organisation
Department of Computer and Systems Sciences
In the same journal
Expert systems with applications
Information Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 433 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 207 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf