Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evaluation of Dimensionality Reduction Techniques: Principal Feature Analysis in case of Text Classification Problems
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.ORCID iD: 0000-0001-7713-1381
2020 (English)In: ICCDE 2020: Proceedings of 2020 the 6th International Conference on Computing and Data Engineering, Association for Computing Machinery (ACM), 2020, p. 75-79Conference paper, Published paper (Refereed)
Abstract [en]

One of the commonly observed phenomena in text classification problems is sparsity of the generated feature set. So far, different dimensionality reduction techniques have been developed to reduce feature spaces into a convenient size that a learner algorithm can infer. Among these, Principal Component Analysis (PCA) is one of the well-established techniques which is capable of generating an undistorted view of the data. As a result, variants of the algorithm have been developed and applied in several domains, including text mining. However, PCA does not provide backward traceability to the original features once it projected the initial features to a new space. Also, it needs a relatively large computational space since it uses all features when generating the final features. These drawbacks especially pose a problem in text classification problems where high dimensionality and sparsity are common phenomena. This paper presents a modified version PCA, Principal Feature Analysis (PFA), which enables backward traceability by choosing a subset of optimal features in the original space using the same criteria PCA uses, without involving the initial features into final computation. The proposed technique is tested against benchmark corpora and produced a comparable result as PCA while maintaining traceability to the original feature space.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2020. p. 75-79
Keywords [en]
Feature extraction, Text mining, Text classification, Principal Component Analysis, Principal Feature Analysis
National Category
Computer Sciences
Research subject
Computer and Systems Sciences
Identifiers
URN: urn:nbn:se:su:diva-184114DOI: 10.1145/3379247.3379274ISBN: 978-1-4503-7673-0 (electronic)OAI: oai:DiVA.org:su-184114DiVA, id: diva2:1458026
Conference
ICCDE 2020: 2020 The 6th International Conference on Computing and Data Engineering, Sanya, China, 4-6 January, 2020
Available from: 2020-08-13 Created: 2020-08-13 Last updated: 2022-02-26Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Lindgren, Tony

Search in DiVA

By author/editor
Lindgren, Tony
By organisation
Department of Computer and Systems Sciences
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 49 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf