Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Optimizing the Dimensionality of Clinical Term Spaces for Improved Diagnosis Coding Support
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences.
2013 (English)In: Proceedings of the 4th International Louhi Workshop on Health Document Text Mining and Information Analysis (Louhi 2013) / [ed] Hanna Suominen, NICTA , 2013Conference paper, Published paper (Refereed)
Abstract [en]

In natural language processing, dimensionality reduction is a common technique to reduce complexity that simultaneously addresses the sparseness property of language. It is also used as a means to capture some latent structure in text, such as the underlying semantics. Dimensionality reduction is an important property of the word space model, not least in random indexing, where the dimensionality is a predefined model parameter. In this paper, we demonstrate the importance of dimensionality optimization and discuss correlations between dimensionality and the size of the vocabulary. This is of particular importance in the clinical domain, where the level of noise in the text leads to a large vocabulary; it may also mitigate the effect of exploding vocabulary sizes when modeling multiword terms as single tokens. A system that automatically assigns diagnosis codes to patient record entries is shown to improve by up to 18 percentage points by manually optimizing the dimensionality.

Place, publisher, year, edition, pages
NICTA , 2013.
Keyword [en]
distributional semantics, random indexing, semantic space, dimensionality reduction, multiword terms, diagnosis codes
National Category
Information Systems
Research subject
Computer and Systems Sciences
Identifiers
URN: urn:nbn:se:su:diva-97228OAI: oai:DiVA.org:su-97228DiVA: diva2:676272
Conference
4th International Louhi Workshop on Health Document Text Mining and Information Analysis Sydney, NSW, Australia, 11-12 February 2013
Available from: 2013-12-05 Created: 2013-12-05 Last updated: 2014-01-29Bibliographically approved

Open Access in DiVA

No full text

Other links

http://nicta.com.au/__data/assets/pdf_file/0005/37661/louhi2013_submission_12.pdf

Search in DiVA

By author/editor
Henriksson, AronHassel, Martin
By organisation
Department of Computer and Systems Sciences
Information Systems

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 37 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf