Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
An Information Theoretic Approach to Prevalence Estimation and Missing Data
Stockholms universitet, Naturvetenskapliga fakulteten, Matematiska institutionen.ORCID-id: 0000-0003-2767-8818
Antal upphovsmän: 42024 (Engelska)Ingår i: IEEE Transactions on Information Theory, ISSN 0018-9448, E-ISSN 1557-9654, Vol. 70, nr 5, s. 3567-3582Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Many data sources, including tracking social behavior to election polling to testing studies for understanding disease spread, are subject to sampling bias whose implications are not fully yet understood. In this paper we study estimation of a given feature (such as disease, or behavior at social media platforms) from biased samples, treating non-respondent individuals as missing data. Prevalence of the feature among sampled individuals has an upward bias under the assumption of individuals’ willingness to be sampled. This can be viewed as a regression model with symptoms as covariates and the feature as outcome. It is assumed that the outcome is unknown at the time of sampling, and therefore the missingness mechanism only depends on the covariates. We show that data, in spite of this, is missing at random only when the sizes of symptom classes in the population are known; otherwise data is missing not at random. With an information theoretic viewpoint, we show that sampling bias corresponds to external information due to individuals in the population knowing their covariates, and we quantify this external information by active information. The reduction in prevalence, when sampling bias is adjusted for, similarly translates into active information due to bias correction, with opposite sign to active information due to testing bias. We develop unified results that show that prevalence and active information estimates are asymptotically normal under all missing data mechanisms, when testing errors are absent and present respectively. The asymptotic behavior of the estimators is illustrated through simulations.

Ort, förlag, år, upplaga, sidor
2024. Vol. 70, nr 5, s. 3567-3582
Nyckelord [en]
Active information, asymptotic normality, biased estimate, missing data, testing errors
Nationell ämneskategori
Sannolikhetsteori och statistik
Identifikatorer
URN: urn:nbn:se:su:diva-231614DOI: 10.1109/TIT.2023.3327399ISI: 001217153500037Scopus ID: 2-s2.0-85176319701OAI: oai:DiVA.org:su-231614DiVA, id: diva2:1887176
Tillgänglig från: 2024-08-07 Skapad: 2024-08-07 Senast uppdaterad: 2024-08-07Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Person

Hössjer, Ola

Sök vidare i DiVA

Av författaren/redaktören
Hössjer, Ola
Av organisationen
Matematiska institutionen
I samma tidskrift
IEEE Transactions on Information Theory
Sannolikhetsteori och statistik

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 25 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf