Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Pooling individual participant data from randomized controlled trials: Exploring potential loss of information
Show others and affiliations
Number of Authors: 142020 (English)In: PLOS ONE, E-ISSN 1932-6203, Vol. 15, no 5, article id e0232970Article in journal (Refereed) Published
Abstract [en]

Background Pooling individual participant data to enable pooled analyses is often complicated by diversity in variables across available datasets. Therefore, recoding original variables is often necessary to build a pooled dataset. We aimed to quantify how much information is lost in this process and to what extent this jeopardizes validity of analyses results. Methods Data were derived from a platform that was developed to pool data from three randomized controlled trials on the effect of treatment of cardiovascular risk factors on cognitive decline or dementia. We quantified loss of information using the R-squared of linear regression models with pooled variables as a function of their original variable(s). In case the R-squared was below 0.8, we additionally explored the potential impact of loss of information for future analyses. We did this second step by comparing whether the Beta coefficient of the predictor differed more than 10% when adding original or recoded variables as a confounder in a linear regression model. In a simulation we randomly sampled numbers, recoded those < = 1000 to 0 and those > 1000 to 1 and varied the range of the continuous variable, the ratio of recoded zeroes to recoded ones, or both, and again extracted the R-squared from linear models to quantify information loss. Results The R-squared was below 0.8 for 8 out of 91 recoded variables. In 4 cases this had a substantial impact on the regression models, particularly when a continuous variable was recoded into a discrete variable. Our simulation showed that the least information is lost when the ratio of recoded zeroes to ones is 1:1. Conclusions Large, pooled datasets provide great opportunities, justifying the efforts for data harmonization. Still, caution is warranted when using recoded variables which variance is explained limitedly by their original variables as this may jeopardize the validity of study results.

Place, publisher, year, edition, pages
2020. Vol. 15, no 5, article id e0232970
National Category
Public Health, Global Health and Social Medicine Neurology Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:su:diva-182999DOI: 10.1371/journal.pone.0232970ISI: 000537475000031PubMedID: 32396543OAI: oai:DiVA.org:su-182999DiVA, id: diva2:1451180
Available from: 2020-07-02 Created: 2020-07-02 Last updated: 2025-02-20Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPubMed

Authority records

Guillemont, Juliette

Search in DiVA

By author/editor
Guillemont, Juliette
By organisation
Aging Research Center (ARC), (together with KI)
In the same journal
PLOS ONE
Public Health, Global Health and Social MedicineNeurologyProbability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 49 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf