Robustness of prevalence estimates derived from misclassified data from administrative databases

Martin Ladouceur; Elham Rahme; Christian A Pineau; Lawrence Joseph

doi:10.1111/j.1541-0420.2006.00665.x

Robustness of prevalence estimates derived from misclassified data from administrative databases

Biometrics. 2007 Mar;63(1):272-9. doi: 10.1111/j.1541-0420.2006.00665.x.

Authors

Martin Ladouceur¹, Elham Rahme, Christian A Pineau, Lawrence Joseph

Affiliation

¹ Division of Clinical Epidemiology, Montreal General Hospital, 687 Pine Avenue West, V-Building, Montreal, Quebec H3A 1A1, Canada.

PMID: 17447953
DOI: 10.1111/j.1541-0420.2006.00665.x

Abstract

Because primary data collection can be expensive, researchers are increasingly using information collected in medical administrative databases for scientific purposes. This information, however, is typically collected for reasons other than research, and many such databases have been shown to contain substantial proportions of misclassification errors. For example, many administrative databases contain fields for patient diagnostic codes, but these are often missing or inaccurate, in part because physician reimbursement schemes depend on medical acts performed rather than any diagnosis. Errors in ascertaining which individuals have a given disease bias not only prevalence estimates, but also estimates of associations between the disease and other variables, such as medication use. We attempt to estimate the prevalence of osteoarthritis (OA) among elderly Quebeckers using a government administrative database. We compare a naive estimate relying solely on the physician diagnoses of OA listed in the database to estimates from several different Bayesian latent class models which adjust for misclassified physician diagnostic codes via use of other available diagnostic clues. We find that the prevalence estimates vary widely, depending on the model used and assumptions made. We conclude that any inferences from these databases need to be interpreted with great caution, until further work estimating the reliability of database items is carried out.

MeSH terms

Bayes Theorem
Databases, Factual / statistics & numerical data*
Diagnostic Errors / statistics & numerical data*
Humans
Prevalence
Probability
Radiography / standards
Reproducibility of Results
Sensitivity and Specificity