TY - JOUR T1 - Toward the Estimation of Unbiased Disease Prevalence Estimates Using Administrative Health Records JF - The Journal of Rheumatology JO - J Rheumatol SP - 1549 LP - 1551 DO - 10.3899/jrheum.190484 VL - 46 IS - 12 AU - TITILOLA FALASINNU AU - JULIA F. SIMARD Y1 - 2019/12/01 UR - http://www.jrheum.org/content/46/12/1549.abstract N2 - Data are information. And what we do with that information, how we process it, and interpret it can be complicated. It should not come as a surprise that these days there is a lot of talk about “big data” – about its promise, its potential, and its pitfalls. Big data (e.g., administrative, birth certificates, claims, electronic health records, registers) are growing in size, accessibility, and application. However, repurposing data from their original use to the research environment requires careful attention. Truthfully, whether we are talking about statistical analysis of small clinical datasets or supervised learning algorithms in big datasets, some of the same principles apply. No matter what, understanding where our data come from informs our design, our analysis, and most importantly, our interpretation.There are 3 major sources of bias that determine whether inferences from a dataset are a close approximation of the truth: confounding, selection, and information. Confounding occurs when an association between 2 factors can be explained by an (often unmeasured) extraneous factor. Confounding often limits our ability to make truthful inferences about causality. Selection bias may occur when the choice of dataset limits the ability to generalize findings to the population affected by a disease. For example, using only drug claims data or hospitalization data to infer the prevalence of osteoarthritis (OA) may underestimate the condition because there may be individuals who may not need medication or have not been hospitalized in the time window evaluated. Information bias (often referred to as misclassification or … Address correspondence to J.F. Simard, Assistant Professor, Division of Epidemiology, Department of Health Research and Policy, Stanford School of Medicine, HRP Redwood Building, Room T152, 259 Campus Drive, Stanford, California 94305-5405, USA. E-mail: jsimard{at}stanford.edu ER -