Epidemiology and Infection

Original Papers

Estimating disease prevalence using census data

M. CHOYa1, P. SWITZERa2, C. De MARTELa1a3 and J. PARSONNETa1a3 c1

a1 Department of Health Research and Policy, Stanford University School of Medicine, Stanford, CA, USA

a2 Department of Statistics, Sequoia Hall, Stanford University, Stanford, CA, USA

a3 Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA


We describe a method of working on publicly available data to estimate disease prevalence in small geographic areas using Helicobacter pylori as a model infection. Using data from the Third National Health and Nutrition Examination Survey, risk parameters for H. pylori infection were obtained by logistic regression and validated by predicting 737·5 infections in an independent cohort with 736 observed infections. The prevalence of H. pylori infection in the San Francisco Bay Area was estimated with the probabilities obtained from a predictive logistic model, using risk parameters with individual-level 1990 U.S. Census data as input. Predicted H. pylori prevalence was also compared to gastric cancer incidence obtained from the Northern California Cancer Center and showed a positive correlation with gastric cancer incidence (P<0·001, R2=0·87), and no statistically significant association with other malignancies. By exclusively using publicly available data, these methods may be applied to selected conditions with strong demographic predictors.

(Accepted September 17 2007)

(Online publication November 30 2007)


c1 Author for correspondence: J. Parsonnet, M.D., Stanford University, 300 Pasteur Dr., Grant Bldg, S-169, Stanford, CA 94305-5107, USA. (Email: parsonnt@stanford.edu)