a1 Department of Health Research and Policy, Stanford University School of Medicine, Stanford, CA, USA
a2 Department of Statistics, Sequoia Hall, Stanford University, Stanford, CA, USA
a3 Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
We describe a method of working on publicly available data to estimate disease prevalence in small geographic areas using Helicobacter pylori as a model infection. Using data from the Third National Health and Nutrition Examination Survey, risk parameters for H. pylori infection were obtained by logistic regression and validated by predicting 737·5 infections in an independent cohort with 736 observed infections. The prevalence of H. pylori infection in the San Francisco Bay Area was estimated with the probabilities obtained from a predictive logistic model, using risk parameters with individual-level 1990 U.S. Census data as input. Predicted H. pylori prevalence was also compared to gastric cancer incidence obtained from the Northern California Cancer Center and showed a positive correlation with gastric cancer incidence (P<0·001, R2=0·87), and no statistically significant association with other malignancies. By exclusively using publicly available data, these methods may be applied to selected conditions with strong demographic predictors.
(Accepted September 17 2007)
(Online publication November 30 2007)