British Journal of Nutrition

Full Papers

Dietary Surveys and Nutritional Epidemiology

Principal components analysis of diet and alternatives for identifying the combination of foods that are associated with the risk of disease: a simulation study

Ioannis Bakolisa1 c1, Peter Burneya2 and Richard Hoopera3

a1 Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, Norfolk Place, London W2 1PG, UK

a2 Respiratory Epidemiology and Public Health Group, Imperial College, National Heart and Lung Institute, Emmanuel Kaye Building, Manresa Road, London SW3 6LR, UK

a3 Centre for Primary Care and Public Health, Blizard Institute, Barts and The London School of Medicine and Dentistry, Abernethy Building, 2 Newark Street, Whitechapel, London E1 2AT, UK


Dietary patterns derived empirically using principal components analysis (PCA) are widely employed for investigating diet–disease relationships. In the present study, we investigated whether PCA performed better at identifying such associations than an analysis of each food on a FFQ separately, referred to here as an exhaustive single food analysis (ESFA). Data on diet and disease were simulated using real FFQ data and by assuming a number of food intakes in combination that were associated with the risk of disease. In each simulation, ESFA and PCA were employed to identify the combinations of foods that are associated with the risk of disease using logistic regression, allowing for multiple testing and adjusting for energy intake. ESFA was also separately adjusted for principal components of diet, foods that were significant in the unadjusted ESFA and propensity scores. For each method, we investigated the power with which an association between diet and disease could be identified, and the power and false discovery rate (FDR) for identifying the specific combination of food intakes. In some scenarios, ESFA had greater power to detect a diet–disease association than PCA. ESFA also typically had a greater power and a lower FDR for identifying the combinations of food intakes that are associated with the risk of disease. The FDR of both methods increased with increasing sample size, but when ESFA was adjusted for foods that were significant in the unadjusted ESFA, FDR were controlled at the desired level. These results question the widespread use of PCA in nutritional epidemiology. The adjusted ESFA identifies the combinations of foods that are causally linked to the risk of disease with low FDR and surprisingly good power.

(Received February 05 2013)

(Revised December 06 2013)

(Accepted January 13 2014)

(Online publication April 11 2014)

Key Words:

  • Principal components analysis;
  • Dietary patterns, Nutritional epidemiology;
  • Logistic regression;
  • Monte Carlo simulation


c1 Corresponding author: I. Bakolis, email


  Abbreviations: ESFA, exhaustive single food analysis; FDR, false discovery rate; PCA, principal components analysis