British Journal of Nutrition

Horizons in Nutritional Science

Multivariate techniques and their application in nutrition: a metabolomics case study

E. Katherine Kemsleya1, Gwénaëlle Le Galla1, Jack R. Daintya1 c1, Andrew D. Watsona1, Linda J. Harveya1, Henri S. Tappa1 and Ian J. Colquhouna1

a1 Institute of Food Research, Norwich Research Park, Colney, Norwich NR4 7UA, UK


The post-genomic technologies are generating vast quantities of data but many nutritional scientists are not trained or equipped to analyse it. In high-resolution NMR spectra of urine, for example, the number and complexity of spectral features mean that computational techniques are required to interrogate and display the data in a manner intelligible to the researcher. In addition, there are often multiple underlying biological factors influencing the data and it is difficult to pinpoint which are having the most significant effect. This is especially true in nutritional studies, where small variations in diet can trigger multiple changes in gene expression and metabolite concentration. One class of computational tools that are useful for analysing this highly multivariate data include the well-known ‘whole spectrum’ methods of principal component analysis and partial least squares. In this work, we present a nutritional case study in which NMR data generated from a human dietary Cu intervention study is analysed using multivariate methods and the advantages and disadvantages of each technique are discussed. It is concluded that an alternative approach, called feature subset selection, will be important in this type of work; here we have used a genetic algorithm to identify the small peaks (arising from metabolites of low concentration) that have been altered significantly following a dietary intervention.

(Received September 05 2006)

(Revised December 11 2006)

(Accepted December 11 2006)


c1 *Corresponding author: Jack R. Dainty, fax 01603 507723, email


Abbreviations: EP1, EP2, EP3, experimental period 1, experimental period 2, experimental period 3, respectively; LDA, linear discriminant analysis; PC, principal component; PCA, principal component analysis; PLS, partial least squares