a1 Institute of Food Research, Norwich Research Park, Colney, Norwich NR4 7UA, UK
The post-genomic technologies are generating vast quantities of data but many nutritional scientists are not trained or equipped to analyse it. In high-resolution NMR spectra of urine, for example, the number and complexity of spectral features mean that computational techniques are required to interrogate and display the data in a manner intelligible to the researcher. In addition, there are often multiple underlying biological factors influencing the data and it is difficult to pinpoint which are having the most significant effect. This is especially true in nutritional studies, where small variations in diet can trigger multiple changes in gene expression and metabolite concentration. One class of computational tools that are useful for analysing this highly multivariate data include the well-known ‘whole spectrum’ methods of principal component analysis and partial least squares. In this work, we present a nutritional case study in which NMR data generated from a human dietary Cu intervention study is analysed using multivariate methods and the advantages and disadvantages of each technique are discussed. It is concluded that an alternative approach, called feature subset selection, will be important in this type of work; here we have used a genetic algorithm to identify the small peaks (arising from metabolites of low concentration) that have been altered significantly following a dietary intervention.
(Received September 05 2006)
(Revised December 11 2006)
(Accepted December 11 2006)
Abbreviations: EP1, EP2, EP3, experimental period 1, experimental period 2, experimental period 3, respectively; LDA, linear discriminant analysis; PC, principal component; PCA, principal component analysis; PLS, partial least squares