Genetics Research

Paper

LASSO with cross-validation for genomic selection

M. GRAZIANO USAIa1 c1, MIKE E. GODDARDa2a3 and BEN J. HAYESa3

a1 Settore Genetica e Biotecnologie, AGRIS-Sardegna, Olmedo 07040, Italy

a2 Faculty of Land and Food Resources, University of Melbourne, Parkville 3010, Australia

a3 Biosciences Research Division, Department of Primary Industries Victoria, 1 Park Drive, Bundoora 3083, Australia

Summary

We used a least absolute shrinkage and selection operator (LASSO) approach to estimate marker effects for genomic selection. The least angle regression (LARS) algorithm and cross-validation were used to define the best subset of markers to include in the model. The LASSO–LARS approach was tested on two data sets: a simulated data set with 5865 individuals and 6000 Single Nucleotide Polymorphisms (SNPs); and a mouse data set with 1885 individuals genotyped for 10 656 SNPs and phenotyped for a number of quantitative traits. In the simulated data, three approaches were used to split the reference population into training and validation subsets for cross-validation: random splitting across the whole population; random sampling of validation set from the last generation only, either within or across families. The highest accuracy was obtained by random splitting across the whole population. The accuracy of genomic estimated breeding values (GEBVs) in the candidate population obtained by LASSO–LARS was 0·89 with 156 explanatory SNPs. This value was higher than those obtained by Best Linear Unbiased Prediction (BLUP) and a Bayesian method (BayesA), which were 0·75 and 0·84, respectively. In the mouse data, 1600 individuals were randomly allocated to the reference population. The GEBVs for the remaining 285 individuals estimated by LASSO–LARS were more accurate than those obtained by BLUP and BayesA for weight at six weeks and slightly lower for growth rate and body length. It was concluded that LASSO–LARS approach is a good alternative method to estimate marker effects for genomic selection, particularly when the cost of genotyping can be reduced by using a limited subset of markers.

(Received February 04 2009)

(Revised August 28 2009)

(Revised November 03 2009)

Correspondence:

c1 Corresponding author. Settore Genetica e Biotecnologie, AGRIS-Sardegna, Loc. Bonassai, Km 18·6 S. S. Sassari-Fertilia, 07040, Olmedo (SS), Italy. Tel: +39 079387318. Fax: +39-079389450. e-mail: graziano.usai@gmail.com

Metrics