An approach to the development of a core set of germplasm using a mixture of qualitative and quantitative data

Rupam Kumar Sarkar; Prabina Kumar Meher; S. D. Wahi; T. Mohapatra; A. R. Rao

doi:10.1017/S1479262114000732

An approach to the development of a core set of germplasm using a mixture of qualitative and quantitative data

Published online by Cambridge University Press: 26 June 2014

Rupam Kumar Sarkar ,

Prabina Kumar Meher ,

S. D. Wahi ,

T. Mohapatra and

A. R. Rao

Show author details

Rupam Kumar Sarkar: Affiliation:
Indian Agricultural Statistics Research Institute, New Delhi110012, India
Prabina Kumar Meher: Affiliation:
Indian Agricultural Statistics Research Institute, New Delhi110012, India
S. D. Wahi: Affiliation:
Indian Agricultural Statistics Research Institute, New Delhi110012, India
T. Mohapatra: Affiliation:
Central Rice Research Institute, Cuttack, Odisha753006, India
A. R. Rao*: Affiliation:
Indian Agricultural Statistics Research Institute, New Delhi110012, India
*: *Corresponding authors: E-mail: arrao@iasri.res.in; rao.cshl.work@gmail.com

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Development of a representative and well-diversified core with minimum duplicate accessions and maximum diversity from a larger population of germplasm is highly essential for breeders involved in crop improvement programmes. Most of the existing methodologies for the identification of a core set are either based on qualitative or quantitative data. In this study, an approach to the identification of a core set of germplasm based on the response from a mixture of qualitative (single nucleotide polymorphism genotyping) and quantitative data was proposed. For this purpose, six different combined distance measures, three for quantitative data and two for qualitative data, were proposed and evaluated. The combined distance matrices were used as inputs to seven different clustering procedures for classifying the population of germplasm into homogeneous groups. Subsequently, an optimum number of clusters based on all clustering methodologies using different combined distance measures were identified on a consensus basis. Average cluster robustness values across all the identified optimum number of clusters under each clustering methodology were calculated. Overall, three different allocation methods were applied to sample the accessions that were selected from the clusters identified under each clustering methodology, with the highest average cluster robustness value being used to formulate a core set. Furthermore, an index was proposed for the evaluation of diversity in the core set. The results reveal that the combined distance measure A1B2 – the distance based on the average of the range-standardized absolute difference for quantitative data with the rescaled distance based on the average absolute difference for qualitative data – from which three clusters that were identified by using the k-means clustering algorithm along with the proportional allocation method was suitable for the identification of a core set from a collection of rice germplasm.

Keywords

consensus clustering core set germplasm mixture data robustness single nucleotide polymorphisms

Type: Research Article
Information: Plant Genetic Resources , Volume 13 , Issue 2 , August 2015 , pp. 96 - 103

DOI: https://doi.org/10.1017/S1479262114000732 [Opens in a new window]
Copyright: Copyright © NIAB 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agrama, HA, Yan, WG, Lee, F, Fjellstrom, R, Chen, M-H, Jia, M and McClung, A (2009) Genetic assessment of a mini-core subset developed from the USDA rice genebank. Crop Science 49: 1336–1346.CrossRef Google Scholar

Crossa, J and Franco, J (2004) Statistical methods for classifying genotypes. Euphytica 137: 19–37.CrossRef Google Scholar

Doring, C, Borgelt, C and Kruse, R (2004) Fuzzy clustering of quantitative and qualitative data. In Proceedings of the 2004 NAFIPS. Banff, Alberta, Canada, pp. 84–89.Google Scholar

Everitt, BS (1979) Unresolved problems in cluster analysis. Biometrics 35: 169–181.CrossRef Google Scholar

Frankel, OH and Brown, AHD (1984) Plant genetic resources today: a critical appraisal. In: Holden, JHW and Williams, JT (eds) Crop Genetic Resources: Conservation and Evaluation. London: George Allen & Unwin Ltd, pp. 249–257.Google Scholar

Gangopadhyay, KK, Mahajan, RK, Kumar, G, Yadav, SK, Meena, BL, Pandey, C, Bisht, IS, Mishra, SK, Sivaraj, N, Gambhir, R, Sharma, SK and Dhillon, BS (2010) Development of a core set in brinjal (Solanum melongena L.). Crop Science 50: 755–762.CrossRef Google Scholar

Gibert, K and Cortes, U (1997) Weighting quantitative and qualitative variables in clustering methods. Mathware & Soft Computing 4: 251–266.Google Scholar

Gower, JC (1971) A general coefficient of similarity and some of its properties. Biometrics 27: 857–874.CrossRef Google Scholar

Hu, J, Zhu, J and Xu, HM (2000) Methods of constructing core collections by stepwise clustering with three sampling strategies based on the genotypic values of crops. Theoretical and Applied Genetics 101: 264–268.CrossRef Google Scholar

Kim, KW, Chung, HK, Cho, GT, Ma, KH, Chandrabalan, D, Gwag, JG, Kim, TS, Cho, EG and Park, YJ (2007) PowerCore: a program applying the advanced M strategy with a heuristic search for establishing core sets. Bioinformatics 23: 515–526.CrossRef Google Scholar PubMed

Monti, S, Tamayo, P, Mesirov, J and Golub, T (2003) Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning 52: 91–118.CrossRef Google Scholar

Munneke, B, Schlauch, KA, Simonsen, KL, Beavis, WD and Doerge, RW (2005) Adding confidence to gene expression clustering. Genetics 170: 2003–2011.CrossRef Google Scholar PubMed

Odong, TL, van Heerwaarden, J, Jansen, J, van Hintum, TJL and van Eeuwijk, FA (2011) Determination of genetic structure of germplasm collections: are traditional hierarchical clustering methods appropriate for molecular marker data? Theoretical and Applied Genetics 123: 195–205.CrossRef Google Scholar PubMed

Odong, TL, Jansen, J, van Eeuwijk, FA and van Hintum, TJL (2013) Quality of core collections for effective utilisation of genetic resources review, discussion and interpretation. Theoretical and Applied Genetics 126: 289–305.CrossRef Google Scholar PubMed

Sarkar, RK, Rao, AR, Wahi, SD and Bhat, KV (2011) A comparative performance of clustering procedures for mixture of qualitative and quantitative data – an application to black gram. Plant Genetic Resources: Characterisation and Utilization 9: 523–527.CrossRef Google Scholar

Sharma, R, Rao, VP, Upadhyaya, HD, Reddy, VG and Thakur, RP (2010) Resistance to grain mold and downy mildew in a mini-core collection of sorghum germplasm. Plant Disease 94: 439–444.CrossRef Google Scholar

Simpson, TI (2010) clusterCons: Calculate the consensus clustering result from re-sampled clustering experiments with the option of using multiple algorithms and parameter, R package version 3.0.2. http://cran.r-project.org/src/contrib/Archive/clusterCons/.Google Scholar

Simpson, TI, Armstrong, JD and Jarman, AP (2010) Merged consensus clustering to assess and improve class discovery with microarray data. BMC Bioinformatics 11: 590.CrossRef Google Scholar PubMed

Studnicki, M and Debski, K (2012) ccChooser: Developing a core collections, R package version 3.0.2. http://cran.r-project.org/package=ccChooser.Google Scholar

van Hintum, T and Th, JL (1999) The Core Selector, a system to generate representative selections of germplasm accessions. Plant Genetic Resources Newsletter 118: 64–67.Google Scholar

van Hintum, T, Brown, AHD, Spillane, C and Hodgkin, T (2000) Core collections of plant genetic resources. IPGRI Technical Bulletin No. 3. International Plant Genetic Resources Institute, Rome, Italy. Google Scholar

Wen, W, Franco, J, Chavez-Tovar, VH, Yan, J and Taba, S (2012) Genetic characterization of a core set of a tropical maize race Tuxpeño for further use in maize improvement. PLoS ONE 7: e32626.CrossRef Google Scholar PubMed

Yan, W, Rutger, JN, Bryant, RJ, Bockelman, HE, Fjellstrom, RG, Thomas, MC, Tai, H and McClung, AM (2007) Development and evaluation of a core subset of the USDA rice germplasm collection. Crop Science 47: 869–876.CrossRef Google Scholar

Yu, JZ, Kohel, RJ, Fang, DD, Cho, J, Van Deynze, A, Ulloa, M, Hoffman, SM, Pepper, AE, Stelly, DM, Jenkins, JN, Saha, S, Kumpatla, SP, Shah, MR, Hugie, WV and Percy, RG (2012) A high-density simple sequence repeat and single nucleotide polymorphism genetic map of the tetraploid cotton genome. Genes Genomes Genetics 2: 43–58.CrossRef Google Scholar PubMed

Article contents

An approach to the development of a core set of germplasm using a mixture of qualitative and quantitative data

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests