Abstract
The results of genetic diversity studies using molecular markers not only depend on the biology of the studied objects but also on the quality of the marker data. Poor data quality may hamper the correct answering of biological questions. A new statistic is proposed to estimate the quality of a marker data set with regard to its ability to describe the structure of the biological material under study. This statistic is called data resolution (DR). It is calculated by splitting a marker data set at random into two sets each with half the number of markers. In each set, similarities between all pairs of objects are calculated. Subsequently, the similarities obtained for the two sets are correlated. This process is repeated a large number of times. The average of the correlation coefficients obtained in this way is the DR of the dataset. In the present paper, the DR statistic is applied to four studies involving amplified fragment length polymorphism as well as micro-satellite markers. In addition, some properties and possible applications of DR are discussed, including the prediction of the added value of scoring additional markers, and the determination of which similarity measure is, apart from genetical considerations, most appropriate for analyzing the data.
Similar content being viewed by others
References
Dice LR (1945) Measures of the amount of ecologic association between species. Ecology 26:297–302
Edwards AL (1976) The correlation coefficient: an introduction to linear regression and correlation, Chap 4. W. H. Freeman, San Francisco
Felsenstein J (1985) Confidence limits on phylogenies: an approach using bootstrap. Evolution 39:783–791
Hintum TJL van (2003) Molecular characterisation of a lettuce germplasm collection. Eucarpia leafy vegetables, In: Proceedings of the Eucarpia meeting on leafy vegetables genetics and breeding, Noordwijkerhout, The Netherlands, 19–21 March, 2003. Centre for Genetic Resources, Wageningen, pp 99–104
Jaccard P (1908) Nouvelles recherches sur la distribution florale. Bull Soc Vaud Sci Nat 44:223–270
Jenkins S, Gibson N (2002) High-throughput SNP genotyping. Comp Funct Genom 3:57–66
Jolliffe IT (1986) Principal component analysis. Springer, New York
Koopman WJM, Gort G (2004) Significance tests and weighted values for AFLP similarities, based on Arabidopsis in silico AFLP fragment length distributions. Genetics 167:1915–1928
Morgante M, Olivieri AM (1993) PCR-amplified microsatellites as markers in plant genetics. Plant J 3:175–182
Nei M, Li WH (1979) Mathematical models for studying genetic variation in terms of restriction endonucleases. Proc Natl Acad Sci USA 76:5269–5273
Reif JC, Melchinger AE, Frisch M (2005) Genetical and mathematical properties of similarity and dissimilarity coefficients applied in plant breeding and seed bank management. Crop Sci 45:1–7
Rogers JS (1972) Measures of genetic similarity and genetic distance. Studies in genetics VII. University of Texas Publication 7213, Austin, pp 145–153
Rohlf FJ (1972) An empirical comparison of three ordination techniques in numerical taxonomy. Syst Zool 21:271–280
Sneath PHA (1957) Some thoughts on bacterial classification. J Gen Microbiol 17:184–200
Sneath PHA, Sokal RR (1973) Numerical taxonomy. W.H. Freeman, San Francisco, pp 230–234
Sokal RR, Michener CD (1958) A statistical method for evaluating systematic relationships. Univ Kansas Sci Bull 38:1409–1438
Sokal RR, Rohlf FJ (1962) The comparison of dendrograms by objective methods. Taxon 11:33–40
van Hintum TJL, van Treuren R, van de Wiel CCM, Visser DL, Vosman B (2007) The distribution of AFLP variation in a Brassica oleracea genebank collection in comparison with the effects of regeneration on diversity. Theor Appl Genet 114:777–786
van Treuren R, Tchoudinova I, van Soest LJM, van Hintum TJL (2006) Marker-assisted acquisition and core collection formation of plant genetic resources: a case study in barley using AFLPs and pedigree data. Genet Resour Crop Evol 53:43–52
Vos P, Hogers R, Bleeker M, Reijans M, van de Lee T, Hornes M, Frijters A, Pot J, Peleman J, Kuiper M, Zabeau M (1995) AFLP: a new technique for DNA fingerprinting. Nucleic Acids Res 23:4407–4414
Wenzl P, Carling J, Kudrna D, Jaccoud D, Huttner E, Kleinhofs A, Kilian A (2004) Diversity arrays technology (DArT) for whole-genome profiling of barley. PNAS 10:9915–9920
You GX, Zhang XY, Wang LF (2004) An estimation of the minimum number of SSR loci needed to reveal genetic relationships in wheat varieties: information from 96 random accessions with maximized genetic diversity. Mol Breed 14:397–406
Zhang XY, Li CW, Wang LF, Wang HM, You GX, Dong YS (2002) An estimation of the minimum number of SSR alleles needed to reveal genetic relationships in wheat varieties. I. Information from large-scale planted varieties and cornerstone breeding parents in Chinese wheat improvement and production. Theor Appl Genet 106:112–117
Acknowledgments
The author would like to thank Rob van Treuren, Hans Jansen, Jean Christophe Glaszmann and Graham McLaren for suggestions and comments. The author would also like to thank the anonymous referees for their excellent feedback that greatly helped to improve the manuscript. This work is supported by the Generation Challenge Programme.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by A. Bervillé.
Rights and permissions
About this article
Cite this article
van Hintum, T.J.L. Data resolution: a jackknife procedure for determining the consistency of molecular marker datasets. Theor Appl Genet 115, 343–349 (2007). https://doi.org/10.1007/s00122-007-0566-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00122-007-0566-5