Data resolution: a jackknife procedure for determining the consistency of molecular marker datasets

van Hintum, Th. J. L.

doi:10.1007/s00122-007-0566-5

Data resolution: a jackknife procedure for determining the consistency of molecular marker datasets

Original Paper
Published: 15 May 2007

Volume 115, pages 343–349, (2007)
Cite this article

Theoretical and Applied Genetics Aims and scope Submit manuscript

Th. J. L. van Hintum¹

1139 Accesses
10 Citations
Explore all metrics

Abstract

The results of genetic diversity studies using molecular markers not only depend on the biology of the studied objects but also on the quality of the marker data. Poor data quality may hamper the correct answering of biological questions. A new statistic is proposed to estimate the quality of a marker data set with regard to its ability to describe the structure of the biological material under study. This statistic is called data resolution (DR). It is calculated by splitting a marker data set at random into two sets each with half the number of markers. In each set, similarities between all pairs of objects are calculated. Subsequently, the similarities obtained for the two sets are correlated. This process is repeated a large number of times. The average of the correlation coefficients obtained in this way is the DR of the dataset. In the present paper, the DR statistic is applied to four studies involving amplified fragment length polymorphism as well as micro-satellite markers. In addition, some properties and possible applications of DR are discussed, including the prediction of the added value of scoring additional markers, and the determination of which similarity measure is, apart from genetical considerations, most appropriate for analyzing the data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Density-Based Clustering Based on Hierarchical Density Estimates

A practical guide to amplicon and metagenomic analysis of microbiome data

Article Open access 11 May 2020

Yong-Xin Liu, Yuan Qin, … Yang Bai

Overview of Statistical Methods for Genome-Wide Association Studies (GWAS)

References

Dice LR (1945) Measures of the amount of ecologic association between species. Ecology 26:297–302
Article Google Scholar
Edwards AL (1976) The correlation coefficient: an introduction to linear regression and correlation, Chap 4. W. H. Freeman, San Francisco
Google Scholar
Felsenstein J (1985) Confidence limits on phylogenies: an approach using bootstrap. Evolution 39:783–791
Article Google Scholar
Hintum TJL van (2003) Molecular characterisation of a lettuce germplasm collection. Eucarpia leafy vegetables, In: Proceedings of the Eucarpia meeting on leafy vegetables genetics and breeding, Noordwijkerhout, The Netherlands, 19–21 March, 2003. Centre for Genetic Resources, Wageningen, pp 99–104
Jaccard P (1908) Nouvelles recherches sur la distribution florale. Bull Soc Vaud Sci Nat 44:223–270
Google Scholar
Jenkins S, Gibson N (2002) High-throughput SNP genotyping. Comp Funct Genom 3:57–66
Article CAS Google Scholar
Jolliffe IT (1986) Principal component analysis. Springer, New York
Google Scholar
Koopman WJM, Gort G (2004) Significance tests and weighted values for AFLP similarities, based on Arabidopsis in silico AFLP fragment length distributions. Genetics 167:1915–1928
Article PubMed CAS Google Scholar
Morgante M, Olivieri AM (1993) PCR-amplified microsatellites as markers in plant genetics. Plant J 3:175–182
Article PubMed CAS Google Scholar
Nei M, Li WH (1979) Mathematical models for studying genetic variation in terms of restriction endonucleases. Proc Natl Acad Sci USA 76:5269–5273
Article PubMed CAS Google Scholar
Reif JC, Melchinger AE, Frisch M (2005) Genetical and mathematical properties of similarity and dissimilarity coefficients applied in plant breeding and seed bank management. Crop Sci 45:1–7
Article Google Scholar
Rogers JS (1972) Measures of genetic similarity and genetic distance. Studies in genetics VII. University of Texas Publication 7213, Austin, pp 145–153
Google Scholar
Rohlf FJ (1972) An empirical comparison of three ordination techniques in numerical taxonomy. Syst Zool 21:271–280
Article Google Scholar
Sneath PHA (1957) Some thoughts on bacterial classification. J Gen Microbiol 17:184–200
PubMed CAS Google Scholar
Sneath PHA, Sokal RR (1973) Numerical taxonomy. W.H. Freeman, San Francisco, pp 230–234
Google Scholar
Sokal RR, Michener CD (1958) A statistical method for evaluating systematic relationships. Univ Kansas Sci Bull 38:1409–1438
Google Scholar
Sokal RR, Rohlf FJ (1962) The comparison of dendrograms by objective methods. Taxon 11:33–40
Article Google Scholar
van Hintum TJL, van Treuren R, van de Wiel CCM, Visser DL, Vosman B (2007) The distribution of AFLP variation in a Brassica oleracea genebank collection in comparison with the effects of regeneration on diversity. Theor Appl Genet 114:777–786
Article PubMed CAS Google Scholar
van Treuren R, Tchoudinova I, van Soest LJM, van Hintum TJL (2006) Marker-assisted acquisition and core collection formation of plant genetic resources: a case study in barley using AFLPs and pedigree data. Genet Resour Crop Evol 53:43–52
Article Google Scholar
Vos P, Hogers R, Bleeker M, Reijans M, van de Lee T, Hornes M, Frijters A, Pot J, Peleman J, Kuiper M, Zabeau M (1995) AFLP: a new technique for DNA fingerprinting. Nucleic Acids Res 23:4407–4414
Article PubMed CAS Google Scholar
Wenzl P, Carling J, Kudrna D, Jaccoud D, Huttner E, Kleinhofs A, Kilian A (2004) Diversity arrays technology (DArT) for whole-genome profiling of barley. PNAS 10:9915–9920
Article Google Scholar
You GX, Zhang XY, Wang LF (2004) An estimation of the minimum number of SSR loci needed to reveal genetic relationships in wheat varieties: information from 96 random accessions with maximized genetic diversity. Mol Breed 14:397–406
Article Google Scholar
Zhang XY, Li CW, Wang LF, Wang HM, You GX, Dong YS (2002) An estimation of the minimum number of SSR alleles needed to reveal genetic relationships in wheat varieties. I. Information from large-scale planted varieties and cornerstone breeding parents in Chinese wheat improvement and production. Theor Appl Genet 106:112–117
PubMed CAS Google Scholar

Download references

Acknowledgments

The author would like to thank Rob van Treuren, Hans Jansen, Jean Christophe Glaszmann and Graham McLaren for suggestions and comments. The author would also like to thank the anonymous referees for their excellent feedback that greatly helped to improve the manuscript. This work is supported by the Generation Challenge Programme.

Author information

Authors and Affiliations

Centre for Genetic Resources, The Netherlands (CGN), Wageningen University and Research Centre, P.O. Box 16, 6700 AA, Wageningen, The Netherlands
Th. J. L. van Hintum

Authors

Th. J. L. van Hintum
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Th. J. L. van Hintum.

Additional information

Communicated by A. Bervillé.

Rights and permissions

Reprints and permissions

About this article

Cite this article

van Hintum, T.J.L. Data resolution: a jackknife procedure for determining the consistency of molecular marker datasets. Theor Appl Genet 115, 343–349 (2007). https://doi.org/10.1007/s00122-007-0566-5

Download citation

Received: 04 July 2006
Accepted: 23 April 2007
Published: 15 May 2007
Issue Date: August 2007
DOI: https://doi.org/10.1007/s00122-007-0566-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data resolution: a jackknife procedure for determining the consistency of molecular marker datasets

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

A practical guide to amplicon and metagenomic analysis of microbiome data

Overview of Statistical Methods for Genome-Wide Association Studies (GWAS)

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Data resolution: a jackknife procedure for determining the consistency of molecular marker datasets

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

A practical guide to amplicon and metagenomic analysis of microbiome data

Overview of Statistical Methods for Genome-Wide Association Studies (GWAS)

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation