Skip to main content
Log in

Data resolution: a jackknife procedure for determining the consistency of molecular marker datasets

  • Original Paper
  • Published:
Theoretical and Applied Genetics Aims and scope Submit manuscript

Abstract

The results of genetic diversity studies using molecular markers not only depend on the biology of the studied objects but also on the quality of the marker data. Poor data quality may hamper the correct answering of biological questions. A new statistic is proposed to estimate the quality of a marker data set with regard to its ability to describe the structure of the biological material under study. This statistic is called data resolution (DR). It is calculated by splitting a marker data set at random into two sets each with half the number of markers. In each set, similarities between all pairs of objects are calculated. Subsequently, the similarities obtained for the two sets are correlated. This process is repeated a large number of times. The average of the correlation coefficients obtained in this way is the DR of the dataset. In the present paper, the DR statistic is applied to four studies involving amplified fragment length polymorphism as well as micro-satellite markers. In addition, some properties and possible applications of DR are discussed, including the prediction of the added value of scoring additional markers, and the determination of which similarity measure is, apart from genetical considerations, most appropriate for analyzing the data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Dice LR (1945) Measures of the amount of ecologic association between species. Ecology 26:297–302

    Article  Google Scholar 

  • Edwards AL (1976) The correlation coefficient: an introduction to linear regression and correlation, Chap 4. W. H. Freeman, San Francisco

    Google Scholar 

  • Felsenstein J (1985) Confidence limits on phylogenies: an approach using bootstrap. Evolution 39:783–791

    Article  Google Scholar 

  • Hintum TJL van (2003) Molecular characterisation of a lettuce germplasm collection. Eucarpia leafy vegetables, In: Proceedings of the Eucarpia meeting on leafy vegetables genetics and breeding, Noordwijkerhout, The Netherlands, 19–21 March, 2003. Centre for Genetic Resources, Wageningen, pp 99–104

  • Jaccard P (1908) Nouvelles recherches sur la distribution florale. Bull Soc Vaud Sci Nat 44:223–270

    Google Scholar 

  • Jenkins S, Gibson N (2002) High-throughput SNP genotyping. Comp Funct Genom 3:57–66

    Article  CAS  Google Scholar 

  • Jolliffe IT (1986) Principal component analysis. Springer, New York

    Google Scholar 

  • Koopman WJM, Gort G (2004) Significance tests and weighted values for AFLP similarities, based on Arabidopsis in silico AFLP fragment length distributions. Genetics 167:1915–1928

    Article  PubMed  CAS  Google Scholar 

  • Morgante M, Olivieri AM (1993) PCR-amplified microsatellites as markers in plant genetics. Plant J 3:175–182

    Article  PubMed  CAS  Google Scholar 

  • Nei M, Li WH (1979) Mathematical models for studying genetic variation in terms of restriction endonucleases. Proc Natl Acad Sci USA 76:5269–5273

    Article  PubMed  CAS  Google Scholar 

  • Reif JC, Melchinger AE, Frisch M (2005) Genetical and mathematical properties of similarity and dissimilarity coefficients applied in plant breeding and seed bank management. Crop Sci 45:1–7

    Article  Google Scholar 

  • Rogers JS (1972) Measures of genetic similarity and genetic distance. Studies in genetics VII. University of Texas Publication 7213, Austin, pp 145–153

    Google Scholar 

  • Rohlf FJ (1972) An empirical comparison of three ordination techniques in numerical taxonomy. Syst Zool 21:271–280

    Article  Google Scholar 

  • Sneath PHA (1957) Some thoughts on bacterial classification. J Gen Microbiol 17:184–200

    PubMed  CAS  Google Scholar 

  • Sneath PHA, Sokal RR (1973) Numerical taxonomy. W.H. Freeman, San Francisco, pp 230–234

    Google Scholar 

  • Sokal RR, Michener CD (1958) A statistical method for evaluating systematic relationships. Univ Kansas Sci Bull 38:1409–1438

    Google Scholar 

  • Sokal RR, Rohlf FJ (1962) The comparison of dendrograms by objective methods. Taxon 11:33–40

    Article  Google Scholar 

  • van Hintum TJL, van Treuren R, van de Wiel CCM, Visser DL, Vosman B (2007) The distribution of AFLP variation in a Brassica oleracea genebank collection in comparison with the effects of regeneration on diversity. Theor Appl Genet 114:777–786

    Article  PubMed  CAS  Google Scholar 

  • van Treuren R, Tchoudinova I, van Soest LJM, van Hintum TJL (2006) Marker-assisted acquisition and core collection formation of plant genetic resources: a case study in barley using AFLPs and pedigree data. Genet Resour Crop Evol 53:43–52

    Article  Google Scholar 

  • Vos P, Hogers R, Bleeker M, Reijans M, van de Lee T, Hornes M, Frijters A, Pot J, Peleman J, Kuiper M, Zabeau M (1995) AFLP: a new technique for DNA fingerprinting. Nucleic Acids Res 23:4407–4414

    Article  PubMed  CAS  Google Scholar 

  • Wenzl P, Carling J, Kudrna D, Jaccoud D, Huttner E, Kleinhofs A, Kilian A (2004) Diversity arrays technology (DArT) for whole-genome profiling of barley. PNAS 10:9915–9920

    Article  Google Scholar 

  • You GX, Zhang XY, Wang LF (2004) An estimation of the minimum number of SSR loci needed to reveal genetic relationships in wheat varieties: information from 96 random accessions with maximized genetic diversity. Mol Breed 14:397–406

    Article  Google Scholar 

  • Zhang XY, Li CW, Wang LF, Wang HM, You GX, Dong YS (2002) An estimation of the minimum number of SSR alleles needed to reveal genetic relationships in wheat varieties. I. Information from large-scale planted varieties and cornerstone breeding parents in Chinese wheat improvement and production. Theor Appl Genet 106:112–117

    PubMed  CAS  Google Scholar 

Download references

Acknowledgments

The author would like to thank Rob van Treuren, Hans Jansen, Jean Christophe Glaszmann and Graham McLaren for suggestions and comments. The author would also like to thank the anonymous referees for their excellent feedback that greatly helped to improve the manuscript. This work is supported by the Generation Challenge Programme.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Th. J. L. van Hintum.

Additional information

Communicated by A. Bervillé.

Rights and permissions

Reprints and permissions

About this article

Cite this article

van Hintum, T.J.L. Data resolution: a jackknife procedure for determining the consistency of molecular marker datasets. Theor Appl Genet 115, 343–349 (2007). https://doi.org/10.1007/s00122-007-0566-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00122-007-0566-5

Keywords

Navigation