1 Introduction

Gas chromatography–electron impact–mass spectrometry (GC–EI–MS) has been an established technique for several decades. However, its application to ‘global’ metabolite analysis in complex samples has only become routine in the past 10 years of plant science (Fiehn et al. 2000a), and perhaps more recently for animal studies (Dunn 2008), although biofluid analysis first occurred in the 1960s (Horning 1968; Pauling et al. 1971). GC–EI–MS profiling has been greatly facilitated by high data acquisition rate GC–EI–time of flight (TOF)/MS and reproducible derivatisation procedures suited to polar metabolites (Roessner et al. 2000). Since the recognition of Sauter et al. (1988) groundbreaking work on herbicide mode of action, Max Plank Institute of Molecular Plant Physiology have strived to update the method establishing robust SOP’s using first quadrupole and later TOF based GC–EI–MS (Fiehn et al. 2000a, b; Fernie et al. 2004; Lisec et al. 2006; Erban et al. 2007). TOF mass analysers give increased sensitivity and very data-rich metabolite profiles, which subsequently demands new strategies in data mining. Standard operating procedures (SOP) are well established for targeted methods however there is a need for standardisation across laboratories for all aspects of metabolomics work. Suggestions have recently been made by the metabolomics standards initiative (MSI; Fiehn et al. 2007a) which is developing minimal reporting standards in data generation (Fiehn et al. 2008), exchange (Jenkins et al. 2004; Hardy and Taylor 2007), analysis (Goodacre et al. 2007) and reporting (Fiehn et al. 2007b; Sumner et al. 2007).

Food quality traits such as fragrance, taste, appearance, shelf-life and nutritional content are determined by their biochemical composition and thus reflected in their metabolite profiles (Hall 2006, 2007). Metabolomics has proven to be an appropriate tool for the extensive analysis of plant and food composition (Dixon et al. 2006; Schauer and Fernie 2006). META-PHOR (http://www.meta-phor.eu/) (Hall 2007; Hall et al. 2008) aims at developing technological platforms and associated methods to provide a tool to monitor food nutritional quality and safety, whilst adhering to all work guidelines of the MSI. Three target species were selected; melon for its matrix complexity and dominance of sugars and the analytical challenges which result, broccoli for its extreme complexity and metabolite richness (especially ‘nutraceuticals’), and the rice grain due to its position as the major staple food.

As part of the META-PHOR project priority towards technology development (Hall 2007) a series of ring experiments comparing, proton-nuclear magnetic resonance (1H-NMR), liquid chromatography (LC)–MS, and GC–EI–TOF/MS have been initiated. The GC–EI–TOF/MS ring experiment was undertaken by the University of Manchester UK (UMAN), Max Plank Institute of Molecular Plant Physiology, Golm DE (MPIMP), and LECO Instruments Mönchengladbach DE (LECO). Each of these groups had SOP’s established for variants of an initial analytical methodology (Fiehn et al. 2000a) largely resulting from the different research activities each focuses upon. The UMAN method was optimised for primary metabolite detection in yeast media whilst maintaining analysis times of less than 20 min (O’Hagan et al. 2005). The MPIMP method was optimised towards maintaining maximum metabolite coverage with polar extracts from plants (Lisec et al. 2006; Erban et al. 2007). The LECO GC–EI–TOF/MS method was optimised for maximal metabolite coverage regardless of the sample matrix.

The ring experiment study design included a standardised protocol (Erban et al. 2007) for sample preparation (Fiehn et al. 2000a) and multivariate analyses, i.e. principal components analysis (PCA) and independent component analysis (ICA) and comparisons of major metabolite features. PCA is a statistical technique for sample classification which reduces multivariate data sets to a small number of variables (PCs) which comprise the major variances in the data set (Jolliffe 1986). ICA is a variant of PCA which additionally allows the unsupervised search for best bimodal sample partitions. ICA is well suited to the confirmation of known experimental sample classes but allows also the discovery of unexpected classes or trends (Stone 2002; Scholz et al. 2004; Scholz et al. 2005; Trygg et al. 2006). Each independent component (IC) encodes a single partition among samples from which a loadings analysis unravels which signals are most relevant for the distinction of the embedded sample partitions. Since PCA and ICA do not use sample class information they are so-called unsupervised methods and thereby are ideal for non-biased reproducibility analysis.

To the best of the authors’ knowledge, this is one of the first ring experiments in the metabolomics field to concentrate upon reproducibility of major differential metabolite features suitable for food sample classifications from a common set of extracts by GC–EI–TOF/MS. By making comparisons of the different laboratories data with ICA, reproducibility can be demonstrated for the unambiguous discrimination of the three plant matrices, indicating that the short-term inter-laboratory reproducibility of GC–EI–TOF/MS based metabolomics is high and thus has great promise for the current efforts being made towards the generation of global metabolomics databases.

2 Methods

2.1 Plant materials

The French melon varieties, Cucumis melo cv. Cézanne and Escrito, were commercial F1 hybrids. Seeds were obtained from Clause-Tézier (FR). Plants were grown by the French National Institute for Agricultural Research (INRA) in an open field in the South-West of France (Moissac, Bordeaux, 44° N × 1° E) between April and August 2006. The soil type was clay and limestone, the plant density was 9,200 plants/ha. The Cézanne cultivar, but not Escrito, was protected with a polyethylene sheet. The Israeli melon varieties, C. melo cv. Noy Yize’el and Tam Dew, were obtained from the germplasm collection at the Agricultural Research Organisation (ARO), Volcani Centre (IL). Plants were grown in a standardised green house (32° N × 35° E) between June and September 2006. The soil type was volcanic tuff and peat (1:1), the plant density was 20,000 plants/ha. French broccoli cultivars, Brassica oleracea cv. Monaco and Chevalier (seed obtained from Syngenta and Seminis (FR) respectively) were grown by INRA in an open field (Toull lan, Bordeaux, 48° N × 3° E) between June and September 2006. The soil type was Eolian silt (12% clay, 16% fine silt, 44% coarse silt, 24% fine sand), the plant density was 2,500 plants/ha. Rice cultivars, Oryza sativa cv. Hom Nang Nouane (HNN), Kay Noy (KNL) and TSN1 seed stocks were obtained from the International Rice Research Institute (IRRI) the Philippines and grown by the Laos National Agricultural Research Centre (NARC) in open paddy fields in the Saythany District of Vientiane (17° N × 102° E) from 1st September until 1st December 2006, the soil type was clay. Fertilisation involved nitrogen supplementation at three time points (0, 4, and 8 weeks) throughout the three month growth period. Four different nitrogen fertilisation regimes (0–30–30 kg/ha; 30–30–30 kg/ha; 60–30–30 kg/ha; 90–30–30 kg/ha) were applied to separated plots for each rice cultivar. For all species (unless otherwise detailed), irrigation, watering, fertilisation and pathogen–pest control were performed according to commercial practices.

For each cultivar, 50 melons were harvested at commercial maturity between July and August 2006 (French varieties) and August and September 2006 (Israeli varieties). Broccoli florets were harvested in mid September 2006. Both melons and broccoli were transported in insulated boxes and upon arrival processed within 2 h. For each cultivar, 36 fruits or 1.5 kg of floret were selected depending on the size, weight and colour in order to make three homogeneous lots (biological replicates) of 11 fruits or 1.5 kg of pooled floret each. For every biological replicate, fruits and florets were rapidly washed for 1–2 min with tap water (~10°C) and air dried. One quarter of each melon was taken, the skin was removed and the flesh cut in 2 cm × 2 cm cubes, the broccoli floret was also cut into small pieces, the samples were then flash frozen in liquid nitrogen. All samples were next ground (UMC5 grinder, STEPHANTM, Lognes, FR) to a homogeneous fine powder. When the rice grain had reached 22% moisture (December 2006), the panicles were harvested and threshed, 1 kg of grain was collected per cultivar and per nitrogen treatment. The grain was equilibrated at room temperature for six weeks to reduce variability in moisture content. For each of the 12 biological samples, the rice grain was ground for ~30 s in an IKA grinder A11 basic (Staufen, DE) fitted with a metallic cup to which liquid nitrogen was added, ensuring the material remained frozen. The fine rice flour was further flash frozen in liquid nitrogen. Ground samples for all species were immediately shipped on dry ice and stored at −80°C on receipt. Sample extraction was undertaken within three months of sample receipt. A full list of the samples analysed is provided (Table 1).

Table 1 Sample details

2.2 Chemicals

UMAN obtained succinic-d 4 acid, glycine-d 5 and malonic-d 2 acid standard metabolites (all of 99% purity or greater: 1:1:1 working stock of each standard of a final concentration of 0.5 mg/ml), along with all solvents (HPLC grade), O-methylhydroxylamine chloride, N-acetyl-N(trimethylsilyl)-trifluoroacetamide, pyridine and n-alkane time series from Sigma-Aldrich (Gillingham, UK). LECO and MPIMP obtained O-methylhydroxylamine chloride and n-alkane series from Sigma-Aldrich (Deisenhofen, DE), N-acetyl-N-(trimethylsilyl)-trifluoroacetamide from Macherey-Nagel (Düren, DE), and pyridine from Merck (Darmstadt, DE). The use of solvents and reagents from different manufacturers and locations represents a realistic evaluation of laboratory-to-laboratory robustness.

2.3 Sample extraction

Since the ring experiment was focused on an evaluation of data-acquisition and processing methods all extractions were conducted by a single laboratory and technician. The extraction procedure precisely followed that of Lisec et al. (2006), which was developed from the protocol of Fiehn et al. (2000a). Briefly, metabolites were extracted from 100 mg fresh weight (FW) for all plant tissue types with methanol and water. Polar metabolites were separated using chloroform purification. Three technical repeat samples each were combined and mixed well giving ~7 ml of polar phase ‘super’-extract, 1 ml was then transferred to clean 2 ml microcentrifuge tubes (Greiner Bio-One Ltd., Stonehouse, Glos., UK) to which 100 μl of the fore mentioned deuterated internal standard solution (cf. Sect. 2.2) was added. Samples were dried by vacuum centrifugation, Eppendorf Concentrator 5301, set on function 1 at 30°C for 8 h and stored at −80°C. The only alteration from the protocol of Lisec et al. (2006) was that ribitol was not used as an internal standard. Samples were shipped on dry ice from UMAN to LECO (Mönchengladbach, DE) and MPIMP (Potsdam-Golm, DE) where they were stored dry at −80°C until analysis. Sample analysis was completed by each lab within one month of receiving the extracts.

2.4 Analytical methods

Analytical methods are numbered and abbreviated by a capital L prefix in square brackets and detailed in Table 2. Common procedures of all method variations were as follows: Samples were removed from −80°C storage and placed in a speed vacuum concentrator for 1 h to remove residual condensation and water. The dried samples were derivatised with O-methylhydroxylamine and N-acetyl-N-(trimethylsilyl) trifluoroacetamide (MSTFA). Further details are presented in Table 2. All samples were run on a GC–EI–TOF/MS instrument with an Agilent 6890N gas chromatograph and a LECO Pegasus III TOF mass spectrometer using the manufacturer’s ChromaTOF software (versions 2.12, 2.22, 3.34; LECO, St. Joseph, MI, USA).

Table 2 Method parameters highlighting variations in GC–TOF/MS data-acquisition

The UMAN (laboratory 1 [L1]) GC–EI–TOF/MS instrument conditions and parameters (Table 2) were as previously described for the optimised method of O’Hagan et al. (2005). This applies a higher polarity column and different injection system when compared to the other methods. MPIMP (laboratory 3 [L3]) GC–EI–TOF/MS instrument conditions and parameters (Table 2) were the same as previously described by Erban et al. (2007). LECO (laboratory 2) GC–EI–TOF/MS [L2.1] instrument conditions and parameters were essentially the same as MPIMP’s with a slightly reduced oven temperature ramp rate and thus longer chromatographic separation time (Table 2). All of the instrument conditions and parameters for the Pegasus 4D GC×GC–TOF/MS [L2.1 2D] analysis undertaken by LECO were standard (Table 2).

2.5 Data processing and statistical analysis

Processing methods are numbered and abbreviated by a capital M prefix in square brackets and details are given in Table 3. Peak heights of mass (m/z) fragments were normalised using the succinic-d 4 acid stable isotope labelled standard (cf. Sect. 2.2). Annotation of peak identity was manually supervised using the TagFinder visualisations for mass spectral matching of so-called time groups and clusters (Lüdemann et al. 2008). Identification afforded a minimum of three correlating fragments in a cluster or time group and less than 5% of time deviation between the expected retention index (RI) of a spectral library of reference compounds of the Golm Metabolome Database (http://csbdb.mpimp-golm.mpg.de/csbdb/gmd/gmd.html) (Kopka et al. 2005). Initial visual and statistical analyses of the data were performed with the Multi Experiment Viewer software (Saeed et al. 2003, 2006) and MetAlign (de Vos et al. 2007; Lommen et al. 2007; Lommen 2009). The pre-processing software tool MetAlign (http://www.metalign.nl/UK/) offers two possibilities for interaction with other software: A, de-noising and baseline correction, which maintains the peak shape information (compatible with deconvolution software and Tagfinder); B, de-noising, baseline correction, peak-picking, alignment and export to an Excel format (compatible with Tagfinder and multivariate analysis software) (Lommen 2009). PCA and ICA were performed according to Scholz et al. (2004) using the MetaGenalyse web-service (Daub et al. 2003). The detailed data processing and statistical analysis methods, [M1] to [M7], are summarised in Tables 3 and 4.

Table 3 Method variations of data pre-processing
Table 4 Method variations of data mining relevant for laboratory comparisons

3 Results and discussion

GC–TOF/MS is a routine technology in analytics with well established standard procedures, nevertheless the use of these data in metabolomics, especially with regard to data exchange between laboratories, demands new strategies in data mining. The aim of our work, based on GC–EI–TOF/MS analysis of identical sample sets, is to demonstrate the reproducibility of sample classification results acquired in different laboratories. Thus data were generated in the three laboratories with different data mining strategies including non-targeted approaches without deconvolution. This was since previous reports (e.g., Lisec et al. 2006; Lu et al. 2008) discovered outlying deconvolutions and cautioned against the non-critical use of deconvoluted mass spectral intensities for relative quantification. Therefore our method of using sample classification of all detected mass features for laboratory-to-laboratory comparison differs from the approach taken in classical ring experiments where deconvoluted quantified data for specific target analytes are compared.

3.1 Analytes detected by GC–EI–TOF/MS and its potential for application to food quality assurance

Across the META-PHOR target species of rice grain, melon fruit and broccoli floret, when analysed with GC–EI–TOF/MS typical MSTFA derived GC amenable analytes (being non thermo-labile and within the instruments upper mass range of ~700 m/z) are observed. The typical metabolite groups that are detected include amino, organic, nucleic and fatty acids, as well as monosaccharides, disaccharides, sugar phosphates, sugar alcohols, and polyols. In the case of melon fruit a large number of these metabolite groups are related significantly to the fruit flavour and quality. For example monosacharides, disaccharides, and sugar alcohols all contribute to the sweet flavoured flesh of melon fruit a key quality trait to the consumer (Gao et al. 1999; Stepansky et al. 1999), and are indeed detected as being significantly more concentrated in the fruit inner mesocarp than the outer mesocarp and epicarp (Biais et al. 2009). Secondly the amino acid profile of the fruit is indicative of its fragrant qualities with many VOC’s such as esters and aldehydes being derived from amino acids such as alanine and valine.

Amino acids, organic acids, mono and disaccharides are also significant indicators of broccoli floret flavour and quality. Unfortunately many nutraceuticals within broccoli such as the flavones, flavanoids and glucosinalates, are large compounds and outside of the mass range of typical GC–EI–MS instrumentation, such nutraceutical compounds are much more amenable to detection via LC–MS (deVos et al. 2007; Jansen et al. 2008). The quality of rice grain is largely reflected in its starch and vitamin content, thus techniques such as LC–inductively coupled plasma (ICP)–MS which is capable of elemental profiling is required for its quality assessment. Since the market value of rice is largely determined by the fragrant nature of the rice variety, again VOC analysis is essential for determining phenotypic measures of market price and quality.

For a metabolomics screen to assess food quality and safety GC–EI–TOF/MS alone will not provide enough information across a large enough range of metabolite groups. Therefore, META-PHOR recommends multi-platform based analysis with: 1H-NMR, GC–EI–TOF/MS, LC–TOF/MS, VOC analysis via thermal desorption (TD) or solid phase micro extraction (SPME) analyte trapping followed by GC–EI–MS, various high resolution MS trap based techniques for the proceeding analyte identification, and where elemental composition analysis is required LC–ICP–MS is also applied.

3.2 Demonstration of global repeatability of GC–EI–TOF/MS based plant metabolomics using independent component analysis (ICA)

Through the use of ICA it was demonstrated that the global repeatability of the sample sets analysed by GC–EI–TOF/MS between the laboratories was high (Fig. 1). The data employed in the generation of Fig. 1a and b from UMAN (laboratory 1 [L1]) (Table 2) differ in data mining strategy. Figure 1a is based on a targeted method using deconvolved data as described by Lisec et al., (2006), corresponding to data analysis method [M7] (Tables 3, 4). Figure 1b is based on the same acquired raw data but processed with a non-targeted fingerprinting approach, thus enabling the analysis of all acquired mass spectral features from the data set and subsequent application to comprehensive statistical analysis (data analysis method [M1]) (Tables 3, 4) (Scholz et al. 2004; Pongsuwan et al. 2007).

Fig. 1
figure 1

Comparative independent component analysis demonstrates the reproducibility of sample discrimination between laboratories and method variations. ae shows independent component analyses based on the first two principal components of a PCA preprocessing. The visualised percentage of total variance (V) is indicated. a shows data of UMAN after metabolite targeted data processing, method combination [L1] and [M7]. b is based on fingerprinting the data set of UMAN with methods [L1] and [M1]. c compares fingerprinting data of LECO with method [L2.1] and [M1] to GC×GC-fingerprinting data (d) of the same laboratory using method [L2.1 2D] and [M1]. e demonstrates the fingerprinting results of MPIMP using the method combination [L3] and [M2]

Figure 1b–d based on data analysis method [M1], and Fig. 1e based on data analysis method [M2] (Tables 3, 4), are all generated by the non-targeted approach with data from all three laboratories [L1] to [L3] (Table 2). Noise reduction was performed by applying a criterion to find at least three unique and mutually correlating mass fragments per analyte for peak height based quantification. By contrast, Fig. 1a generated via data processing method [M7] (Tables 3, 4) is based upon a defined, pre-selected single unique mass for peak area based quantification. A maximum normalised response value was calculated from the available unique masses found via the underlying correlation and cluster analyses performed within TagFinder (Lüdemann et al. 2008). Annotation was manually supervised testing mass spectral similarity between the reference library (Kopka et al. 2005) and the measured feature and retention index behaviour.

Figure 1c and d compare the LECO (laboratory 2 [L2]) methods [L2.1] and [L2.1 2D] (Table 2) respectively, using data processing method [M1] (Tables 3, 4). Chromatography is longer, with a less polar column, and splitless injection, in contrast to the UMAN method [L1]. Figure 1c and e are based essentially on the same technical settings but generated by different laboratories (LECO [L2.1] and data processing method [M1], and MPIMP [L3] and data processing method [M2]; Tables 3, 4). All of the four data sets (3 × GC–EI–TOF/MS and 1 × GC×GC–EI–TOF/MS) were aligned according to retention index, normalised to the succinic-d 4 acid as this standard was ideal under all chromatography regimes, mean centred by each mass feature and finally log10 transformed. Missing data were replaced with “0” before uploading into MetaGeneAlyse for PCA and ICA (Daub et al. 2003). Note that the plots axes are scaled to the same scores range allowing comparative visualisation. The comparison between laboratories as well as between one (GC) and two (GC×GC) dimensional chromatography show good reproducibility and using unsupervised ICA clear and highly similar sample classifications were achieved for all data sets.

3.3 Assessments of technical reproducibility

For further and more detailed analysis, the subset of rice data was evaluated alone since the direct simultaneous analysis of the highly different matrices of broccoli, melon and rice, with respect to the high qualitative and quantitative differences in composition, reduces the availability of unique masses which can be employed for quantification. Therefore analysing a sub-set of the data according to biological matrix is advised. First a detailed non targeted evaluation of technical reproducibility is shown in Fig. 2. All mass spectral features with pair-wise availability after respective processing are plotted in Fig. 2a and b. In Fig. 2a, using the MPIMP instrument method [L3] (Table 2) and data processing method [M2] (Table 3), two similar biological rice samples with a minimum fragment intensity of 50 are compared. Figure 2b demonstrates the technical reproducibility of two identical analytical replicates, i.e. based on one biological extract redundant from derivate variation, taken from a MPIMP reproducibility experiment with a total of 29 analysed replicates.

Fig. 2
figure 2

Analyses of technical replicate profiles. a, b compares the reproducibility of all mass spectral features from technical replicates (b) to biological replicates of highly similar rice samples (a, cf. to samples of Figs. 4, 5). The peak-heights (counts) of all aligned acquired mass fragments are plotted. a is limited to 50 counts minimum using the baseline correction integrated in method [M2]. b also processed by [M2] demonstrates the validity of the 50 count cut-off (grey format). c summarises the relative standard deviations (RSDs) of all aligned mass spectral features from an MPIMP experiment comprising 29 technological replicate chromatograms. Note that the population of intense features at 50–60% RSD is caused by reagent contaminations. d demonstrates the expected technological RSDs with regard to choice of peak intensity (count) range as a histogram

A strong impact of the signal-to-noise threshold can be observed in Fig. 2b–d. With increasing fragment intensity from 1 to 106 the technical variability decreases dramatically from approximately 50% down to 5% (based on a minimum of six data points out of 29 replicates). The bi-modal behaviour of quantitative variability observed in Fig. 2c where some of the high intensity fragments show increased relative standard deviation (RSD) can be traced back to the replicate specific concentration of artefact polysiloxanes generated commonly by column bleed or silylation reagents independently of sample composition. In typical metabolite profiling experiments high RSD mass fragments are ignored as these can be identified and removed from further analysis using characteristic mass spectra. Since high RSD artefact mass fragments may impact upon the PCA and ICA of non-targeted fingerprinting studies, routine exclusion prior to statistical analyses is recommended. However, artefact exclusion may not always be necessary, the comparative ICA for this study (Fig. 1) were performed including mass fragments of both artefacts and internal standards and yet reproducible sample classification was obtained.

Chemical stability of derivatives based on different amounts of silylated groups or isomerism from methoxymation is represented in Fig. 3. As the relative quantification of amino acids using GC–MS based profiling has been controversially discussed (Noctor et al. 2007), we used glutamic acid as one example to compare between laboratories. For glutamic acid two major detectable derivatives with two and three silylated groups are plotted in Fig. 3a. Figure 3a shows adequate reproducibility between the different participating laboratories of the META-PHOR ring experiment and their datasets based upon instrument methods [L1] to [L3] (Table 2), and data processing method [M1] (Table 3). Although not easily achievable a stable isotope labelled standard for each metabolite class detected is ultimately advisable to improve precision. It should be noted that glutamic acid can also form not only a four times silylated derivative but may also generate varying amounts of the cyclic pyroglutamic acid during derivatisation and analysis under high temperatures. Much less chemically affected and therefore not a matter of discussion is the stability of glucose derivatives, as shown in Fig. 3b based upon instrument methods [L1] to [L3] (Table 2) and data processing method [M1] (Table 3). Here the derivatives are based on the geometric cis/trans-isomerism of the methoxymated carboxyl-group.

Fig. 3
figure 3

Stability of alternative chemical derivatives. The normalised responses after internal standardisation of alternative glutamate (a) and glucose (b) derivatives are shown. The high agreement of the METAPHOR data [L1], [L2.1], [L2.1 2D], [L3], processed by [M1] is demonstrated. For analysis of the resilient biological matrices or unstable metabolite derivatives, specific stable isotope labelled standards will enhance accuracy. Note that glutamic acid 2TMS was not detectable in [L1]

When comparing the analytical methodologies employed across the ring experiment, unsurprisingly the medium throughput methods (Lisec et al. 2006; Erban et al. 2007) of MPIMP [L3] and LECO [L2.1] were more appropriate for the analysis of the diverse META-PHOR species, than the UMAN method [L1] which was optimised for high-throughput analysis of yeast media (O’Hagan et al. 2005). The research warrants a further comparison of splitless and split injection methodologies for these sample types in future experimentation, although the repeatability of data between the laboratories on the whole was impressive.

To include a measure of the reproducibility of the data mining methods applied to our ring-experiment Fig. 4 was generated. In Fig. 4a the raw data from instrument method [L3] (Table 2) were mined with several data processing methods, comprising the use of peak area evaluation [M7], peak deconvolution [M6], different base-line correction algorithms [M1], [M2], [M3], and [M4], different peak height picking algorithms [M1], [M2], [M3], and [M4], as well as employing restrictions with regard to different expected peak widths [M1] and [M5] (Tables 3, 4). For the 12 rice samples from the META-PHOR ring-experiment the maximum normalised response of the deuterated internal standard succinic-d 4 acid 3TMS is shown in Fig. 4a and c reflecting the alternative data-mining possibilities. In Fig. 4b and c the corresponding information allowing comparison between the laboratories instrumental methods [L1] to [L3] (Table 2) is represented, a method as similar as possible to [M1] was used (Table 3).

Fig. 4
figure 4

Technical reproducibility evaluated by the internal standard, d 4-succinic acid (2TMS). Response data were maximum normalised for comparison of the data-mining methods [M1] to [M7] using exemplary [L3] data (a). b compares maximum normalised d 4-succinic acid (2TMS) response between laboratories [L1] to [L3] using processing method [M1]. The respective standard deviations of each of the previous calculations are reported in (c) with laboratory and method combinations indicated

The highest deviations in reproducibility can be observed in the split mode faster-GC based dataset from UMAN [L1]. However, this variability which became apparent through the internal standard compound can be effectively corrected. When applying the common normalisation method for matrix metabolites the fast-GC based dataset exhibits similar reproducibility to the other methods (Fig. 5). The increased standard deviation observed for instrument method [L2.1 2D] (Table 2) may not be attributed to represent a technological feature of GC×GC–TOF/MS, but is currently the result of the non-optimised fingerprinting of high intensity GC×GC–TOF/MS peaks which are split among several subsequent 2nd dimension modulations. Using the information of Lu et al. (2008) we can now demonstrate the improved quality of data based on peak-picking strategies from TagFinder (Lüdemann et al. 2008) and MetAlign (de Vos et al. 2007; Lommen et al. 2007, 2009).

Fig. 5
figure 5

Comparisons of endogenous metabolite levels using responses normalised to the d 4-succinic acid internal standard. Metabolites were chosen to represent the borderline of potential distinctive features, such as a, b phosphoric acid (3TMS) and c, d aspartic acid (3TMS), as well as clear differences between sample groups, e.g. e, f GABA, 4-aminobutyric acid (3TMS). Variation of processing methods [M1] to [M7] of an identical data set [L3] (a, c, e) is compared to variations between laboratories [L1] to [L3] with processing fixed to [M1] (b, d, f). Abbreviations HNN, KNL and TSN1 represent rice cultivars, numbers encode nitrogen regimes (Sect. 2.1.)

3.4 Reproducibility of data for representative metabolites between variations in GC–EI–TOF/MS analytical methodology and data processing strategy

After normalisation to the internal standard the responses of three representative metabolites were analysed and are visualised in Fig. 5a, c and e (based upon instrument method [L3] and data processing methods [M1] to [M7]) and Fig. 5b, d and f (based upon instrument methods [L1] to [L3] and data processing method [M1]) (Tables 2, 3, 4). Of course, both the error propagation from the internal standard values and the inherent variability of the data processing methods must be kept in mind. In the case of phosphoric acid 3TMS the comparability of data from the different laboratories and data-mining methods is shown in Fig. 5a and b. The metabolite aspartic acids’ corresponding analyte aspartic acid 3TMS is missing in the UMAN dataset [L1] (Table 2), which is possibly due to discrimination of the analyte based on the split-injection of the derivate compared to splitless injections in the other instrument methods [L2] and [L3] (Fig. 5c, d). However, it must be noted that the UMAN [L1] on-column volume was almost 1/10th that of the LECO [L2] and MPIMP [L3] (Table 2) methods (0.11 μl [L1] in comparison to 1 μl [L2] and [L3]).

Gama-Aminobutyric acid (GABA) 3TMS represents a metabolite in the rice experiment showing a specific increase associated with the KNL cultivar (Fig. 5e, f). A lower precision was observed for GABA 3TMS in the UMAN instrument method [L1], GABA is not commonly detected in yeast footprint media and so the method optimisation did not account for it (O’Hagan et al. 2005), however this observation may also result from the split or different injection system employed by UMAN [L1] (Table 2). In the case of GABA 3TMS, the analyte concentration is still above the detection-limit for some conditions but erroneous due to noise in others. Dealing with fragment intensities of a signal-to-noise of 2.0 and higher, results in details within the low level detection region of noisy data remaining in the chromatogram after baseline-correction (Fig. 2).

Caution is also necessary when comparing data from different software-versions or algorithms, e.g. data processing method [M2] used ChromaTOF 2.22 which leaves noise of ~25 units after baseline-correction while data processing method [M1] used ChromaTOF 3.34 which leaves noise of ~100 units. When using ChromaTOF 2.22 the operator defines the smoothing factor manually whereas ChromaTOF 3.34 has the option to select the smoothing factor automatically. The automatic smoothing also takes into account the data acquisition rate to ensure that 18–20 data points are present across the chromatographic peaks. Additionally, standardisation of baseline cutting parameters (above the noise, mid-way and at the noise) in all data processing methods is necessary and must not be over looked. This is important since there is potential for the generation of different results and therefore data cannot be compared without applying the correct standardisation.

3.5 GC×GC offers enhanced resolution and depth of data over conventional GC

For an overview of the complexity of the evaluated rice matrix, and to also assess the resolution and depth of data gained through GC×GC–EI–TOF/MS compared to conventional GC–EI–TOF/MS, data from the same derivate samples obtained on both instruments were compared. Figure 6a is based on LECO instrument method [L2.1] and Fig. 6b on [L2.1 2D] for comparison between 1D and 2D GC–EI–TOF/MS. As can be seen from the two chromatograms represented in Fig. 6a and b, the two dimensional GC×GC–EI–TOF/MS chromatogram shows a significantly greater wealth of information and enhanced level of resolution, which at the current state of automated data pre-processing is not fully accessed. Thus, a strong incentive is given to improve on the development of automated metabolite targeted and non-targeted multi-parallel fingerprinting analyses of these 4-dimensional data rich files.

Fig. 6
figure 6

GC×GC–TOF/MS is expected to enhance routine metabolite profiling. An exemplary GC–TOF/MS Chromatogram (a) of the evaluated rice samples (Figs. 4, 5) is compared to the corresponding GC×GC–TOF/MS analysis (b). Total ion count (TIC) is plotted

3.6 Ring experiment “take homes” and improvements for future laboratory-to-laboratory comparisons

Despite the excellent reproducibility illustrated between the different laboratories analytical methodologies, further improvement could be made by using identical analytical setup and chromatographic methods. The differences between split and splitless injection methods (and to a lesser extent the different injection systems used) and on-column volumes has potentially been shown to influence results. For future assessments of split based GC methods it will be crucial to also test for matrix dependent discrimination effects. The chromatography generated from the melon extracts for all laboratories suffered greatly from monosaccharide overloading. In future it may be of benefit to perform a two-stage GC–EI–TOF/MS analysis of melon, where a whole melon extracts polar phase is used for the analysis of sugars and highly concentrated bulk metabolites, and a second sample is prepared via subjecting the same polar phase to a solid phase extraction (SPE) for the removal of free sugars (Suzuki et al. 2002), prior to being analysed for trace metabolites. Of course analysis of extraction solvents subjected to SPE would also be required to identify artefacts resulting from the process. To further enhance sample stability during future experiments the authors recommend that samples are best sealed dry under inert gas and shipped upon excessive amounts of dry ice. It is also recommended that a minimum of one backup sample set per laboratory be held in storage by the laboratory responsible for extract preparation as a means for testing unexpected artefact laboratory-to-laboratory deviations.

In our hands relative quantification based upon peak area worked impressively well especially for peaks giving high responses. In contrast the comparison of data generated between several laboratories in a short-term experiment for relative quantification based upon peak height was found to be more feasible as automated peak height retrieval is a simple process compared to the required area decomposition of multiple co-eluting metabolic components. Thus peak height will be employed as the future preferred method for META-PHOR experimentation until robust peak area calculations may become available. It must be taken into account that for long term experimental comparisons (months–years) employment of peak area may be more appropriate, since changing consumables such as the injection liners influence peak shape and thus peak height more than peak area.

3.7 Ring experiment precedents from across all disciplines of “omic” research

Precedents for the assessment of inter-laboratory reproducibility from the alternative ‘omic’ fields of proteomics and transcriptomics can be found. In the transcriptomics field, reproducible and highly overlapping results based upon the independent treatment of rats with bromobenzene and microarray analyses have been reported. This was despite the two laboratories using alternative routes of bromobenzene administration and differing in-house constructed microarray chips (Heijne et al. 2003, 2004). Further, in a more recent study a large consortium of transcriptomics laboratories tested standardised operating procedures (SOPs) for the processing and analysis of a common set of sample material, again resulting in highly overlapping datasets (Pennie et al. 2004). Unsurprisingly, the inter-laboratory reproducibility of proteomics is lower than transcriptomics. One study produced three technical replicate 2D gels per each biological sample and reported that variability between the gels was very high to such an extent that statistical analysis could only confirm changes in the levels of 24 proteins, despite having a high number of changes that were not technically reproducible between gels (Heijne et al. 2003). Notwithstanding, good reproducibility has been demonstrated for the MS analysis of proteins and mass fingerprinting of peptide digests (Verhoeckx et al. 2004). It is currently a major and ongoing focus of all three ‘omics’ fields to develop robust and standardised high-throughput operating procedures.

Many previous studies can be found where GC–MS ring experiments have been conducted, however these were not non-targeted metabolomic studies but tended to focus upon the analysis of soils (Karstensen et al. 1998) and water samples (Hoogerbrugge et al. 1999) for the detection of specific contaminants during quality testing. Through non-targeted metabolomics literature searches only one previous study could be found where GC–MS results from two laboratories were compared for biological quality assurance purposes, here the authors did not focus on the inter-laboratory reproducibility, but more on the biological significance of the data (Catchpole et al. 2005; Beckmann et al. 2007). Catchpole, Beckman, and colleagues, performed a comparison of GM potato lines generated from the Désirée cultivar using a combination of flow infusion (FI)MS, LC–MS and GC–MS. However, in that study extracts were prepared independently by different technicians and run on various manufacturers and models of instrument (Catchpole et al. 2005; Beckmann et al. 2007). By contrast, for the present META-PHOR ring experiment a common set of extracts was prepared by a single technician, aliquoted and distributed for parallel runs on the same model of instrument (Agilent 6890N GC with LECO Pegasus III TOF–MS) in different locations, though with different injector systems (Agilent 7673 and CTC CombiPAL).

4 Concluding remarks

In conclusion, the work reported here provides an unbiased assessment of the inter-laboratory repeatability of GC–EI–TOF/MS taking into consideration the different analytical method variants, and the suitability of a range of data processing and statistical analysis routines. The major metabolite features generated in the different META-PHOR laboratories proved to be highly reproducible indicating great promise for the future generation of global metabolomics databases. We suggest that further ring experiments tuned to the specific approaches and properties of fingerprinting and profiling studies be performed to monitor and document the future advances of the ongoing standardisation process in the metabolomic field of qualitative and quantitative food and health related analyses.