Introduction

Phosphorus (P) is an essential nutrient for plant and animal production. Soils contain insufficient P for optimal crop production in many parts of the world, and therefore are amended with P containing fertiliser and animal manure, especially in affluent countries. This practice is discussed nowadays, because the P containing fertilisers are mainly derived from rock phosphate, which is a non-renewable resource and, therefore, finite (Cordell et al. 2009; Edixhoven et al. 2014; Sattari et al. 2012). Furthermore, many soils of agricultural land in affluent countries have been enriched with P, because P application via fertilisers and manures was larger than P withdrawal via harvested biomass (MacDonald et al. 2011). This practice increases the risks of P leaching and run-off losses (Oenema et al. 2005; Sharpley and Rekolainen 1997). These P losses contribute to eutrophication of surface waters and to a loss of biodiversity (Conley et al. 2009; Sawyer 1966).

Agriculture in the Netherlands started to use P fertilisers since the end of the nineteenth century. Many soils have been enriched with P fertilisers, especially since the 1960s, and increasingly with P from animal manure. As a result, the mean soil P status is relatively high (Reijneveld et al. 2010). To reduce P accumulation in soils, the Dutch government introduced P application limits from 1984 onwards. These P application limits have decreased stepwise over time. Initially, these limits related to P application of animal manure only, but from 2006 limits include application of all sources of P. From 2010 onwards, P application limits are related to the soil P status; the lower the soil P status, the higher the permissible P application, and vice versa. Since 2015, balanced P fertilisation is obligatory for soils within the soil P status class “neutral” (Supplementary Table A), a negative balance for soils with P status “high”, and a positive balance for soils with P status “low”. Balanced P fertilization is defined as a mean P-input per hectare that is equal to the mean P-output via harvested biomass (Anonymous 2013). Soil processes like mobilisation, immobilisation or losses are not taken into account in these limits. The dairy sector in the Netherlands uses a significant part (>60 %) of agricultural land to produce roughages, such as grass, grass silage and maize silage and contributes to the net P accumulation in soil (Smit et al. 2015). Moving towards balanced P fertilisation might affect grassland yield and quality, because soil processes might influence the P availability for plant uptake (Frossard et al. 1995; Morgan 1997; Syers et al. 2008). Some studies indicate that the availability of soil P may decrease with balanced P fertilisation, but that herbage production is not affected (Verloop et al. 2010), or that herbage production is affected negatively in the longer term when grazed by sheep (Nguyen et al. 1989). Cornforth and Sinclair (1982) reason that P losses from soil, through e.g. erosion and ‘fixation’, have to be compensated. Janssen and de Willigen (2006), however, reason that, in case of high soil P status, balanced P fertilisation should be enough to maintain the amount of P available for plant uptake and production.

In the short term, balanced P fertilisation is not expected to limit dry matter (DM) production of grass on soils with a sufficient to high soil P status as the response on P application is only found at low soil P status (Van der Paauw 1956). Decreases in P content and P offtake of grass are, however, expected directly after decreasing P fertilisation (Power et al. 2005; Schulte and Herlihy 2007; Swift et al. 1988). In the next 10–20 years decreases of herbage yield might be foreseen, due to the conversion of available soil P into soil P fractions that are less available to plants (Power et al. 2005). In the end, the P content in herbage may decrease to a level that is insufficient to meet the P requirements of lactating and high-yielding dairy cattle (Valk and Sebek 1999). This insufficiency may be addressed by supplementing P via manufactured feed or by accepting a decrease in the production of milk and meat per unit surface area. Supplementing P via feed will increase the P import and thereby may increase the necessity to export manure P (Smit et al. 2015), whereas decreasing milk and meat production per ha may lower farm income.

Grazing is an important factor that influences P flows on grassland. On grazed grassland, herbage P is returned to the soil via excretion of dung and, in case of rotational grazing, in a random pattern over a paddock (Haynes and Williams 1993; Oudshoorn et al. 2008; Petersen et al. 1956). With balanced P fertilisation at field level, grazed grasslands will have a relatively heavy P supply in manure patches (i.e., a positive P balance), whereas areas without manure patches will have a negative P balance. Areas influenced by manure patches will become larger and areas with negative P balance will decrease in course of time, depending on the grazing management (Petersen et al. 1956).

So far, implications of long-term balanced P fertilisation on herbage yield and quality, and on soil P status are not well quantified under grazing conditions. The objective of the long-term field experiment described here, was to examine the effects of balanced P fertilisation and two levels of excessive P fertilisation, on herbage yield and quality, and on soil P status of grazed grasslands.

Materials and methods

Site description

The field experiment was a multi-site experiment conducted on four permanent grassland sites in the Netherlands on different soil types (Soil Atlas of Europe 2005), including a gleyic podzol (sand1; 52°N), a haplic podzol (sand2; 51°N), a calcaric-gleyic fluvisol (young marine clay; 52°N, reclaimed from the sea in 1957, in use since 1973), and a eutric histosol (peat; 52°N) (Table 1). Weather data were collected from meteorological stations on or close to the sites. Annual precipitation on the sites was about 850 mm, and average temperature 10.5 °C. Soils were sampled and characterised in the autumn of 1996, before the start of the experiment in spring 1997. The sandy soils were rather similar, though sand2 had a deeper groundwater level and lower annual precipitation than sand1 (Table 1). The soil P status, measured as capacity parameter P-AL-value, ranged from ‘amply sufficient’ to ‘high’ (Supplementary Table A), which is representative for a large part of grasslands in the Netherlands (Reijneveld et al. 2010). The swards on clay and sand were dominated (about 90 %) by Lolium Perenne L. (LP). On the peat soil the sward was on average 35 % LP, 20 % Poa trivialis L., 15 % Agrostis stolonifera L., 5 % Poa annua, 10 % Alopecurus geniculatus L., 10 % herbs and 5 % other grasses. The botanic composition was measured at irregular intervals and changed during the experiment, e.g. the share of LP decreased on sand and clay and increased on peat. Within sites, however, no differences between plots developed (data not presented), with an exception for clover, which was introduced to half of the plots on sand1 in 2002 (see “Management” section).

Table 1 General information and soil characteristics (soil analyses 0–5 cm in autumn 1996) on experimental sites

Field experimental design and treatments

The experiment had a longitudinal multifactorial design with 15 experimental years as replications in time on four sites and six treatments per site. The six treatments per site were combinations of three P and two N surpluses, resulting in two and three pseudo-replicates for P and N surpluses, respectively. The treatments were randomly assigned to a plot (Supplementary Table B). Treatments per site were without spatial replicates, comparable to other studies that included grazing (Aarons et al. 2015; Common et al. 1991; Li et al. 2009; Oudshoorn et al. 2008). The plot area per site varied from 345 to 375 m2. Fertilisation levels were aimed to implement surpluses of 0, 9, and 18 kg P ha−1 year−1, and 180 and 300 kg N ha−1 year−1, further indicated as P0, P9, and P18, and N180 and N300, respectively. The P9 and P18 levels were based on a study that suggested that a surplus of 10–20 kg P ha−1 year−1 is needed to maintain soil P status of grassland (Oenema and Van Dijk 1994). The surplus of 300 kg N ha−1 was based on the N fertilisation recommendations of the 1990’s, and a surplus of 180 kg N ha−1 was expected at that time to become the legal standard for the 2000s, and is actually close to current practice on intensively managed dairy farms in the Netherlands.

Management

Cattle slurry and mineral P fertiliser were applied annually, in early spring and directly after the third harvest, whereas mineral N fertiliser was applied in early spring and directly after each subsequent harvest (in March–September), except after the annual final harvest. Slurry was applied using shallow injection (disc injector) on clay and sand and narrow band application (sliding feet machine) on peat soil (Huijsmans et al. 1998). Slurry application per season was targeted at balanced P fertilisation and ranged from 35 to 45 m3 ha−1. On peat soil balanced P fertilisation gave too much slurry to apply correctly; therefore slurry application was reduced and supplemented with a small amount (3.5 kg P ha−1) of triple superphosphate (TSP) fertiliser. Also on the P9 and P18 treatments, slurry was supplemented with triple superphosphate fertiliser (water soluble P > 90 %). The N fertiliser replacement value (NRFV) of slurry was calculated based on chemical analyses of slurry samples and the coefficients of the fertiliser recommendations (Anonymous 2014):

$${\text{NFRV}}\; = \;{\text{NFRV}}\;{\text{ammonium-N}} \times {\text{ammonium-N}}\;{\text{in}}\;{\text{slurry}}\; + \;{\text{NFRV}}\;{\text{organic-N}} \times {\text{organic-N in slurry}}$$
(1)

where, NFRV ammonium-N is 64 % for the sliding feet machine and 76 % for the disc injector and NFRV organic-N is 24 % for both application methods. Mineral fertiliser N was applied as calcium ammonium nitrate (CAN, 27 % N). The distribution of N fertilisation over the harvests depended on the planned use (mowing or grazing) of the harvest and was derived from the fertiliser recommendations (Anonymous 2014).

Total N fertilisation averaged (all sites) at 238 kg N ha−1 year−1 for N180 and 370 kg N ha−1 year−1 for N300, whereas total P fertilisation (all sites) averaged at 24 kg P ha−1 year−1 for P0, 34 kg P ha−1 year−1 for P9 and 44 kg P ha−1 year−1 for P18 (Supplementary Table C). Other nutrients such as sulfur, potassium (K), sodium and magnesium were applied in equal amounts to all plots per site, according to the fertiliser recommendations for grassland (Anonymous 2014).

In 2002, the management of the experimental farm on sand1 was converted to organic farming, implying no application of processed N and P fertilisers. Therefore, P fertiliser was applied as ground phosphate rock (Gafsa-phosphate) which is a less effective P fertiliser than triple superphosphate (Scholefield et al. 1999) in the short term, due to a low water soluble P content of 1–2 %. To reach different N levels, white clover was successfully introduced in former N300 plots in 2003 (introduction of clover was not successful in 2002).

In general the first and fourth harvests of all sites were taken for silage. The other harvests were made via rotational grazing by two or three heifers or dry cows for 2–4 days per harvest, dependent on the rate of intake and actual DM yield. Occasionally grassland use was adjusted, for example when soil conditions were too wet to graze, grass was taken for silage later. Rotational grazing alternated with mowing is common practice on dairy farms in The Netherlands (Aarts et al. 1992). All plots of a site were harvested on the same day, which resulted in different growth stages on N180 and N300 plots. The growth stage was estimated on N300 plots and was targeted at stages that are recommended in the Netherlands for the planned use of the harvests. The 1st harvest was at about 3500 kg DM ha−1, the 4th harvest at about 2500 kg DM ha−1 and grazing (2nd, 3rd, 5th, and 6th) at about 1700 kg DM ha−1. Harvests at different sites were not synchronised as growing rates differed between sites. During the first ten years of the experiment, the animals were weighed at least twice during the grazing season to estimate P output in live weight gain. In subsequent years the mean growth rate obtained during the first ten years was used.

Each treatment plot was split into an entrance plot and an experimental plot. Cattle grazed 2–3 days at the entrance plot, which enabled adaptation of excreta to the mineral composition of the grass. Subsequently, they entered the experimental plot. After each grazing, grassland was topped. Topped material was left on the field to decompose.

Sampling and chemical analyses

At each experimental plot, soil samples were collected in late autumn/early winter (November/December) at depths of 0–5, 5–10, 10–20, and 20–30 cm. Here, we present the results of 0–5 and 5–10 cm. To obtain a soil sample that represents the average properties of the experimental plot (345–375 m2), each plot was divided in 20 squares of equal size. In each square, two sampling points were selected randomly. The 40 subsamples were bulked and analysed for (amongst others) P-AL-value as capacity parameter, and Pw-value as an intensity parameter and from 2004 onwards for P-CaCl2-value as newly introduced intensity parameter (Reijneveld et al. 2014; Van Rotterdam et al. 2012). Stored soil samples from 1996, 1997 and 2000 were analysed for P-CaCl2-value as well in 2011. Slurry was sampled from the tank of the application machine before and after application (the tank was filled with more than sufficient slurry) and analysed for dry matter (DM), organic matter, crude ash, ammonium N, total N, total P and total K. At harvest, DM yield was determined by mowing, weighing and sampling the herbage of four strips of 1.5 m by about 5 m (measured afterwards) per plot with a Haldrup ® crop harvester at a cutting height of 5 cm. The exact location of the strips was varied over harvests to avoid adaptation of the sward in the strips to constant mowing. Herbage samples were dried for 48 h at 70 °C prior to the analysis of the N and P contents. Total N and P contents of grass and cattle slurry were determined following digestion with a mixture of sulphuric acid, salicylic acid, hydrogen peroxide and selenium (Novozamsky et al. 1974, 1983). The N concentrations in the digests were measured by means of the indophenol blue method (Novozamsky et al. 1974). The P concentrations in the digests of slurry were measured according to the molybdenum blue method (Murphy and Riley 1962), whereas the P concentrations in grass digests were measured by an inductively coupled plasma atomic emission spectrometer (ICP–AES). The P-AL-value in the soil was determined following extraction (1:20 v/w) with a 0.1 N ammonium lactate/0.4 N acetic acid solution and pH 3.75 (Egnér et al. 1960). Pw was determined by extraction with water at a soil to solution ratio of 1:60 (v/v) (Sissingh 1971). The P-CaCl2-value was determined following extraction (1:10 v/w) with 0.01 M CaCl2 solution and ortho-P in the extraction solution was measured via segmented flow analysis (Houba et al. 1990).

Calculations and statistical analyses

Soil surface balances were calculated as the P and N inputs via fertilisation minus the P and N outputs via mown grass and the P and N retention in the live weight gain of the grazing cattle. The intake and excretion of P and N by cattle during grazing was not measured and considered to be within plot (internal) P and N cycles. The amounts of P and N retained in gained live weight was calculated using contents of 7.4 g P kg−1 and 25.3 g N kg−1 live weight (Heeres-van der Tol 2002).

Herbage observations that were statistically analysed were: annual DM yields, weighted P and N contents, P and N yields. The number of observations was 360: 4 sites × 6 treatments × 15 years. Soil P observations that were statistically analysed were P-AL-, Pw- and P-CaCl2-values, at depths 0–5 and 5–10 cm. The number of observations was 384 per soil layer, as soil P was measured before the start and at the end of the experimental period. Differences between treatments and trends in time in herbage and soil P observations were statistically analysed in a linear regression model with a fixed and a random part. The fixed part comprises the effects of the experimental treatments as explanatory variables, the random part comprises factors that cannot be controlled, quantified or are not of interest but (can) have an effect. By including the random part the residual variance against which the effects of treatments are tested is reduced. The parameters were estimated with the Restricted Maximum Likelihood (ReML) method (Harville 1977), using Genstat (15th edition).

For the herbage results the fixed part comprises: site, actual P and N fertilisation, trend in time expressed in number of experimental years, and their interactions as explanatory variables. The starting year 1997 was assigned a value of 1 for number of years. Though the experiment was designed with P and N surpluses as treatments, these surpluses could not be used as explanatory variables for yields and contents as the surpluses are calculated using yields and contents. Hence, the surpluses are not independent from herbage yield and P and N contents, therefore P and N fertilisation were used. The random part in the herbage models is year (as factor) × site + plot. The variance between individual years in level of the results on a site, due to e.g. annual climate differences, is assigned to the random interaction year × site and therefore not to the residual variance. Results on an individual plot can be different over the whole period compared to the mean level, due to e.g. unnoticed soil conditions of that plot. This systematic variance between plots is assigned to the random factor “plot” and not to the residual variance.

For the soil P results the fixed part comprised the experimental treatments: site, P surplus, N surplus, number of years, and the interactions. The factors P surplus and N surplus were tested in interaction with number of years only (P or N surplus × number of years) because it is expected that P or N surplus influences soil P measurements not in the observations at the start of the experiment when number of years is zero, but influences the trend in time of soil P. The random part of the model was year (as factor) × site × soil layer + plot × soil layer. Similar to the herbage results the variance between years that influence the level of soil P results per layer on all plots within a site similarly, are assigned to the random interaction year × site × soil layer. If the results on an individual plot over the years are systematically higher or lower than on average, this is assigned to the random interaction “plot × soil layer”. With this procedure initial differences in soil P measurements that are systematically found back in all experimental years are assigned to the random term “plot”. If it appears that initial differences are only coincidental, the initial differences are assigned to the residual variance. When all fixed and random terms are fit, the residual variance that remains can be described as the interaction year × plot for both herbage and soil P observations, this interaction makes the model saturated. The residual variance is the error term which is used for assessing if factors and interactions are significant.

For sand1 the transition from conventional to organic management appeared to influence the results of the experiment. Therefore the experimental period was split into a conventional period (1997–2001), indicated as sand1_conv, and an organic period (2002–2011), indicated as sand1_org. The effect of the transition on the results of the experiment was characterised by two factors: the change of P fertiliser from high to low water soluble P and the introduction of clover on the N300 plots. In the models a factor was added for “clover” which had a value 1 for plots with clover (introduced on N300 plots in 2003) and 0 for plots without clover (N180 plots on sand1 and all plots on other sites), instead of N fertilisation since only one N fertilisation level was applied in that period on sand1. In the final model a factor for “low soluble P fertiliser” was added as interaction to the P treatments when this factor proved to be significant and had a value of 1 for sand1 on the positive surpluses from 2002 on and 0 in all other situations.

If the trends in time in effect of P fertilisation or P surplus is different from zero, indicating that differences between treatments in response to P surplus increase or decrease from the first year onwards, a maximum or minimum might be reached in the long(er) term. Therefore the hypothesis was tested if the responses showed a sign of levelling off in time with the interaction: site × (number of years)2 × P fertilisation/surplus.

The initial model for DM yield, P and N content, P and N yield was:

$$\begin{aligned} {\text{Y}} & = {\text{constant}}_{\text{site}} +\upbeta1_{\text{site}} \times {\text{P}}\;{\text{fertilisation}} +\upbeta2_{\text{site}} \times {\text{N}}\;{\text{fertilisation}}\; \\ & \quad + \;\upbeta3_{\text{site}} \times {\text{P}}\;{\text{fertilisation}} \times {\text{N}}\;{\text{fertilisation}} \\ & \quad + \;\upbeta4_{\text{site}} \times {\text{number}}\;{\text{of}}\;{\text{years}}\; + \;\upbeta5_{\text{site}} \times {\text{number}}\;{\text{of}}\;{\text{years}} \times {\text{P}}\;{\text{fertilisation}} \\ & \quad + \;\upbeta6_{\text{site}} \times {\text{number}}\;{\text{of}}\;{\text{years}} \times {\text{N}}\;{\text{fertilisation}}\; \\ & \quad + \;\upbeta7_{\text{site}} \times {\text{number}}\;{\text{of}}\;{\text{years}} \times {\text{P}}\;{\text{fertilisation}} \times {\text{N}}\;{\text{fertilisation}} \\ & \quad + \;\upbeta8\; \times \;{\text{clover}} +\upbeta9_{\text{site}} \times {\text{number}}\;{\text{of}}\;{\text{years}}^{2} \times {\text{P}}\;{\text{fertilisation}} +\upmu_{{{\text{site}} \times {\text{year}} + {\text{plot}}}} +\upvarepsilon_{{{\text{plot}}\times{\text{year}}}} \\ \end{aligned}$$
(2)

where Y is herbage yield, P yield, P content, N content, or N yield; constant is the intercept for site; the β’s are site-specific coefficients: β1 for P fertilisation; β2 for N fertilisation; β3 for the interaction of P fertilisation and N fertilisation; β4 for number of years (overall trend); β5 for the interaction of number of years and P fertilisation (trend in the effect of P fertilisation); β6 for the interaction of number of years and N fertilisation (trend in the effect of N fertilisation); β7 for the interaction of number of years, P fertilisation, and N fertilisation (trend in NxP interaction); β8 for the effect of clover on plots with clover; β9 for the interaction P fertilisation and square of number of years; μ random model; ε residual variance.

The initial model for the soil P parameters was:

$$\begin{aligned} {\text{Y}} & = {\text{constant}}_{{{\text{site}},{\text{soil layer}}}} +\upalpha1_{\text{site,soil layer}} \times {\text{number}}\;{\text{of}}\;{\text{years}} \\ & \quad +\,\upalpha2_{\text{site,soil layer}} \times {\text{number}}\;{\text{of}}\;{\text{years}} \times {\text{P}}\;{\text{surplus}}\; + \;\upalpha3_{\text{site,soil layer}} \times {\text{number}}\;{\text{of}}\;{\text{years}} \times {\text{N}}\;{\text{surplus}} \\ &\quad +\, \upalpha4_{\text{site,soil layer}} \times {\text{number}}\;{\text{of}}\;{\text{years}} \times {\text{P}}\;{\text{surplus}} \times {\text{N}}\;{\text{surplus}} \\ & \quad +\,\upalpha5_{\text{site,soil layer}} \times {\text{number}}\;{\text{of}}\;{\text{years}}^{2} \times {\text{P}}\;{\text{surplus}} \\ &\quad +\,\upalpha6 \times {\text{clover}}\; + \;\upmu_{{{\text{site}} \times {\text{year}} \times {\text{soil layer + plot}} \times {\text{soil layer}}}} +\upvarepsilon_{{{\text{plot}} \times {\text{year}} \times {\text{soil layer}}}} \\ \end{aligned}$$
(3)

where Y is the soil P parameter; constant is the intercept for site and soil layer; the α’s are site and soil layer specific coefficients: α1 for number of years; α2 for the interaction of number of years and P surplus; α3 for the interaction of number of years and N surplus; α4 for the interaction of number of years, P surplus, and N surplus; α5 for the interaction square of number of years and P surplus; α6 for the effect of clover; μ random model; ε residual variance.

In the first step of the statistical analyses, the significance of factors and interactions was tested with an approximate F-test (P ≤ 0.05). The four sites were tested against their individual residual variances per parameter. Non-significant interactions were removed one by one, resulting in a model with the initial main effects and the significant interactions. In the random model it was tested if the initial interactions decreased the residual variance. If not, they were removed. In the second step, a final model was developed by also removing all non-significant main effects, so the model comprised only significant main effects and interactions. This prediction model was applied by using the model-equations and filling in the parameter estimations (α’s and β’s) and the terms (e.g. P fertilisation at P0 and P18) to quantify the effects of the significant treatments and interactions.

Results

DM yield

Mean dry matter (DM) yield responded to P and N fertilisation, but responses differed between sites (Fig. 1, Supplementary Fig. B). The statistical analysis of herbage DM yields resulted in the following model:

$$\begin{aligned} {\text{Dry}}\;{\text{matter}}\;{\text{yield}} &= {\text{constant}}_{\text{site}} +\upbeta1_{\text{site}} \times {\text{P}}\;{\text{fertilisation}}\; + \;\upbeta2_{\text{site}} \times {\text{N}}\;{\text{fertilisation}} \\ & \quad +\upbeta 8 \times {\text{clover}} +\upmu_{{{\text{site}} \times {\text{year}} + {\text{plot}}}} +\upvarepsilon_{{{\text{plot}} \times {\text{year}}}} \\ \end{aligned}$$
(4)
Fig. 1
figure 1

Mean annual dry matter yields at the four sites (sand1, sand2, clay, peat), as function of mean annual phosphorus surpluses in kg P ha−1 (P0, P9 and P18), averaged over two levels of nitrogen input, during the period 1997–2011. Note that the origin of the Y-axis is at 6 Mg ha−1. Vertical line in sand1 indicates transition from conventional to organic

The factors site, P fertilisation, N fertilisation, and clover were included as fixed effects, whereas year × site and plot were included as random effects (4). Not all fixed factors had an effect on all sites (Table 2). The response to P fertilisation was positive for sand1_conv, sand2, and for peat, whereas for sand1_org and clay no response was found. For sand1_conv and sand2 the response was mainly based on the difference between P0 and P18, the difference between P0 and P9 was small on these sites. On the other hand the difference between P9 and P18 was small on peat. Effects did not increase or decrease across years.

Table 2 Results of statistical analyses; Reml estimates (in 1000 kg DM ha−1) of factors affecting DM yield (see Eq. 4); means and Least Significant Difference (in brackets, P ≤ 0.05)

Differences in DM yield between P surpluses were estimated using Eq. (4) and the estimations of factors from the statistical analysis (Table 2). Equation (4), however, requires P fertiliser instead of P surplus. As actual P surpluses not always equalled targeted surpluses (Supplementary Table B), we estimated P fertilisations that corresponded with targeted surpluses 0, 9 and 18 kg P ha−1 for all sites. We, therefore, assumed that a change in P fertilisation on a plot would result in an equal change in P surplus. For sand1_conv, the estimated difference in DM yield compared to 0 kg P surplus ha−1 (i.e. balanced fertilisation) was 396 kg DM ha−1 for P9 and 792 kg DM ha−1 for P18, which equalled 3.3 and 6.6 % of the estimated DM yield on P0. For sand2 the difference was 312 for P9 and 623 kg DM ha−1 for P18, which equalled 3.1 and 6.2 % of the estimated DM yield on P0. For peat the difference was 392 for P9 and 783 kg DM ha−1 for P18, which equalled 3.8 and 7.5 % of the estimated DM yield on P0. The estimated response on peat between P0 and P18 was higher than the measured response (Fig. 1) because the realised difference in P surplus between P0 and P18 was only 15 kg P ha−1 (Supplementary Table B) and was extrapolated to the targeted 18 kg P ha−1.

P content of grass

Mean P content responded to P and N fertilisation (Fig. 2; Supplementary Fig. C). The statistical analysis of total P content resulted in the following model:

$$\begin{aligned} {\text{P}}\;{\text{content}} & = {\text{constant}}_{\text{site}} +\upbeta1_{\text{site}} \times {\text{P}}\;{\text{fertilisation}} +\upbeta2_{\text{site}} \times {\text{N}}\;{\text{fertilisation}} \\ & \quad +\upbeta5_{\text{site}} \times {\text{P}}\;{\text{fertilisation}} \times {\text{number}}\;{\text{of}}\;{\text{years}} +\upbeta8 \times {\text{clover}} +\upmu_{{{\text{site}} \times {\text{year}} + {\text{plot}}}} +\upvarepsilon_{{{\text{plot}} \times {\text{year}}}} \\ \end{aligned}$$
(5)
Fig. 2
figure 2

Weighted mean annual phosphorus contents of herbage at the four sites (sand1, sand2, clay, peat), as function of mean annual phosphorus surpluses in kg P ha−1 (P0, P9 and P18), averaged over two levels of nitrogen input, during the period 1997–2011. Note that the origin of the Y-axis is at 2.0 g P kg−1 DM. Vertical line in sand1 indicates transition from conventional to organic

The effects of site, P fertilisation, N fertilisation, change of effect of P fertilisation over time (P fertilisation × number of years) and clover were fixed effects, and plot and year × site random effects.

P content increased on all sites as a response to P fertilisation, except for sand 1_org (Table 3). On clay the effect of P fertilisation was absent in the beginning and positive after 2 years due to an increase of the response to P fertilisation over time (P fertilisation × number of years). The estimated difference in P content between P0, P9 and P18 was 0.26 and 0.51 g P kg−1 DM on sand2 (7.5 and 15 %), and 0.17 and 0.34 g P kg−1 DM on peat (5 and 10 %). After 15 years the estimated difference on clay was 0.11 and 0.21 g P kg−1 DM (2.5 and 5 %).

Table 3 Results of statistical analyses; Reml estimates (in g P kg−1 DM) of factors affecting P content of grass (see Eq. 5); means and Least Significant Difference (in brackets, P ≤ 0.05)

The response to N fertilisation was negative on clay and tended to be negative on the other sites. This effect is probably caused by dilution, i.e. N fertilisation increased DM production more than P uptake (Whitehead 2000). Also, plots were harvested on the same date, which implies that the physiological status of the grass may have been different between N180 and N300 plots. The interaction between the effects of P and N fertilisation was not proven, possibly due to the limited range of P and N fertilisation.

Total P yield

Mean total P yield responded to P and N fertilisation (Supplementary Figs. A, D). The statistical analysis of total P yield resulted in the following model:

$$\begin{aligned} {\text{P}}\;{\text{yield}} & = {\text{constant}} +\upbeta1_{\text{site}} \times {\text{P}}\;{\text{fertilisation}} +\upbeta2_{\text{site}} \times {\text{N}}\;{\text{fertilisation}} \\ & \quad +\upbeta5_{\text{site}} \times {\text{P}}\;{\text{fertilisation}} \times {\text{number}}\;{\text{of}}\;{\text{years}} +\upmu_{{{\text{site}} \times {\text{year}} + {\text{plot}}}} +\upvarepsilon_{{{\text{plot}} \times {\text{year}}}} \\ \end{aligned}$$
(6)

The factors site, P fertilisation, N fertilisation, and the trend in effect of P fertilisation (P fertilisation × number of years) were fixed factors, and plot and year × site random factors. The response to P fertilisation was positive on sand1_conv, sand2, and on peat (Table 4). The response to N fertilisation was positive on all sites except on sand1_conv and could not be determined on sand1_org. At the peat and clay sites the effect of P fertilisation increased across years (interaction P fertilisation × number of years). The mean estimated difference between 0 kg P surplus ha−1 (balanced fertilisation) and 18 kg P surplus ha−1 after 15 years equalled 5.8 kg P ha−1, on all responsive sites. This was 16 % higher P yield compared to P0. The estimated mean response of P yield to P fertilisation was 0.25 kg P kg−1 P applied on all sites and 0.31 kg P kg−1 P on the responsive sites.

Table 4 Results of statistical analyses; Reml estimates (in kg P ha−1) of factors affecting P yield (see Eq. 6); means and Least Significant Difference (in brackets, P ≤ 0.05)

Total N content and N yield

The responses of N content to N fertilisation (β2) were positive on all sites (Eq. 7, Supplementary Table D). On sand1 the N content with clover was 1.55 g N kg−1 DM higher than without clover.

$${\text{N}}\;{\text{content}} = {\text{constant}}_{\text{site}} +\upbeta2_{\text{site}} \times {\text{N}}\;{\text{fertilisation}} +\upbeta8 \times {\text{clover}} +\upmu_{{{\text{site}} \times {\text{year}} + {\text{plot}}}} +\upvarepsilon_{{{\text{plot}} \times {\text{year}}}}$$
(7)

The responses of N yield to N fertilisation (β2) were positive on all sites (Eq. 8, Supplementary Table E): On sand1 the N yield with clover was 46 kg N ha−1 higher than without clover.

$$\begin{aligned} {\text{N}}\;{\text{yield}} & \; = {\text{constant}}_{\text{site}} +\upbeta1_{\text{site}} \times {\text{P}}\;{\text{fertilisation}}\; +\upbeta2_{\text{site}} \times N\;{\text{fertilisation}} \\ & \quad + \;\upbeta8 \times {\text{clover}} +\upmu_{{{\text{site}} \times {\text{year}} + {\text{plot}}}} +\upvarepsilon_{{{\text{plot}} \times {\text{year}}}} \\ \end{aligned}$$
(8)

The recovery of N in grass to applied N (i.e. β2: increase of N yield by N fertilisation) ranged from 0.4 to 0.5 kg N kg−1 N applied on sand and peat and was 0.8 kg N kg−1 N on clay, in the range of N fertilisation between N180 and N300 (Supplementary Table E). These values are in line with results from other experiments (Vellinga and André 1999; Schils and Snijders 2004). On sand2 and peat, the N yield responded positively to P fertilisation (β1), caused by the positive response of DM yield to P application. The N yield on sand2 was 25 kg N ha−1 and on peat 19 kg N ha−1 higher for P18 than for P0.

P-AL-value

The P-AL-values in the soil layers at 0–5 and 5–10 cm depth were higher at a higher P surplus (Fig. 3; Table 5). The difference between sand1_conv and sand1_org was small. Sand1 was therefore considered as one site in the analysis of P-AL-values. The statistical analysis resulted in the model:

$$\begin{aligned} {\text{P-AL-value}} & = {\text{constant}}_{\text{site,soil layer}} +\upalpha1_{\text{site,soil layer}} \times {\text{number}}\;{\text{of}}\;{\text{years}} \\ & \quad +\upalpha2_{\text{site,soil layer}} \times {\text{P}}\;{\text{surplus}}\; \times \;{\text{number}}\;{\text{of}}\;{\text{years}}\; +\upvarepsilon_{{{\text{site }} \times {\text{ soil layer }} \times {\text{ year }} + {\text{ plot }} \times {\text{ soil layer}}}} +\upmu_{{{\text{plot }} \times {\text{ year }} \times {\text{ soil layer}}}} \\ \end{aligned}$$
(9)
Table 5 Results of statistical analyses; Reml estimates (in mg P kg−1 dry soil) of factors affecting P-AL-value (1997–2011; see Eq. 9); means and Least Significant Difference (in brackets, P ≤ 0.05)
Fig. 3
figure 3

Mean P-AL-values of the topsoil (0–5 cm; left-hand side; and 5–10 cm; right-hand side) at the four sites (sand1, sand2, clay, peat), as function of phosphorus surpluses in kg P ha−1 (P0, P9 and P18), averaged over two levels of nitrogen input, during the period 1997–2011

The effects of site, overall change over time (number of years) and change over time in response to P surplus (P surplus × number of years) were included in the fixed model, whereas plot × soil layer and year × site × soil layer were included in the random model.

With balanced fertilisation, indicated by the value of α1 in Table 5, P-AL-values decreased for sand1 in the layer 0–5 cm and increased for clay in the layer 5–10 cm. The P-AL-values increased with an increase of P surplus. The estimated increase in difference in P-AL-value in the 0–5 cm soil layer between P0 and P9 was 1.6 mg P kg−1 year−1 on sand1, 2.0 mg P kg−1 year−1 on sand2, 2.6 mg P kg−1 year−1 on clay and 3.1 mg P kg−1 year−1 on peat.

Pw-value

The Pw-values in the soil layers at 0–5 and 5–10 cm depth were higher at a higher P surplus (Supplementary Figure E, Supplementary Table F). Sand1 was considered as one site in the analysis of Pw-values. The statistical analysis resulted in the model:

$$\begin{aligned} {\text{Pw-value}}\; & = {\text{constant}}_{\text{site,soil layer}} +\upalpha1_{\text{site,soil layer}} \times {\text{number}}\;{\text{of}}\;{\text{years}} \\ & \quad +\upalpha2_{\text{site,soil layer}} \times {\text{P}}\;{\text{surplus}} \times {\text{number}}\;{\text{of}}\;{\text{year}}s +\upvarepsilon_{{{\text{site }} \times {\text{ soil layer }} \times {\text{ year + plot }} \times {\text{ soil layer}}}} +\upmu_{{{\text{plot }} \times {\text{ year }} \times {\text{ soil layer}}}} \\ \end{aligned}$$
(10)

The effects of site, overall change over time (number of years) and change over time in response to P surplus (P surplus × number of years) were included in the fixed model, whereas plot × soil layer and year × site × soil layer were included in the random model.

With balanced fertilisation, Pw-values increased for clay in the layers 0–5 cm and 5–10 cm (Supplementary Table F). The Pw-values increased as P surplus increased, except on sand1. The estimated increase in Pw-value in the 0–5 cm soil layer between P0 and P9 was 0.21 mg P L−1 year−1 on sand2, 0.35 mg P L−1 year−1 on clay and 0.24 mg P L−1 year−1 on peat.

P-CaCl2-value

The P-CaCl2-value of the soil layers 0–5 and 5–10 cm were analysed for the period 2004–2011 (Fig. 4; Table 6). Only for sand2 a relationship of P-CaCl2-value with P surplus was found, on the other sites no relationship or trend was found. For sand2 the final model for the period 2004–2011 was:

$${\text{P-CaCl}}_{2}{\text{-value}} = {\text{constant}}_{{{\text{soil}} {\text{ layer}}}} +\upalpha1_{\text{soil layer}} \times {\text{P surplus}} +\upmu_{{{\text{site}} \times {\text{year}} \times {\text{soil layer}} + {\text{plot}} \times {\text{soil layer}}}} +\upvarepsilon_{{{\text{plot}} \times {\text{year}} \times {\text{soil layer}}}}$$
(11)
Table 6 Results of statistical analyses; Reml estimates (in mg P kg−1 dry soil) of factors affecting P-CaCl2-value (2004–2011; see Eqs. 11 and 12); means and Least Significant Difference (in brackets, P ≤ 0.05)
Fig. 4
figure 4

Mean P-CaCl2-values of the topsoil (0–5 cm; left-hand side; and 5–10 cm; right-hand side) at the four sites (sand1, sand2, clay, peat), as function of phosphorus surpluses in kg P ha−1 (P0, P9 and P18), averaged over two levels of nitrogen input, during the period 2004–2011. Note that the scale of the Y-axis differs between depths

For sand1, clay and peat the final model for the period 2004–2011 was an estimation of the mean P-CaCl2-value:

$${\text{P-CaCl}}_{2}{\text{-value}} = {\text{constant}}_{{{\text{site}} \times {\text{soil layer}}}} +\upmu_{{{\text{site}} \times {\text{year}} \times {\text{soil layer}} + {\text{plot}} \times {\text{soil layer}}}} +\upvarepsilon_{{{\text{plot}} \times {\text{year}} \times {\text{soil layer}}}}$$
(12)

The P-CaCl2-value differed greatly between years. On sand2, P-CaCl2-value was lowest at P0 and highest at P18. On other sites, no clear differences between P0, P9 and P18 were found in P-CaCl2-values. At all sites P-CaCl2 was relatively high at the start in 1997. Significant trends in P-CaCl2-values from 2004 were absent on all sites, and the variance in P-CaCl2-values was not related to variance in the responses of DM yield or P content (analysis not shown).

General discussion

Experimental design

The experiment included four sites and six treatments per site, but no spatial replicates of treatments within one site. This design, in combination with Restricted Maximum Likelihood (ReML) statistics, enabled a longitudinal analysis of the effects of P and N fertilisation on herbage yield and P content and uptake, and on soil P parameters for the separate sites due to the replications in time. This design increases the risk of not detecting small effects (type II error). The application of a random part of the models to decrease the residual variance and sufficient replications in time, however, decrease this risk.

Herbage response to P fertilisation

During the experimental period of 15 years the DM yield, P content and P yield of herbage on sand and peat sites were lower at balanced P fertilisation than at a surplus P fertilisation of 9 and 18 kg P ha−1 (Tables 2, 3, 4). The relative mean response to 18 kg P ha−1 of DM yield was 6–8 %. The relative mean response of P yield was larger, 8–26 %, indicating a luxury consumption by the herbage. The mean absolute responses to P surpluses were in DM yield 19 kg DM kg−1 P and P yield 0.25 kg P kg−1 P, averaged over all sites. Differences between sites were large. On sand1, the response in DM yield, P content and P yield disappeared when the P fertiliser changed from highly soluble TSP to low water soluble rock phosphate. On the recently (1957) reclaimed marine clay the DM yield did not respond to P surplus, and herbage P content and P yield responded positively only from about the tenth experimental year onwards.

The positive response of DM yield on sand and peat was not expected, because of the high P status of these soils: the response of a crop to P fertilisation decreases when soil P status is higher and is absent at high soil P status due to the law of diminishing returns (Syers et al. 2008; Van der Paauw 1956). A response in P content and P yield was expected, as herbage P content and P yield continue to respond till a higher soil P status than DM yield (Schulte and Herlihy 2007; Swift et al. 1988; Van der Paauw 1956).

The observed responses in DM and P yield to P fertilisation of our experiment were on the upper side of results reported for multiple-year experiments in mown grassland: responses derived from experiments in the United Kingdom and Ireland ranged from 8.0 to 11.4 kg DM kg−1 P applied on soils without a relation to P status which varied from “low” to “high” or was unknown (Adams 1974; Heddle 1967; Paynter and Dampney 1991; Power et al. 2005; Swift et al. 1988). Only when P status was “very low” (Swift et al. 1988) and after ten years of repeated treatments (Power et al. 2005) responses were larger, 20 kg DM kg−1 P. Responses for P yield ranged from 0.04 to 0.20 kg P kg−1 P which was unrelated to soil P status on soils with “very low” to “high” P status (Paynter and Dampney 1991; Power et al. 2005; Swift et al. 1988) and was 0.32 kg P kg−1 P after 10 years of repeated treatments (Power et al. 2005). The observed response of DM yield to P fertilisation of our experiment was more or less in line with results reported for multiple-year experiments in grazed grassland. Reported responses of DM yield range from 0 up to 110 kg DM kg−1 P (Bolland et al. 2011; Bolland and Guthridge 2007; Davison et al. 1997), responses of P yield were not mentioned.

Soil P response to P surplus

At balanced P fertilisation, P-AL-value decreased on sand1 in the layer 0–5 cm and increased on clay in 5–10 cm. Differences in P-AL-values between P surpluses increased in time. The relationship between P surplus and the change in P-AL-value was assumed to be linear. Data analysis indicated no levelling off in time. Similar linear relations between cumulative P surpluses and soil P values were found in other multiple year experiments on grazed and mown grasslands although different soil P tests were used (Jaakkola et al. 1997; Messiga et al. 2015; Robertson and Nash 2008). The change of fertiliser type on sand1 from high to low water soluble P did not change the relation between P surplus and P-AL-value.

Changes in Pw-values in response to P surpluses were comparable to the responses of P-AL-value, though Pw-value is considered to be an intensity parameter and extracts far less P from soil than P-AL-value (Tunney et al. 1997). Changes in P-CaCl2-values in response to P surpluses were absent, apart from sand2 where we found that P-CaCl2-values were different for different P surpluses. The absence of responses in P-CaCl2 over time would indicate that P-CaCl2 is a poor indicator for the differences in P accumulation in the experiment. This corresponds with findings of other experiments (Neyroud and Lischer 2003; Robertson and Nash 2008).

Relationship between soil and herbage response

The responses of DM yield on the sand and peat soils were higher (to a high water soluble P fertiliser) than on the clay soil (Fig. 5). The differences in herbage response between sites seem to be related to the ratio between the amounts of P extracted with P-CaCl2 and with P-AL. The sand and peat soils had relatively high P-AL and relatively low P-CaCl2-values, while the clay soil had relatively high P-AL and relatively high P-CaCl2 values (Tables 5, 6).

Fig. 5
figure 5

Response of dry matter yield to 18 kg P ha−1 surplus, as function of P-CaCl2-value in 0–5 cm, averaged over 2004–2011, estimated with linear model based on yield results of 1997–2011. Surplus created with water soluble P fertiliser (Triple Super Phosphate), except on Sand1_org with poorly water soluble P fertiliser (ground rock Gafsa-phosphate)

The absence of herbage response on sand1 after the change from high to low water soluble P fertiliser indicates that the herbage mainly responded to directly available P in P-CaCl2 or water soluble P and not to an increase of P-AL-value.

Response to N

The response of DM yield to the applied N fertilisation was in line with results from other experiments on similar soil types and sites (Schils and Snijders 2004; Vellinga and Andre 1999). An interaction between N and P fertilisation in DM yield was not found, likely because there were only two N levels. Soil P did not respond to N fertilisation in the final model, where N fertilisation was a variate. In a model where N and P fertilisation levels were factors, soil P was lower at a higher N level. This indicates that N fertilisation decreased P surplus via an increase of P yield and thereby affected soil P indirectly. There was no indication of a residual effect of N fertilisation level on the response of soil P. Messiga et al. (2014) found also that N fertilisation affected soil P indirectly via P surplus.

Grazing

An explanation for the relative large herbage response to P fertilisation is possibly the influence of grazing and the uneven spreading of P by defecation (Haynes and Williams 1993; Nguyen and Goh 1992) because virtually all P is excreted in faeces (Valk and Beynen 2003). The area covered by dung (and thus P) was estimated using a Poisson distribution (Petersen et al. 1956; see additional information). On average, 59 % of the area is not covered by dung and will have a P balance of −255 kg P ha−1 after 15 years (Table 7), 31 % is covered by one dung patch and has a P balance of +221 kg P ha−1, and 10 % is covered by two or more dung patches and has a P balance of +697 kg P ha−1. This is an estimation for the time span of our experiment (the fields were also grazed before the experiment started). Such site-specific differences in P balances within fields affect the response to P fertilisation. The N in a dung patch influences about 5 times its surface (MacDiarmid and Watkin 1972). For P, being a less mobile element, this is probably an overestimation. With the assumption that P from a dung patch influences the grass growth on twice its area (e.g. by lateral roots), 34 % of the surface would suffer from depletion after 15 years (Table 7). Likely, areas with no dung patch have a larger response to P fertilisation than areas with dung. Averaged over the whole area, the response to P fertilisation in the range around balanced fertilisation is therefore expected to be larger on grazed grassland than on mown grassland. Note that balanced P fertilisation on grazed grassland has a lower level due to a lower off-take than on mown grassland.

Table 7 Estimated surface areas (in percent) of the treatments plots that are uncovered, covered with one and covered with two or more dung patches, using the Poisson distribution (Supplementary Information Eq. 12) and the estimated cumulative phosphorus surpluses (see text)

Implications

The results of this experiment indicate that (1) At balanced P fertilisation of grazed grassland there is a risk for a lower herbage yield and P content compared to grazed grassland with surplus P fertilisation, but differences in yield and P content seem not to increase over time. (2) Surplus P fertilisation leads to a build-up of P in the soil, implying a higher risk of P losses to the environment than at balanced P fertilisation.

Conclusions

At balanced P fertilisation annual DM yield, P content, and P yield of grazed grassland were lower than at a surplus of 9 or 18 kg P ha−1 year−1 on sand and peat. Dry matter yield was 6–8 % lower, P content was 5–15 % lower and P yield was 14–26 % lower at balanced P fertilisation than at a surplus of 18 kg P ha−1 year−1. However, differences between P treatments in DM yield, P content, and P yield remained constant over the whole period of 15 years. On the marine clay soil, no differences in DM yield were found between P treatments, but P content in the herbage and P yield were lower with balanced P fertilisation than with surplus P fertilisation. These differences between sites seem to be related to the ratio between the amounts of P extracted with P-CaCl2 and with P-AL. The sand and peat soils had relatively high P-AL and relatively low P-CaCl2 values, while the clay soil had relatively high P-AL and relatively high P-CaCl2 values. Interestingly, P-AL increased over time at the clay soil and decreased at the sand soil with balanced P fertilisation. Soil P status responded to P surpluses and the differences between the surpluses increased in time.

In summary, there is a risk that balanced P fertilisation of grazed grassland reduces herbage yield and P content relative to surplus P fertilisation, even at relatively high soil P status. The risk of yield reduction seems to be related to uneven distribution of dung patches and the ratio between the P intensity indicator and P capacity indicator.