A model based on Rock-Eval thermal analysis to quantify the size of the centennially persistent organic carbon pool in temperate soils

. Changes in global soil carbon stocks have considerable potential to influence the course of future climate change. However, a portion of soil organic carbon (SOC) has a very long residence time (> 100 years) and may not contribute 15 signiﬁcantly to terrestrial greenhouse gas emissions during the next century. The size of this persistent SOC reservoir is presumed to be large. Consequently, it is a key parameter required for the initialization of SOC dynamics in ecosystem and Earth system models, but there is considerable uncertainty in the methods used to quantify it. Thermal analysis methods provide cost-effective information on SOC thermal stability that has been shown to be qualitatively related to SOC biogeochemical stability. The objective of this work was to build the first quantitative thermal analysis based model of the 20 size of the centennially persistent SOC pool. We used a unique set of soil samples from four agronomic predicted the size of the centennially persistent SOC pool using Rock-Eval 6 thermal parameters as predictors in the calibration set (pseudo-R² = 0.91, RMSEC = 0.06) and the validation set (R² = 0.91, RMSEP = 0.07). The uncertainty of the predictions obtained using the multivariate regression model was quantified by a Monte Carlo approach that produced conservative 95% prediction intervals across the 30 samples of the validation set. This model based on Rock-Eval 6 thermal analysis can thus be used to predict the proportion of centennially persistent SOC with known uncertainty in new soil 5 samples from similar pedoclimates. Our study strengthens the evidence for a link between the thermal and biogeochemical stability of soil organic matter, and demonstrates that Rock-Eval 6 thermal analysis can be used to quantify the size of the centennially persistent organic carbon pool in temperate soils.

predicted the size of the centennially persistent SOC pool using Rock-Eval 6 thermal parameters as predictors in the calibration set (pseudo-R² = 0.91, RMSEC = 0.06) and the validation set (R² = 0.91, RMSEP = 0.07). The uncertainty of the predictions obtained using the multivariate regression model was quantified by a Monte Carlo approach that produced conservative 95% prediction intervals across the 30 samples of the validation set. This model based on Rock-Eval 6 thermal analysis can thus be used to predict the proportion of centennially persistent SOC with known uncertainty in new soil 5 samples from similar pedoclimates. Our study strengthens the evidence for a link between the thermal and biogeochemical stability of soil organic matter, and demonstrates that Rock-Eval 6 thermal analysis can be used to quantify the size of the centennially persistent organic carbon pool in temperate soils.

Introduction
Soils exert a key regulation of the atmospheric greenhouse gas concentrations on a decadal timescale through the net carbon 10 source and sink status of their organic carbon reservoir (Amundson, 2001;Eglin et al., 2010). However, a portion of the soil organic carbon (SOC) reservoir may not contribute significantly to the net exchange of CO 2 and CH 4 between atmosphere and land during the next century because its residence time exceeds 100 years and its rate of carbon inputs is low (Trumbore, 1997;He et al., 2016). The size of this centennially persistent SOC pool is presumed to be large (i.e. between one and two thirds of total SOC) and dependent on geochemical parameters such as soil texture and mineralogy (Buyanovsky and 15 Wagner, 1998a;Trumbore, 2009;Mills et al., 2014;Mathieu et al., 2015). However, the amount of centennially persistent organic carbon in soils is highly uncertain as it cannot be estimated accurately by current analytical methods (Post and Kwon, 2000;von Lützow et al., 2007;Bruun et al., 2008). Physico-chemical procedures attempting to isolate SOC with high residence time from bulk SOC have proven unsatisfactory because of indications that such fractions are a mixture of ancient and recent SOC (von Lützow et al., 2007;Trumbore, 2009;Lutfalla et al., 2014). Even the well-established radiocarbon 20 ( 14 C) analytical technique cannot precisely determine the size of the centennially persistent SOC pool (Schrumpf and Kaiser, 2015;Menichetti et al., 2016). The importance of better information on the size of the centennially persistent SOC pool has been emphasized recently (International Soil Carbon Initiative, 2011;Bailey et al., in press;Bispo et al., 2017;Harden et al., in press), stressing the need for operational and standardized metrics or proxies to accurately quantify SOC persistent at the centennial timescale. The general lack of information on the size and turnover rate of measurable SOC pools hampers the 25 initialization of SOC pools in dynamic models, questioning their predictions of the evolution of the global SOC reservoir (Falloon and Smith, 2000;Luo et al., 2014;Feng et al., 2016;He et al., 2016;Sanderman et al., 2016). Luo et al. (2016) and He et al. (2016) recently claimed that optimizing parameter estimation with global data sets on SOC pools and fluxes was the highest priority to reduce biases among Earth system models.
During the past decade, thermal stability of organic carbon has been proposed as a good surrogate for its biogeochemical 30 stability in litter and soils (e.g. Rovira et al., 2008;Plante et al., 2009;Gregorich et al., 2015). Several studies using thermal analysis techniques such as thermogravimetry and differential scanning calorimetry during ramped combustion have shown that the fast-cycling SOC pool determined as the amount CO 2 respired in laboratory incubation experiments was thermally labile (Plante et al., 2011;Leifeld and von Lützow, 2014;Campo and Merino, 2016). Recently, studies using thermal analysis under oxidative or inert (pyrolysis) reaction atmosphere coupled with evolved gas analysis have shown a high and positive correlation between the thermally stable SOC and persistent SOC determined using 14 C measurements (Plante et al., 2013), and between thermally stable SOC and mineral-associated SOC isolated by a classical SOC physical fractionation 5 scheme (Saenger et al., 2015). Using long-term bare fallow (LTBF) soils kept free of vegetation for several decades (i.e. with negligible carbon inputs), Barré et al. (2016) recently showed that persistent SOC was low in energy and thermally stable.
While there appears to be strong qualitative links between thermal and biogeochemical stability of SOC, there is to date no established quantitative link between the size of the persistent SOC pool and SOC thermal characteristics.
The objective of this work was to design a reliable, routine method based on a thermal analysis technique (Rock-Eval 6; 10 RE6) to quantify centennially persistent SOC in a range of soils. First, we compiled a set of reference soil samples from four long-term agronomic experiments in Northwestern Europe with long-term bare fallow treatments. The SOC concentration of LTBF treatments can be used to estimate the size of the persistent SOC pool of a particular site, as proposed by Rühlmann (1999) and Barré et al. (2010). Here, we refined estimates of the persistent SOC concentration previously published by Barré et al. (2010) for the four sites used in this study. We then used these values to estimate the proportion of centennially 15 persistent SOC in 118 archived soil samples from LTBF and non-LTBF treatments of these four sites. The last step consisted in analyzing these reference samples using RE6 thermal analysis, and building a multivariate regression model to relate RE6 information on SOC thermal stability and bulk chemistry to the estimated proportion of centennially persistent SOC. In this work, we aimed at delivering a thermal analysis based model with reliable prediction intervals around the predicted values of the size of the centennially persistent SOC pool. We thus had a particular focus on the uncertainty of the estimated 20 proportion of centennially persistent SOC and its propagation in the multivariate regression model.

Reference soil sample set with estimated size of the centennially persistent SOC pool
The reference soil sample set was built using samples from four long-term agronomic experimental sites in Northwestern Europe (Versailles, France, Grignon, France, Rothamsted, United Kingdom, Ultuna, Sweden, Supplementary material S1). 25 Each of the four sites includes a LTBF treatment, with bare fallow durations ranging from 48 years at Grignon to 79 years at Versailles. For all experimental sites, we also included non-LTBF treatments that have increased or maintained their total SOC concentrations over time, or sustained smaller losses than the LTBF treatment. The selected non-LTBF treatments included manure amendments (Versailles), straw or composted straw amendments (Grignon), continuous grassland (Rothamsted), and continuous cropland (Ultuna). Soil samples from each site and treatment have been regularly collected 30 and archived since the initiation of the experiments. A total of 118 topsoil (0-20 to 0-25 cm depth, Supplementary material S1) samples were selected from the archives of LTBF and non-LTBF treatments to build the reference sample set. Samples were selected from two or three field replicate plots in each decade from the initiation of the experiments up to 2007 (Grignon), 2008(Versailles, Rothamsted) or 2009 to obtain a sample set with the widest possible range of proportions of centennially persistent SOC. The non-LTBF treatments and multiple sites also added to the diversity of landuse, climate and parent material. For each sample, total SOC concentration was measured by dry combustion with an elemental analyzer (SOC EA , gC.kg -1 soil) after decarbonatation when necessary according to NF ISO 10694 (1995). 5 Based on the decline in total SOC concentration over the duration of the LTBF treatment, Barré et al. (2010) estimated the concentration of centennially persistent SOC at each site using a Bayesian curve fitting method applied to each LTBF field replicate plot. Here, we refined those site-specific estimates by (i) applying a similar Bayesian curve fitting method on combined SOC concentration data from all LTBF field replicate plots of each site (four field replicate plots for Ultuna and Rothamsted, six field replicate plots for Versailles and Grignon), and (ii) using new SOC concentration data up to 2014 for 10 Rothamsted and 2015 for Ultuna, increasing their LTBF duration to 55 years for Rothamsted and 59 years for Ultuna.
For each site, we assumed the temporal evolution of LTBF SOC concentration, γ(t), followed an exponential decay function: where γ(t) (unit = gC.kg -1 soil) is the LTBF SOC concentration at time t, t (unit = year) is the time under bare fallow, and a, b 15 and c are fitting parameters. Parameter a (unit = gC.kg -1 soil) corresponds to the amplitude of the decay and b (unit = y -1 ) is the characteristic decay rate. The parameter c (unit = gC.kg -1 soil) represents a theoretically inert portion of SOC. We considered this parameter as a site-specific metric of the centennially persistent SOC concentration. We used a Bayesian inference method to compute site-specific estimates of centennially persistent SOC concentration and associated uncertainties (detailed in the Section 2.3.1). 20 The proportion of centennially persistent SOC (CP SOC ) in each soil sample was then calculated as the ratio of the site-specific CP SOC concentration to the total SOC concentration of the sample: where the unit of CP SOC concentration [site] and SOC concentration [sample] is gC.kg -1 soil. The CP SOC proportions of five 25 samples that were slightly above 1 were set to 1. In these calculations, we assumed that at each site, the concentration of CP SOC was the same in the LTBF and non-LTBF treatments and was constant with time. The details related to the estimation of the uncertainty on the CP SOC proportion of each sample are reported in the Section 2.3.2.

Thermal analysis of soil samples by Rock-Eval 6
The 118 soil samples from the reference set were analyzed with a RE6 Turbo device (Vinci Technologies) using the basic 30 set-up for the analysis of soil organic matter (Behar et al., 2001;Disnar et al., 2003). The RE6 technique provided measurements from the sequential pyrolysis and oxidation of ca. 40 mg of finely ground (< 250 µm) soil per sample (Fig. 1).
Volatile hydrocarbon effluents from pyrolysis were detected and quantified with flame ionization detection (FID), while the evolution of CO and CO 2 gases were quantified by infrared detection during both pyrolysis and oxidation stages. Pyrolysis was carried out from 200 °C to 650 °C in a N 2 atmosphere with a heating rate of 30 °C minute -1 , while the oxidation was carried out from 300 °C to 850 °C in a laboratory air atmosphere (with O 2 ) with a heating rate of 20 °C minute -1 . The RE6 technique generated five thermograms per sample (Fig. 1, i.e. volatile hydrocarbon (HC) effluents during pyrolysis, CO 2 5 during pyrolysis, CO 2 during oxidation, CO during pyrolysis, and CO during oxidation). On average, the organic carbon yield of the RE6 analysis was greater than 96.5% of SOC EA for the soils of the reference sample set (SOC RE6 = 0.966 × SOC EA + 0.403, R² = 0.97, n = 118).
For each RE6 thermogram, we determined the temperatures corresponding to each incremental proportion of the amount of gases evolved during the pyrolysis and oxidation stages. Upper temperatures of 850 °C (CO oxidation thermogram), 650 °C 10 (HC pyrolysis thermogram), 611 °C (CO 2 oxidation thermogram) and 560 °C (CO and CO 2 pyrolysis thermograms) were chosen for signal integration (Fig. 1), thereby excluding any interference of soil carbonates (Behar et al., 2001). Thermal decomposition of carbonates was indeed observed beyond 560 °C (CO and CO 2 pyrolysis thermograms) and 611 °C (CO 2 oxidation thermogram) for the site of Grignon (data not shown). For the three pyrolysis thermograms, signal integration started after an isotherm step of 200 s at 200 °C. Finally, we retained 5 of these temperature parameters per thermogram: T 10 , 15 T 30 , T 50 , T 70 , T 90 which respectively represent the temperatures corresponding to the evolution of 10,30,50,70, and 90% of the amount of evolved gases for each sample and for each of the five different thermograms (HC, CO 2 pyrolysis, CO 2 oxidation, CO pyrolysis, CO oxidation).

30
Using the HC pyrolysis thermogram, we determined a parameter reflecting SOC bulk chemistry, the hydrogen index (HI, mgHC.g -1 C), that corresponds to the quantity of pyrolyzed hydrocarbons relative to SOC RE6 . Using the CO and CO 2 pyrolysis thermograms, we determined another parameter reflecting SOC bulk chemistry, the oxygen index (OI RE6 , mgO 2 .g -1 C) corresponding to the oxygen yield as CO and CO 2 during thermal pyrolysis of soil organic matter divided by the total SOC (SOC RE6 ) of the sample. The HI correlates with the elemental H:C atomic ratio of SOC and the OI RE6 correlates with the elemental O:C atomic ratio of SOC (Espitalié et al., 1977).
Overall, we thus calculated for each soil sample a series of 30 RE6 parameters reflecting SOC thermal stability and bulk 5 chemistry to be used in subsequent statistical and modelling analyses.

Bayesian inference of site-specific CP SOC concentrations and uncertainties
At each site, the CP SOC concentration was estimated as the model parameter c of the exponential decay function described in Eq. (1). To estimate the value of this parameter and assess its uncertainty, we sampled the posterior Probability Density Function (PDF) of the model parameters in Eq. (1), which is given by Bayes' theorem as a function of the prior PDF (i.e. what we know before collecting data) and the likelihood (i.e. how likely is it to predict the data given a set of parameters). 15 The posterior PDF is the combination of our prior knowledge and of the information carried by the data, including measurement uncertainties. For a model vector m (containing the parameters a, b and c) and a data vector d of all the measurements of SOC concentrations, the posterior PDF, P(d | m), is P( | ) ∝ P( )P( | ), with P( ) the prior PDF on the model parameters and P( | ) the likelihood.
We chose uniform PDFs for the model parameters, a, b, and c to be as uninformative as possible. We use the Gaussian form 20 of the likelihood function, such as P( | ) ∝ e − 1 2 ( -γ( )) d −1 ( -γ( )) , where t is the vector of all observation times and C d is the data covariance matrix describing the uncertainties on the SOC measurements. We consider a conservative standard deviation for SOC concentration data (0.75 gC.kg -1 soil) estimated by Barré et al. (2010) for the same soils. We use a Metropolis algorithm to draw 3 × 10 4 samples from the posterior PDF with a burning phase of 3.7 × 10 5 steps. We can then derive the mean and standard deviation for the parameter c from the posterior PDF. 25

Estimating the uncertainty of CP SOC proportion in each sample
Based on our assessment of the uncertainties on SOC concentration data and site-specific CP SOC concentrations (see above), we propagated these errors to estimate the uncertainty on the CP SOC proportion in each soil sample. This was estimated by calculating the standard deviation of the CP SOC proportion for each sample as follows: where abbreviation sd stands for standard deviation.

Statistical relationships between RE6 parameters and CP SOC proportion
The reference sample set was randomly split into a calibration set (n = 88 samples) and a validation set (n = 30 samples). The correlations between the 30 RE6 parameters and the CP SOC proportion were assessed with a non-parametric Spearman's rank 5 correlation test on the calibration set (n = 88). A principal component analysis (PCA) of the 30 centered and scaled RE6 parameters was performed for the calibration set to (i) summarize the variance of SOC thermal stability and bulk chemistry on a single factorial map, and (ii) illustrate the correlations among RE6 parameters. Correlations between the CP SOC proportion in calibration soils and their principal component scores were determined using Spearman's rank correlation tests, and its relationships with the 30 RE6 parameters were further illustrated by projecting the CP SOC proportion variable in the 10 PCA correlation plot. The RE6 data of the soils from the validation set were projected on the same PCA factorial map to check that the validation set was representative of the calibration set.

Random forests regression model to predict CP SOC proportion from RE6 parameters
A multivariate regression model was built to relate CP SOC proportion in the reference samples from the calibration set (response vector or dependent variable y, n = 88) to their SOC thermal stability and bulk chemistry, summarized by a matrix 15 of predictor variables (X) made of the 30 centered and scaled RE6 parameters. The non-parametric and non-linear machine learning technique of random forests (RF, Breiman, 2001;Strobl et al., 2009) was used to build this model. The random forests regression model was based on a forest of 1000 diverse regression trees made of splits and nodes. A random forests learning algorithm combines bootstrap resampling and random variable selection. Each of the 1000 regression trees was thus grown on a bootstrapped subset of the calibration set (i.e. containing about two thirds of "in-bag" calibration samples) by 20 randomly sampling 10 out of the 30 RE6 parameters as candidates at each split of the tree, and using a minimum size of terminal tree nodes of five soil samples. The random forests regression model was then used to predict the proportion of CP SOC in the validation set (n = 30), a prediction corresponding to the mean of the predicted values across the 1000 regression trees.
The performance of the random forests regression model for predicting CP SOC proportion was assessed by statistics 25 comparing the RF-predicted vs. reference (estimated) values of the sample set. The performance statistics were calculated on: (i) the "out-of-bag" soil samples of the calibration set and (ii) the soil samples of the validation set. Out-of-bag samples are observations from the calibration set not included in the learning sample set for a specific tree that can be used as a "built-in" test set for calculating its prediction accuracy (Strobl et al., 2009). The performance statistics included the coefficient of determination (pseudo-R² for the calibration set or R² for the validation set) and the root-mean-square error of 30 calibration or prediction (RMSEC for the calibration set or RMSEP for the validation set). The ratio of performance to deviation (RPD) and the bias of the random forests regression model were additionally calculated for the validation set. The relative importance (i.e. ranking) of each of the 30 RE6 parameters for the prediction of the proportion of CP SOC in the RF regression model was computed as the unscaled permutation accuracy (Strobl et al., 2009).

Error propagation in the random forests regression model
Since our objective was to deliver a thermal analysis based model with reliable prediction intervals around the predicted 5 values of the CP SOC proportion, we estimated the prediction uncertainty of the random forests model for new soil samples.
We used a methodology recently published by Coulston et al. (2016) to approximate prediction uncertainty for random forests regression models, and adapted it to explicitly take into account the uncertainty on reference values of CP SOC proportion (Eq. (6)) that were used to build the model (Supplementary material S2). that can be calculated for any soil sample by the random forests regression model, and the squared prediction error (y -ŷ ̅ )² that is only available for the reference sample set. The metric τ was calculated as follows (Coulston et al., 2016): 20 A Monte Carlo approach was used to estimate , the 95 th percentile of all calculated τ values for all out-of-bag observations of the 2000 bootstraps (Supplementary material S2). This ̂ value was such that 95% of the predictions of the CP SOC proportion lie within ̂ × sd(ŷ) of the true value of CP SOC proportion (i.e. 95% prediction intervals). As sd(ŷ), the standard deviation of the predictions of the CP SOC proportion across 1000 regression trees, can be calculated by the random forests 25 regression model for any soil sample, this approach allows the calculation of 95% prediction intervals on any new soil sample for which only X data (30 RE6 parameters) are available. We calculated the 95% prediction intervals (ŷ ̅ ± ̂ × sd(ŷ)) for the validation set (n = 30) to examine whether those intervals included the true (estimated) values of CP SOC proportion.
More details on the procedure to approximate prediction uncertainty for random forests regression models are provided in Coulston et al. (2016). We finally checked how the error on CP SOC proportion propagated into the random forests regression 30 model by (i) comparing the value of ̂ with or without incorporating the uncertainty on reference values of CP SOC proportion in the algorithm, and (ii) by comparing the sizes of the 95% prediction intervals calculated for the validation soil samples with their respective 95% confidence intervals (determined by multiplying their standard deviation calculated in Eq. (6) by 1.96).
The Bayesian inference method was performed with Python 2.7 and the PyMC library (Patil et al., 2010). All other statistical analyses were performed with R v.3.4.3 (R Core Team, 2017) and the factoextra package for running PCA (Kassambara, 2015), the randomForest package for running the random forests regression models (Liaw and Wiener, 2002) and the boot 5 package for bootstrapping (Davison and Hinkley, 1997;Canty and Ripley, 2015).

CP SOC concentration at each site and CP SOC proportion in reference soil samples
The Bayesian inference of the parameter c of the exponential decay function (Eq. (1)) yielded site-specific estimates of the CP SOC concentration with 95% confidence intervals (Eq. (1), Table 1, Fig. 2). Estimated CP SOC concentrations ranged from 10 6.22 gC.kg -1 soil at Versailles to 10.46 gC.kg -1 soil at Rothamsted. The uncertainty on CP SOC concentration was lower at Rothamsted (standard deviation of 0.27 gC.kg -1 soil) and Versailles (standard deviation of 0.31 gC.kg -1 soil) than at Ultuna (standard deviation of 0.88 gC.kg -1 soil) and Grignon (standard deviation of 1.00 gC.kg -1 soil).
Overall, the wide range in total SOC concentrations within and across sites (from 5 to 46 gC.kg -1 soil, Table 1) combined with an assumed constant CP SOC concentration within each site, resulted in a reference sample set with a wide spectrum of 15 CP SOC proportions ranging from 0.14 to 1 (Eq. (2), Table 1 The random splitting of the reference sample set generated calibration and validation sample sets with similar mean values, range of values and standard deviations for both total SOC concentration and CP SOC proportion (Table 1).

Relationships between RE6 parameters and CP SOC proportion
The 30 RE6 parameters showed contrasted correlations with the CP SOC proportion in the calibration set (Table 2). Most RE6 temperature parameters showed positive correlations with the CP SOC proportion, with Spearman's ρ above 0.8 for four of 25 them (the RE6 temperature parameter corresponding to 50% of CO 2 gas evolution during the pyrolysis stage, T 50_CO2_PYR , and the RE6 temperature parameters corresponding to 30%, 50% and 70% of CO 2 gas evolution during the oxidation stage, T 30_CO2_OX , T 50_CO2_OX , T 70_CO2_OX , Table 2).
Conversely, five RE6 temperature parameters showed significant negative correlations with the CP SOC proportion (T 10_HC_PYR , T 10_CO_PYR , T 30_CO_PYR , T 50_CO_PYR , T 70_CO_PYR , Table 2). Out of the three RE6 parameters reflecting a proportion 30 of thermally resistant or labile hydrocarbons, only the TLHC-index showed a weakly significant negative Spearman's ρ with the CP SOC proportion, the I-index and the R-index showing no correlations ( Table 2). The two RE6 parameters reflecting SOC bulk chemistry showed highly significant correlations with the CP SOC proportion (Table 2), the HI being negatively correlated and the OI RE6 being positively correlated.
The PCA of the centered and scaled RE6 parameters illustrates the correlations among those 30 variables in the calibration set (Fig. 3). A continuum of CP SOC proportion values was observed in the reference samples along the first two principal 5 components (Fig. 3A), and projecting the CP SOC proportion in the PCA correlation circle further highlighted the relationships between this variable and the 30 RE6 parameters (Fig. 3B). The CP SOC proportion variable had a strongly negative projected loading score on PC1 (Fig. 3B), as well as negative projected loadings on PC2 (Fig. 3B) and PC3 (data not shown). The scores of the calibration soils on the first three principal components were indeed significantly and negatively correlated with the CP SOC proportion (ρ = -0.71, p-value < 0.001 for PC1, ρ = -0.36, p-value < 0.001 for PC2, ρ = -0.25, p-value < 0.05 for 10 PC3), such that a large part (82%) of the variance of the 30 RE6 parameters was linked to the CP SOC proportion in the calibration set.
The random splitting of the reference sample set generated calibration and validation sample sets with similar RE6 thermal characteristics as illustrated by their similar distribution on the factorial map of the first two principal components of the PCA (Fig. 3A). 15

Performance of the regression model using RE6 parameters to predict CP SOC proportion
The random forests regression model performed very well in predicting the CP SOC proportion in the reference sample set using the 30 RE6 parameters as predictors (Fig. 4) Fig.   4), the uncertainties increased to 0.14 (minimum total width), 0.67 (maximum total width) and 0.35 (mean total width). The thirty 95% prediction intervals for RE6-RF predictions of CP SOC proportion in the validation set all included their respective reference estimation of CP SOC proportion (Fig. 4). 30 Out of the 30 RE6 parameters tested by the random forests model as possible predictor variables of the CP SOC proportion in the calibration set, the RE6 temperature parameters corresponding to 50% of CO 2 gas evolution during the pyrolysis stage (T 50_CO2_PYR ) and to 30% of CO 2 gas evolution during the oxidation stage (T 30_CO2_OX ) showed the highest importance scores (based on permutation accuracy importance calculations, Table 2). The twelve most important RE6 parameters for predicting the CP SOC proportion were temperature parameters calculated on the five different RE6 thermograms ( Table 2). The two RE6 parameters reflecting SOC bulk chemistry (OI RE6 and HI) were of medium importance to predict the CP SOC proportion, while the RE6 parameters reflecting a proportion of thermally resistant or labile hydrocarbons (I-index, R-index and TLHC-index) were of weak importance (Table 2). 5

A unique soil sample set with accurate and contrasted values of CP SOC
Adding new SOC concentration data for Rothamsted (up to 2014) and Ultuna (up to 2015), and combining SOC concentration data from all LTBF field replicate plots of each site decreased the uncertainty on the site-specific estimates of the CP SOC concentration (Fig. 2), compared with the previous estimations published by Barré et al. (2010). Indeed, the total 10 width of the 95% confidence interval around the estimation of the site-specific CP SOC concentration slightly decreased from 1.4 to 1.2 gC.kg -1 soil at Versailles and from 4.96 to 3.92 gC.kg -1 soil at Grignon, and strongly decreased from 7.24 to 3.46 gC.kg -1 soil at Ultuna and from 5.98 to 1.06 gC.kg -1 soil at Rothamsted (Table 1, Fig. 2, Barré et al., 2010). The mean estimated values of the CP SOC concentration were marginally changed at Versailles (6.22 vs. 6.12 gC.kg -1 soil in Barré et al., 2010) and Grignon (7.12 vs. 6.80 gC.kg -1 soil in Barré et al., 2010), but strongly modified (increased) at Ultuna (6.95 vs. 3.90 15 gC.kg -1 soil in Barré et al., 2010) and Rothamsted (10.46 vs. 2.72 gC.kg -1 soil in Barré et al., 2010, Table 1).
Our results obtained under four contrasted pedoclimates of Northwestern Europe indicate a minimum value of 5 gC.kg -1 soil (lowest boundary of the four 95% confidence intervals, Table 1) and a maximum value of 11 gC.kg -1 soil (highest boundary of the four 95% confidence intervals, Table 1) for CP SOC concentration in topsoils (0-20 to 0-25 cm depth). These estimates are close, yet below the CP SOC concentration value of 12 gC.kg -1 soil estimated by Buyanovsky and Wagner (1998b) for the 20 topsoil (0-20 cm depth) of the Sanborn long-term (100 years) agronomic experiment (Columbia, Missouri, USA). Our estimates of topsoil CP SOC concentration are also well below the value of 16 gC.kg -1 soil estimated by Franko and Merbach (2017) in the topsoil (0-30 cm depth) of the long-term (28 years) bare fallow experiment of Bad Lauchstädt (Central Germany). The soil type in Bad Lauchstädt (Haplic Chernozem) and its high concentration of slow-cycling black carbon (estimated at 2.5 gC.kg -1 soil, Brodowski et al., 2007) may explain this difference, as well as the relatively short time period 25 under bare fallow (higher uncertainty on the inferred CP SOC concentration).
Among the wide range of CP SOC proportions (0.14 to 1) of our reference sample set, high values of CP SOC proportions (> 0.6) were obtained only for soils which had been under bare fallow for a long period of time: after several years or decades with negligible C inputs and sustained SOC decomposition (Table 1). Similarly, the low values of CP SOC proportions (< 0.25) of our reference sample set were obtained for soils without vegetation but receiving high amounts of manure amendments at 30 Versailles (Table 1). It could be argued that CP SOC proportion values obtained for bare soils with or without organic matter amendments may not be representative of CP SOC proportions of soils under conventional management practices. However, it is interesting to notice that soils of the reference sample set with vegetation and experiencing classical management practices (grassland at Rothamsted, cropland at Ultuna) also showed a wide range of CP SOC proportions, from 0.25 to 0.56 (Table 1).
Moreover, other studies have shown the high variability of CP SOC proportion in soils. For instance, Falloon et al. (1998) listed a series of published values of CP SOC proportions ranging from 0.13 to 0.59. More recently, Mills et al. (2014) published a large dataset of CP SOC proportions in uncultivated topsoils (ca. 15 cm depth). They estimated CP SOC proportions 5 using a global dataset of topsoil radiocarbon ( 14 C) data and a steady-state SOC turnover model with a fixed mean residence time of 1000 years for persistent SOC. Their estimates of CP SOC proportions varied greatly from 0.03 to 0.98 (mean = 0.48, standard deviation = 0.22, n = 232, soils with inconsistent negative modeled SOC pools values were removed), with significantly higher CP SOC proportions in non-forest than in forest uncultivated ecosystems (Mills et al., 2014).
Overall, those combined results illustrate the wide range of CP SOC concentrations and proportions in topsoils that may 10 depend upon pedoclimate, land-use and management practices. Additionally, these results show the value of LTBF experiments to investigate the long-term dynamics of SOM.

A quantitative link between the long-term biogeochemical stability of SOC and its thermal stability and bulk chemistry
This work strengthens the existence of a link between SOC persistence in ecosystems and its thermal stability, providing 15 evidence of the first quantitative link between thermal and in-situ long-term (> 100 years) biogeochemical SOC stability (Fig. 4). The regression model yields accurate RE6-RF predictions of CP SOC proportions with 95% prediction intervals that fully propagate the uncertainties originating from the calibration set that was used to build the model. Predictions on the validation set illustrate that the error propagation scheme provides highly conservative 95% prediction intervals of the CP SOC proportion in new samples, all intervals including their respective reference estimate of CP SOC proportion (Fig. 4). Despite 20 rather large prediction intervals, the RE6-RF regression model clearly discriminates soils with small CP SOC proportions from samples with large CP SOC proportions (Fig. 4). This model based on RE6 thermal analysis can thus be used to predict the size of the CP SOC pool with known uncertainty in new soil samples from similar pedoclimates and with thermal characteristics (i.e. RE6 predictor variables) similar to those of the reference sample set.
Our results also illustrate the complex relationships between thermal analysis based parameters of SOC stability and the 25 CP SOC proportion. The hypothesis behind the use of SOC thermal stability as a proxy of its biogeochemical stability implies positive correlations between the size of the CP SOC pool and temperature parameters derived from thermal analysis such as the 25 RE6 temperature parameters calculated in this study. Significant positive correlations with the CP SOC proportions were indeed found for the majority (15 out of 25) of RE6 temperature parameters, with very high and positive Spearman's ρ values for some of them (Table 2). This was notably the case of the RE6 temperature parameter corresponding to 50% of 30 CO 2 gas evolution during the oxidation stage, T 50_CO2_OX that had been previously shown to systematically increase with bare fallow duration on the same soils by Barré et al. (2016). This study extends the results of Barré et al. (2016) towards a quantitative link between RE6 temperature parameters and SOC persistence (direct correlations and predictions of the size of the CP SOC pool rather than time under bare fallow treatment). It also extends those results to non-bare fallow soils: bare soils receiving organic amendments (at Grignon and Versailles), cropland soils (Ultuna) and grassland soils (Rothamsted).
Conversely, ten RE6 temperature parameters showed no significant correlation or significant negative correlations with the CP SOC proportion. Weak or negative correlations occurred principally for temperature parameters calculated on thermograms of the pyrolysis stage of the RE6 analysis: for all parameters of the CO thermogram and low temperature parameters of the 5 HC and CO 2 thermograms (Table 2). Negative correlations contradict the above-mentioned hypothesis, with the evolution of a similar proportion of the total amount of gases (HC pyrolysis effluents or CO) occurring at lower temperatures for samples with high CP SOC proportions than for soils with low CP SOC proportions. A possible explanation for this unexpected observation could be that the pyrolysis of SOC in samples with high proportion of CP SOC may undergo an enhanced pyrolysis catalyst effect by soil minerals (Auber, 2009), which are relatively more abundant in those samples generally 10 characterized by low total SOC concentrations.
Despite the fact that three RE6 parameters used here, i.e. the TLHC-index, the I-index, and the R-index, had originally been proposed as useful qualitative metrics of soil carbon dynamics, reflecting a proportion of thermally resistant or labile hydrocarbons (Disnar et al., 2003;Sebag et al., 2006;Saenger et al., 2013Saenger et al., , 2015Sebag et al., 2016), those parameters were weakly correlated (TLHC-index) or not correlated (I-index, R-index) to the CP SOC proportion. Furthermore, they also had a 15 weak importance in the random forests model predictions of the CP SOC proportion ( Table 2). The poor link between those three RE6 parameters and the CP SOC proportion may be explained by the high residence time of CP SOC (> 100 years). Indeed, so far those parameters have been related to the proportion of SOC present in the particulate organic matter fraction (> 50 µm), a SOC pool characterized by a residence time in soils generally below 20 years (Saenger et al., 2015;Soucémarianadin et al., 2018). 20 The two RE6 parameters reflecting SOC bulk chemistry showed highly significant correlations with the CP SOC proportion.
This confirms, and extends to vegetated soils, the observed decreasing trend for HI and increasing trend for OI RE6 (except at Versailles which has high proportions of pyrogenic carbon) with bare fallow duration observed by Barré et al. (2016) on the bare fallow treatments of the same experimental sites. Soils with high proportions of CP SOC are thus characterized by an oxidized and H-depleted organic matter. 25

Perspectives to improve and foster RE6 thermal analysis based predictions of the size of the CP SOC pool
Future developments of this work must extend the Rock-Eval 6 thermal analysis regression model to a wider range of pedoclimates and to other biomes. As sites with LTBF treatments are not widespread, complementing the reference sample set may be achieved by using soils that have different soil forming factors (e.g. climate, parent material) and (i) which are sampled from long-term (> 50 to 100 years) experiments with contrasted SOC inputs, enabling the estimation of their CP SOC 30 concentration (Buyanovsky and Wagner, 1998a;1998b), or (ii) for which the mean SOC age is known from radiocarbon data, enabling the estimation of the size of their persistent SOC pool (Trumbore, 2009;Mills et al., 2014).
Another development of this work will involve elucidating the fundamental mechanisms linking the biogeochemical stability of SOC with its thermal stability (see e.g. Leifeld and von Lützow, 2014). This was beyond the scope of this work, yet it remains an exciting field of research that should be addressed in the future, as highlighted by the unexpected observations discussed in Section 4.2 and by other recent works that found no relationships between the thermal oxidation of SOC between 200 °C and 400 °C and the size of SOC pools with shorter residence times in soils (below or above ca. 18 years, 5 Schiedung et al., 2017).
Overall, this work demonstrates the value of Rock-Eval 6 as a routine method for quantifying the size of the centennially persistent SOC pool with known uncertainty in temperate soils. The relatively low cost of the Rock-Eval 6 technique and the robustness of the thermal analysis regression model makes it possible to apply it to soil monitoring networks across a continuum of scales, as a reliable proxy of SOC persistence. This may be part of the framework proposed by O' Rourke et al. 10 (2015) to better understand SOC processes at the biosphere to biome scales, and should be added to the soil carbon cycling proxies recently listed by Bailey et al. (in press). Mapping persistent SOC at large scales may allow the identification of regional hotspots of centennially persistent SOC that may contribute little to climate change by 2100. It may also provide information on the sustainability of additional SOC storage from soil carbon sequestration strategies such as those promoted by the international 4 per 1000 initiative in agriculture and forestry (https://www.4p1000.org/; Dignac et al., 2017;Minasny 15 et al., 2017;Soussana et al., in press). A global map of persistent SOC based on this empirical RE6 thermal analysis model could also be useful for improving the parameterization of models simulating SOC dynamics (Falloon and Smith, 2000;Luo et al., 2014;He et al., 2016). The integration of large-scale information on the size of SOC kinetic pools may indeed provide an adequate complement to the global data sets on SOC fluxes that are currently under development and restructuration (Hashimoto et al., 2015;Luo et al., 2016;Harden et al., in press). 20    [sample]). Vertical bars represent the propagated errors (95% confidence intervals) on the RE6-RF predicted CP SOC proportion values of the validation sample set, calculated as ŷ ̅ ± ̂ × sd(ŷ) (see Section 2.3.5), with a ̂ value of 2.10 (Fig. S1). Abbreviations: RMSEC, root-mean-square error of calibration; RMSEP, root-mean-square error of prediction; RPD, ratio of performance to deviation; sd, standard deviation. Calibration set (n = 88) 18.0 (5.5, 45.5, 9.5) 0.50 (0.14, 1.00, 0.21) Validation set (n = 30) 16.1 (5.4, 38.8, 7.9) 0.53 (0.16, 1.00, 0.23) All samples (n = 118) 17. 5 (5.4, 45.5, 9.1) 0.51 (0.14, 1.00, 0.21)