A traceability analysis system for model evaluation on land carbon dynamics: design and applications

An increasing number of ecological processes have been incorporated into Earth system models. However, model evaluations usually lag behind the fast development of models, leading to a pervasive simulation uncertainty in key ecological processes, especially the terrestrial carbon (C) cycle. Traceability analysis provides a theoretical basis for tracking and quantifying the structural uncertainty of simulated C storage in models. Thus, a new tool of model evaluation based on the traceability analysis is urgently needed to efficiently diagnose the sources of inter-model variations on the terrestrial C cycle in Earth system models. A new cloud-based model evaluation platform, i.e., the online traceability analysis system for model evaluation (TraceME v1.0), was established. The TraceME was applied to analyze the uncertainties of seven models from the Coupled Model Intercomparison Project (CMIP6). The TraceME can effectively diagnose the key sources of different land C dynamics among CMIIP6 models. For example, the analyses based on TraceME showed that the estimation of global land C storage varied about 2.4 folds across the seven CMIP6 models. Among all models, IPSL-CM6A-LR simulated the lowest land C storage, which mainly resulted from its shortest baseline C residence time. Over the historical period of 1850–2014, gross primary productivity and baseline C residence time were the major uncertainty contributors to the inter-model variation in ecosystem C storage in most land grid cells. TraceME can facilitate model evaluation by identifying sources of model uncertainty and provides a new tool for the next generation of model evaluation.


Introduction
Earth system models are an essential tool for understating and predicting the interactions between ecological processes and environmental changes at the global scale (Eyring et al. 2016a;Bonan and Doney 2018). In the past three decades, the structural complexity of models has been increasing rapidly, which is featured by the incorporation of more and more ecological processes . However, the comprehensive and systematic model evaluations usually lag behind the fast development of Earth system models, leading to a pervasive uncertainty in Earth system models on key ecological processes, especially terrestrial carbon (C) cycle (Friedlingstein et al. 2006;Bonan et al. 2019;Fisher and Koven 2020;Xia et al. 2020). For example, the large uncertainty on global land C sink has been existing in Earth system models since the 3rd assessment report of IPCC (Arora et al. 2020;Zarakas et al. 2020). One key challenge is that how model evaluation can increase its pace to systematically trace the model uncertainty back to the key sources. For the land C cycle in Earth system models, the varied model structure among models (Bonan and Doney 2018), parameterization of C-related processes Luo and Schuur 2020), and external climate forcings (Ahlström et al. 2012;Hoffman et al. 2014) are three major uncertainty contributors. Thus, a traceability analysis tool for efficiently evaluating terrestrial C cycles in Earth system models is useful to accelerate the pace of model inter-comparisons and model-data comparisons as well as their feedbacks to model developments.
A few new analytical tools have been recently developed to facilitate the evaluation of Earth system models, such as the International Land Model Benchmarking (ILAMB) System (Hoffman et al. 2016;Collier et al. 2018), the Earth System Model Evaluation Tool (ESM-ValTool) (Eyring et al. 2016b;Eyring et al. 2020), and the Land surface Verification Toolkit (LVT) (Kumar et al. 2012). The evaluation methods of these tools mainly focus on measuring the biases of a specific predicted variable across models or between models and observations using statistical metrics. For example, the ILAMB system uses a set of statistical methods to construct a data-driven scoring system to benchmark global C cycle models (Collier et al. 2018). These new tools have greatly increased the efficiency of model evaluations for Earth system models (Eyring et al. 2019). Even so, it is still difficult to quantitively trace the structural sources of uncertainty among models. For the terrestrial C cycle, a traceability analysis has been developed to diagnose the inter-model variations in the land C cycle based on its fundamental properties (Xia et al. 2013;Luo et al. 2017). This method provides a traceability framework that can decompose the land C dynamics into a few traceable components, such as net primary productivity (NPP), C residence time, and environmental factors (temperature and precipitation). The traceability analysis has been applied to some local-level model evaluations Rafique et al. 2017). However, it remains unclear whether the traceability analysis is applicable to Earth system models, which simulate the terrestrial C cycle at a global scale. Thus, developing the traceability analysis as an available tool for analyzing Earth system models, especially those who have participated in the Coupled Model Intercomparison Project (CMIP), can effectively facilitate the simulations of the global terrestrial C cycle and its feedbacks to climate change.
The model evaluation process usually consists of three steps: downloading the model output data and archiving them locally, pre-processing the data to be suitable for analyses, and utilizing a dedicated program to finish the evaluation. Both the data volumes of model outputs and data products have been increased rapidly in the recent CMIPs (Overpeck et al. 2011;Stockhause and Lautenschlager 2017). It becomes more and more timeconsuming for the routine evaluation of single or multiple models through downloading, managing, preprocessing, and analyzing the data comprehensively on their local equipment (Bai and Di 2012;Xu et al. 2019). Fortunately, cloud-based technology facilitates the processing of distributed big data and provides user-friendly web interfaces. Such web-based technology has been used in the field of ecological modeling and model evaluation (Abramowitz 2012;Huang et al. 2019). The advantage of the web-based cloud technology can help the researchers to focus on the scientific questions rather than processing the data.
Here, we introduce a new online traceability analysis system for model evaluation (TraceME v1.0), which can be applied to analyze the uncertainty of the terrestrial C cycle in the ongoing CMIP6. The specific aims of this study are (1) describing the design and workflow of Tra-ceME, including the overview of TraceME, the introduction of the traceability analysis method, and the available data; (2) using TraceME to evaluate the performance of seven CMIP6 models in simulating terrestrial C cycle; and (3) discussing the potential applications and the implications of TraceME for the next generation of model evaluation.

Materials and methods
Design of the TraceME TraceME (v1.0) is an online framework for automatically analyzing and evaluating the performance of models using the traceability analysis. It is built on a collaborative analysis framework for distributed gridded environmental data (Collaborative Analysis Framework for Environmental data, CAFE; more details are described in Xu et al. (2019)) with different core functions and focuses. The basic cyberinfrastructure of TraceME consists of one central server (node) and more than one work node (Fig. 1). Work nodes can be set up in different data centers and can archive the data stored in these data centers. The central node is used to archive the descriptive information of each node and the information about the data stored on it and get the task request of users and send it to corresponding work nodes. In each node, it contains the data analysis module and the data management module. The data analysis module includes an analysis launcher, a command executor, and the traceability analytic script, to realize the traceability analysis and output the corresponding analysis results. The data managing module includes the data index submodule and task managing submodule. The data index submodule manages the descriptive information about data (data file name, storage path, and data attributes) stored on each worker node. The task managing module is used for task submission, task dispatching, and task status/results query services on each node.
The web-based technology provides a straightforward way for users to interact with the system through a web browser and the model evaluation process of TraceME runs in the background. Users only need to filter data of interest from the entire system, and the selected data is then packaged into a task and delivered to the assigned work node for data processing, which includes data pre-processing, traceability analysis, and evaluation, and finally, the evaluation results are output and visualized for the users. The scientific workflow is essential for TraceME to realize online automated model evaluation.

Traceability analysis
The core functionality of TraceME is based on the framework of traceability analysis that is developed by Xia et al. (2013). This framework is extended to the transient dynamic by decomposing the C storage dynamics into a three-dimensional parameter space . The latter can be further partitioned into traceable components to track the sources of model uncertainty. In the framework of traceability analysis, terrestrial C storage is at dynamic disequilibrium, which is collectively influenced by internal C-related processes, environmental forces, and their interactions (Luo and Weng 2011). Under given environmental conditions, the C storage of an ecosystem can reach the steady-state, which can be defined as C storage capacity (X C ). In the land C cycle model, we can obtain the X C by spinning up the model to the steady-state (Xia et al. 2012). Because the external forces, such as climate, are never at steady state, so the X C always deviates from the realistic C storage in natural ecosystems. Such deviation or difference between the transient C storage and X C was defined as C storage potential (X P ) ). The positive X P means the potential of an ecosystem to store additional C while the negative X P means the potential to lose C . Hence, the transient C storage of an ecosystem can be determined by X C and X P . Then, X C is jointly determined by ecosystem C input (e.g., net primary production, NPP) and ecosystem C residence time (τ E ). As the net ecosystem C input, NPP is decomposed into gross primary production (GPP) and C use efficiency (CUE). CUE describes the capacity of an ecosystem to effectively absorb C from the atmosphere, which is defined as the ratio of NPP to GPP (DeLucia et al. 2007;Xia et al. 2017). The τ E can be further traced Fig. 1 Schematic overview of TraceME (v1.0). The online collaborative framework of TraceME (v1.0) consists of one central node (Central server) and several work nodes (NODE). Users trigger the tasks of model evaluation through the browser and the tasks can be transferred by the application-programming interface. The work nodes consist of the data managing module, the data analysis module, and the data archiving function. The central node collects all information about the work nodes, the data stored in those nodes, and the information of the tasks to the baseline C residence time ( τ 0 E ) and the environmental scalar (ξ). τ 0 E represents the ecosystem C residence time under optimal environmental conditions, which is usually determined by the preset soil properties and vegetation characteristics in the model (Xia et al. 2013). The ξ is influenced by several factors, such as climate, oxygen concentration, and land cover. The climate is the most common limiting factor in the land C cycle model. In this study, we focus on the effect of climate forcing (i.e., temperature and precipitation) on the ecosystem C residence time. The detail of the traceability analysis method is described in Xia et al. (2013), Luo et al. (2017), and Zhou et al. (2018).
Under the framework of traceability analysis, land C storage is ultimately attributed to its traceable components, which are related to the natural properties expressed by the model (Fig. 2). For example, GPP is the photosynthetic property of vegetation; baseline C residence time is related to the soil attributes ( Fig. 2). To quantify the contributions of these traceable components to the uncertainty of models, we use a hierarchical partitioning method (Chevan and Sutherland 1991) to decompose the uncertainty of simulated C storage dynamics. This method can be used to calculate the independent effect of each explanatory variable (x 1 , x 2 , x 3 … x k ) on a single dependent variable (y). The independent effect of x l (I xl ) means the contribution of x l to the variable y, which is calculated by comparing the fit of all models (2 k possible models) including x l to that lacking x l by the hierarchical partitioning (Chevan and Sutherland 1991;Murray and Conner 2009). In our system, we calculate the variance contribution of the variables using the "hier.part" package in R. Based on the relationships built by traceability, we first calculate the relative contributions of X C and X P to X. Then, the contributions of NPP and τ E to X C are calculated in their logarithmic form: ln(X C ), ln(NPP) and ln(τ E ). Third, the variation contributions of the components of NPP and C residence time are calculated in the same way. Finally, the contributions of these traceable components (GPP, CUE, baseline C residence time, temperature, and precipitation) can be calculated.

CMIP6 and modeling outputs
TraceME (v1.0) can be compatible with any model output that follows the NetCDF Climate and Forecast (CF) Metadata Convention (http://cfconventions.org/), for example, all data from CMIP5 and CMIP6. TraceME (v1.0) is a systematic framework for uncertainty analysis on the terrestrial C cycle for CMIPs. It requires a multivariable dataset to analyze and trace the sources of uncertainty in simulating ecosystem C storage. The time-series data of total ecosystem C storage are needed, which generally consist of living biomass C, litter C, and soil C pools. The timeseries data of NPP, GPP, and forcing data (temperature and precipitation) are also used for further model intercomparisons. In this study, the TraceME (v1.0) used CMIP6 model outputs as examples to describe the workflow of this platform. All data is from seven CMIP6 models (the release data before July 2019) and collected from ESGF (http://esgf.llnl.gov/) as shown in Table 1. Fig. 2 The theoretical framework of traceability analysis. The transient carbon (C) storage dynamic (X) can be decomposed into carbon storage capacity (Xc) and potential (Xp). Then, the net primary productivity (NPP) and ecosystem C residence time (τ E ) can explain the C storage capacity. NPP can be traced to gross primary productivity (GPP) and carbon use efficiency (CUE). τ E can be traced to environmental scalars (ξ) and baseline C residence time (τ 0 E ). These traceable components can be explained by related attributions

Results
Temporal dynamics of land carbon storage in CMIP6 models TraceME (v1.0) provided an automatic traceability analysis for data of temporal interest, which can be used to evaluate the temporal dynamics of land C storage simulated by models. We used seven models that had submitted results in CMIP6 to analyze the uncertainty of these models in simulating historical land C storage from 1850 to 2014. From the results of TraceME, the temporal dynamics of global annual C storage simulated by different models were first calculated (Fig. 3a). The global annual C storage varied greatly among the seven models, ranging from 938.76 ± 11.36 to 2206.76 ± 50.14 Pg C (Fig. 3a). Decomposing the C storage into C storage capacity and potential, the C storage potential ranged considerably from about − 21.66 ± 54.39 to 58.07 ± 57.62 Pg C (Fig. 3a). The C storage capacity of different models in response to external force was also quite different. For example, the lowest simulated C storage capacity was IPSL-CM6A-LR from 1850 to 2014, which was 944 ± 27.14 Pg C, and the other models were from about 1677.57 ± 57.21 to 2263.43 ± 106.61 Pg C (Fig. 3a). To further analyze the uncertainty of C storage capacity, the results of NPP and C residence time reflected the net C input capacity (38.48 ± 2.72 to 68.74 ± 5.88 Pg C year −1 ) and the C turnover time of ecosystem (23.22 ± 1.75 to 56.23 ± 3.10 years) in the models (Figs. 3b, c and Fig. 4a). In detail, the lowest simulated NPP was CESM2 and the shortest C residence time was IPSL-CM6A-LR, while CanESM5 had the largest NPP and C residence time among all models (Figs. 3b, c and Fig. 4a).
GPP and CUE were used to explain the uncertainty sources of NPP simulated by models (Figs. 3d, e and Fig. 4b). The differences of GPP and CUE in different models reflected the model's photosynthetic capacity and C transfer efficiency from the atmosphere to ecosystem biomass. Based on this process, TraceME could quantify the effects of models simulating photosynthesis and respiration on the uncertainty of NPP. For example, NPP simulated by CanESM5 and EC-Earth3-Veg had larger uncertainty, which were 68.74 ± 5.88 and 48.96 ± 2.78 Pg C year −1 respectively during 1850 to 2014, whereas their GPP was similar, which were 132.22 ± 8.18 and 127.72 ± 4.38 Pg C year −1 respectively (Figs. 3b-e and Fig. 4b). Therefore, the uncertainty of NPP between the two models mainly came from CUE (0.52 ± 0.01 and 0.38 ± 0.02, respectively), which was related to autotrophic respiration. Besides, to show the sources of C residence time, the uncertainties of baseline C residence time and environmental scalars were given in TraceME. For example, IPSL-CM6A-LR had the shortest C residence time (23.22 ± 1.75 years) than other models during 1850 to 2014, and compared with external forces, the main reason was that it had the shortest baseline C residence time (18 years) among all models (Figs. 3c, f-i, and Fig. 4c). Hence, the development of IPSL-CM6A-LR was suggested to pay more attention to the preset attributes of soil. Furthermore, the environmental scalar in TraceME here was the global annual scale. Its uncertainty reflected the variability of interannual variation of temperature and precipitation used in each model overall models rather than the direct difference of external forces among models (Figs. 3 f-h and Fig. 4c, d).
Overall, after analyzing the uncertainties of all traceable components, TraceME summarized the variance contributions of the components to the uncertainty of land C storage among models. This framework traced the uncertainty of land C storage to several sources. For example, the variation of land C storage among seven CMIP6 models was mainly from C residence time that contributed 74.8%, while NPP and the C storage potential contributed about 20.7% and 4.5%, respectively (Fig. 5). Comparing all traceable components, the variation in C storage simulated by these models was dominated by baseline C residence time (Fig. 5).
Different spatial distributions of land carbon storage among CMIP6 models TraceME (v1.0) provided the ability to analyze the spatial uncertainty of models. It could trace the sources of the uncertainty of models in simulating C storage at each grid. From the results, the mean spatial pattern of the seven models showed C storage in boreal regions was higher than in other regions (Fig. 6a). However, some models, such as IPSL-CM6A-LR, had no such spatial pattern (Fig. 7a), and the high variability of C storage simulated by these models also appeared in the boreal regions, such as Siberia and northern North America (Fig. 6b). To further research the sources of the uncertainty of models in simulating C storage, TraceME (v1.0) provided the spatial patterns of C storage capacity and C storage potential (Figs. 6 c-f and Fig. 7). According to the traceability framework, the spatial distributions of NPP and C residence time were used to explain the uncertainty of land C storage capacity among models (Fig. 7). From the results of seven CMIP6 models, the distribution of the variation in NPP among these models occurred in the lower latitude region, while the variation of C residence time was mainly distributed in the northern high latitude region ( Fig. 8a and d). Following the workflow of TraceME (v1.0), the uncertainties of global distributions of NPP had a similar pattern to that of GPP (Fig. 8a-c). The distribution of the variation in baseline C residence time was mainly in the northern high latitude region and the Tibetan Plateau (Fig. 8e). To better guide model development, model evaluation needs to provide information on the spatial distribution of the dominant factor influencing the Fig. 3 The time series of annual carbon (C) storage (solid lines) and C storage capacity (the contour lines) (a), and the traceable components: bh for net primary productivity (NPP), C residence time, gross primary productivity (GPP), C use efficiency (CUE), environmental scalars, temperature, and precipitation simulated by seven CMIP6 models, respectively. i The baseline C residence time for each model. The shades in (a) represent the annual variation in C storage potential for models (positive above the solid lines, and negative below the solid lines) simulation of land C storage. TraceME (v1.0) could analyze the variation contributions of all traceable components to land C storage at each grid and offered the spatial pattern of the dominant factor (Fig. 9a). For example, the baseline C residence time and GPP were the major contributors to the global distribution of the variation of simulated C storage by the seven models from CMIP6 (Fig. 9a). Compared to GPP, baseline C residence time dominated the uncertainties of simulated land C storage in northern high latitude, eastern Asian, and the northern part of South America (Fig. 9a).

Spatiotemporal changes in the dominant uncertainty sources of simulated carbon storage in CMIP6 models
Assessing the performances of the model over different periods could provide a more comprehensive understanding of the model's ability to simulate land C storage. For example, the environmental scalars among the seven CMIP6 models had larger variability at the initial state (e.g., from 1850 to 1860) than those at the current state (e.g., 2004 to 2014) (Fig. 3f). It is necessary to examine whether and how the sources of model uncertainty change with time. For example, the dominant contributor to the inter-model variance of global land C storage was baseline C residence time from 1850-1860 to 2004-2014 (Fig. 9b, c). However, the contribution of C storage potential increased from 5.2% over 1850-1860 to 19.1% over 2004-2014 (Fig. 9b, c). In addition, GPP and C residence time were the major contributors to the inter-model variance of ecosystem C storage in most land grid cells (Fig. 9b, c). In the regions at northern high latitudes, GPP was the dominant contributor in more grid cells in the period of 1850-1860 than 2004-2014 (Fig. 9b, c).

Evaluations on the uncertainty source of land C dynamics in CMIP6 models
The increase of model complexity and the rapid expansion of observational data volumes together promote the model evaluation into the next generation (Collier et al. 2018;Eyring et al. 2019;Xia et al. 2020). In our study, we introduce a new model evaluation platform, Tra-ceME (v1.0), which uses traceability analysis and a collaborative cloud-based framework. As the core function of TraceME, the traceability analysis increases the traceability of the model evaluations . Rather than simply comparing the differences in simulated C storage among models, this method can trace and quantify the uncertainties to the traceable ecological  (Figs. 3 and 7). For example, the annual C storage simulated by IPSL-CM6A-LR is much lower than other models, and TraceME can first track it to C storage capacity (Fig. 3a). Further analysis shows that the low estimates of ecosystem C storage capacity on the global scale in IPSL-CM6A-LR are mainly contributed from C residence time, especially the baseline C residence time (Figs. 3 and 4). Thus, TraceME not only shows the structural sources of the disagreement on global land C storage between models but also identifies the key uncertain component for a specific model for further development. Recent studies have highlighted the importance to develop model evaluations to explore and understand the sources of uncertainties in Earth system models (Lovenduski and Bonan 2017;Bonan and Doney 2018;Bonan et al. 2019). For example, the ILAMB package used the variable-to-variable relationships between metrics to benchmark Earth system models. Overall, TraceME gives model evaluation a new way to systematically trace the structural sources of the uncertainties in global C cycle models.

Potential applications of TraceME
An advantage from multi-model intercomparison projects (MIPs) is that model evaluation can provide a multifaceted understanding of a given model by comparing its performance with its older versions or other models (Eyring et al. 2016b). Model evaluation needs to understand whether and how the fidelity of the models in simulating terrestrial C processes increases at different phases of MIP. For example, ESMValTool has been used to analyze whether the emergent constraints on equilibrium climate sensitivity in CMIP5 still hold for CMIP6 (Schlund et al. 2020). ILAMB has benchmarked and intercompared the terrestrial C cycle simulated by CMIP5 and CMIP6 models and presented the results in a detailed assessment report (https://www.ilamb.org/ CMIP5v6/historical/). In our study, we analyzed the spatiotemporal changes in the uncertainty sources of simulated C storage in CMIP6 models at different periods using TraceME (Fig. 9). It also has the potential to research the terrestrial C cycle dynamic at the two phases of CMIP from a traceability perspective. Compared with other tools, it can diagnose whether the source of uncertainty simulated by CMIP6 models has shifted compared to CMIP5, and which processes cause the change. Furthermore, TraceME can provide detailed reports of traceability analysis on the performance of specific models in CMIP5 and CMIP6.
Global C cycle models have incorporated a broad set of terrestrial processes, such as human management and societal impacts (Fisher and Koven 2020). Model evaluation needs to comprehensively diagnose the effect of the new modules on the simulations of C cycle processes (Collier et al. 2018). TraceME has the advantage of traceability to measure which components of the C cycle can be affected by new processes represented in the model. For example, Du et al. (2018) has explored the effect of three different carbon-nitrogen coupling schemes on C storage capacity based on the framework of traceability analysis. Besides, some plant functional traits have been considered in models because of the robust relationship between traits (Wright et al. 2004;Fyllas et al. 2014;Sakschewski et al. 2015;Cui et al. 2020). A traceability framework has been used to analyze the uncertainty of simulated ecosystem productivity by linking different vegetation functional properties . Thus, TraceME can further update its traceable framework to evaluate the effect of some new processes on the performance of models.
Benchmarking analysis is an essential part of model evaluation. Some model evaluation systems (e.g., ILAMB and ESMValTool) have built large datasets of observation data as benchmarks to diagnose the performance of models (Eyring et al. 2020;Collier et al. 2018). The TraceME package can be applied together with those existing tools to offer additional diagnoses on model uncertainty. Recently, more and more observational products have been generated with the improvement of measurement means and algorithm technologies. For example, Wang et al. (2019) have Fig. 5 Variation decomposition of the carbon storage based on annual data from models (CMIP6). The inner-circle indicates the carbon storage is decomposed into carbon storage capacity and carbon storage potential, and their variance contributions. The middle circle represents the carbon storage capacity is decomposed into net primary productivity (NPP) and carbon residence time, and their variance contributions. The outside circle indicates that the NPP is decomposed into gross primary productivity (GPP) and carbon use efficiency (CUE), and carbon residence time is decomposed into baseline carbon residence time and environmental scalars (temperature and precipitation), and their variation contributions to carbon storage constructed a global soil C residence time database and used it to evaluate the simulated mean soil C transit times by Earth system models. Many new global datasets about other ecological processes based on both field measurements (Salunkhe et al. 2018;Li et al. 2019;Zheng et al. 2020;Zhu & Xia 2020;Ustin & Middleton 2021) and manipulative experiments (Song et al. 2019) are greatly valuable for model evaluation. These observational products make it possible for TraceME to develop datasets for evaluating those key processes which have not been incorporated in other tools.

Challenges and future developments of TraceME
Although TraceME (v1.0) provides a traceable and comprehensive system for evaluating global terrestrial C cycle models, some challenges remain in its future development. One challenge is the theoretical development of the traceability analysis in TraceME. The theoretical foundation of the traceability analysis is developed on the internal properties of the land C cycle, which can be described as a matrix equation (Xia et al. 2013;Luo et al. 2017). Some other terrestrial processes, such as nutrient cycles, hydrological processes, and energy fluxes, are difficult to be incorporated into the matrix equation . The second challenge is that it is difficult to obtain observational data for some traceable components in the framework of the traceability analysis, such as baseline C residence time. The third challenge is from the shortcomings of the cyberinfrastructure of the current TraceME. For Fig. 6 The spatial distribution of the mean land carbon storage (a), land carbon storage capacity (c), and potential (e) simulated by seven models from CMIP6 during 1850 to 2014, and the standard deviation of land carbon storage (b), land carbon storage capacity (d), and potential (f) from these models example, the efficiency of the evaluation process of Tra-ceME significantly depends on the performance of the computer where the node of TraceME is located. Moreover, the installation of working nodes requires some specific environment settings in the operating system.
The development of TraceME is ongoing. Many efforts are being made to improve the framework of traceability analysis, to build up the observational datasets for benchmarking analysis, and to improve the infrastructure of the TraceME. In terms of developing the traceability analysis, some works can be considered. For example, recent studies have shown that GPP is jointly controlled by plant phenology and physiology, and it can be decomposed into the CO 2 uptake period (CUP) and the maximal GPP during the CUP that represents a property of plant canopy physiology (Xia et al. 2015;Huang et al. 2018). Both of the phenological and physiological processes are influenced by environmental factors, such as temperature and water availability (Jaworski and Hilszczański 2013;Xie et al. 2015;Piao et al. 2019). Meanwhile, other environmental factors besides temperature and water, such as oxygen and nutrients availability, also affect C residence time (Tian et al. 1999;Wu et al. 2003;Melillo et al. 2011;Van Groenigen et al. 2014;Wieder et al. 2015). These traceable processes and factors still need to be added to the Tra-ceME. On the other hand, the new advances in machine learning methods could be useful to produce datasets for some components in the framework of the traceability analysis. For example, Shi et al. (2020) has used the machine learning method to link the measurements of Fig. 7 The global distribution of the mean of carbon storage and its traceable components simulated by seven CMIP6 models for the historical period 1850-2014. a Carbon storage (kg C m −2 ). b Carbon storage capacity (kg C m −2 ). c Carbon storage potential (kg C m −2 ). d Net primary productivity (NPP, kg C m −2 year −1 ). e Carbon residence time (year). f Gross primary productivity (GPP, kg C m −2 year −1 ). g Carbon use efficiency (CUE). h Baseline carbon residence time (year). i Temperature scalar. j precipitation scalar radiocarbon with environmental factors to get the age distribution of global soil C. Finally, the infrastructure of Tra-ceME is expected to evolve into a more open community for users and developers, so some aspects need to be further improved, such as version-control mechanism, intermediate analytical result, and encryption techniques . Developing an offline package is also one way to make TraceME more effective. Moreover, the databases in TraceME (v1.0) need to be updated in a timely and automated manner, especially since the amount of both observational and modeling data is increasing rapidly ).

Conclusions
We developed an online tool for analyzing and evaluating the performance of CMIP6 models on the land C cycle using a traceability analysis (i.e., TraceME). TraceME can effectively diagnose the source of uncertainty of land C cycle models. As shown in this study, TraceME can accelerate the pace of model evaluation on land C cycle, and its evaluation results can be useful for specific models to further improve their representation of some ecological processes. Overall, new model evaluation tools like TraceME will provide new opportunities to understand the large uncertainty in the complex Earth system models.