Applicability and improvement of different potential evapotranspiration models in different climate zones of China

Background Accurate estimation of potential evapotranspiration (PET) is the key for studying land-air interaction hydrological processes. Several models are used to estimate the PET based on standardized meteorological data. Although combination-based models have the highest level performance estimation of PET, they require more mete-orological data and may therefore be difficult to apply in areas lacking meteorological observation data. Results The results showed significant differences in the spatial trends of PET calculated by different models in China, the Doorenbots–Pruitts model revealed the highest PET (1902.6 mm), and the Kuzmin model revealed the lowest PET (349.6 mm), with the largest difference being 5.5 times. The Romanenko and the Rohwer models were the recommended temperature-based and aerodynamic-based models. On the other hand, the Abtew model was more suitable for arid and semi-arid regions, while the Priestley–Taylor model was more suitable for humid regions. Combi-nation-based models revealed ideal calculation accuracies, among which the Penman–Monteith model was the best option for PET calculation. Conclusions The accuracy range of Romanenko, Rohwer, Abten, Priestley Taylor, and Penman Monteith models improved in MPZ and TCZ is higher than that improved in TMZ and SMZ. This does not mean that the improved models have higher accuracy in MPZ and TCZ than in TMZ and SMZ. On the contrary, the original model performed poorly in MPZ and TCZ, so the improved accuracy was relatively large. The unimproved model was already more suitable in TMZ and SMZ, so the improved accuracy was relatively small. Therefore, regional calibration of the PET models can improve the accuracy and applicability of PET calculation, providing a reference for studying hydrological processes in different climatic zones.


Introduction
Evapotranspiration (ET) has long been recognized as a major component of the hydrologic cycle.In fact, it is the largest component of the hydrological budget behind precipitation (Gharbia et al. 2018), connecting the water, energy, and carbon cycles.In addition, evapotranspiration plays a key role in the land-atmosphere interaction system (Wang et al. 2011;Li et al. 2021) and has been widely used in extreme weather monitoring and sustainable water resource utilization evaluation (Sheffield et al. 2012;Duethmann and Blöschl 2018).However, accurate quantification of actual evapotranspiration is difficult due to the complex interaction between meteorological factors and underlying surface factors, making other variables often necessary to represent it, such as potential evapotranspiration (PET), reference crop evapotranspiration (ET 0 ), and pan evapotranspiration (Epan) (Zheng et al. 2017;Anabalón and Sharma 2017).PET has been widely used in watershed hydrological models to estimate runoff and aridity assessment (Douglas et al. 2009;Zhou et al. 2020).Moreover, long-term changes of PET have been used in climate-based prediction models to assess the impacts of climate change on ecosystems (Milly and Dunne 2016;Yang et al. 2021).
PET was first proposed by Thornthwaite (1948).It refers to the maximum amount of water loss through the evapotranspiration pathway under optimal conditions.However, for decades, scientists have defined PET in different ways and have even considered PET and ET 0 as the same concept.The use of "potential" by Hargreaves and Samani (1982) and "reference crop" by Hargreaves and Samani (1985) is a typical example.PET can be estimated by numerous models using observed meteorological data.These models are classified into temperature-based models, aerodynamic-based models, radiation-based models, and combination-based models according to different assumptions and input data requirements (Bormann 2011;Zhao et al. 2013;Xiang et al. 2020;Zhou et al. 2020;Yang et al. 2021).Temperature-based PET estimate was first proposed by Thornthwaite (1948) using the Thornthwaite model, then other temperature-based models such as the Baier-Robertson model, Blaney-Criddle model, Kharrufa model, and Oudin model were proposed (Blaney and Criddle 1950;Baier and Robertson 1965;Kharrufa 1985;Oudin et al. 2005).This model category has been extensively applied in PET estimation worldwide due to the fewer and simple data parameters involved and its reliable performance (Xiang et al. 2020;Yang et al. 2021).Aerodynamic-based PET models are derived from the Dalton model.However, several researchers have highlighted highly biased results of this model category as its underlying mechanism is significantly different from the current PET methods (Bormann 2011;Xiang et al. 2020), and PET can only be estimated after adjusting the parameters in the lack of meteorological data (Azhar and Perera 2011;Jakimavičius et al. 2013;Valipour 2014).Radiation-based models reveal the relationship between PET and radiation using empirical equations, such as the Priestley-Taylor equation, which is a simplified form of the Penman-Monteith equation (Jensen and Haise 1965).Although the Priestley-Tayor equation has been widely used in PET estimation worldwide, it may have a false basic premise that large-scale advection is not independent of surface energy balance (McNaughton and Spriggs 1988), resulting in significant variation in the accuracy of the estimate across regions.The combined category model considers both aerodynamics and energy balance, as ET pathway involve both evaporation and motion processes.Penman (1948) proposed the first typical PET combination category equation, considering the effect of temperature and other meteorological factors.In recent decades, numerous researchers have proposed Penman-based equations (Rijtema and Wageningen 1965;Van Bavel 1966;Wright and Jensen 1972;Thom and Oliver 1977), of which the most widely used is the Penman-Monteith equation, which includes a new coefficient representing crop surface roughness (rs) (Monteith 1965).The Food and Agriculture Organization (FAO) later recommended the Penman-Monteith equation as a standardized method for estimating ET 0 (Allen et al. 1998).However, its application is difficult and ambiguous as some scientists use it to estimate PET and others to estimate ET 0 .
Numerous studies have assessed the reliability of PET models under different climates and underlying surface conditions.In fact, most studies have confirmed that the combined model has the best estimation accuracy of PET, followed, respectively, by radiation, aerodynamic and temperature-based models (Singh and Xu 1997;Douglas et al. 2009;Zuo et al. 2009;Bormann 2011;Valipour 2015;Zhou et al. 2020;Yang et al. 2021).However, almost all combined models require several climate variables, including net radiation, wind speed, and soil heat flux density, which are often difficult to obtain, especially in regions where extensive and reliable meteorological data are lacking, explaining the reason for using other model categories.Given the wide variety of PET models, several models have been developed for specific hydrological conditions and evaluated with limited measured data (Tegos et al. 2015), making it difficult to select the best model for estimating PET in a given area.
According to climatic characteristics, the climate in China is classified into five climatic zones, namely the mountain plateau zone (MPZ), temperate continental zone (TCZ), temperate monsoon zone (TMZ), subtropical monsoon zone (SMZ), and tropical monsoon zone (PMZ) (Song et al. 2011).The PMZ was considered as adjacent SMZ, as reported by other studies, due to its small area (Fan et al. 2016;Yang et al. 2021).The SMZ, TMZ, MPZ, and TCZ are humid, semi-humid, semi-arid, and arid regions, respectively (Fan et al. 2016;Feng et al. 2018).Indeed, studies have revealed a significant variation in the PET estimated by different models among the four climatic zones (Zhou et al. 2020;Yang et al. 2021).Therefore, it is necessary to evaluate the reliability of PET models in different climatic zones.
Since PET is a theoretical concept that is difficult to be measured directly (Prudhomme and Williamson 2013), indirect estimation methods have been commonly used, including the lysimeter method (Herrnegger et al. 2012), scintillation method (Xu et al. 2013;Yee et al. 2015), Bowen ratio method (Douglas et al. 2009), eddy covariance method (Li et al. 2016;Zheng et al. 2017), and pan evaporation method (Xu and Singh 2001;Xu et al. 2016).At the watershed scale, some scientists have used hydrological models (Bai et al. 2016) and water balance methods (Lu et al. 2005) to validate the estimated PET.The pan evaporation (Epan) considers the effect of climatic factors in the PET determination, including temperature, radiation, and wind speed.This method is simple to operate and inexpensive.The FAO recommends using a class A Epan as a standard instrument for determining the PET, with a diameter and depth of 1.21 and 25.5 cm, respectively.This class has been commonly used worldwide in recent years and provided satisfactory results (Padmakumari et al. 2013;Azorin-Molina et al. 2015;Liu and Sun 2016).It should be pointed out that the amounts of Epan are not equal to PET rates, but in theory, the two variables should have a significant correlation (Liu et al. 2010).Epan integrates the effects of radiation, humidity, wind and air temperature on wet surface evaporation rates, and so provides measurements of evaporation that are concep-tually similar to PET rates (McVicar et al. 2007;Donohue et al. 2010).Therefore, the Epan was used in this study to validate the PET rate determined by several models.In order to use Epan in the PET model accuracy assessment, we assumed that Epan and PET are highly correlated.This study aims to: (1) use 30 PET models from four categories to estimate the PET values and trends during the 2011-2020 period in different climatic zones of China; (2) recommend the most suitable models of different categories with measured Epan values; (3) calibrate the recommended models of different climatic zones.

Study area
China is located in the East Asia with a territory of more than 9.6 million square kilometers.Its topography is characterized by a three-trapezoidal distribution with highest mountains in the west and lowest plains in the east.The annual precipitation over China varies from 50 mm in the arid northwest to 2000 mm in the humid southeast.The climate is dominated by changes in winter and summer monsoons, which are significantly influenced by ENSO (Ding et al. 2014;Li et al. 2021).In summer, warm and humid south wind blows, however, in winter, mainly cold and dry northerly wind prevails (Ge et al. 2017).

Data collection
Daily meteorological data observed at 699 stations were obtained from the China Meteorological Administration (http:// data.cma.cn/).The climate variables observed include precipitation (mm), air temperature (ºC), relative humidity (%), atmospheric pressure (hPa), wind speed at 10 m height (m/s), and sunshine duration (h).The data used in this study were from 699 meteorological stations, including 87, 119, 195, and 298 in the MPZ, TCZ, TMZ, and SMZ, respectively (Fig. 1).The average elevation of meteorological stations, Epan evaporation, annual mean precipitation, air temperature, and aridity index obtained over the 2001-2020 period in the four climatic zones are reported in Additional file 1: Table S1).The missing data were interpolated based on linear regression relationships with those of neighboring stations.These meteorological stations were spatially well-distributed, thus reflecting the general characteristics of the regional climate in China.Meteorological data were transformed into raster data using Anusplin spatial interpolation software, covering entire China's area, with a resolution of 1 km.Radiation, vapor pressure, and other meteorological variables, required in the PET calculation by the various methods, were estimated following the procedure described in FAO-56 (Allen et al. 1998).
The saturation vapor pressure and actual vapor pressure were calculated using the following equations: where e s is the saturation vapor pressure (kPa); T is the air temperature (℃); e a is the actual vapor pressure (kPa); RH is the relative humidity (%).
The radiations are given by: where R a is extraterrestrial radiation (MJ•m −2 •day −1 ); R s is solar or shortwave radiation; R so is clear-sky solar radiation (1) e s = 0.6108 exp 17.27T T + 237.3 (2) e a = (RH × e s )/100

Mann-Kendal trend test
We used the Mann-Kendal (MK) test to analyze the changing trend of PET calculated by 30 models from 2001 to 2020, the MK test is a non-parametric test method proposed by Mann and Kendall (Mann 1945;Kendall 1948).This test is currently recommended by the World Meteorological Organization (WMO) and has been widely used.The MK statistical test can be used for assessing linear and non-linear trends, with tolerance for outliers.Indeed, this test is suitable for hydrological data that does not meet the characteristics of normal distribution.The test statistic S in the MK test can be used to assess the hydrological or meteorological sequence trends (Zandt and Owens 1986) and can be calculated according to the following equations: The significance of the trend is calculated using the test statistic Z, as follows: where Var(S) is the variance of S; Z > 0 and Z < 0 indicate increasing and decreasing trends, respectively.|Z|≥ 1.96 and |Z|≥ 2.58 are the critical values of the 95 and 99% significance levels, respectively.

Selected PET models
A total of 30 PET models were selected in this work, including 8 temperature-based methods, 7 aerodynamic methods, 11 radiation-based methods, and 4 (10 Table 1 The 30 potential evapotranspiration (PET) models selected for this study PET is the potential evapotranspiration (mm•day −1 ); u 2 and u 8 are wind speed at 2 and 8 m height (m•s −1 ), respectively; e s and e a are saturation vapor pressure and actual vapor pressure (kPa), respectively; T a , T max and T min are average, maximum, and minimum daily air temperature (°C), respectively, °F unit for the Jensen-Haise and Stephens-Stewart equations; k is the monthly consumptive use coefficient; p is the percentage of total daytime hours for the period used (daily or monthly) out of total daytime hours of the year (365 × 12); RH is the relative humidity (%); A is a constant ( A = 6.75 × 10 −7 H 3 − 7.71 × 10 −5 H 2 + 1.792 × 10 −2 H + 0.49 ); Δ is the slope of the vapor pressure curve (kPa•°C −1 ); R n , R a , R s are net, extraterrestrial, and incident solar radiation, respectively (MJ•m −2 •day −1 ); G is the soil heat flux density (MJ•m −2 •day −1 ), which can be neglected at a daily time step; r is the roughness coefficient; γ is a psychrometric constant (kPa•°C −1 ); λ is the latent heat of vaporization (2.45 MJ•kg −1 ); ρ is the water density (kg•m −3 )

Category
No.

Methods Equation References
Temperature  Ta +273 µ2(es−ea) �+γ (1+0.34µ2)(Zhao et al. 2013 Allen (1986) combination methods (Table 1).These models were selected not only because of the theoretical differences in their calculation formulas, but also because they represent the full range of key input meteorological variables (e.g., radiation, air temperature, relative humidity and wind speed) to be processed.

Evaluation criteria
Statistical indices were used for quantitative analysis of the PET modelling performance.The PET values calculated by the 30 models and were compared to Epan using a series of statistical criteria as follows: where R 2 , MAE, RMSE and NSE are the coefficient of determination, mean absolute error (mm), root mean square error (mm), and Nash-Sutcliffe efficiency, respectively.n is the number of statistical days, and cov and σ are the covariance and standard deviation, respectively.The coefficient of determination R 2 was chosen as the representative evaluation index, and the other criteria included the NSE, MAE and RMSE were used as reference evaluation indices.The statistical analysis was conducted both for the overall time period and for each of the 12 months.All analyses were performed using R (4.1.0)software.

Model calibration
The best method for each climatic zone was modified to increase precision of estimating by calibration.The calibration method is similar to the study by Xu and Singh (2000), which changes the model that needs improvement to five forms with constants (Table 2), and then uses the PET benchmark (PETm) as a reference value to calibrate the model's constants. ( a and b are variables to be estimated.

Spatial and temporal variation of potential evapotranspiration in China calculated using 30 models
In total, 30 models were used to calculate the average PET in different climatic zones, using the interpolated raster data from the 699 weather stations observed over the 2011-2020 period, followed by MK test analysis to assess the PET trends.The results showed significant variations in the average PET between the PET models.In addition, considerable differences in temporal and spatial changes of the PET were observed between the climatic zones of China.The maximum PET values in all climatic zones of China were obtained using Doorenbos-Pruitt's model and were 1959.8mm (MPZ), 1965.2 mm (TCZ), 1860.4 mm (SMZ), and 1825.2 mm (TMZ), while the lowest PET values were obtained using the Kuzmin model (375.8 and 339.4 The PET trends determined using the MK test over the 2001-2020 period for 30 models showed high spatial heterogeneity.Most models revealed a significant upward trend in PET in MPZ, while few radiation-based models showed a downward trend in the MPZ (Fig. 3).The models used showed considerable spatial differences in the PET trends across the climatic zones of China, suggesting that model evaluation is needed.

Comprehensive comparison of temperature-based models
In this study, the monthly mean PET values across China calculated by the temperature-based models, namely  4), suggesting better applicability of temperature-based models in humid regions.
On the other hand, the monthly PET rates of the 8 temperature-based models revealed different monthly and spatial trends in the R 2 values with Epan (Fig. 5).Indeed, significant R 2 values were obtained using the Oudin, Thornthwaite, Kharrufa, and McCloud models, it was smaller than the other models, and the variation in R 2 between months was irregular, indicating that these three models may not be suitable for the calculation of PET in China.The R 2 obtained using the Baier-Robertson, Romanenko, and Schendel models revealed similar trends.Moreover, the monthly PET and Epan observed in warm months showed higher R 2 than those observed in cold months, indicating that these three models are more appropriate for calculating the PET in warm seasons in China.Indeed, the results suggested that the Blaney-Criddle model is more suitable for calculating the PET in cold seasons in China.The daily average temperature is an important input variable for these models, and seasonal temperature differences may explain the variation in R 2 between months.
The monthly R 2 of the Romanenko model was higher without considerable variations between months as compared to those of other models (maximum R 2 of 0.90 in the TMZ).By considering the four climatic zones, the results showed a higher R 2 value of the Romanenko model than those of other models, with a maximum R 2 value observed in SMZ.Therefore, the results suggested that the Romanenko model may be the most suitable temperature-based model, especially in the humid regions of China.Zuo et al. (2009) used a temperature-based model to simulate PET in arid regions of Northwest China and found a high correlation coefficient (r = 0.97) of the corrected Romanenko model, with a conversion factor value with the data measured by a small evaporating pan of 0.6, significantly better than those of other models.However, some studies may reach different conclusions due

Comprehensive comparison of aerodynamic-based models
The monthly average PET values across China calculated based on the aerodynamic Albrecht, Brockamp-Wenner, Harbeck, Kuzmin, Mahringer, Rohwer, and Trabert models were 85.24, 90.1, 44.87, 29.72, 54.34, 73.32, and 30.79 mm, respectively.The Rohwer model showed a higher R 2 value (0.73) than those of other models in all climatic zones (Fig. 6).In addition, except for the Harbeck and Mahringer models, the R 2 values of aerodynamic-based models observed in TCZ were higher than those observed in other regions.On the other hand, the monthly PET of the aerodynamic models showed similar variations (Fig. 7), with high R 2 values in winter and   7), the wind speed at 8 m, considered an input variable in these three models, is calculated using the wind speed profile relationship in

Comprehensive comparison of radiation-based models
The monthly average PET values across China calculated by the Abtew, Christiansen, Doorenbots-Pruitts, Hargreaves, Jensen-Haise, Makkink, Milly-Dunne, Stephens,and Truc models were 100.30,151.73,159.52,83.38,29.02,153.93,82.68,82.96,56.42,34.39,and 36.97 mm.In addition, the Abtew and the Stephens-Stewart models revealed the highest R 2 values in TCZ, while the Priestley-Taylor model showed the highest R 2 values in SMZ, TCZ, and TMZ (Fig. 8).On the other hand, from a monthly perspective, the 11 radiation-based models showed similar seasonal changes, that is, showing an increase in spring, with maximum and minimum values in summer and winter, respectively.The seasonal changes in temperature and radiation explain the seasonal changes of R 2 values since these parameters are the principal input variables in radiation-based models (Fig. 9).Compared with models based on temperature, aerodynamic, and combination, none of the 11 radiation-based models outperformed the others (Fig. 9).Therefore, the following monthly comprehensive comparison was conducted (Fig. 10) to determine which model performed best in which month.The formula used by Abtew to calculate PET is similar to the Christiansen, Doorenbos-Pruitt, and Hargreens (Table 1), with the main influencing parameters being incident solar radiation and psychrometric constant, and Abtew was more suitable than the others.The selection of Stephens-Stewart model and Stephens model is also Although the results showed a lack of significant difference in the PET calculation accuracy of the 11 radiationbased models, it can be highlighted that the Abtew model is more suitable for arid regions (TCZ), while the Priestley-Taylor model is more suitable for humid and semihumid regions (SMZ, TMZ).However, both PET models should be used with caution in colder months.Unlike the temperature-and aerodynamic-based models, radiationbased models have not shown a high calculation accuracy of any particular model.However, numerous studies have recommended the Abtew and Priestley-Taylor models, among radiation-based models, for calculating PET (Douglas et al. 2009;Valipour 2015), while other studies have reached different conclusions about the calculation accuracy of radiation-based models due to several reasons, including model selection, study area characteristics, year, variables used in the validation step, evaluation criteria, and evaluation scales (Xu and Singh 2002;Zuo et al. 2009;Bormann 2011;Yang et al. 2021).

Comprehensive comparison of combination-based models
The mean monthly PET values across China calculated by Penman,Rijtema,80.59,76.17Penman-Monteith models were better than those of the Rijtema and Wright-Jensen models in entire China (Figs. 11,12).
Unlike the temperature-, radiation-, and aerodynamicbased models, the trend change of the monthly average R 2 obtained using combination-based models did not show considerable seasonal changes, which may be due to the higher number of input meteorological variables as compared to other model category.However, the mean R 2 values of the Penman and Wright-Jensen models were slightly lower in cold months than in warm months (Fig. 12).The Rijtema model did not show a higher R 2 than those of other models for all months and climatic zones.In TCZ, the Penman-Monteith model showed higher R 2 values in all months and more than 6 months in other climatic zones of China.The Penman model revealed the highest R 2 values in July, August, and October in SMZ and in July and August in TMZ, while the Wright-Jensen model showed the highest R 2 in the March-June period in MPZ and in May in SMZ (Fig. 12).
The Penman-Monteith model is an improved version of the Penman model.Indeed, the results of the current study showed that the calculation accuracy of the Penman-Monteith model was higher than those of the Penman, Wright-Jensen, and Rijtema models.The latter revealed the lowest calculation accuracy.Many studies have indicated that the Penman-Monteith model has good accuracy in the PET calculation and even serves as a reference model for MODIS products (Chen and Liu 2020;Cheng et al. 2021).

Comprehensive comparison of the four PET model categories
The ranking results of the mean R 2 values demonstrated high calculation accuracies of the combination model category in the climatic zones of China, followed respectively by the radiation-, aerodynamic-, and temperaturebased models.The average R 2 values of the combined model category were relatively close (between 0.62 and 0.88) compared with those of other model categories, indicating that this type of model has a relatively high level of stability.In addition, the stability of the model is also reflected in two other aspects.One is that the R 2 of the combination model has a small standard deviation, which is significantly lower than the standards of other types.Secondly, the R 2 of the combination model is also very close in different climatic zone (Fig. 13).During the overall time period, the four climatic zones ranked as follows in terms of the mean of R 2 between Epan and the PET calculated by the combination and aerodynamic methods: TCZ > TMZ > SMZ > MPZ.The corresponding rankings for the radiation-based were TCZ > SMZ > TMZ > MPZ and for the temperature-based methods were TCZ > TMZ > MPZ > SMZ.
PET is driven primarily by four key meteorological variables, including radiation, wind speed, temperature and vapor pressure (Donohue et al. 2010).The combination methods considered all four influencing factors.The results of the current study demonstrated that the calculation accuracies of the combined model category were higher than those of the other model categories, followed by radiation-based models, which is consistent with results reported in several studies (Xu and Singh 2005;Donohue et al. 2010;Lakatos et al. 2020;Yang et al. 2021).This result may be due to the ignorance of one or multiple variable types in the PET calculation.In fact, the temperature-and radiation-based methods ignore the aerodynamic variables, while the aerodynamic-based models ignore the energy term (Shiri 2018;Shiri et al. 2019).In this study, the radiation based model outperformed the aerodynamic-and temperature based models, while the temperature-based model revealed the poorest calculation accuracies.Some authors showed similar results to those found in the current study (Xu and Singh 2001;Zuo et al. 2009;Valipour 2015;Yang et al. 2021), while others like Xu and Singh (2002) revealed inconsistent results, which may be due to the selection of the best model from each category for the cross-comparison and the difference between the chosen comparison method and the variables used for validation.In this study, except for the combined model category, the mean R 2 values of the other three model categories were not considerably different in different climatic zones.The differences in the mean R 2 values between PET models were found particularly within the same category of models.Therefore, the selection of different models of the same category for PET calculation may result in a biased evaluation of models, thus, further comparison of the best-selected models in each group is required to provide conclusive results, as reported in the next section.In the previous sections, we analyzed the correlations between the PET calculated by the models and the measured Epan, the actual values of the calculated PET presented in the subsequent part of this section.

Comprehensive comparison of the
The equally weighted mean values of the five selected models were taken as the PET benchmark (PETm) (Fig. 14).By considering the four climate zones, the PET values of the Romanenko model were higher than PETm in MPZ and TCZ and lower than PETm in SMZ and TMZ.The PET values of the Rohwer and Priestley-Taylor models were lower than PETm in MPZ, SMZ, and TMZ, and higher than PETm in TCZ.In addition, the PET values of the Penman-Monteith model were very close to PETm in all months, confirming the accuracy and applicability of the Penman-Monteith model in four climatic zones.It should be noted that the net radiation (clear-sky extraterrestrial radiation) was calculated in this study according to the FAO-56 method (As + bs, where As = 0.25, bs = 0.5) (Allen et al. 1998).Indeed, the MPZ is characterized by higher radiation levels as compared to other climatic zones in China due to the existence of high-altitude areas (e.g., the Qinghai-Tibet Plateau) (Chen et al. 2004;Zhang et al. 2010).In addition, the monthly average PET value calculated by the Abtew model was significantly higher than the PETm, suggesting an overestimation of the PET in alpine regions in actual operation by this model.While different model categories can be expected to exhibit different sensitivities due to the difference in the climatic variables used, the different sensitivities of PET models within the same category The variables (Table 2) of Romanenko, Rohwer, Abtew, Priestley-Taylor, and Penman-Monteith models can be calibrated according to the four climatic zones if the mean PET value (PETm) of the five models is used as a benchmark.Table 2 shows the original and calibrated coefficients of the above five models in four climate zones.The PET values of models were closer to PETm values after the empirical coefficient correction (Fig. 15, Table 2), indicating higher PET accuracy (higher R 2 and NSE, lower MAE and RMSE).The Romanenko model, as the optimal temperature model, has significantly larger coefficient adjustments in the TCZ and MPZ than in the SMZ and TMZ, suggesting that the Romanenko model is not well suited for high latitudes and more arid regions.In addition, the adjustment of the empirical coefficients of the Rohwer model revealed the highest improvement of the PET values (Table 2 and Fig. 15), aerodynamic models were developed according to specific characteristics of regions.Indeed, the variation in the climate and underlying surface conditions may result in erroneous results.Thus, this model category can be applied in areas with the same conditions as the regions in which they were developed (Shiri 2018;Yang et al. 2021).Many studies have confirmed that the Abtew and Priestley-Taylor models are relatively accurate radiative category models (Singh and Xu 1997;Valipour 2015), and this viewpoint is also reflected in our study.Through the correction results, Abtew model has the largest adjustment amplitude in MPZ, while Priestley-Taylor model has the largest adjustment amplitude in TCZ, indicating that these two models have differences in their applicable spatial ranges, In the future, these two models can be selected according to the specific climate zone.The combination methods considered all meteorological factors and the original Penman model could calculate PET well, so the calibrated Penman model did not progress as much as other models.From the four climate zones, the adjustment amplitude of the five models in SMZ and TMZ is smaller than that in MPZ and TCZ, indicating that these humid areas have higher calculation accuracy in calculating PET through the models.

Integrated discussion
The PET for different climate zones in China over the past 20 years, calculated using 30 different types of models, exhibit significant spatial heterogeneity in both numerical values and trends.This indicates the necessity of assessing the applicability of different models in various climatic regions.Combination-based models, which comprehensively consider temperature, wind speed, radiation, and vapor pressure, demonstrate superior simulation performance compared to other types of models.This has been corroborated by numerous studies (Xu and Singh 2001;Zhou et al. 2020;Yang et al. 2021).Zuo et al. (2009) simulated PET in the arid region of Northwest China using a temperature-based model.They found that the corrected Romanenko model achieved a correlation coefficient of 0.97, with a conversion factor of 0.6 when compared to data from small evaporating pans.This performance was significantly superior to other temperature-based models, consistent with our research findings.In China, the spring season experiences higher wind speeds, peaking in April, while wind speeds reach their lowest point in August.As wind speed is a crucial input variable for aerodynamic models, the R 2 performance of aerodynamic models exhibits similar seasonal variations to wind speed.The Rohwer model is the only aerodynamic model we recommend based on our findings.Singh and Xu (1997) evaluated the calculation accuracy of 13 aerodynamic-based models at four climatic stations in Northwestern Ontario in Canada and revealed the highest accuracy of the Rohwer model for calculating PET, which is consistent with the results of this study.Radiation-based models did not show a clear preference for one specific model over others.This is in contrast to the temperature and aerodynamic models mentioned earlier, where a specific model demonstrated superior performance.Radiation-based models do not estimate accurately the effect of wind speed increasing on air resistance, thus resulting in underestimation of the PET.Indeed, the application of this model category is recommended in areas with low wind speed (Irmak and Irmak 2008;Xiang et al. 2020), In our study, based on climate zones, the Abtew model appears to be more suitable for arid and semi-arid regions (TCZ, MPZ), while the Priestley-Taylor model is better suited for relatively humid areas (SMZ, TMZ).However, caution is advised when using either model during cold months.
Calibrating the optimal model within each type significantly enhances the precision of PET simulations.Based on the calibration results, aerodynamic models require the most substantial adjustments.The calibrated Rohwer model performs exceptionally well in regions with higher wind speeds (TCZ), with an R 2 exceeding 0.9-the only region where this model achieves such high accuracy.Although both Abtew and Priestley-Taylor radiation models exhibit good performance in simulating PET, there are differences in their applicable spatial ranges.In the future, the choice between these two models may depend on specific climate conditions.The PM model, as the optimal model among composite models and recommended by FAO, demonstrates superior performance among the 30 models studied and it is applicable to most regions in China.

Conclusions
After spatially interpolating meteorological data from 699 meteorological stations in four climatic zones of China from 2001 to 2020, 30 PET models were used to calculate PET in China.The PET values calculated varied significantly among the different PET models used, both across entire China and in different climatic zones.The Doorenbots-Pruitts model revealed the highest PET value (1902.6 mm), and the Kuzmin model revealed the lowest PET value (349.6 mm).In addition, the PET models used revealed considerable spatial heterogeneity.
The combined model category showed the highest calculation accuracy of the monthly mean PET in different climatic zones, followed respectively by radiation-, aerodynamic-, and temperature-based models.Based on the results obtained, the Romanenko and Rohwer were the recommended temperature-and aerodynamic-based models, respectively, while the remaining temperatureand aerodynamic-based models may not be suitable for PET calculations in the climatic zones of China.On the other hand, the Abtew model and the Priestley-Taylor radiation-based models outperformed other radiation models.Moreover, the Abtew model was more suitable for arid and semi-arid regions, while the Priestley-Taylor model was more suitable for humid regions.However, these radiation-based models should be used with caution in areas with low temperatures.In addition to the Rijtema model, the calculation accuracies of the combination-based models were relatively ideal, with the Penman-Monteith model being the best option for PET calculation.
The empirical coefficients of the optimal models of each category were calibrated using their average PET values.The results show an improvement in accuracy for all models, but there is a significant difference in the magnitude of the improvement over the different climatic zones, suggesting that the applicability of the different models varies considerably.In addition, from the perspective of the four climatic zones, the calculation accuracy range of Romanenko, Rohwer, Abten, Priestley Taylor, and Penman Monteith models improved in MPZ and TCZ is higher than that improved in TMZ and SMZ.This does not mean that the improved models have higher accuracy in MPZ and TCZ than in TMZ and SMZ.On the contrary, the original model performed poorly in MPZ and TCZ, so the improved accuracy was relatively large.The unimproved model was already more suitable in TMZ and SMZ, so the improved accuracy was relatively small.Therefore, regional calibration of the PET models can improve the accuracy and applicability of PET calculation, providing a reference for studying hydrological processes in different climatic zones.

Fig. 2
Fig. 2 Area-averaged annual potential evapotranspiration (2001-2020) calculated by 30 models and measured Epan evaporation in different climatic zones of China

Fig. 3
Fig. 3 MK trend results of average annual PET (2001-2020) calculated by different models.The blue and red colors indicate significant upward and downward trends in PET, respectively

Fig. 4
Fig. 4 Comparison of calculated and measured monthly PET (2001-2020) determined by 8 temperature-based models and Epan, respectively.Orange, green, blue, and purple line colors correspond to trend lines and PET scattered points in MPZ, SMZ, TCZ, and TMZ, respectively; the black line is the 1:1 line

Fig. 5
Fig. 5 Monthly variation of the mean ± standard deviation of R 2 (coefficient of determination) for 8 temperature-based models in the four climatic zones of China

Fig. 6
Fig. 6 Comparison of calculated and measured monthly PET (2001-2020) determined by 7 aerodynamic-based models and Epan, respectively

Fig. 7
Fig. 7 Monthly variation of the mean ± standard deviation of R 2 (coefficient of determination) for 7 aerodynamic-based models in the four climatic zones of China

Fig. 8 Fig. 10
Fig. 8 Comparison of calculated and measured monthly PET (2001-2020) determined by 11 radiation-based models and Epan, respectively , and 70.72 mm, respectively.The calculation results of thePenman and

Fig. 11
Fig. 11 Comparison of calculated and measured monthly PET (2001-2020) determined by the 4 combination-based models and Epan, respectively

Fig. 12
Fig. 12 Monthly variation of the mean ± standard deviation of R 2 (coefficient of determination) for the 4 combination-based models in the four climatic zones of China

Fig. 13
Fig. 13 Mean ± standard deviation of coefficient of determination for the 18 models in the four climatic zones of China.MPZ mountain plateau zone, SMZ subtropical monsoon zone, TMZ temperate monsoon zone, TCZ temperate continental zone Romanenko, Rohwer, Abtew, Priestley-Taylor, and Penman-Monteith models Potential evapotranspiration is a theoretical value that is difficult to quantify accurately, and thus comparing different model categories overcomes, to some extent, the limitations of certain model categories (Diks and Vrugt 2010; Chen and Liu 2020).According to the results presented in previous sections, the Romanenko temperature-based model, the Rohwer aerodynamic-based model, the Abtew radiation-based model, and the Priestley-Taylor and Penman-Monteith combined models were selected as the five optimal models for comparison.

Fig. 14
Fig. 14 Monthly variation of the calculated potential evapotranspiration by five PET models, PETm, and PET of Epan in the four climatic zones of China

Fig. 15
Fig. 15 Comparison of the monthly potential evapotranspiration (PET) calculated by the original (black circle) and calibrated (blue circle) potential evapotranspiration models against PETm in four climatic zones of China

Table 2
Evaluation criteria between the original and calibrated potential evapotranspiration models against potential evapotranspiration reference in different climatic zones of China Baier-Robertson, Kharrufa, McCloud,  Oudin, Romanenko, Schendel, and Thornthwaite models  were 83.71, 42.99, 38.85, 69.04, 88.88, 95.05, 91.04, and  70.34 mm, respectively, while the monthly Epan value was 100.76 mm.It should be noted that, except for the Baier-Robertson model, all other PET models revealed higher R 2 values with the Epan in the SMZ than those observed in other climatic zones (Fig.