Using hierarchical spatial models to assess the occurrence of an island endemism: the case of Salamandra corsica

Island species are vulnerable to rapid extinction, so it is important to develop accurate methods to determine their occurrence and habitat preferences. In this study, we assessed two methods for modeling the occurrence of the Corsican endemic Salamandra corsica, based on macro-ecological and fine habitat descriptors. We expected that models based on habitat descriptors would better estimate S. corsica occurrence, because its distribution could be influenced by micro-environmental gradients. The occurrence of S. corsica was modeled according to two ensembles of variables using random forests. Salamandra corsica was mainly found in forested habitats, with a complex vertical structure. These habitats are associated with more stable environmental conditions. The model based on fine habitat descriptors was better able to predict occurrence, and gave no false negatives. The model based on macro-ecological variables underestimated the occurrence of the species on its ecological boundary, which is important as such locations may facilitate interpopulation connectivity. Implementing fine spatial resolution models requires greater investment of resources, but this is advisable for study of microendemic species, where it is important to reduce type II error (false negatives).


Introduction
Macro-ecological models are popular for evaluating ecological and evolutionary hypotheses for vertebrates. This approach is commonly used because the free access to databases, including Worldclim and the Global Biodiversity Information Facility, enables the evaluation of hypotheses involving large taxonomic groups over broad geographical areas (Buckley and Jetz 2007;Araújo et al. 2008;Gregory et al. 2009). Studies involving the use of fine spatial scale descriptors are expensive in terms of resources required per species and site, which typically restricts their application to a limited number of species in small regions (Freemark and Merriam 1986;Williams et al. 2002). However, in some cases (e.g., forest ectothermic vertebrates) macroclimatic models cannot adequately describe a species niche, leading to erroneous extrapolations (Scheffers et al. 2014a). For this reason, studies evaluating the niche of ectothermic vertebrates associated with forests should include parameters that describe the habitats in finer detail (Petranka et al. 1993;Rödel and Ernst 2004;Escoriza et al. 2018).
Urodeles are particularly sensitive to environmental conditions because they have narrow ranges of thermal/ humidity tolerance (Hutchison 1961;Spotila 1972). Many species of terrestrial salamander are habitat specialists, confined to thermally buffered microhabitats including riparian forests, caves, or fissured substrates (Salvidio 1998;Johnston and Frid 2002;Godmann et al. 2016). This suggests that terrestrial salamanders are useful for testing the effectiveness of ecological models constructed using sets of variables covering different spatial scales. In addition, a large proportion of salamander species are threatened globally, by habitat loss and the expansion of pathogens (Rovito et al. 2009;Martel et al. 2013). For this reason to have accurate tools for describing their habitat preferences could aid their conservation, including defining optimal survival habitats so that such areas can be included in natural reserves (Esselman and Allan 2011). Terrestrial salamanders are poor terrestrial dispersers and frequently the territory of adults is close to breeding habitats (Bar-David et al. 2007). It can be assumed that the optimal habitats for the survival of populations are those that include suitable habitats for reproduction and for surface activity of adults.
In this study, we evaluated the performance of two modeling methods for predicting the occurrence of an island salamander, Salamandra corsica. This species is endemic to Corsica but distributed discontinuously, and the factors explaining the pattern of occurrence are not fully understood (Lanza et al. 2007;Bosc and Destandau 2012). The fact that this species is confined to a geographically small region provides the opportunity to characterize most of its distributional range at fine spatial resolution, in contrast to other species of the Mediterranean salamander. In addition, it is a priority to understand the ecological requirements of S. corsica, because island species are particularly prone to rapid extinction because of the accidental translocation of pathogens or alien species (Sax and Gaines 2008).
Corsica has a very heterogeneous landscape, with major topographic gradients in climate and plant communities (Jeanmonod and Gamisans 2007). Characterizing these communities can be useful in describing the local landscape complexity, because plants respond at fine scale and in competitive ways to subtle changes in temperature, moisture, substrate chemistry, and anthropogenic disturbance (Beatty 1984;Wilson and Keddy 1986). Moreover, plants constitute shelter and food for animals, so their importance is not limited to reflecting the abiotic background (Garden et al. 2007). For these reasons, the gradients of vegetation have a marked influence on the organization of zoological assemblages (Bersier and Meyer 1994).
In this study, using macro-and micro-ecological data, we investigated the capacity of two statistical models to determine and predict the occurrence of the endemic S. corsica found in Corsica Island, western Mediterranean. The macro-ecological variables described the landscape and climate at coarse spatial resolution, and were obtained from freely available databases (Chander et al. 2009;Fick and Hijmans 2017). The micro-ecological variables described the fine composition and structure of the terrestrial habitats involved, particularly focusing on those parameters that determine, or are associated with, thermal/humidity buffers (Johnston and Frid 2002); these were characterized in situ in each locality where the target species occurred. We predicted that those models that included habitat descriptors would better estimate S. corsica occurrence, because the distribution of the species could be influenced by micro-gradients that are not described by large-scale spatial models (Scheffers et al. 2014b). We expected this pattern because this salamander appears discontinuously in the meso-Mediterranean climate belt, suggesting the use of habitats associated with very specific topographic and vegetation profiles (Lanza et al. 2007).

Study area and surveys
The study region was the island of Corsica (8680 km 2 ), which is located in the western Mediterranean. Two thirds of the surface of the island is mountainous, with a maximum elevation of 2706 m at Monte Cinto (van Geldern et al. 2014). The climate of the island is Csa type (Köppen system) on the peri-coastal plains, but the Csb type predominates inland, from elevations of 400 m (Peel et al. 2007). Most of the island receives moderate to high rainfall, from 774 mm year −1 at Vizzavona (974 m) to 547 mm year −1 at Bonifacio (63 m). The natural vegetation is dominated by meso-Mediterranean forests (Quercus ilex/Q. suber) and evergreen shrubs ('maquis'; 0-700 m), supra-Mediterranean forests (Pinus nigra/Quercus pubescens; 700-1300 m) and oro-Mediterranean and temperate forests (Fagus sylvatica/P. nigra; 1000-1800 m; IFN (Inventaire Forestier National) 2006; Gamisans 2010). Above 1600-1800 m, sub-alpine fir forests (Abies alba) and shrubs predominate (Gamisans 2010).
Salamandra corsica is one of two species of urodeles that occur on Corsica (Delaugerre and Cheylan 1992). This salamander is endemic to the island, where it occurs from 50 to 1890 m, although it is rare below 400-500 m, where it is mostly confined to humid forests (Lanza et al. 2007;Lescure and de Massary 2012). According to the IUCN, this species is not threatened but its ecological requirements are poorly known, and it could be adversely affected by habitat disturbance and emerging diseases (Miaud et al. 2009;Bosc and Destandau 2012). We surveyed S. corsica over a year from September 2014 to September 2015, and for six shorter periods of 10 days in October, November, February, March, and May in the years 2006 and 2018. Surveying in different months and years reduced the uncertainty associated with interannual variability in the use of aquatic habitats by amphibians under unstable hydrological regimes (Gómez-Rodríguez et al. 2010). Area-constrained surveys (30 × 30 m) were conducted by assessing ponds, springs, and streams for larvae (Duguet and Melki 2003), and assessing terrestrial habitats on foot for adults during the day and at night in rainy weather (Hernandez et al. 2018). We lifted rock and logs under which larvae and adult salamanders usually seek refuge.
The surveyed sites were classified according to the presence or absence of S. corsica. Assessing for the occurrence of a terrestrial salamander at a given site can be difficult because they are elusive or confined to microhabitat patches (Warburg 1986;Surasinghe and Baldwin 2015). Sites for which we determined that the species was absent were defined as those outside the convex hull generated by the known species distribution range (Delaugerre and Cheylan 1992;Bosc and Destandau 2012), and where S. corsica was not detected in at least two surveys conducted in different years and seasons (Fig. 1). The combination of several survey methods (i.e., terrestrial and aquatic habitats) under different environmental conditions at different periods of the year reduces the probability of including false absences in the survey data (Williams and Berkson 2004).

Data collection
Both presence and absence sites were characterized using environmental variables that described climate and landscape features important in the natural history of terrestrial salamanders (Kozak and Wiens 2010;Escoriza and Ben Hassine 2017). The climate was characterized by the maximum temperature in July and the minimum temperature in January (°C), water vapor pressure in July (kPa), and mean annual precipitation (mm year −1 ) (Hijmans et al. 2005;Fick and Hijmans 2017). Land cover was characterized using a vegetation index (enhanced vegetation index, EVI; Huete et al. 2002). The EVI data were obtained from Landsat 8 (32-day composite) for the month of July (Chander et al. 2009). This vegetation index is less sensitive to random environmental noise than the normalized difference vegetation index (NDVI), and describes variations in the biomass and in the architecture of the forest canopy (Pettorelli et al. 2005). The topography descriptors included altitude and the terrain ruggedness index, an estimator of topographic heterogeneity (Riley et al. 1999). The terrain ruggedness index was obtained from a digital elevation model having a resolution of 250 m (Jarvis et al. 2008). Elevation was obtained in situ using a global positioning system at an accuracy of ± 15 m (Garmin Etrex 10; Garmin Ltd., Olathe, KS, USA). GIS environmental variables data were extracted using QGIS v. 3.4.1 (QGIS Development Team 2018).
We also characterized the composition and structure of plant communities at the study sites (James and Wamer 1982). The richness of plant species in Corsica is too high (2858 species; Jeanmonod and Gamisans 2007) Fig. 1 Map of the study region. The shaded polygon shows the distribution of Salamandra corsica according to the IUCN (International Union for Conservation of Nature and Natural Resources) (2018). Blue triangles, presence sites; orange circles, absence/undetected presence sites to be useful in a statistical model. For this reason, we summarized plant diversity by grouping species into functional taxonomic types, widely used in standard botanical classifications (Thorogood 2018). These groups were (i) height type: trees (species having a maximum height > 5 m), shrubs (1-5 m), sub-shrubs (< 1 m), lianas (climbing plants), and perennial grasses (plants lacking woody stems and having a lifespan > 2 years); (ii) leaf type: aciculifolious (conifers), deciduous, and perennial; and (iii) taxonomic groups: Spermatophyta (including all previous subgroups), Lycopodiophyta (lycophytes), and Pteridophyta (equisetums and ferns). In addition, we included a group comprising naturalized plants (alien plants of spontaneous growth, following Gamisans 2010Gamisans , 2014 as bio-indicators of anthropic disturbance (Larson et al. 2001). Data regarding plant composition were collected by sampling along three transects, each of 30 m in length radiating in north, southeast, and southwest directions (respectively) from a central point of occurrence of S. corsica (Guénnette and Villard 2005). The relative importance of these groups to the vertical structure of the plant community was estimated by multiplying the number of stems of the species by their average heights. Tree density combined size estimations are commonly used in forest inventories to assess the dominance of tree species (McIntyre et al. 2015). The height was estimated by measuring average-sized specimens, using images calibrated in ImageJ 1.52 (Rasband 2018). The dataset supporting the conclusions of this article is included within the article and its Additional file 1.

Data analyses
Correlations involving all variables were tested. Those that showed a correlation value > 0.75 were removed from subsequent analyses (Griffith et al. 2003). The remaining variables were normalized. The relative position of the presence and absence sites within the environmental gradient was visualized using principal component analysis (PCA). We assessed whether the two groups of sites differed significantly in their environmental conditions, applying a PERMANOVA test with the matrix based on Euclidean distances (one fixed factor = presence/absence) (Anderson 2001). These analyses were carried out using the package PRIMER-E (Primer-E Ltd., Plymouth) and the software 'pca3d' (Weiner 2017) The significant associations and the sense (positive or negative) of the associations between the presence of S. corsica and the environmental descriptors were estimated separately using univariate binomial generalized linear models (GLMs). A univariate method was used to test the sense and the statistical significance (alpha level = 0.05) of the associations between the dependent variable and predictors, not biased by variable correlations (Dormann et al. 2013). The relative importance of the pooled variables was evaluated using automated model selection and multi-model inference procedures for GLMs (Calcagno and de Mazancourt 2010). The best candidate models were obtained using a holistic selection, and the models were ranked using the small-sample corrected Akaike information criterion (AICc; Burnham and Anderson 2002). We assessed the statistical importance of each variable according to its model-averaged weighting in the 100 best models. These analyses were conducted using the package 'glmulti' (Calcagno 2015) in R.
The efficiency of the model in determining the presence of S. corsica at the study sites was assessed using random forests (Cutler et al. 2007). This classification method has been shown to be superior to other classification algorithms, and is less prone to overfit when using high dimensional datasets (Hastie et al. 2009;Broennimann et al. 2012). We built two models, one including only macro-ecological parameters (hereafter ŷ1 model) and having a spatial resolution of 250-1000 m (except for elevation, which had a spatial resolution of 15 m). The other model was built by combining macro-ecological and microhabitat parameters (hereafter ŷ2 model).
We set the random forest parameters for classification, and used 10,000 training trees (Liaw and Wiener 2002). The optimum number of variables randomly sampled at each split was estimated by the decrease in the out-of-bag (OOB) error, using the 'tuneRF' function implemented in the 'randomForest' package (Liaw and Wiener 2002). We assessed model performance based on several estimators that evaluated the efficiency, rate, and type of errors in the discriminatory capacity of binary models, including the OOB error, the area under the receiver-operating characteristic curve (AUC), and the rates of false positives (FP) and false negatives (FN) (Lalkhen and McCluskey 2008;Mwitondi 2013). The OOB is the misclassification rate of the random forest model estimated for the training data, with higher values indicating models having lower classification accuracy (Mwitondi 2013). The AUC facilitated evaluation of the predictive efficiency of the model; its values ranged from 0.5 for models having predictive capacity similar to chance, to 1.0 for models having perfect predictive ability (Araújo et al. 2005). We evaluated variable importance of each variable based on the mean decrease accuracy (MDA; loss of model accuracy if the variable is excluded from the model; Guo et al. 2016). These analyses were carried out with the package 'ran-domForest' (Liaw and Wiener 2002) and 'ROCR' (Sing et al. 2015) in R.

Results
Salamandra corsica was found at 35 sites, encompassing most of the island of Corsica (Fig. 1). The species  typically occurs in elevated habitats having high rainfall and moderate temperatures (Table 1). In the sites where the species was present, the vegetation had a complex vertical structure: high canopies of broadleaf/aciculifolious trees (approximately 10-30 m height), with a mid-understory of deciduous bushes (approximately 2-5 m height) and a low understory of lianas, ferns, and perennial grasses (approximately 1 m height) (Table 1 and Additional file 1). Correlation matrices indicated that the water vapor pressure in July and the minimum temperature in January were highly correlated with the maximum temperature in July (r > 0.75). Thus, the variables temperature and water vapor pressure were removed from subsequent analyses. The PCA showed that the presence and absence sites were distributed separately along a multidimensional environmental gradient (explained variance = 28%; Fig. 2). The first axis of the PCA (explained variance = 11.02%) was negatively correlated with the elevation and was positively correlated with temperatures (Fig. 2). The second axis of the PCA (explained variance = 9.24%) was positively correlated with aciculifolious trees and was negatively correlated with perennial liana (Fig. 2). The third axis of the PCA (explained variance = 7.89%) was positively correlated with perennial grass and was negatively correlated with perennial broadleaved trees (Fig. 2). The PERMA-NOVA test confirmed that both groups of sites differed statistically (Pseudo-F = 13.147, P = 0.0001).
The GLM analyses showed that the topography (elevation: positive association with S. corsica occurrence), vegetation structure (proportion of deciduous broad-leaved trees, ferns, deciduous lianas: positive association with S. corsica occurrence), and temperature (negative association with S. corsica occurrence) were the major variables (present in at least the 22% of the best models) differentiating presence and absence sites ( Table 2). All these associations were statistically significant (Table 2).
Both models generated using a random forest classifier correctly discriminated a large proportion of the study sites (Table 3). The ŷ1 model showed AUC = 0.984 and OOB = 6.06% and the ŷ2 model showed AUC = 0.999 and OOB = 1.52%. The ŷ2 model was superior to the ŷ1  (Table 4).

Discussion
Our analyses showed that the model including micro-ecological and macro-ecological descriptors (the ŷ2 model) provided the best prediction of the occurrence of S. corsica. Although the difference in the performance of the models was relatively small, the ŷ2 model better predicted the occurrence of the species in environments at the edge of its distribution. This included low elevation meso-Mediterranean shrubland or 'maquis, ' where the species is more localized in smaller populations. These populations could be important to the long-term survival of the species, acting as genetic corridors interconnecting mountain populations, as has been observed for other urodeles (Giordano et al. 2007). Furthermore, some peripheral amphibian populations show greater genetic diversity than those located in core areas, possibly because of the suboptimal environmental conditions (Munwes et al. 2010). However, no studies have evaluated whether this is also the case for S. corsica. Consistent with the findings of Delaugerre and Cheylan (1992) and Lanza et al. (2007), our data showed that S. corsica occupies mainly temperate and supra-Mediterranean deciduous forests composed of beech (F. sylvatica) and chestnut (Castanea sativa), and meso-/oro-Mediterranean forests of black pine (P. nigra) and holm oak (Q. ilex), from 240 to 1722 masl. However, this salamander also occurred in meso-Mediterranean sclerophyllous shrubland, from 139 to 215 masl. The habitat of the sites where S. corsica was present differed structurally from those where it was absent. The two methods used to evaluate these differences showed that this could mainly be explained by the proportions of deciduous trees, ferns, and deciduous lianas. The deciduous broad-leaved trees included species with a height range of 15-30 m (mainly Alnus glutinosa, C. sativa, F. sylvatica, Fraxinus ornus, and Q. pubescens), which commonly formed a high and dense canopy poorly permeable to sunlight (Gálhidy et al. 2006). The group of deciduous lianas comprised species of the genera Lonicera, Rosa, and Rubus. All the species involved, together with ferns, are confined to habitats having high levels of edaphic moisture, such as Mediterranean riverine strips and/or humid montane forests (Gamisans 2010(Gamisans , 2014. Our data also showed that sites where S. corsica was present were also topographically heterogeneous. This heterogeneity, together with a dense forest canopy, is typically associated with more stable air temperatures and humidity (Suggitt et al. 2011;von Arx et al. 2012). This could explain why macroclimate models failed to predict the occurrence of S. corsica in the edge habitats, where it is mainly (or exclusively) located in or around closed forest ravines (Lanza et al. 2007).
Another aspect evaluated in this study was anthropogenic effects on the sites where S. corsica was present, determined by the presence of aliens in the plant community structure. The results indicated that S. corsica can occupy highly altered habitats in which alien species comprise up to 75.4% of the vegetation community. However, this was associated with the massive forests (about 24,600 ha; IFN (Inventaire Forestier National) 2006) of probably non-native chestnut on the island (Roces-Díaz et al. 2018). The chestnut forests typically comprise mature trees mixed with native trees and bushes, providing a vertical structure that mimics the natural deciduous forests (IFN (Inventaire Forestier National) 2006; Gamisans 2010), so the microclimatic conditions may also be similar. On the other hand, S. corsica was absent from other types of highly modified habitat, including intensive agriculture and suburban habitats, where the vegetation structure does not resemble natural forest formations. Our findings, together with the data provided by the literature (Delaugerre and Cheylan 1992;Lanza et al. 2007), suggested that this species could avoid open human-disturbed landscapes.

Conclusions
Our findings showed that for describing the niche of a forest salamander, models including fine habitat descriptors are preferable to those based solely on macro-ecological parameters. This finding has important implications, particularly for the conservation of micro-endemic and threatened species, where it is crucial to minimize type II errors (sites are not protected, but include populations of the target species; Loiselle et al. 2003). Our method allows determining these sites with high probability of presence of the target species and therefore where to concentrate the survey effort to detect overlooked refuges or populations (Hughes et al. 2008).
The methodology used in this study is relatively simple to apply in the field, although it requires substantial survey effort (depending on the detectability of the species) and some botanical knowledge. The use of taxonomic and basic functional groups to describe plant associations enables this protocol to be used in any ecoregion worldwide, so its effectiveness could be tested in niche models generated for other species of forest-dwelling amphibian.

Additional file
Additional file 1: Data describing the characteristics of plant species found at the study sites. (XLSX 14 kb)