Refining spatial neighbourhoods to capture terrain effects

Nelson, Trisalyn A; Robertson, Colin

doi:10.1186/2192-1709-1-3

Research
Open access
Published: 10 February 2012

Refining spatial neighbourhoods to capture terrain effects

Trisalyn A Nelson¹ &
Colin Robertson²

Ecological Processes volume 1, Article number: 3 (2012) Cite this article

6082 Accesses
7 Citations
Metrics details

Abstract

Introduction

Spatially explicit ecological research has increased substantially in the past 20 years. Most spatial approaches require the definition of a spatial neighbourhood or the region over which spatial relationships are modelled or assessed. Spatial neighbourhood definitions impact analysis results, and there are benefits in considering neighbourhood definitions that better capture ecological processes. The goal of this research is to present a simple and flexible approach in constraining ecological spatial neighbourhoods using terrain data.

Methods

Using watershed boundaries, we can restrict spatial neighbourhoods from combining populations or processes that should be separated by terrain effects. We demonstrate the need for ecological constraints by way of a simulation study and highlight our approach with a case study examining mountain pine beetle (Dendroctonus ponderosae, Coleoptera; Hopkins) infestation hot spots.

Results

Our results demonstrate how failure to constrain neighbourhoods can lead to errors when the spatial signals from unrelated populations are mixed. Also, unconstrained spatial neighbourhoods can unintentionally detect spatial relationships across many scales.

Conclusions

There will be benefits to studies that develop new, ecology-based approaches in defining spatial neighbourhoods that better illuminate ecological function of phenomena under study.

Introduction

Spatial ecology has developed over the past two decades as spatial research questions have intersected with advancing, geographically explicit technology, such as global positioning systems [GPS], geographic information systems [GIS] and remote sensing. Spatial perspectives to ecological research have proven beneficial (e.g. Liebhold and Gurevitch 2002), but they have also presented researchers with new methodological challenges (e.g. Legendre and Fortin 1989; Legendre et al. 2002). In response, many good references have been written to guide ecologists as they collect and analyse spatial data to explore or generate geographically explicit ecological hypotheses (e.g. Fortin and Dale 2005; Dormann et al. 2007; Nelson and Boots 2008). Most spatial methods require an a priori definition of a spatial neighbourhood, also referred to as a spatial weight matrix or kernel (Bavaud 1998; Getis and Aldstadt 2002), and results of analyses are dependent on the spatial neighbourhood definition (e.g. Moilanen and Nieminen 2002). While ecologists are aware of the importance of spatial neighbourhood selections (see Fortin and Dale 2005, pages 113 to 118, for discussion of standard approaches), few definitions capture the ecological function.

Spatial neighbourhoods are commonly categorised as binary or weighted (O'Sullivan and Unwin 2003, pages 44 to 45). If binary, locations are either part of the same neighbourhood or not. Binary neighbourhoods, which are computationally simple and require the least a priori knowledge, are the most common. Weighted neighbourhoods allow the amount of interaction between two locations within the same neighbourhood to vary based on proximity or some other measure. Weighting is most often based on linear distance decay (O'Sullivan and Unwin 2003, pages 44 to 45). Spatial neighbourhoods can also be fixed or adaptive (e.g. Aldstadt and Getis 2006). Fixed neighbourhoods have the same definition regardless of location in the study area, whereas adaptive neighbourhoods enable the definition of neighbourhood to vary over the study area. It may be appropriate to use adaptive neighbourhood definitions when a study area is known to have fine scale processes in one region and coarse scale processes in another. Due to ease of use and interpretability, most spatial studies use fixed neighbourhoods.

Within the aforementioned categories of spatial neighbourhoods, there are many possible definitions. A complete enumeration of spatial neighbourhoods is beyond the scope of this paper. Here, we focus on neighbourhood definitions that are most commonly used with point and areal data, and we refer the reader to O'Sullivan and Unwin (2003, pages 41 to 52) for more details. The majority of studies use definitions based on distance, k neighbours or contiguity (Griffith 1996). Distance definitions can be employed by buffering each location in the study with a radius of a specific distance (Figure 1a). A binary distance neighbourhood would assign all locations within the buffer to the same neighbourhood. Distance weighting could also be employed, and distances used to buffer locations may be variable or fixed in terms of size (Figure 1.1 in O'Sullivan and Unwin 2003). When distance neighbourhoods are of fixed size and when densities vary over space, the number of locations in each neighbourhood will change.

k-Neighbour definitions are used to create neighbourhoods with a constant number of locations in each neighbourhood (Griffith 1996; Figure 1b). k-Neighbour statistics are used to identify the nearest data locations or quantify the distance to a specific number of nearest neighbours (see Bailey and Gatrell 1995 for further discussion and equations). For instance, k could equal 10, and all locations would be assigned neighbourhoods that include the 10 nearest neighbouring locations. If data locations are close together in one part of the study area and far apart in another region, the area associated with each neighbourhood will change. Some statistical, spatial measures are more stable when k is constant or larger than a threshold; for instance, the distribution of the Getis statistics is known when k ≥ 8, and statistical considerations may make k neighbours a useful neighbourhood definition (Ord and Getis 1995).

Contiguity- or adjacency-based neighbourhoods are based on topology (O'Sullivan and Unwin 2003, pages 43 to 44) and are best described in an application to spatial data represented as uninterrupted polygons (i.e. satellite, census or forest inventory data; Figure 1c). Contiguity is defined based on polygon adjacency or shared boundaries (Dubin 2009). First-order contiguity assigns to the same neighbourhood all locations that share a boundary with a single polygon. Higher-order contiguity can be defined by including polygons that share boundaries with any first-order polygon. If data are represented as points, contiguity can be used to define neighbourhoods if points are first tessellated using triangulation or Voronoi polygons (O'Sullivan and Unwin 2003, pages 50 to 52). Voronoi polygons tessellate space so that all locations within a polygon are closer to the point where the polygon is generated from than all other points (for further details and equations, we refer the reader to Okabe et al. 2000). Properties of neighbourhood structures defined by contiguous Voronoi polygons can be a link between the point pattern and the generating spatial process. For example, complete spatial randomness processes will typically produce neighbourhood structures with six first-order neighbours (Okabe et al. 2000). For regular lattices, contiguity definitions can include all neighbours (queen) or can be directional, using only neighbours in the north, south, east and west directions (rook) or only in corner locations (bishop) (Dubin 2009). Contiguity-based weighted neighbourhoods can also be weighted, for instance, by length of shared boundaries.

Exploration of spatial neighbourhood definitions is often insufficiently explored when conducting analysis. There are many examples where the definition of spatial neighbourhood is not even stated in spatial literature. In part, it can be wise to avoid drawing attention to the selection of spatial neighbourhood, which is inherently subjective and easy fodder during the review process. However, and more importantly, the impact of spatial neighbourhood has not been fully explored in spatial literature, and relatively little attention has been given to appropriate selection of spatial neighbourhoods in ecology specifically (Fortin and Dale 2005, pages 113 to 118). The absence of spatial neighbourhood selection discussions is problematic in ecology given that most standard definitions were developed outside the discipline. Ecology has unique spatial issues to consider in selecting spatial neighbourhoods. For example, if the objective of analysis was to model the spatial diffusion of mountain pine beetle across a heterogeneous landscape, a fixed distance definition of spatial neighbourhoods may produce misleading results. Mountain pine beetle populations disperse along valleys, and mountains are typically a barrier to dispersion (Robertson et al. 2009a). As such, when modelling diffusion or detecting hot spots, spatial neighbourhoods should be restricted within valleys or along the sides of mountains and should not reach across mountain peaks or ridges. The underlying issue is that the conceptualisation of space in most spatial analysis methods is in reference to a two-dimensional plane when, in reality, topographic relief structures much of the ecological variation in plant and animal communities. Neighbourhoods defined by distance will ignore topography and may aggregate the spatial signals from multiple, unrelated sub-populations.

When study areas are small, it is reasonable to modify spatial neighbourhoods manually. However, as with many applied areas of spatial analysis, ecological data sets are increasingly large, and automated approaches are required for appropriate spatial neighbourhood delineation. The goal of this research is to present a semi-automated approach to the delineation of spatial neighbourhoods that captures impacts of terrain features on ecological function. To meet this goal, we use terrain data to delineate watersheds and demonstrate how watersheds can be used to constrain spatial neighbourhoods. Watersheds are useful because they are often divided by mountain peaks and can be used to partition terrain that separates species' populations. Our general approach is demonstrated through the application of a local measure of spatial autocorrelation, Moran's I_i (Anselin 1995), useful for detecting spatially explicit hot spots or locations where values are similar and extreme relative to the mean. We apply our topographic approach in constraining spatial neighbourhoods to simulated data and a case study identifying infestation hot spots in epidemic mountain pine beetle populations in British Columbia, Canada. The methods and results for the simulation and case study are presented separately. First, however, we will outline our terrain approach to spatial neighbourhood definition and outline the spatial statistic used in our demonstrations.

Approach to spatial neighbourhood delineation

Our approach in modifying spatial neighbourhoods is to constrain standard definitions of spatial neighbourhoods, like distance, by watershed boundaries. The topographic approach to spatial neighbourhood selection will be most appropriate where sub-populations are separated by topographic barriers, when dispersal is constrained by mountains, valleys or water bodies, or when samples from different topographic regions are known a priori to have unique characteristics. Also, for spatial research of hydrological applications, a watershed approach has been shown to be beneficial (Laffan 2002). Our method's simplicity is its strength, in which creating more ecologically meaningful weight matrices is a relatively straightforward extension to standard techniques, and constraints are easily implemented in standard GIS software packages. Using an elevation model, watershed boundaries can be delineated. It may be necessary to remove watershed boundaries in low-lying areas that do not have terrain high enough to constrain ecological function. Spatial neighbourhoods can then be defined by distance, k neighbours or contiguity, but the watersheds provide a barrier forcing neighbours to be selected from only within watersheds. More specific steps will be provided.

There are many good textbooks that outline how to delineate watersheds using elevation data, and it is not our aim to replicate these (see Chang 2006, Chapter 15). Briefly, there are four conceptual steps. First, the direction that water will flow out of each location (cell) in a digital elevation model [DEM] is calculated (flow direction). Second, flow accumulation is tabulated as the number of cells that water will flow through to accumulate in each cell. Third, a stream network is defined by applying a threshold to the flow accumulation layer. Selecting the accumulation threshold value is largely subjective. Larger thresholds will generate smaller watersheds, and threshold selection enables users to control the scale of watershed delineation. The accumulation threshold should be set to ensure that 'impassable' terrain is separated by the watershed boundaries. Also, it is important that each watershed is large enough to have a sufficient number of data points to ensure the spatial analysis or that statistic is robust. In our study, we set the accumulation threshold to ensure that large terrain features such as mountain ridges were spatially separated. For further discussion of the impact of watershed accumulation threshold selection, we refer to Band (1993). Fourth, using inputs generated on flow, watersheds are defined around each stream.

Once watersheds are defined, it is best to assess the characteristics of the terrain within each watershed. If the watershed has low variability or maximum elevation, variation or height of terrain may be insufficient to impact ecological function, and watershed boundaries should be dissolved. The remaining watershed boundaries may be used to constrain the definition of spatial neighbourhood in combination with any standard approach in assigning spatial neighbours.

Moran's I_i

To show the impact of spatial neighbourhood definition on spatial analysis and the benefit of the terrain approach to neighbourhood definition, we undertake hot spot detection using local Moran's I_i , a measure of spatial autocorrelation. Spatial autocorrelation is the first law of geography which states that all things are related and that near things more so than far (Tobler 1965). Positive spatial autocorrelation exists when nearby events are similar. Negative spatial autocorrelation exists when nearby events are dissimilar. Local Moran's I_i quantifies spatial autocorrelation for every location in the study region, differing from its global counterpart, which generates one summary value for an entire study area (Anselin 1995).

If the observed values x, i ∈ {1, ..., n} of a random variable x are recorded at a set of n data sites, then local measures of spatial autocorrelation take the general form of a cross-product statistic:

Γ_{i} = \sum_{j} w_{i j} y_{i j}

(1)

where w_ij is a measure of the spatial relationships of data sites i and j at a given time, and y_ij is a measure of their relationship in attribute space (Getis and Ord 1992; Boots 2002).

For local Moran's I_i , the attribute relationship in the cross-product statistic is defined as $y_{i j} = (x_{i} - \bar{x}) (x_{j} - \bar{x})$ and

I_{i} = (\frac{z_{i}}{(\frac{\sum_{i} z_{i}^{2}}{n})}) \sum_{j} w_{i j} z_{j},

(2)

where $z_{i j} = (x_{i} - \bar{x})$ . Positive values of Moran's I_i indicate positive spatial autocorrelation in values that are extreme relative to the mean. Negative Moran's I_i values indicate a negative spatial autocorrelation in values that are extreme relative to the mean. When Moran's I_i approaches 0, it could be that there is no spatial autocorrelation or that spatial autocorrelation is present in values near the mean.

For hot spot detection, a Moran's scatter plot is useful to categorise positive and negative spatial autocorrelations based on the attribute value of a location in relation to the attribute value of its neighbours. On a Moran's scatter plot, the x axis is the attribute value in deviation form and the y axis is a standardized average of the neighbour values also in deviation form (Anselin 1995). The upper right quadrant indicates high values surrounded by high values (high-high); the upper left quadrant indicates low values surrounded by high values (low-high). The lower right quadrant indicates high values surrounded by low values (high-low), and the lower left quadrant indicates low values surrounded by low values (low-low) (see Nelson and Boots 2008 for details and figure of scatter plot). Moran's scatter plot can be combined with statistical testing to determine the significance of spatial autocorrelation. Expected values of I_i can be derived under the hypothesis of no local spatial autocorrelation (Sokal et al. 1998); however, the distribution does not follow a known distribution (Boots and Tiefelsdorf 2000), and a randomization approach must be adopted for statistical testing. It should be mentioned that all local spatial statistics are complicated by the presence of global spatial autocorrelation and multiple and correlated tests (Boots 2002). As such, statistical significance is usually considered exploratory rather than confirmatory. In our study, hot spots are defined as high-high values that are statistically different from configurations of data expected by chance.

Methods

Simulation

A simulation study was conducted to examine the theoretical impacts of using spatial neighbourhoods constrained by terrain. When a broad phenomenon, such as terrain, influences the mean of a spatial process, landscapes exhibit first-order non-stationarity and violate the assumption of most spatial statistics (Kabos and Csillag 2002). The aim of the simulation is to examine the impact of accounting for terrain constraints when doing spatial analysis on point patterns that exhibit first-order non-stationarity.

To create a simulated landscape with first-order non-stationarity, a spatially autocorrelated process was generated onto a 30 × 30-km grid. Each grid cell value was drawn from a normal distribution with a mean of 0 and a spatially structured standard deviation. Standard deviations were estimated with the spatial covariance function (Ripley 1981) up to a maximum distance of 10 km (i.e. range parameter) with a standard error of 1 km. The spatially correlated surface was categorised into 5 quantiles (Figure 2a), representing distinct theoretical landscape types whereby ecological processes of interest within each landscape were functionally independent. A 3 × 3 modal-spatial filter was run over the surface to remove unrealistically small landscape polygons. Separate autocorrelation processes were simulated for grid cells in each of the landscape polygons, thereby creating first-order homogeneity within each polygon and non-stationarity across polygon boundaries (Figure 2b). Points (n = 1,000) were randomly allocated to the study area (Figure 2c), and assigned values were extracted from the grid cells. This process was replicated 99 times to create a total of 99 simulated point patterns with different realisations of the non-stationary process. These point patterns were used for subsequent hot spot analysis.

Hot spots were quantified for the simulated data using local Moran's I_i . Pseudo-significance was determined via randomisation using 999 permutations. The spatial neighbourhoods defined were binary, fixed and based on distance. Neighbourhoods were defined using constraints similar to topographic barriers (i.e. only using points within a landscape polygon to calculate local Moran's I_i ) and also a standard unconstrained definition (using all points within the spatial neighbourhood). For both constrained and unconstrained neighbourhoods, several distances were used to explore the impact of the neighbourhood scale on analysis results. Distances of 2, 5, 10, 15, and 20 km were examined, and the number of hot spots was tabulated for each of the 99 point patterns.

Case study

Beginning in the mid 1990s, British Columbia became home to the largest recorded infestation of mountain pine beetle (Robertson et al. 2009a). The infestation continues to have substantial impact on pine (Pinus spp.) forests. Due to infestation severity, many spatial studies of mountain pine beetle have been conducted, and standard spatial neighbourhood definitions are typically employed. Given that mountain pine beetle tends to spread via valleys and that mountains and large water bodies can be barriers to dispersal (Robertson et al. 2009b), standard neighbourhood definitions are not always appropriate. Ideally, neighbourhood relationships are defined based on knowledge of the ecosystem functioning in the study area. However, landscape scale studies require large data sets, making manual definition of spatial neighbourhoods problematic. We hypothesize that watershed boundaries are ideal for restricting spatial neighbourhoods to represent the ecological characteristics of mountain pine beetle spatial processes as terrain impediments to dispersal can be captured.

Study area and data

The case study area is the southern portion of the Vanderhoof Forest District located in central British Columbia. This area of Vanderhoof was chosen due to its topographic variability, ranging from being flat in the north to being mountainous in the south. The Vanderhoof Forest District has experienced epidemic levels of mountain pine beetle over the last decade (Nelson et al. 2006a).

Vanderhoof Forest District monitors the mountain pine beetle infestation using point-based, GPS-based aerial surveys (Nelson et al. 2006a). Aerial surveys of mountain pine beetle infestations use indicators of pine mortality, mainly changes in crown foliage colour, to monitor mountain pine beetle activity. During aerial surveys, clusters of visually infested trees are identified, typically those with yellow and red crowns, indicating mortality 1 to 2 years prior to infestation, and a GPS is used to map cluster centres with a point. For each cluster, the number of infested trees is estimated and the infesting insect species, recorded. Attributes have been shown to be accurate to ± 10 trees for 92.6% of points (Nelson et al. 2006b). In the current study, there are a total of 14,117 GPS points representing 170,639 infested trees.

A DEM was used to derive watershed boundaries. The elevation model had 25-m² grid cells and was created from the 1:20,000 scale of Terrain Research Information Management data (Province of British Columbia 1996). The data were interpolated using a linear interpolation process, and the DEM is reported to be accurate within 10 m (Province of British Columbia 1996).