0% found this document useful (0 votes)
13 views34 pages

Spatial Data Analysis Techniques

Chapter 2 of Applied Spatial Statistics focuses on spatial data analysis, covering point pattern data, lattice data, spatial autocorrelation measures, and spatial auto-regressive models. It emphasizes the importance of spatial relationships and the construction of spatial weights matrices for analyzing geographic influences. The chapter also discusses practical applications and examples, such as tree clustering and environmental impacts on health data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views34 pages

Spatial Data Analysis Techniques

Chapter 2 of Applied Spatial Statistics focuses on spatial data analysis, covering point pattern data, lattice data, spatial autocorrelation measures, and spatial auto-regressive models. It emphasizes the importance of spatial relationships and the construction of spatial weights matrices for analyzing geographic influences. The chapter also discusses practical applications and examples, such as tree clustering and environmental impacts on health data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Applied Spatial Statistics (Stat 4121)

Chapter 2-Spatial Data Analysis

Bedilu A. Ejigu
Department of Statistics
Addis Ababa University, Ethiopia

Stat 4121 Bedilu A. Ejigu Chapter 2 1 / 34


Outline

Basics on point pattern data (clustering, spatial point processes)


Lattice data and analysis (spatial weight construction)
Spatial autocorrelation measures
Areal data visualization
Spatial auto-regressive models
Practical lab examples

Stat 4121 Bedilu A. Ejigu Chapter 2 2 / 34


Spatial point pattern data

A spatial point process generates realizations that are finite or countably infinite sets of
points in the plane.
Goal: describe, model, and infer structure in the locations (and optionally marks) of
events/objects.
Observed locations {xi }ni=1 ⊂ W ⊂ R2 of events/objects.

What to know
Characterizations of point-process distributions (e.g., via product densities / generating
functionals).
Campbell theorem & moment measures (mean/variance; higher-order moments).
Palm theory: interior/exterior conditioning (distribution seen from a typical point).

Stat 4121 Bedilu A. Ejigu Chapter 2 3 / 34


Motivating example (forest stand)

Data: 270 trees in a 100 × 100 m plot.


Phenomena to capture:
Inhibition among large trees (competition for space/nutrients).
Clustering of seedlings near adults (recruitment).
Environmental heterogeneity (e.g., soil fertility) driving apparent clusters.
Visual evidence of clustering (short-range attraction) and inhibition (preferred spacing).
Large (older) trees exhibit preferred inter-tree distance (e.g., ≈ 4 m) ⇒ potential
design/management signal.

Marked point patterns (adding attributes)


Each point carries a mark (e.g., diameter, species, health status): (xi , mi ).
Subpatterns (small vs large trees) reveal structure: small trees often fill gaps around large
trees.
Practical question: are marks independent of locations or do we see mark–location
interaction?
Stat 4121 Bedilu A. Ejigu Chapter 2 4 / 34
Examples: Simulated point process data

Intensity

Stat 4121 Bedilu A. Ejigu Chapter 2 5 / 34


Outline

1 Areal/Lattice Data Analysis

Stat 4121 Bedilu A. Ejigu Chapter 2 6 / 34


Spatial influence

Goal: quantify how geographic objects affect each other ("spatial influence").
Encoded via adjacency or (inverse) distance; commonly stored in a spatial weights
matrix W
True influence is complex and unobservable =⇒ we use practical proxies.
Examples for polygons (e.g., countries):
Binary adjacency (share a border = 1, else 0)
Centroid distance ("as-the-crow-flies")
Border length (longer shared boundary =⇒ stronger influence)

SEDA
How to explore spatial relationships with the variable of interest?
spatial weight construction
Global and local spatial autocorrelation Versus variogram
How to handle spatial data in R
Stat 4121 Bedilu A. Ejigu Chapter 2 7 / 34
Adjacency

Objects are adjacent if they “touch” (e.g., neighboring countries) or if they are within a
distance threshold (common for point data).
Build an adjacency matrix A from a distance matrix D:
Threshold rule: Aij = I{Dij ≤ 50} (units as in D).
Diagonal convention: set Aii = NA (or 0); no self-adjacency.
Convert TRUE/FALSE to 1/0 as needed (e.g., multiply by 1).

Stat 4121 Bedilu A. Ejigu Chapter 2 8 / 34


Two nearest neighbours (k-NN adjacency)
Build a k-nearest neighbours adjacency (e.g., k = 2, 3, 4):
For each row i, rank columns j by increasing Dij .
Mark the k smallest non-self distances as neighbours:
(
(k ) 1, if j is among the k nearest to i,
Aij =
0, otherwise.
Useful under varying point density: ensures each feature has k neighbours.
Weights matrix
Use continuous weights instead of binary adjacency:
Inverse distance: wij = Dij−α with α > 0.
Kernel decay (e.g., Gaussian), or weights based on shared border length.
.
Row-normalize so each row sums to 1: w̃ij = wij ∑j wij .
Handle Inf/NA carefully:
Self-distances Dii = 0 ⇒ 1/0 = ∞; set diagonals to 0 or NA before computing W .
Replace remaining ∞ with NA, then normalize over valid entries only.
Stat 4121 Bedilu A. Ejigu Chapter 2 9 / 34
Basic activities to do SEDA

Type of spatial data and research question


Areal/lattice, geostatistics, point pattern
Assessment of spatial dependency
Autocorrelation: Occur in either space(spatial autocorrelation) or time(temporal
autocorrelation)
spatio-temporal autocorrelation
Stationarity, Non-stationarity
Stationarity: process is not changing with respect to either time or space.
Non-Stationarity: the process changes with respect to time or space.
Global and Local Statistics
Global: one statistic is used to adequately summarize the data
Local:useful when the process you are studying varies over space, i.e. different areas have
different local values that might cluster together to form a local deviation from the overall
mean
Neighborhoods:who is next to who
How the weights constructed (distance, queen, rock, )
Stat 4121 Bedilu A. Ejigu Chapter 2 10 / 34
Spatial Autocorrelation Learning Objectives

Understand what spatial autocorrelation is and why it matters for spatial statistical
analysis.
Understand how a spatial weights matrix is constructed.
Understand the difference between global and local spatial autocorrelation methods.
Know the differences among Local Moran’s I, Local Geary’s C , and Getis–Ord Gi /Gi∗ .
Perform local spatial autocorrelation and hotspot analysis on imported data and interpret
the results.

Stat 4121 Bedilu A. Ejigu Chapter 2 11 / 34


Spatial Autocorrelation

Spatial autocorrelation measures quantify the correlation of the spatial random field Y (l )
with itself at different locations.
Different statistics have been developed to test for the presence and magnitude of spatial
association among areal units
1 Moran’s I, and Geary’s C (see Cressie 1993, Banerjee 2015, for discussion).
2 B-statistics (Ejigu 2020)
Summary of numeric scales for Moran’s I, Geary’s C and B-statistic indices
Spatial pattern Geary’s C Moran’s I B (spatio-env)

Clustered, similar (+ρ) 0<C <1 I > E[I ] B > E[B ]


Random, independent (ρ = 0) C =1 I ≈ E[I ] B ≈ E[B ]
Dissimilar, contrasting (−ρ) C >1 I < E[I ] B < E[B ]
Main difference between Moran’s I and Geary’s C statistics are:
Moran’s index is based on populations, while Geary’s index is based on samples, and
The numerical range values of both index are different.
Stat 4121 Bedilu A. Ejigu Chapter 2 12 / 34
Spatial Autocorrelation

Question: Is the value of an outcome at a location influenced by its neighbours?


Examples: disease status, housing prices, population density.
Visual assessment alone may be insufficient:
How confident are we that a significant spatial pattern exists?
How can we quantify the spatial pattern?
Compare xi (at location i) with neighbours xj and with the overall dataset.
Different measures address distinct questions:
Moran’s I: are deviations from the mean similar across neighbours? (xi − x̄ ) vs (xj − x̄ )
Geary’s C : are pairwise differences (xi − xj ) small/large relative to variance?
Getis–Ord G/G ∗ : are local products/sums of values large or small compared to global?
Neighbour definition is essential ⇒ weights.

Stat 4121 Bedilu A. Ejigu Chapter 2 13 / 34


Global vs Local Spatial Autocorrelation

Global: combines information into a single statistic for the dataset.


Question: Is there overall spatial autocorrelation?
Local: computes a statistic for each location.
Components/“decomposition” of global measures.
Multiple values per dataset (one per feature).
Focuses on location–neighbour interactions.
Identifies where clusters are located.
Distinguishes cluster type:
High–High (hotspots)
Low–Low (coldspots)
High–Low and Low–High (outliers)
Use the same spatial weights matrix in both settings.

Stat 4121 Bedilu A. Ejigu Chapter 2 14 / 34


Moran’s I statistic and spatial weighting matrix

Moran’s I statistic
n ∑i ∑j wij (Yi − Ȳ )(Yj − Ȳ )
I= (1)
(∑i ̸=j wij ) ∑i (Yi − Ȳ )2
The choice of a weighting matrix is a central component of Moran’s I as it assumes prior
structure of spatial dependence.
Specification of spatial weights matrix starts by identifying neighborhood structure of
each cell.
Spatial weighting matrix is created based on the concept of 1st law of geography.
1 Rook Contiguity
2 Queen Contiguity
3 q-nearest neighbors of cases
4 Exponential distance weights

Stat 4121 Bedilu A. Ejigu Chapter 2 15 / 34


Weighting Matrix Construction Approaches

Typically, a spatial weighting matrix is created based on the concept of geographical


distance/neighborhood.
Common type of weighting matrices (Waller & Gotway, 2004; Getis, 2009):
(
1, if location i and j share boundaries,
1 Rook Contiguity wij =
0 , otherwise.
2 Queen Contiguity: each neighboring cell in all directions are given the value 1.

Stat 4121 Bedilu A. Ejigu Chapter 2 16 / 34


Rook and Queen Spatial weighting matrix construction Example

Row standardization: each neighbour receives proportional weight so row sums are 1
(e.g., Rook → 0.50, Queen → 0.33 per neighbour in this toy example).
Stat 4121 Bedilu A. Ejigu Chapter 2 17 / 34
Weighting Matrix Construction Approaches

3 q-nearest
( neighbors of cases:
1 , if site i is among q nearest neighbors of site j,
wij =
0 , otherwise.
4. Exponential distance weights:
wij = exp (−pdij ), where dij represents Euclidean distance
Limitations of distance based-functions to create weighting matrices in spatial statistics is not well
explored (Earnest et al. 2007, Wang et al. 2013).
Methods of constructing weights should take into account how the outcome variable is generated
over-spatial units under consideration.
None of these tools directly represents dynamic aspects of environmental effects on the
occurrence of the outcome variable.

Stat 4121 Bedilu A. Ejigu Chapter 2 18 / 34


Motivation to consider 3rd LG

Prevalence of environmentally-mediated diseases highly linked with environment =⇒


Environmental configuration
Meuse River Dataset - Heavy Metal Concentration
Distance from the river has impact on lead concentration

Correlation decreases as distance from the river increases.

⇐ Correlogram plot using


1st law of geography

⇐ Correlogram plot using


3rd law of geography.
Stat 4121 Bedilu A. Ejigu Chapter 2 19 / 34
Other weighting matrix construction approaches

Two locations close geographically but separated by other barriers may not be considered
as near neighbors (Ejigu & Wencheko 2020).
To create a meaningful weighting matrix:
1. decide which relationship between observation are to be given non-zero weights
2. assign weights to identified neighbor links:
a) environmental contiguity, b) using meaningful functions

Similar symbol represents similar


environmental condition.
Ejigu et al 2020 weighting matrix

mij = exp(−(αue + (1 − α)us )), (2)


where ue = |ei − ei′ | is the absolute difference in the
environmental covariate between two locations, and
us = ||s − s ′ || is the euclidean distance between two
locations
Environmentally, for site 1 neighbors are not 2 and 4, rather 3,5,7.
Stat 4121 Bedilu A. Ejigu Chapter 2 20 / 34
Computing Moran’s I Statistics Example

N = number of observations, x̄ = 1
N ∑ xi , W = ∑ ∑ wij .
i i j

N ∑i ∑j wij (xi − x̄ )(xj − x̄ )


I= .
W ∑i (xi − x̄ )2

Toy grid (N = 4), mean x̄ = 27, Rook weights with W = 8.


Values and deviations Cross-deviations summary
Cell x x − x̄
A 36 9 ∑i,j wij (xi − x̄ )(xj − x̄ ) = −72
B 22 -5 ∑i (xi − x̄ )2 = 116
C 26 -1
D 24 -3 4 −72
I= · ≈ −0.31
8 116

Stat 4121 Bedilu A. Ejigu Chapter 2 21 / 34


Local Moran’s I Definition

Let xi be the value at location i, x̄ the mean, N be the number of observations, x̄ the mean, wij spatial
weights (wii = 0), and W = ∑i ∑j wij . For location i,

N (xi − x̄ ) ∑j wij (xj − x̄ )


Ii = .
W ∑k (xk − x̄ )2

Inputs: x (outcome), indices i, j, weights wij , dataset mean x̄ , total weight W .


Interpretation (range ≈ [−1, 1]): Ii > 0 cluster [suggests a local cluster (high with high, or low
with low)], Ii = 0 none, Ii < 0 outlier[suggests a local spatial outlier (high with low, or low with
high)].

Stat 4121 Bedilu A. Ejigu Chapter 2 22 / 34


Local Moran’s I Example
Toy 2 × 2 grid with values:

36 22
⇒ x̄ = 27, N = 4, Rook weights with W = 8.
26 24

Cross-deviations (per cell):

A : (36 − 27)(22 − 27) + (36 − 27)(26 − 27) = −45 + (−9) = −54


B : − 45 + 15 = −30
C : − 9 + 3 = −6
Denominator ∑(x − x̄ )2 = 116
D : 15 + 3 = 18

Local indices:
4 −54 4 −30
·IA = = −0.23 , IB = · = −0.13 ,
8 116 8 116
4 −6 4 18
IC = · = −0.03 , ID = · = 0.08 .
8 116 8 116
Type labels: High–High, Low–Low (clusters); High–Low, Low–High (outliers).
Stat 4121 Bedilu A. Ejigu Chapter 2 23 / 34
Interpreting Results

Use z-scores and p-values based on observed vs. expected and variance.
If p > α: no evidence of autocorrelation.
If p < α:
Positive z: positive spatial autocorrelation.
Negative z: negative spatial autocorrelation.
Moran scatter plot: Moran’s I equals the slope of the regression line.

Interpreting Results (Quadrant Plot)


Top Right: High–High cluster (hotspot).
Bottom Left: Low–Low cluster (coldspot).
Top Left: Low–High outlier.
Bottom Right: High–Low outlier.

Stat 4121 Bedilu A. Ejigu Chapter 2 24 / 34


Geary’s C
Geary’s C test statistics
N − 1 ∑i ∑j wij (xi − xj )2
C= .
2W ∑i (xi − x̄ )2

Interpretation: C < 1 positive autocorrelation; C = 1 none; C > 1 negative autocorrelation.


Geary’s C worked example ( Using Rook weights, N − 1 = 3, 2W = 16.)

∑i,j wij (xi − xj )2 = 608, ∑i (xi − x̄ )2 = 116


3 608
C= · ≈ 0.98
16 116
Local Geary’s C , for location i,
N − 1 ∑j wij (xi − xj )2
Ci = .
2W ∑k (xk − x̄ )2

Interpretation (range [0, 2]): Ci < 1 cluster, Ci = 1 none, Ci > 1 outlier.


Stat 4121 Bedilu A. Ejigu Chapter 2 25 / 34
Getis–Ord G and G ∗

∑i ∑j wij xi xj
G= , (for G ∗ , include i = j in the numerator).
∑i ∑j xi xj

No centering by x̄ ; not advised to use row standardization.


Cannot identify negative autocorrelation (focus on hot/cold spots).
Interpret relative to expected value E[G ] = W
N (N −1)
(for simple binary wij , i ̸= j).
Getis–Ord G example with Rook weights W = 8, N = 4.
Numerator ∑i,j wij xi xj = 5760
Denominator ∑i,j xi xj = 4316
G = 5760/4316 ≈ 1.33
W 8
Expected G = = ≈ 0.67
N (N − 1) 4·3
G > E[G ] ⇒ potential hotspots.
Stat 4121 Bedilu A. Ejigu Chapter 2 26 / 34
Local Geary’s C (worked example)

Same toy grid, Rook weights; N − 1 = 3, 2W = 16.

Differences (A) : 142 + 102 = 296,


Differences (B) : 142 + (−2)2 = 200,
Differences (C) : 102 + 22 = 104,
∑(x − x̄ )2 = 116.
Differences (D) : 22 + (−2)2 = 8.

Local indices:
3 296 3 200 3 104 3 8
CA = · = 0.48, CB = · = 0.32, CC = · = 0.17, CD = · = 0.01.
16 116 16 116 16 116 16 116
Type labels: Ci < 1 cluster (HH/LL); Ci > 1 outlier.

Stat 4121 Bedilu A. Ejigu Chapter 2 27 / 34


Getis–Ord Gi / Gi∗ (definition)

Let x be the outcome, wij spatial weights. No adjustment by N and no centering by x̄ . Not advised to
row-standardize.
∑j wij xj
Gi∗ = , (for Gi , exclude j = i in the numerator).
∑ j xj

Interpreted relative to expected value (for simple binary wij , i ̸= j):

W
E[G ] = .
N (N − 1)

Detects hot/cold spots; does not diagnose negative autocorrelation.

Stat 4121 Bedilu A. Ejigu Chapter 2 28 / 34


Getis–Ord Gi∗ (worked example)
Toy grid with Rook weights: N = 4, W = 8.

∑ x = 36 + 22 + 26 + 24 = 108.
Neighbour sums (including i for Gi∗ ):
84
A : 36 + 22 + 26 = 84 ⇒ GA∗ = = 0.78,
108
82
B : 22 + 36 + 24 = 82 ⇒ GB∗ = = 0.76,
108
86
C : 26 + 36 + 24 = 86 ⇒ GC∗ = = 0.80,
108
72
D : 24 + 22 + 26 = 72 ⇒ GD∗ = = 0.67.
108
Expected value:
W 8
E[G ] = = ≈ 0.67.
N (N − 1) 4·3
Rule of thumb: G ∗ > E[G ] potential hotspots; G ∗ < E[G ] potential coldspots.
Stat 4121 Bedilu A. Ejigu Chapter 2 29 / 34
Constructing neighborhoods for areal data using R

The spdep package has a number of function to construct neighborhoodness based on


different criteria.
poly2nb: to construct the list of neighbors based on areas with contiguous boundary.
knearneigh : to obtain matrix of indices of points belonging to the set of the k nearest
neighbors.
knn2nb: to convert the list obtained using knearneigh function into a neighbor list.
dnearneigh: to get list of neighbors based on a distance between specific limits.
nblab & nblag: to get neighbors of order k based on contiguity.
nb2listw: to create a spatial neighborhood matrix containing the spatial weights
corresponding to a neighbors list.
Review the RMarkdown output "Spatial [Link]" file for further details and practical
examples using R.

Stat 4121 Bedilu A. Ejigu Chapter 2 30 / 34


Practice Weighting matrix construction using R
Consider the admin2 boundaries (Zones) of Ethiopia, and
1 Visualize which zone is a neighbor of others?
2 Identify zones with maximum and minimum number of neighbors

Stat 4121 Bedilu A. Ejigu Chapter 2 31 / 34


Moran I scatter plot

Moran’s I scatter plot: A plot of spatial data against its spatially lagged values.
What the two Moran’s I scatter plot tells us?
In which dataset spatial dependency demonstrated?

Stat 4121 Bedilu A. Ejigu Chapter 2 32 / 34


Hot-spot or Cluster identification
Local Moran’s I used to identify clusters of the following types:
High-High: areas of high values with neighbors of high values,
High-Low: areas of high values with neighbors of low values,
Low-High: areas of low values with neighbors of high values,
Low-Low: areas of low values with neighbors of low values.

Stat 4121 Bedilu A. Ejigu Chapter 2 33 / 34


Practical Session

In this practical session, you will learn the following:


1 How to define/construct spatial neighborhood using poly2nbfunction;
2 How to convert spatial neighborhood into spatial weights;
3 How to get global and local Moran’s I statistics to check spatial autocorrelation;
4 Cluster identification (if any).
Instructions for the lab practice
Load "[Link] " and "SA_SDP_HIV .csv " datasets and be familiar with the variables
in the data.
Check the columns of the data and ensure that you understand what each column means.
Get and visualize the shapefiles of Ethiopia nd South-Africa
Construct spatial neighborhood matrix and compute Moran’s I statistics to check the
presence of spatial autocorrelation.
For detailed support and get all codes, refer the Rmarkdown output "Spatial [Link]"
file.

Stat 4121 Bedilu A. Ejigu Chapter 2 34 / 34

You might also like