0% found this document useful (0 votes)
19 views60 pages

Geospatial Analysis: Principles & Techniques

The document is the 7th edition of 'Geospatial Analysis: A Comprehensive Guide to Principles, Techniques and Software Tools' authored by Dr. Michael J. de Smith, Prof. Michael F. Goodchild, and Prof. Paul A. Longley. It provides an extensive overview of spatial analysis, GIS, and related software tools, along with methodologies for geospatial project design and research practices. The guide includes various chapters covering concepts, statistical methods, data exploration, and surface analysis, aimed at both practitioners and researchers in the field.

Uploaded by

Gemeda Bedasa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views60 pages

Geospatial Analysis: Principles & Techniques

The document is the 7th edition of 'Geospatial Analysis: A Comprehensive Guide to Principles, Techniques and Software Tools' authored by Dr. Michael J. de Smith, Prof. Michael F. Goodchild, and Prof. Paul A. Longley. It provides an extensive overview of spatial analysis, GIS, and related software tools, along with methodologies for geospatial project design and research practices. The guide includes various chapters covering concepts, statistical methods, data exploration, and surface analysis, aimed at both practitioners and researchers in the field.

Uploaded by

Gemeda Bedasa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Geospatial Analysis

A Comprehensive Guide to Principles


Techniques and Software Tools

7th edition

Dr Michael J de Smith, Prof Michael F Goodchild


Prof Paul A Longley & Associates
Geospatial Analysis
A Comprehensive Guide to Principles
Techniques and Software Tools

7th edition; Updated July 2025

Dr Michael J de Smith, Prof Michael F Goodchild


Prof Paul A Longley & Associates
Copyright © 2007-2025 All Rights reserved. 7th Edition. Issue version: 2025-1

No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form
or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except
under the terms of the UK Copyright Designs and Patents Act 1998 or with the written permission of
the authors. The moral right of the authors has been asserted. Copies of this edition are available in
print, electronic book and web-accessible formats. Users of mono print versions should refer to the
web or special PDF versions for color images.

Disclaimer: This publication is designed to offer accurate and authoritative information in regard to
the subject matter. It is provided on the understanding that it is not supplied as a form of professional
or advisory service. References to software products, datasets or publications are purely made for
information purposes and the inclusion or exclusion of any such item does not imply recommendation
or otherwise of the product or material in question.

For more details please refer to the Guide’s website: [Link]

ISBN-13

978-1-912556-06-9 Hardback
978-1-912556-07-6 Paperback
978-1-912556-08-3 ebook

The cover image background map is © Crown Copyright Ordnance Survey 2014-18 and shows part of
central London, UK from the London Output Area Classification carried out by Dr Alex Singleton and
Prof Paul Longley
3D visualization of modeled road-related noise levels in
an urbanized area

Visualization using CadnaA software, courtesy of Accon GmbH & DataKustik GmbH
Optimized service center location and allocated demand
Tripolis, in Arcadia, Greece

Coverage or p center location optimization problem. See Section 7.4.2 for more details. Map
produced using S-Distance software (2006), courtesy of S A Sirigos
Acknowledgements
The authors would like to express their particular thanks to the following individuals and organizations:
Accon GmbH, Greifenberg, Germany for permission to use the noise mapping images on the inside cover
of this Guide and in Figure 3-4; Prof D Martin for permission to use Figure 4-19 and Figure 4-20; Prof D
Dorling and colleagues for permission to use Figure 4-50 and Figure 4-52; Dr K McGarigal for permission
to use the Fragstats summary in Section 5.3.4; Prof Wei Li and colleagues, China National Laboratory for
High Speed Railway Construction for permission to use Figures 4-69A and 4-69B; Dr H Kristinsson, Faculty
of Engineering, University of Iceland for permission to use Figure 4-70; Dr S Rana, formerly of the Center
for Transport Studies, University College London for permission to use Figure 6-24; Prof B Jiang,
Department of Technology and Built Environment of University of Gävle, Sweden for permission to use
the Axwoman software and sample data in Section [Link] and the inside cover page; Dr G Dubois,
European Commission (EC), Joint Research Center Directorate (DG JRC) for comments on parts of
Chapter 6 and permission to use material from the original AI-Geostats website; Geovariances (France)
for provision of an evaluation copy of their Isatis geostatistical software; F O’Sullivan for use of Figure
6-41; Profs A Okabe, K Okunuki and S Shiode (Center for Spatial Information Science, Tokyo University,
Japan) for use of their SANET software and sample data; and S A Sirigos, University of Thesally, Greece
for permission to use his Tripolis dataset, the provision of his S-Distance software, and comments on
part of Chapter 7. Sections 8.1 and 8.2 of Chapter 8 are substantially derived from material researched
and written by Christian Castle and Andrew Crooks (updated for the latest editions by Andrew) with the
financial support of the Economic and Social Research Council (ESRC), Camden Primary Care Trust
(PCT), and the Greater London Authority (GLA) Economics Unit. The recommended citation of this
Chapter 8 is: "Castle C, Crooks A T, de Smith M J, Goodchild M F and Longley P A (2024)
Geocomputational methods and modeling, Chapter 8 in de Smith M J, Goodchild M F, and Longley P A
(2024) Geospatial Analysis: A comprehensive guide to principles, techniques and software tools, 7th
edition, The Winchelsea Press, UK”. Chapter 9 was originally written by Guy Lansley of the UCL
Consumer Data Research Centre and has subsequently been amended and devloped by the principal
authors. The recommended citation of Chapter 9 is: "Lansley G, de Smith M J, Goodchild M F and
Longley P A (2024) Big Data and Geospatial Analysis, Chapter 9 in de Smith M J, Goodchild M F, and
Longley P A (2024) Geospatial Analysis: A comprehensive guide to principles, techniques and software
tools, 7th edition, The Winchelsea Press, UK”.

We would also like to express our thanks to the many users of the book and website for their comments,
suggestions and occasionally, corrections. Particular thanks for corrections go to Bryan Thrall, Juanita
Francis-Begay, Paul Johnson, Nathan Franz and Prof Liem Tran. A number of the maps displayed in this
Guide, notably those in Chapter 6, have been created using GB Ordnance Survey data provided via the
EDINA Digimap/JISC service. These datasets and other GB OS data illustrated including the cover image
are © Crown Copyright and database right Ordnance Survey 2014-18. Every effort has been made to
acknowledge and establish copyright of materials used in this publication. The cover image shows part
of central London, UK from the London Output Area Classification carried out by Dr Alex Singleton and
Prof Paul Longley. Anyone with a query regarding any items in this document should contact the authors
via the Guide’s website, [Link]

de Smith, Goodchild, Longley and Associates [Link]


9

Table of Contents
1 Overview and Terminology 19
1.1 Spatial analysis, GIS and software tools 23
1.2 Intended audience and scope 28
1.3 Software tools and Companion Materials 29
1.3.1 GIS and related software tools 29
1.3.2 Suggested reading 31

1.4 Terminology and Abbreviations 34


1.4.1 Definitions 34

1.5 Common Measures and Notation 42


1.5.1 Notation 42
1.5.2 Statistical measures and related formulas 43

2 Conceptual Frameworks for Spatial Analysis 57


2.1 Basic Primitives 60
2.1.1 Place 60
2.1.2 Attributes 61
2.1.3 Objects 63
2.1.4 Maps 64
2.1.5 Multiple properties of places 65
2.1.6 Fields 66
2.1.7 Networks 67
2.1.8 Density estimation 67

2.1.9 Detail, resolution, and scale 68


2.1.10 Topology 69

2.2 Spatial Relationships 70


2.2.1 Co-location 70
2.2.2 Distance, direction and spatial weights matrices 70

2.2.3 Multidimensional scaling 72


2.2.4 Spatial context 72
2.2.5 Neighborhood 72
2.2.6 Spatial heterogeneity 73

de Smith, Goodchild, Longley and Associates [Link]


10

2.2.7 Spatial dependence 74


2.2.8 Spatial sampling 74
2.2.9 Spatial interpolation 75

2.2.10 Smoothing and sharpening 75


2.2.11 First- and second-order processes 76

2.3 Spatial Statistics 77


2.3.1 Spatial probability 77
2.3.2 Probability density 77

2.3.3 Uncertainty 77
2.3.4 Statistical inference 78

2.4 Spatial Data Infrastructure 80


2.4.1 Geoportals 80
2.4.2 Metadata 81
2.4.3 Interoperability 81
2.4.4 Conclusion 81

3 Geospatial Project Design and Research Practice 83


3.1 Analytical methodologies 86
3.2 Spatial analysis as a process 91
3.3 Spatial analysis and the PPDAC model 93
3.3.1 Problem: Framing the question 95
3.3.2 Plan: Formulating the approach 96
3.3.3 Data: Data acquisition 97

3.3.4 Analysis: Analytical methods and tools 99


3.3.5 Conclusions: Delivering the results 101

3.4 Geospatial analysis and model building 102


3.5 Research-ready data 107
3.5.1 A retrospective on sampling and inference in social investigation 108

3.5.2 The design of Smart Data Infrastructure 108


3.5.3 Access to RRD 110
3.5.4 Creation and maintenance of geospatial RRD 112

3.6 The changing context of GIScience 113

4 Building Blocks of Spatial Analysis 115

de Smith, Goodchild, Longley and Associates [Link]


11

4.1 Spatial and Spatio-temporal Data Models and Methods 118


4.2 Geometric and Related Operations 123
4.2.1 Length and area for vector data 123

4.2.2 Length and area for raster datasets 125


4.2.3 Surface area 127
4.2.4 Line Smoothing and point-weeding 131
4.2.5 Centroids and centers 133
4.2.6 Point (object) in polygon (PIP) 140
4.2.7 Polygon decomposition 141
4.2.8 Shape 142
4.2.9 Overlay and combination operations 144
4.2.10 Areal interpolation 147
4.2.11 Districting and re-districting 149
4.2.12 Classification and clustering 155
4.2.13 Boundaries and zone membership 168
4.2.14 Tessellations and triangulations 177

4.3 Queries, Computations and Density 183


4.3.1 Spatial selection and spatial queries 183
4.3.2 Simple calculations 183
4.3.3 Ratios, indices, normalization, standardization and rate smoothing 187
4.3.4 Density, kernels and occupancy 191

4.4 Distance Operations 207


4.4.1 Metrics 209

4.4.2 Cost distance 215


4.4.3 Distance Transforms 221
4.4.4 Network distance 229
4.4.5 Buffering 230
4.4.6 Distance decay models 233

4.5 Directional Operations 237


4.5.1 Directional analysis of linear datasets 237
4.5.2 Directional analysis of point datasets 242
4.5.3 Directional analysis of surfaces 244

4.6 Grid Operations and Map Algebra 246


4.6.1 Operations on single and multiple grids 246
4.6.2 Linear spatial filtering 247

de Smith, Goodchild, Longley and Associates [Link]


12

4.6.3 Non-linear spatial filtering 250


4.6.4 Erosion and dilation 251

5 Data Exploration and Spatial Statistics 253


5.1 Statistical Methods and Spatial Data 256
5.1.1 Descriptive statistics 258
5.1.2 Spatial sampling 259

5.2 Exploratory Spatial Data Analysis 267


5.2.1 EDA, ESDA and ESTDA 267
5.2.2 Outlier detection 270
5.2.3 Cross tabulations and conditional choropleth plots 273
5.2.4 ESDA and mapped point data 276
5.2.5 Trend analysis of continuous data 277

5.2.6 Cluster hunting and scan statistics 278

5.3 Grid-based Statistics and Metrics 280


5.3.1 Overview of grid-based statistics 280
5.3.2 Crosstabulated grid data, the Kappa Index and Cramer’s V statistic 281
5.3.3 Quadrat analysis of grid datasets 283

5.3.4 Landscape Metrics 285

5.4 Point Sets and Distance Statistics 292


5.4.1 Basic distance-derived statistics 292
5.4.2 Nearest neighbor methods 293
5.4.3 Pairwise distances 297
5.4.4 Hot spot and cluster analysis 302
5.4.5 Proximity matrix comparisons 307

5.5 Spatial Autocorrelation 309


5.5.1 Autocorrelation, time series and spatial analysis 309
5.5.2 Global spatial autocorrelation 311

5.5.3 Local indicators of spatial association (LISA) 325


5.5.4 Significance tests for autocorrelation indices 328

5.6 Spatial Regression 330


5.6.1 Regression overview 330
5.6.2 Simple regression and trend surface modeling 335

5.6.3 Geographically Weighted Regression (GWR) 338


5.6.4 Spatial autoregressive and Bayesian modeling 342

de Smith, Goodchild, Longley and Associates [Link]


13

5.6.5 Spatial filtering models 350

6 Surface and Field Analysis 353


6.1 Modeling Surfaces 356
6.1.1 Test datasets 356
6.1.2 Surfaces and fields 357
6.1.3 Raster models 359
6.1.4 Vector models 361

6.1.5 Mathematical models 363


6.1.6 Statistical and fractal models 364

6.2 Surface Geometry 366


6.2.1 Gradient, slope and aspect 366
6.2.2 Profiles and curvature 372

6.2.3 Directional derivatives 377


6.2.4 Paths on surfaces 377
6.2.5 Surface smoothing 379
6.2.6 Pit filling 380
6.2.7 Volumetric analysis 381

6.3 Visibility 382


6.3.1 Viewsheds and RF propagation 382
6.3.2 Line of sight 385
6.3.3 Isovist analysis and space syntax 386

6.4 Watersheds and Drainage 389


6.4.1 Drainage modeling 389
6.4.2 D-infinity model 390
6.4.3 Drainage modeling case study 391

6.5 Gridding, Interpolation and Contouring 393


6.5.1 Overview of gridding and interpolation 393

6.5.2 Gridding and interpolation methods 394


6.5.3 Contouring 400

6.6 Deterministic Interpolation Methods 402


6.6.1 Inverse distance weighting (IDW) 403
6.6.2 Natural neighbor 406

6.6.3 Nearest-neighbor 408


6.6.4 Radial basis and spline functions 408

de Smith, Goodchild, Longley and Associates [Link]


14

6.6.5 Modified Shepard 410


6.6.6 Triangulation with linear interpolation 411
6.6.7 Triangulation with spline-like interpolation 411

6.6.8 Rectangular or bi-linear interpolation 412


6.6.9 Profiling 412
6.6.10 Polynomial regression 412
6.6.11 Minimum curvature 413
6.6.12 Moving average 413

6.6.13 Local polynomial 414


6.6.14 Topogrid/Topo to raster 414

6.7 Geostatistical Interpolation Methods 416


6.7.1 Core concepts in Geostatistics 417
6.7.2 Kriging interpolation 430

7 Network and Location Analysis 439


7.1 Introduction to Network and Location Analysis 442
7.1.1 Terminology 442
7.1.2 Source data 444

7.1.3 Algorithms and computational complexity theory 445

7.2 Key Problems in Network and Location Analysis 447


7.2.1 Overview - network and locational analysis 447
7.2.2 Heuristic and meta-heuristic algorithms 454

7.3 Network Construction, Optimal Routes and Optimal Tours 465


7.3.1 Minimum spanning tree 465
7.3.2 Gabriel network 465
7.3.3 Steiner trees 467
7.3.4 Shortest (network) path problems 468
7.3.5 Tours, travelling salesman problems and vehicle routing 473

7.4 Location and Service Area Problems 479


7.4.1 Location problems 479
7.4.2 Larger p-median and p-center problems 481
7.4.3 Service areas 487

7.5 Arc Routing 489


7.5.1 Network traversal problems 489

de Smith, Goodchild, Longley and Associates [Link]


15

8 Geocomputational methods and modeling 493


8.1 Introduction to Geocomputation 496
8.1.1 Modeling dynamic processes within GIS 497

8.2 Geosimulation 502


8.2.1 Cellular automata (CA) 502
8.2.2 Agents and agent-based models 506
8.2.3 Applications of agent-based models 508

8.2.4 Advantages of agent-based models 514


8.2.5 Limitations of agent-based models 516
8.2.6 Explanation or prediction? 516
8.2.7 Developing an agent-based model 518
8.2.8 Types of simulation/modeling (s/m) systems for agent-based modeling 520

8.2.9 Guidelines for choosing a simulation/modeling (s/m) system 521


8.2.10 Simulation/modeling (s/m) systems for agent-based modeling 523
8.2.11 Verification and calibration of agent-based models 533
8.2.12 Validation and analysis of agent-based model outputs 534

8.3 Artificial Neural Networks (ANN) 537


8.3.1 Introduction to artificial neural networks 537
8.3.2 Radial basis function networks 551
8.3.3 Self organizing networks 553

8.4 Genetic Algorithms and Evolutionary Computing 561


8.4.1 Genetic algorithms - introduction 561
8.4.2 Genetic algorithm components 562
8.4.3 Example GA applications 566
8.4.4 Evolutionary computing and genetic programming 570

9 Big Data, AI and Geospatial Analysis 571


9.1 Big Data and Research 575
9.2 Types of Big Data 579
9.2.1 Human-sourced data 579
9.2.2 Process-Mediated data 580
9.2.3 Machine-Generated data 582

9.3 Challenges of Big Data 585


9.3.1 Access 585

de Smith, Goodchild, Longley and Associates [Link]


16

9.3.2 Ethics 585


9.3.3 Data Quality 586
9.3.4 Repurposing Data 587

9.3.5 Demographic Bias 587


9.3.6 Spatial and Temporal Coverage 589
9.3.7 Unstructured Data 590
9.3.8 Data Linkage 591
9.3.9 Tools and Skills 592

9.4 Conclusions 594

10 Resources 595
10.1 Appendices 598
10.1.1 CATMOG Guides 598

10.1.2 R-Project spatial statistics software packages 600


10.1.3 Fragstats landscape metrics 603
10.1.4 Web links 607

11 References 611

de Smith, Goodchild, Longley and Associates [Link]


Foreword
This 7th edition 3rd release) includes the following principal changes from the 2018 6th edition: many small
changes to the text have been made; weblinks and associated information have been updated and/or added or
removed as appropriate; a new section on Research Ready Data (RRD) has been added; and the final Chapter has
been substantially amended in light of recent developments. Note that new versions of software tools referenced
in the text have not been re-run and re-tested so readers should refer to the latest versions of these software
tools and their documentation where appropriate. Users of mono print vesions should refer to the web or special
PDF versions for color images.

Geospatial Analysis: A Comprehensive Guide to Principles, Techniques and Software Tools originated as material
to accompany the spatial analysis module of MSc programmes at University College London delivered by the
principal author, Dr Mike de Smith. The project was discussed with Professors Longley and Goodchild. They kindly
agreed to contribute to the contents of the Guide itself. As such, this Guide may be seen as a companion to the
pioneering book on Geographic Information Systems and Science (now changed to Science and Systems) by
Longley, Goodchild, Maguire and Rhind, particularly the chapters that deal with spatial analysis and modeling.
Their participation has also facilitated links with broader “spatial literacy” and spatial analysis programmes.
Notable amongst these are the GIS&T Body of Knowledge materials provided by the Association of American
Geographers together with the spatial educational programmes provided through UCL and UCSB. The formats in
which this Guide has been published have proved to be extremely popular, encouraging us to seek to improve
and extend the material and associated resources further. Many academics and industry professionals have
provided helpful comments on previous editions, and universities in several parts of the world have now
developed courses which make use of the Guide and the accompanying resources. Workshops based on these
materials have been run in Ireland, the USA, East Africa, Italy and Japan, and a Chinese version of the Guide
(2nd ed.) has been published by the Publishing House of Electronics Industry, Beijing, PRC, [Link] in
2009. A Chinese version of this 6th edition is due to be published by Science Press.

A unique, ongoing, feature of this Guide is its independent evaluation of software, in particular the set of
readily available tools and packages for conducting various forms of geospatial analysis. To our knowledge, there
is no similarly extensive resource that is available in printed or electronic form. We remain convinced that there
is a need for guidance on where to find and how to apply selected tools. Inevitably, some topics have been
omitted, primarily where there is little or no readily available commercial or open source software to support
particular analytical operations. Other topics, whilst included, have been covered relatively briefly and/or with
limited examples, reflecting the inevitable constraints of time and the authors’ limited access to some of the
available software resources. Every effort has been made to ensure the information provided is up-to-date,
accurate, compact, comprehensive and representative - we do not claim it to be exhaustive. However, with
fast-moving changes in the software industry and in the development of new techniques and data sources it
would be impractical and uneconomic to publish the material in a conventional manner. Accordingly the Guide
has been prepared without intermediary typesetting. We would like to thank all those users of the book, for
their comments and suggestions which have assisted us in producing this latest edition.

Mike de Smith, UK, Mike Goodchild, USA, Paul Longley, UK, 2025 (7th edition,third release)

de Smith, Goodchild, Longley and Associates [Link]


Chapter

1
Overview and Terminology 21

1 Overview and Terminology


In this Guide we address the full spectrum of spatial analysis and associated modeling techniques that are
provided within currently available and widely used geographic information systems (GIS) and associated software.
Collectively such techniques and tools are often now described as geospatial analysis, although we use the more
common form, spatial analysis, in most of our discussions.
The term ‘GIS’ is widely attributed to Roger Tomlinson and colleagues, who used it in 1963 to describe their
activities in building a digital natural resource inventory system for Canada (Tomlinson 1967, 1970). The history of
the field has been charted in an edited volume by Foresman (1998) containing contributions by many of its early
protagonists. A timeline of many of the formative influences upon the field is provided in Longley et al. (2015,
p20). The research makes the unassailable point that the success of GIS as an area of activity has been driven by
the success of its applications in solving real world problems.
In order to cover such a wide range of topics, this Guide has been divided into a number of main sections or
chapters. These are then further subdivided, in part to identify distinct topics as closely as possible, facilitating
the creation of a web site from the text of the Guide. Hyperlinks embedded within the document enable users of
the web and PDF versions of this document to navigate around the Guide and to external sources of information,
data, software, maps, and reading materials.
Chapter 2 provides an introduction to spatial thinking, described by some as “spatial literacy”, and addresses the
central issues and problems associated with spatial data that need to be considered in any analytical exercise.
Readers who are already familiar with these concepts may wish to skip this chapter and move straight on to
Chapter 3.
Real-world problems and applications are typically governed by the organizational practices and procedures that
prevail with respect to particular places. Not only are there wide differences in the volume and remit of data that
the public sector collects about population characteristics in different parts of the world, but there are
differences in the ways in which data are collected, assembled and disseminated (e.g. general purpose censuses
versus statistical modeling of social surveys, property registers and tax payments). Data collected by the private
sector, often as a result of the use of services that automatically gather information on events, locations and
individuals, present a further challenge (see Chapter 9 for an extended discussion of this issue). There are also
differences in the ways in which different data holdings can legally be merged and the purposes for which data
may be used — particularly with regard to health and law enforcement data. Finally, there are geographical
differences in the cost of geographically referenced data. Some organizations, such as the US Geological Survey,
are bound by statute to limit charges for data to sundry costs such as media used for delivering data while others,
such as most national mapping organizations in Europe, are required to exact much heavier charges in order to
recoup much or all of the cost of data creation. Analysts may already be aware of these contextual considerations
through local knowledge, and other considerations may become apparent through browsing metadata catalogs. GIS
applications must by definition be sensitive to context, since they represent unique locations on the Earth’s
surface.
This initial discussion is followed in Chapter 3 by an examination of the methodological background to GIS analysis.
Initially we examine a number of formal methodologies and then apply ideas drawn from these to the specific case
of spatial analysis. A process known by its initials, PPDAC (Problem, Plan, Data, Analysis, Conclusions) is described
as a methodological framework that may be applied to a very wide range of spatial analysis problems and projects.
We continue Chapter 3 with a brief look at model-building, with particular reference to the various types of GIS
model that can be constructed to address geospatial problems, followed by a discussion of so-called Research-
ready data (RRD).
Subsequent Chapters present the various analytical methods supported within widely available software tools. The
majority of the methods described in Chapter 4 (Building blocks of spatial analysis) and many of those in Chapter 6

de Smith, Goodchild, Longley and Associates, 2025 [Link]


22

(Surface and field analysis) are implemented as standard facilities in modern commercial GIS packages such as
ArcGIS, MapInfo, Manifold, TNT/Datum Workstation and Intergraph (now part of Hexagon Geospatial). Many are
also provided in more specialized GIS products such as TerrSet/Idrisi, GRASS, QGIS and ENVI. Note that GRASS and
QGIS (which includes GRASS in its download kit) are OpenSource. A number of software tools described and
illustrated in this book have since been superseded or are no longer available, but their underlying principles and
methods remain applicable to the analytical processes examined. Similarly, a number of software tools have been
dramatically enhanced, often as applications-specific or sector-specific add-on packages.
In addition we discuss a number of more specialized tools, designed to address the needs of specific sectors or
technical problems that are otherwise not well-supported within the core GIS packages at present. Chapter 5,
which focuses on statistical methods, and Chapter 7 and Chapter 8 which address Network and Location Analysis,
and Geocomputation, are less commonly supported in GIS packages, but may provide loose- or close-coupling with
such systems, depending upon the application area. In all instances we provide examples and commentary on
software tools that are readily available. The final Chapter (Chapter 9) addresses issues associated with so-called
Big Data and AI - this is clearly an area that is developing extremely rapidly and is expected to result in major
changes to the way the GIS industry operates and the forms of Geospatial Analysis that are available.
As noted above, throughout this Guide examples are drawn from and refer to specific products — these have been
selected purely as examples and are not intended as recommendations. Extensive use has also been made of
tabulated information, providing abbreviated summaries of techniques and formulas for reasons of both
compactness and coverage. These tables are designed to provide a quick reference to the various topics covered
and are, therefore, not intended as a substitute for fuller details on the various items covered. We provide limited
discussion of 2D and 3D mapping facilities, and the support for digital globe formats (e.g. KML and KMZ), which is
increasingly being embedded into general-purpose and specialized data analysis toolsets. These developments
confirm the trend towards integration of geospatial data and presentation layers into mainstream software
systems and services, both terrestrial and planetary (see, for example, the KML images of Mars DEMs developed by
Google as part of the Google Earth project).
Just as all datasets and software packages contain errors, known and unknown, so too do all books and websites,
and the authors of this Guide expect that there will be errors despite our best efforts to remove these! Some may
be genuine errors or misprints, whilst others may reflect our use of specific versions of software packages and
their documentation. Inevitably with respect to the latter, new versions of the packages that we have used to
illustrate this Guide will have appeared even before publication, so specific examples, illustrations and comments
on scope or restrictions may have been superseded. In all cases the user should review the documentation
provided with the software version they plan to use, check release notes for changes and known bugs, and look at
any relevant online services (e.g. user/developer forums and blogs on the web) for additional materials and
insights.
The web version of this Guide may be accessed via the associated Internet site: [Link].
The contents and sample sections of the PDF version may also be accessed from this site. In both cases the
information is regularly updated. The Internet is now well established as society’s principal mode of information
exchange and most GIS users are accustomed to searching for material that can easily be customized to specific
needs. Our objective for such users is to provide an independent, reliable and authoritative first port of call for
conceptual, technical, software and applications material that addresses the panoply of new user requirements.

de Smith, Goodchild, Longley and Associates, 2025 [Link]


Overview and Terminology 23

1.1 Spatial analysis, GIS and software tools


Our objective in producing this Guide is to be comprehensive in terms of concepts and techniques (but not
necessarily exhaustive), representative and independent in terms of software tools, and above all practical in
terms of application and implementation. However, we believe that it is no longer appropriate to think of a
standard, discipline-specific textbook as capable of satisfying every kind of new user need. Accordingly, an
innovative feature of our approach here is the range of formats and channels through which we disseminate the
material.
Given the vast range of spatial analysis techniques that have been developed over the past half century many
topics can only be covered to a limited depth, whilst others have been omitted because they are not implemented
in current mainstream GIS products. This is a rapidly changing field and increasingly GIS packages are including
analytical tools as standard built-in facilities or as optional toolsets, add-ins or analysts. In many instances such
facilities are provided by the original software suppliers (commercial vendors or collaborative non-commercial
development teams) whilst in other cases facilities have been developed and are provided by third parties. Many
products offer software development kits (SDKs), programming languages and language support, scripting facilities
and/or special interfaces for developing one’s own analytical tools or variants.
In addition, a wide variety of web-based or web-deployed tools have become available, enabling datasets to be
analyzed and mapped, including dynamic interaction and drill-down capabilities, without the need for local GIS
software installation. These tools include the widespread use of web-based Java, Javascript, AJAX and HTML5
applications, and interactive Virtual Globe explorers, some of which are described in this Guide. They provide an
illustration of the direction that many toolset and service providers are taking.
Throughout this Guide there are numerous examples of the use of software tools that facilitate geospatial
analysis. In addition, some subsections of the Guide and the software section of the accompanying website,
provide summary information about such tools and links to their suppliers. Commercial software products rarely
provide access to source code or full details of the algorithms employed. Typically they provide references to
books and articles on which procedures are based, coupled with online help and “white papers” describing their
parameters and applications. This means that results produced using one package on a given dataset can rarely be
exactly matched to those produced using any other package or through hand-crafted coding. There are many
reasons for these inconsistencies including: differences in the software architectures of the various packages and
the algorithms used to implement individual methods; errors in the source materials or their interpretation; coding
errors; inconsistencies arising out of the ways in which different GIS packages model, store and manipulate
information; and differing treatments of special cases (e.g. missing values, boundaries, adjacency, obstacles,
distance computations etc.).
Non-commercial packages sometimes provide source code and test data for some or all of the analytical functions
provided, although it is important to understand that “non-commercial” often does not mean that users can
download the full source code. Source code greatly aids understanding, reproducibility and further development.
Such software will often also provide details of known bugs and restrictions associated with functions — although
this information may also be provided with commercial products it is generally less transparent. In this respect
non-commercial software may meet the requirements of scientific rigor more fully than many commercial
offerings, but is often provided with limited documentation, training tools, cross-platform testing and/or technical
support, and thus is generally more demanding on the users and system administrators. In many instances open
source and similar not-for-profit GIS software may also be less generic, focusing on a particular form of spatial
representation (e.g. a grid or raster spatial model). Like some commercial software, it may also be designed with
particular application areas in mind, such as addressing problems in hydrology or epidemiology.
The process of selecting software tools encourages us to ask: (i) “what is meant by geospatial analysis
techniques?” and (ii) “what should we consider to be GIS software?” To some extent the answer to the second
question is the simpler, if we are prepared to be guided by self-selection. For our purposes we focus principally on

de Smith, Goodchild, Longley and Associates, 2025 [Link]


24 Spatial analysis, GIS and software tools

products that claim to provide geographic information systems capabilities, supporting at least 2D mapping
(display and output) of raster (grid based) and/or vector (point/line/polygon based) data, with a minimum of basic
map manipulation facilities. We concentrate our review on a number of the products most widely used or with the
most readily accessible analytical facilities. This leads us beyond the realm of pure GIS. For example: we use
examples drawn from packages that do not directly provide mapping facilities (e.g. the now rather outdated
software called Crimestat) but which provide input and/or output in widely used GIS map-able formats; products
that include some mapping facilities but whose primary purpose is spatial or spatio-temporal data exploration and
analysis (e.g. GS+, GeoDa, PySal); and products that are general- or special-purpose analytical engines
incorporating mapping capabilities (e.g. MATLab with the Mapping Toolbox, WinBUGS) — for more details on these
and other example software tools, please see the website page: [Link]/[Link]
The more difficult of the two questions above is the first — what should be considered as “geospatial analysis”? In
conceptual terms, the phrase identifies the subset of techniques that are applicable when, as a minimum, data
can be referenced on a two-dimensional frame and relate to terrestrial activities. The results of geospatial
analysis will change if the location or extent of the frame changes, or if objects are repositioned within it: if they
do not, then “everywhere is nowhere”, location is unimportant, and it is simpler and more appropriate to use
conventional, aspatial, techniques.
Many GIS products apply the term (geo)spatial analysis in a very narrow context. In the case of vector-based GIS
this typically means operations such as: map overlay (combining two or more maps or map layers according to
predefined rules); simple buffering (identifying regions of a map within a specified distance of one or more
features, such as towns, roads or rivers); and similar basic operations. This reflects (and is reflected in) the use of
the term spatial analysis within the Open Geospatial Consortium (OGC) “simple feature specifications” (see
further Table 4-2). For raster-based GIS, widely used in the environmental sciences and remote sensing, this
typically means a range of actions applied to the grid cells of one or more maps (or images) often involving
filtering and/or algebraic operations (map algebra). These techniques involve processing one or more raster layers
according to simple rules resulting in a new map layer, for example replacing each cell value with some
combination of its neighbors’ values, or computing the sum or difference of specific attribute values for each grid
cell in two matching raster datasets. Descriptive statistics, such as cell counts, means, variances, maxima,
minima, cumulative values, frequencies and a number of other measures and distance computations are also often
included in this generic term “spatial analysis”.
However, at this point only the most basic of facilities have been included, albeit those that may be the most
frequently used by the greatest number of GIS professionals. To this initial set must be added a large variety of
statistical techniques (descriptive, exploratory, explanatory and predictive) that have been designed specifically
for spatial and spatio-temporal data. Today such techniques are of great importance in social and political
sciences, despite the fact that their origins may often be traced back to problems in the environmental and life
sciences, in particular ecology, geology and epidemiology. It is also to be noted that spatial statistics is largely an
observational science (like astronomy) rather than an experimental science (like agronomy or pharmaceutical
research). This aspect of geospatial science has important implications for analysis, particularly the application of
a range of statistical methods to spatial problems.
Limiting the definition of geospatial analysis to 2D mapping operations and spatial statistics remains too restrictive
for our purposes. There are other very important areas to be considered. These include: surface analysis —in
particular analyzing the properties of physical surfaces, such as gradient, aspect and visibility, and analyzing
surface-like data “fields”; network analysis — examining the properties of natural and man-made networks in
order to understand the behavior of flows within and around such networks; and locational analysis. GIS-based
network analysis may be used to address a wide range of practical problems such as route selection and facility
location, and problems involving flows such as those found in hydrology. In many instances location problems
relate to networks and as such are often best addressed with tools designed for this purpose, but in others existing
networks may have little or no relevance or may be impractical to incorporate within the modeling process.

de Smith, Goodchild, Longley and Associates, 2025 [Link]


Overview and Terminology 25

Problems that are not specifically network constrained, such as new road or pipeline routing, regional warehouse
location, mobile phone mast positioning, pedestrian movement or the selection of rural community health care
sites, may be effectively analyzed (at least initially) without reference to existing physical networks. Locational
analysis “in the plane” is also applicable where suitable network datasets are not available, or are too large or
expensive to be utilized, or where the location algorithm is very complex or involves the examination or simulation
of a very large number of alternative configurations.
A further important aspect of geospatial analysis is visualization ( or geovisualization) — the use, creation and
manipulation of images, maps, diagrams, charts, 3D static and dynamic views, high resolution satellite imagery
and digital globes, and their associated tabular datasets (see further Slocum et al., 2008, Dodge et al., 2008,
Longley et al., 2015, Ch12). For early insights into how some of these developments may be applied, see Andrew
Hudson-Smith (2008) “Digital Geography: Geographic visualization for urban environments” and Martin Dodge and
Rob Kitchin’s earlier “Atlas of Cyberspace” which is available as a free downloadable document. A Working Paper
on this topic from the Centre for Advanced Spatial Analysis (CASA) can be found here:
[Link]
GIS packages and web-based services increasingly incorporate a range of such tools, providing static or rotating
views, draping images over 2.5D surface representations, providing animations and fly-throughs, dynamic linking
and brushing and spatio-temporal visualizations. This latter class of tools has been, until recently, the least
developed, reflecting in part the limited range of suitable compatible datasets and the limited set of analytical
methods available, although this picture is changing rapidly. One recent example is the availability of image time
series from NASA’s Earth Observation Satellites, yielding vast quantities of data on a daily basis (e.g. Aqua
mission, commenced 2002; Terra mission, commenced 1999).
Geovisualization is the subject of ongoing research by the International Cartographic Association (ICA),
Commission on Visual Analytics, who have organized a series of workshops and publications addressing
developments in geovisualization, notably with a cartographic focus.
As datasets, software tools and processing capabilities develop, 3D geometric and photo-realistic visualization are
becoming a sine qua non of modern geospatial systems and services — see Andy Hudson-Smith’s “Digital Urban”
blog for a regularly updated commentary on this field. We expect to see an explosion of tools and services and
datasets in this area over the coming years — many examples are included as illustrations in this Guide. Other
examples readers may wish to explore include: the static and dynamic visualizations at 3DNature and similar sites;
the 2D and 3D Atlas of Switzerland; Urban 3D modeling programmes such as CityGML; and the integration of GIS
technologies and data with digital globe software, e.g. data from Digital Globe (Maxar), and Earth-based
frameworks such as Google Earth, Microsoft Virtual Earth, NASA Worldwind and Edushi (Chinese). There are also
automated translators between GIS packages such as ArcGIS and Digital Earth models (see for example the now
withdrawn, Arc2Earth).
These novel visualization tools and facilities augment the core tools utilized in spatial analysis throughout many
parts of the analytical process: exploration of data; identification of patterns and relationships; construction of
models; dynamic interaction with models; and communication of results — see, for example, the work of the city
of Portland, Oregon, who have used 3D visualization to communicate the results of zoning, crime analysis and
other key local variables to the public; and with Covid at the forefront of much work and reporting from 2020
onward, the US interactive Covid mapping project (now ended). Another example is the 3D visualizations provided
as part of the web-accessible London Air Quality network. These are designed to enable:
· users to visualize air pollution in the areas that they work, live or walk
· transport planners to identify the most polluted parts of London
· urban planners to see how building density affects pollution concentrations in the City and other high density
areas, and

de Smith, Goodchild, Longley and Associates, 2025 [Link]


26 Spatial analysis, GIS and software tools

· students to understand pollution sources and dispersion characteristics


Physical 3D models and hybrid physical-digital models have also been developed and applied to practical analysis
problems. For example: 3D physical models constructed from plaster, wood, paper and plastics have been used for
many years in architectural and engineering planning projects; hybrid sand tables are used to help firefighters in
California visualize the progress of wildfires (see Figure 1-1A, below and tools from companies like SimTable; very
large sculptured solid terrain models are being used for educational purposes, to assist land use modeling
programmes, and to facilitate participatory 3D modeling in less-developed communities (P3DM); and 3D digital
printing technology is being used to generate 3D landscapes and cityscapes from GIS, CAD and/or VRML files with
planning, security, architectural, archaeological and geological applications (see Figure 1-1B, below and the
websites of 3D Systems and Stratasys for more details). To create large landscape models multiple individual
prints, which are typically only around 20cm x 20cm x 5cm, are made, in much the same manner as raster file
mosaics.
Figure 1-1A: 3D Physical GIS models: Sand-in-a-box model, Albuquerque, USA

Figure 1-1B: 3D Physical GIS models: 3D GIS printing

GIS software, notably in the commercial sphere, is driven primarily by demand and applicability, as manifest in
willingness to pay. Hence, to an extent, the facilities available often reflect commercial and resourcing realities
(including the development of improvements in processing and display hardware, and the ready availability of high
quality datasets) rather than the status of development in geospatial science. Indeed, there may be many
capabilities available in software packages that are provided simply because it is extremely easy for the designers

de Smith, Goodchild, Longley and Associates, 2025 [Link]


Overview and Terminology 27

and programmers to implement them, especially those employing object-oriented programming and data models.
For example, a given operation may be provided for polygonal features in response to a well-understood
application requirement, which is then easily enabled for other features (e.g. point sets, polylines) despite the
fact that there may be no known or likely requirement for the facility.
Despite this cautionary note, for specific well-defined or core problems, software developers will frequently utilize
the most up-to-date research on algorithms in order to improve the quality (accuracy, optimality) and efficiency
(speed, memory usage) of their products. For further information on algorithms and data structures, see the
online NIST Dictionary of algorithms and data structures.
Furthermore, the quality, variety and efficiency of spatial analysis facilities provide an important discriminator
between commercial offerings in an increasingly competitive and open market for software and spatial datasets.
However, the ready availability of analysis tools does not imply that one product is necessarily better or more
complete than another — it is the selection and application of appropriate tools in a manner that is fit for purpose
that is important. Guidance documents exist in some disciplines that assist users in this process, e.g. Perry et al.
(2002) dealing with ecological data analysis, and to a significant degree we hope that this Guide will assist users
from many disciplines in the selection process.

de Smith, Goodchild, Longley and Associates, 2025 [Link]


28 Intended audience and scope

1.2 Intended audience and scope


This Guide has been designed to be accessible to a wide range of readers — from undergraduates and
postgraduates studying GIS and spatial analysis, to GIS practitioners and professional analysts. It is intended to be
much more than a cookbook of formulas, algorithms and techniques ― its aim is to provide an explanation of the
key techniques of spatial analysis using examples from widely available software packages. It stops short,
however, of attempting a systematic evaluation of competing software products. A substantial range of
application examples are provided, but any specific selection inevitably illustrates only a small subset of the huge
range of facilities available. Wherever possible, examples have been drawn from non-academic sources,
highlighting the growing understanding and acceptance of GIS technology in the commercial and government
sectors.
The scope of this Guide incorporates the various spatial analysis topics included within the seminal NCGIA Core
Curriculum (Goodchild and Kemp, 1990, now updated) and as such may provide a useful accompaniment to GIS
Analysis courses based closely or loosely on this programme. More recently the Education Committee of the
University Consortium for Geographic Information Science (UCGIS) in conjunction with the Association of American
Geographers (AAG) has produced a comprehensive “Body of Knowledge” (BoK) document - see the associated
website [Link] This Guide covers materials that primarily relate to
the BoK sections CF: Conceptual Foundations; AM: Analytical Methods and GC: Geocomputation. In the general
introduction to the AM knowledge area the authors of the BoK summarize this component as follows:
“This knowledge area encompasses a wide variety of operations whose objective is to derive analytical results
from geospatial data. Data analysis seeks to understand both first-order (environmental) effects and second-
order (interaction) effects. Approaches that are both data-driven (exploration of geospatial data) and model-
driven (testing hypotheses and creating models) are included. Data-driven techniques derive summary
descriptions of data, evoke insights about characteristics of data, contribute to the development of research
hypotheses, and lead to the derivation of analytical results. The goal of model-driven analysis is to create and
test geospatial process models. In general, model-driven analysis is an advanced knowledge area where previous
experience with exploratory spatial data analysis would constitute a desired prerequisite.” (BoK, p83 of the e-
book version, first edition).

de Smith, Goodchild, Longley and Associates, 2025 [Link]


Overview and Terminology 29

1.3 Software tools and Companion Materials


In this section you will find the following topics:
GIS and related software tools
Suggested reading

1.3.1 GIS and related software tools


The GIS software and analysis tools that an individual, group or corporate body chooses to use will depend very
much on the purposes to which they will be put. There is an enormous difference between the requirements of
academic researchers and educators, and those with responsibility for planning and delivery of emergency control
systems or large scale physical infrastructure projects. The spectrum of products that may be described as a GIS
includes (amongst others):
· mainstream GIS software packages and services, such as those from Caliper (Maptitude/TransCAD), Clark Labs
(TerrSet/IDRISI), ESRI (ArcGIS Pro and related toolboxes and services), Grass (Opensource), Manifold, and
Microimages (TNT/Datum Workstation)
· highly specialized, sector specific packages: for example civil engineering design and costing systems;
satellite image processing systems; and utility infrastructure management systems
· transportation and logistics management systems
· civil and military control room systems
· systems for visualizing the built environment for architectural purposes, for public consultation or as part of
simulated environments for interactive gaming
· land registration systems
· census data management systems
· commercial location services and Digital Earth models
· Geospatial data visualization tools
· AI-augmented geospatial data services and software tools
· Online versions of offline (desktop) GIS software
The list of software functions and applications is long and in some instances suppliers would not describe their
offerings as a GIS. In many cases such systems fulfill specific operational needs, solving a well-defined subset of
spatial problems and providing mapped output as an incidental but essential part of their operation. Many of the
capabilities may be found in generic GIS products. In other instances a specialized package may utilize a GIS
engine for the display and in some cases processing of spatial data (directly, or indirectly through interfacing or
file input/output mechanisms). For this reason, and in order to draw a boundary around the present work,
reference to application-specific GIS will be limited.
A number of GIS packages and related toolsets have particularly strong facilities for processing and analyzing
binary, grayscale and color images. They may have been designed originally for the processing of remote sensed
data from satellite and aerial surveys, but many have developed into much more sophisticated and complete GIS
tools, e.g. Clark Lab’s Terrset/Idrisi software; MicroImage’s TNT/Datum Workstation product set; the ERDAS suite
of products; and ENVI with associated packages such as RiverTools. Alternatively, image handling may have been
deliberately included within the original design parameters for a generic GIS package (e.g. Manifold), or simply be
toolsets for image processing that may be combined with mapping tools (e.g. the MATLab Image Processing

de Smith, Goodchild, Longley and Associates, 2025 [Link]


30 Software tools and Companion Materials

Toolbox). Whatever their origins, a central purpose of such tools has been the capture, manipulation and
interpretation of image data, rather than spatial analysis per se, although the latter inevitably follows from the
former.
In this Guide we do not provide a separate chapter on image processing, despite its considerable importance in
GIS, focusing instead on those areas where image processing tools and concepts are applied for spatial analysis
(e.g. surface analysis). We have adopted a similar position with respect to other forms of data capture, such as
field and geodetic survey systems and data cleansing software — although these incorporate analytical tools, their
primary function remains the recording and georeferencing of datasets, rather than the analysis of such datasets
once stored.
For most GIS professionals, spatial analysis and associated modeling is an infrequent activity. Even for those whose
job focuses on analysis the range of techniques employed tends to be quite narrow and application specific. GIS
consultants, researchers and academics on the other hand are continually exploring and developing analytical
techniques. For the first group and for consultants, especially in commercial environments, the imperatives of
financial considerations, timeliness and corporate policy loom large, directing attention to: delivery of solutions
within well-defined time and cost parameters; working within commercial constraints on the cost and availability
of software, datasets and staffing; ensuring that solutions are fit for purpose/meet client and end-user
expectations and agreed standards; and in some cases, meeting “political” expectations.
For the second group of users it is common to make use of a variety of tools, data and programming facilities
developed in the academic sphere. Increasingly these make use of non-commercial wide-ranging spatial analysis
software libraries, such as the R-Spatial project (in “R”); PySal (in “Python”); and Splancs (in “S”). Many of these
tools include support for a variety of file types and formats, including those required for mapping the results (e.g.
using ESRI-type shape, shp, file handling) - for more details and worked examples using R, see Lansley and
Cheshire's (2016) Introductory Guide downloadable as a PDF.

Sample software products


The principal products we have included in this latest edition of the Guide are included on the accompanying
website’s software page. Many of these products are free whilst others are available (at least in some form) for a
small fee for all or selected groups of users. Others are licensed at varying per user prices, from a few hundred to
over a thousand US dollars per user. Our tests and examples have largely been carried out using desktop/Windows
versions of these software products. Different versions that support Unix-based operating systems and more
sophisticated back-end database engines have not been utilized. In the context of this Guide we do not believe
these selections affect our discussions in any substantial manner, although such issues may have performance and
systems architecture implications that are extremely important for many users. Increasingly online services are
being offered that provide access to very large datasets, data mining and AI-driven analytical tools; brief
discussion of the opportunities and issues relating to some of these services is provided in the final Chapter of this
book. OGC compliant software products are listed on the OGC resources web page:
[Link]
To quote from the latest OGC statement: “The OGC Compliance Program is a certification process that ensures
organizations’ solutions are compliant with OGC Standards. It is a universal credential that allows agencies,
industry, and academia to better integrate their solutions. OGC compliance provides confidence that a product
will seamlessly integrate with other compliant solutions regardless of the vendor that created them”

Software performance
Suppliers should be able to provide advice on performance issues (e.g. see the ESRI web site, "Services" area for
relevant material relating to their products) and in some cases such information is provided within product Help
files (e.g. see the Performance Tips section within the Manifold GIS support section). Some analytical tasks are

de Smith, Goodchild, Longley and Associates, 2025 [Link]


Overview and Terminology 31

very processor- and memory-hungry, particularly as the number of elements involved increases. For example,
vector overlay and buffering is relatively fast with a few objects and layers, but slows appreciably as the number
of elements involved increases. This increase is generally at least linear with the number of layers and features,
but for some problems grows in a highly non-linear (i.e. geometric) manner. Offline (desktop) software like
Manifold has, in recent years, been dramatically improved in terms of data loading and processing speed, notably
by using multiple cores and processor units, in particular, NVDIA GPUs.
Many optimization tasks, such as optimal routing through networks or trip distribution modeling, are known to be
extremely hard or impossible to solve optimally and methods to achieve a best solution with a large dataset can
take a considerable time to run (see our discussion of Algorithms and computational complexity theory later in this
Guide for a fuller discussion of this topic). Similar problems exist with the processing and display of raster files,
especially large images or sets of images. Geocomputational methods, some of which are beginning to appear
within GIS packages and related toolsets, are almost by definition computationally intensive. This certainly applies
to large-scale (Monte Carlo) simulation models, cellular automata and agent-based models and some raster-based
optimization techniques, especially where modeling extends into the time domain.
A frequent criticism of GIS software is that it is over-complicated, resource-hungry and requires specialist
expertise to understand and use. Such criticisms are often valid and for many problems it may prove simpler,
faster and more transparent to utilize specialized tools for the analytical work and draw on the strengths of GIS in
data management and mapping to provide input/output and visualization functionality. Example approaches
include: (i) using high-level programming facilities within a GIS (e.g. macros, scripts, VBA, Python) – many add-ins
are developed in this way; (ii) using wide-ranging programmable spatial analysis software libraries and toolsets
that incorporate GIS file reading, writing and display, such as the R-Spatial and PySal projects noted earlier; (iii)
using general purpose data processing toolsets, e.g. MATLab, Excel, Matplotlib, Numeric Python (Numpy), Cython
("an optimizing static compiler for both the Python programming language and the extended Cython programming
language (based on Pyrex). It makes writing C extensions for Python as easy as Python itself" - see further
[Link] and other libraries from Enthought; or (iv) directly utilizing mainstream programming
languages (e.g. Java, C++). The advantage of these approaches is control and transparency, the disadvantages are
that software development is never trivial, is often subject to frustrating and unforeseen delays and errors, and
generally requires ongoing maintenance. In some instances analytical applications may be well-suited to parallel
or grid-enabled processing – as for example is the case with GWR (see Harris et al., 2006). In some instances many
of these issues are effectively 'outsourced' to a geospatial services provider, with datasets, mapping and analytical
toolsets delivered through a single, user-friendly interface, with all the pros and cons of such packaged offerings.
At present there are no standardized tests for the quality, speed and accuracy of GIS procedures. It remains the
buyer’s and user’s responsibility and duty to evaluate the software or service they wish to use for the specific task
at hand, and by systematic controlled tests or by other means establish that the product and facility within that
product they choose to use is truly fit for purpose — caveat emptor! Details of how to obtain many of these
products are provided on the software page of the website that accompanies this book. The list maintained on
Wikipedia is also a useful source of information and links, although is far from being complete or independent. A
number of trade magazines and websites provide ad hoc reviews of GIS software and service offerings, especially
new releases, although coverage of analytical functionality may be limited.

1.3.2 Suggested reading


There are numerous excellent modern books on GIS and spatial analysis, although few address software facilities
and developments. Hypertext links are provided here, and throughout the text where they are cited, to the more
recent publications and web resources listed.
As a background to this Guide any readers unfamiliar with GIS are encouraged to first tackle “Geographic
Information Science and Systems” (GISSc) by Longley et al. (2015). GISSc seeks to provide a comprehensive and

de Smith, Goodchild, Longley and Associates, 2025 [Link]


32 Software tools and Companion Materials

highly accessible introduction to the subject as a whole. The GB Ordnance Survey’s “GIS pages” also provides an
excellent brief introduction to GIS and its applications.
Some of the basic mathematics and statistics of relevance to GIS analysis is covered in Dale (2005) and Allan
(2004). For detailed information on datums and map projections, see Iliffe and Lott (2008). Useful online
resources for those involved in data analysis, particularly with a statistical content, include the StatsRef website
and the e-Handbook of Statistical Methods produced by the US National Institute on Standards and Technology,
NIST). The more informally produced set of articles on statistical topics provided under the Wikipedia umbrella are
also an extremely useful resource. These sites, and the mathematics reference site, Mathworld, are referred to
(with hypertext links) at various points throughout this document. For those who find mathematics and statistics
something of a mystery, de Smith (2018) and Bluman (2003) provide useful starting points. For guidance on how to
avoid the many pitfalls of statistical data analysis readers are recommended the material in the classic work by
Huff (1993) “How to lie with statistics”, and the 2008 book by Blastland and Dilnot “The tiger that isn’t”.
A relatively new development has been the increasing availability of out-of-print published books, articles and
guides as free downloads in PDF format. These include: the series of 59 short guides published under the CATMOG
umbrella (Concepts and Methods in Modern Geography), published between 1975 and 1995, most of which are now
available at the QMRG website (a full list of all the guides is provided at the end of this book); the Atlas of
Cyberspace by Dodge and Kitchin; and Fractal Cities, by Batty and Longley.
Undergraduates and MSc programme students will find Burrough and McDonnell (1998, 2015) provides excellent
coverage of many aspects of geospatial analysis, especially from an environmental sciences perspective. Valuable
guidance on the relationship between spatial process and spatial modeling may be found in Cliff and Ord (1981)
and Bailey and Gatrell (1995). The latter provides an excellent introduction to the application of statistical
methods to spatial data analysis. O’Sullivan and Unwin (2010, 2nd ed.) is a more broad-ranging book covering the
topic the authors describe as “Geographic Information Analysis”. This work is best suited to advanced
undergraduates and first year postgraduate students. In many respects a deeper and more challenging work is
Haining’s (2003) “Spatial Data Analysis — Theory and Practice”. This book is strongly recommended as a
companion to the present Guide for postgraduate researchers and professional analysts involved in using GIS in
conjunction with statistical analysis.
However, these authors do not address the broader spectrum of geospatial analysis and associated modeling as we
have defined it. For example, problems relating to networks and location are often not covered and the literature
relating to this area is scattered across many disciplines, being founded upon the mathematics of graph theory,
with applications ranging from electronic circuit design to computer networking and from transport planning to the
design of complex molecular structures. Useful books addressing this field include Miller and Shaw (2001)
“Geographic Information Systems for Transportation” (especially Chapters 3, 5 and 6), and Rodrigue et al. (2006)
"The geography of transport systems" (see further: [Link]
As companion reading on these topics for the present Guide we suggest the two volumes from the Handbooks in
Operations Research and Management Science series by Ball et al. (1995): “Network Models”, and “Network
Routing”. These rather expensive volumes provide collections of reviews covering many classes of network
problems, from the core optimization problems of shortest paths and arc routing (e.g. street cleaning), to the
complex problems of dynamic routing in variable networks, and a great deal more besides. This is challenging
material and many readers may prefer to seek out more approachable material, available in a number of other
books and articles, e.g. Ahuja et al. (1993), Mark Daskin’s excellent book “Network and Discrete Location” (1995)
and the earlier seminal works by Haggett and Chorley (1969), and Scott (1971), together with the widely available
online materials accessible via the Internet. Final recommendations here are Stephen Wise’s excellent GIS Basics
(2002) and Worboys and Duckham (2004) which address GIS from a computing perspective. Both these volumes
covers many topics, including the central issues of data modeling and data structures, key algorithms, system
architectures and interfaces.

de Smith, Goodchild, Longley and Associates, 2025 [Link]


Overview and Terminology 33

Many recent books described as covering (geo)spatial analysis are essentially edited collections of papers or brief
articles. As such most do not seek to provide comprehensive coverage of the field, but tend to cover information
on recent developments, often with a specific application focus (e.g. health, transport, archaeology). The latter is
particularly common where these works are selections from sector- or discipline-specific conference proceedings,
whilst in other cases they are carefully chosen or specially written papers. Classic amongst these is Berry and
Marble (1968) “Spatial Analysis: A reader in statistical geography”. More recent examples include “GIS, Spatial
Analysis and Modeling” edited by Maguire, Batty and Goodchild (2005), and the excellent (but costly) compendium
work “The SAGE handbook of Spatial Analysis” edited by Fotheringham and Rogerson (2008).
A second category of companion materials to the present work is the extensive product-specific documentation
available from software suppliers. Some of the online help files and product manuals are excellent, as are
associated example data files, tutorials, worked examples and white papers (see for example, ESRI’s What is GIS?),
which provides a wide-ranging guide to GIS. In many instances we utilize these to illustrate the capabilities of
specific pieces of software and to enable readers to replicate our results using readily available materials. In
addition some suppliers, notably ESRI, have a substantial publishing operation, including more general (i.e. not
product specific) books of relevance to the present work. Amongst their publications we strongly recommend the
“ESRI Guide to GIS Analysis Volume 1: Geographic patterns and relationships” (1999) by Andy Mitchell, which is full
of valuable tips and examples. This is a basic introduction to GIS Analysis, which he defines in this context as “a
process for looking at geographic patterns and relationships between features”. Mitchell’s Volume 2 (July 2005)
covers more advanced techniques of data analysis, notably some of the more accessible and widely supported
methods of spatial statistics, and is equally highly recommended. A number of the topics covered in his Volume 2
also appear in this Guide. David Allen has produced a tutorial book and DVD (GIS Tutorial II: Spatial Analysis
Workbook) to go alongside Mitchell’s volumes, and these are obtainable from ESRI Press. For a perspective on
modern applications of GIS and geospatial analysis see Esri's GIS For Science series (Volume 3 is the latest and
Chapter 7 focuses on geospatial analysis and AI, also referred to now as GeoAI) and the recently published
collection of edited chapters brought together as the "Handbook of Geospatial Artificial Intelligence" (2024, Song
Gao et al., with a Foreword by Prof Mike Goodchild). Those considering using Open Source software may wish to
investigate the books by Neteler and Mitasova (2008), Tyler Mitchell (2005) and Sherman (2008).
In parallel with the increasing range and sophistication of spatial analysis facilities to be found within GIS
packages, there has been a major change in spatial analytical techniques. In large measure this has come about as
a result of technological developments and the related availability of software tools and detailed publicly
available datasets. One aspect of this has been noted already — the move towards network-based location
modeling where in the past this would have been unfeasible. More general shifts can be seen in the move towards
local rather than simply global analysis, for example in the field of exploratory data analysis; in the increasing use
of advanced forms of visualization as an aid to analysis and communication; in the growth of online spatial data,
mapping and analysis services; in the development of a wide range of computationally intensive and simulation
methods that address problems through micro-scale processes (geocomputational methods); and finally, in the
recent development of AI-augmented services, many of which fit within the somewhat portmanteau term, GeoAI.
These trends are addressed at many points throughout this Guide.

de Smith, Goodchild, Longley and Associates, 2025 [Link]


34 Terminology and Abbreviations

1.4 Terminology and Abbreviations


GIS, like all disciplines, utilizes a wide range of terms and abbreviations, many of which have well-understood and
recognized meanings. For a large number of commonly used terms online dictionaries have been developed, for
example: those created by the Association for Geographic Information (AGI); the Open Geospatial Consortium
(OGC); and by various software suppliers. The latter includes many terms and definitions that are particular to
specific products, but remain a valuable resource. Web site details for each of these are provided at the end of
this Guide.

1.4.1 Definitions
Geospatial analysis utilizes many of these terms, but many others are drawn from disciplines such as mathematics
and statistics. The result that the same terms may mean entirely different things depending on their context and
in many cases, on the software provider utilizing them. In most instances terms used in this Guide are defined on
the first occasion they are used, but a number warrant defining at this stage. Table 1-1, below, provides a
selection of such terms, utilizing definitions from widely recognized sources where available and appropriate.

Table 1-1 Selected terminology

Term Definition

Adjacency The sharing of a common side or boundary by two or more polygons (AGI). Note that
adjacency may also apply to features that lie either side of a common boundary where
these features are not necessarily polygons

Arc Commonly used to refer to a straight line segment connecting two nodes or vertices of a
polyline or polygon. Arcs may include segments or circles, spline functions or other forms
of smooth curve. In connection with graphs and networks, arcs may be directed or
undirected, and may have other attributes (e.g. cost, capacity etc.)

Artifact A result (observation or set of observations) that appears to show something unusual (e.g.
a spike in the surface of a 3D plot) but which is of no significance. Artifacts may be
generated by the way in which data have been collected, defined or re-computed (e.g.
resolution changing), or as a result of a computational operation (e.g. rounding error or
substantive software error). Linear artifacts are sometimes referred to as “ghost lines”

Aspect The direction in which slope is maximized for a selected point on a surface (see also,
Gradient and Slope)

Attribute A data item associated with an individual object (record) in a spatial database. Attributes
may be explicit, in which case they are typically stored as one or more fields in tables
linked to a set of objects, or they may be implicit (sometimes referred to as intrinsic),
being either stored but hidden or computed as and when required (e.g. polyline length,
polygon centroid). Raster/grid datasets typically have a single explicit attribute (a value)
associated with each cell, rather than an attribute table containing as many records as
there are cells in the grid

Azimuth The horizontal direction of a vector, measured clockwise in degrees of rotation from the
positive Y-axis, for example, degrees on a compass (AGI)

de Smith, Goodchild, Longley and Associates, 2025 [Link]


Overview and Terminology 35

Term Definition

Azimuthal A type of map projection constructed as if a plane were to be placed at a tangent to the
Projection Earth's surface and the area to be mapped were projected onto the plane. All points on
this projection keep their true compass bearing (AGI)

(Spatial) The degree of relationship that exists between two or more (spatial) variables, such that
Autocorrelation when one changes, the other(s) also change. This change can either be in the same
direction, which is a positive autocorrelation, or in the opposite direction, which is a
negative autocorrelation (AGI). The term autocorrelation is usually applied to ordered
datasets, such as those relating to time series or spatial data ordered by distance band.
The existence of such a relationship suggests but does not definitely establish causality

Cartogram A cartogram is a form of map in which some variable such as Population Size or Gross
National Product typically is substituted for land area. The geometry or space of the map
is distorted in order to convey the information of this alternate variable. Cartograms use a
variety of approaches to map distortion, including the use of continuous and discrete
regions. The term cartogram (or linear cartogram) is also used on occasion to refer to
maps that distort distance for particular display purposes, such as the London Underground
map

Choropleth A thematic map [i.e. a map showing a theme, such as soil types or rainfall levels]
portraying properties of a surface using area symbols such as shading [or color]. Area
symbols on a choropleth map usually represent categorized classes of the mapped
phenomenon (AGI)

Conflation A term used to describe the process of combining (merging) information from two data
sources into a single source, reconciling disparities where possible (e.g. by rubber-sheeting
— see below). The term is distinct from concatenation which refers to combinations of
data sources (e.g. by overlaying one upon another) but retaining access to their distinct
components

Contiguity The topological identification of adjacent polygons by recording the left and right polygons
of each arc. Contiguity is not concerned with the exact locations of polygons, only their
relative positions. Contiguity data can be stored in a table, matrix or simply as [i.e. in] a
list, that can be cross-referenced to the relevant co-ordinate data if required (AGI).

Curve A one-dimensional geometric object stored as a sequence of points, with the subtype of
curve specifying the form of interpolation between points. A curve is simple if it does not
pass through the same point twice (OGC). A LineString (or polyline — see below) is a
subtype of a curve

Datum Strictly speaking, the singular of data. In GIS the word datum usually relates to a
reference level (surface) applying on a nationally or internationally defined basis from
which elevation is to be calculated. In the context of terrestrial geodesy datum is usually
defined by a model of the Earth or section of the Earth, such as WGS84 (see below). The

de Smith, Goodchild, Longley and Associates, 2025 [Link]


36 Terminology and Abbreviations

Term Definition

term is also used for horizontal referencing of measurements; see Iliffe and Lott (2008) for
full details

DEM Digital elevation model (a DEM is a particular kind of DTM, see below)

DTM Digital terrain model

EDM Electronic distance measurement

EDA, ESDA Exploratory data analysis/Exploratory spatial data analysis

Ellipsoid/Spheroid An ellipse rotated about its minor axis determines a spheroid (sphere-like object), also
known as an ellipsoid of revolution (see also, WGS84)

Feature Frequently used within GIS referring to point, line (including polyline and mathematical
functions defining arcs), polygon and sometimes text (annotation) objects (see also,
vector)

Geoid An imaginary shape for the Earth defined by mean sea level and its imagined continuation
under the continents at the same level of gravitational potential (AGI)

Geodemographics The analysis of people by where they live, in particular by type of neighborhood. Such
localized classifications have been shown to be powerful discriminators of consumer
behavior and related social and behavioral patterns

Geospatial Referring to location relative to the Earth's surface. "Geospatial" is more precise in many
GI contexts than "geographic," because geospatial information is often used in ways that
do not involve a graphic representation, or map, of the information. OGC

Geostatistics Statistical methods developed for and applied to geographic data. These statistical
methods are required because geographic data do not usually conform to the requirements
of standard statistical procedures, due to spatial autocorrelation and other problems
associated with spatial data (AGI). The term is widely used to refer to a family of tools
used in connection with spatial interpolation (prediction) of (piecewise) continuous
datasets and is widely applied in the environmental sciences. Spatial statistics is a term
more commonly applied to the analysis of discrete objects (e.g. points, areas) and is
particularly associated with the social and health sciences

Geovisualization A family of techniques that provide visualizations of spatial and spatio-temporal datasets,
extending from static, 2D maps and cartograms, to representations of 3D using perspective
and shading, solid terrain modeling and increasingly extending into dynamic visualization
interfaces such as linked windows, digital globes, fly-throughs, animations, virtual reality
and immersive systems. Geovisualization is the subject of ongoing research by the
International Cartographic Association (ICA), Commission on Geovisualization

de Smith, Goodchild, Longley and Associates, 2025 [Link]


Overview and Terminology 37

Term Definition

GIS-T GIS applied to transportation problems

GPS/ DGPS Global positioning system; Differential global positioning system — DGPS provides improved
accuracy over standard GPS by the use of one or more fixed reference stations that
provide corrections to GPS data

Gradient Used in spatial analysis with reference to surfaces (scalar fields). Gradient is a vector field
comprised of the aspect (direction of maximum slope) and slope computed in this direction
(magnitude of rise over run) at each point of the surface. The magnitude of the gradient
(the slope or inclination) is sometimes itself referred to as the gradient (see also, Slope
and Aspect)

Graph A collection of vertices and edges (links between vertices) constitutes a graph. The
mathematical study of the properties of graphs and paths through graphs is known as graph
theory

Heuristic A term derived from the same Greek root as Eureka, heuristic refers to procedures for
finding solutions to problems that may be difficult or impossible to solve by direct means.
In the context of optimization heuristic algorithms are systematic procedures that seek a
good or near optimal solution to a well-defined problem, but not one that is necessarily
optimal. They are often based on some form of intelligent trial and error or search
procedure

iid An abbreviation for “independently and identically distributed”. Used in statistical analysis
in connection with the distribution of errors or residuals

Invariance In the context of GIS invariance refers to properties of features that remain unchanged
under one or more (spatial) transformations

Kernel Literally, the core or central part of an item. Often used in computer science to refer to
the central part of an operating system, the term kernel in geospatial analysis refers to
methods (e.g. density modeling, local grid analysis) that involve calculations using a well-
defined local neighborhood (block of cells, radially symmetric function)

Layer A collection of geographic entities of the same type (e.g. points, lines or polygons).
Grouped layers may combine layers of different geometric types

Map algebra A range of actions applied to the grid cells of one or more maps (or images) often involving
filtering and/or algebraic operations. These techniques involve processing one or more
raster layers according to simple rules resulting in a new map layer, for example replacing
each cell value with some combination of its neighbors’ values, or computing the sum or
difference of specific attribute values for each grid cell in two matching raster datasets

Mashup A recently coined term used to describe websites whose content is composed from multiple
(often distinct) data sources, such as a mapping service and property price information,

de Smith, Goodchild, Longley and Associates, 2025 [Link]


38 Terminology and Abbreviations

Term Definition

constructed using programmable interfaces to these sources (as opposed to simple


compositing or embedding)

MBR/ MER Minimum bounding rectangle/Minimum enclosing (or envelope) rectangle (of a feature set)

Planar/non- Literally, lying entirely within a plane surface. A polygon set is said to be planar enforced
planar/planar if every point in the set lies in exactly one polygon, or on the boundary between two or
enforced more polygons. See also, planar graph. A graph or network with edges crossing (e.g.
bridges/underpasses) is non-planar

Planar graph If a graph can be drawn in the plane (embedded) in such a way as to ensure edges only
intersect at points that are vertices then the graph is described as planar

Pixel/image Picture element — a single defined point of an image. Pixels have a “color” attribute
whose value will depend on the encoding method used. They are typically either binary
(0/1 values), grayscale (effectively a color mapping with values, typically in the integer
range [0,255]), or color with values from 0 upwards depending on the number of colors
supported. Image files can be regarded as a particular form of raster or grid file

Polygon A closed figure in the plane, typically comprised of an ordered set of connected vertices,
v1,v2,…vn-1,vn=v1 where the connections (edges) are provided by straight line segments. If
the sequence of edges is not self-crossing it is called a simple polygon. A point is inside a
simple polygon if traversing the boundary in a clockwise direction the point is always on
the right of the observer. If every pair of points inside a polygon can be joined by a
straight line that also lies inside the polygon then the polygon is described as being convex
(i.e. the interior is a connected point set). The OGC definition of a polygon is “a planar
surface defined by 1 exterior boundary and 0 or more interior boundaries. Each interior
boundary defines a hole in the polygon”

Polyhedral surface A Polyhedral surface is a contiguous collection of polygons, which share common boundary
segments (OGC). See also, Tesseral/Tessellation

Polyline An ordered set of connected vertices, v1,v2,…vn-1,vn¹v1 where the connections (edges) are
provided by straight line segments. The vertex v1 is referred to as the start of the polyline
and vn as the end of the polyline. The OGC specification uses the term LineString which it
defines as: "a curve with linear interpolation between points. Each consecutive pair of
points defines a line segment"

Raster/grid A data model in which geographic features are represented using discrete cells, generally
squares, arranged as a (contiguous) rectangular grid. A single grid is essentially the same
as a two-dimensional matrix, but is typically referenced from the lower left corner rather
than the norm for matrices, which are referenced from the upper left. Raster files may
have one or more values (attributes or bands) associated with each cell position or pixel

de Smith, Goodchild, Longley and Associates, 2025 [Link]


Overview and Terminology 39

Term Definition

Resampling 1. Procedures for (automatically) adjusting one or more raster datasets to ensure that the
grid resolutions of all sets match when carrying out combination operations. Resampling is
often performed to match the coarsest resolution of a set of input rasters. Increasing
resolution rather than decreasing requires an interpolation procedure such as bicubic
spline.
2. The process of reducing image dataset size by representing a group of pixels with a
single pixel. Thus, pixel count is lowered, individual pixel size is increased, and overall
image geographic extent is retained. Resampled images are “coarse” and have less
information than the images from which they are taken. Conversely, this process can also
be executed in the reverse (AGI)
3. In a statistical context the term resampling (or re-sampling) is sometimes used to
describe the process of selecting a subset of the original data, such that the samples can
reasonably be expected to be independent

Rubber sheeting A procedure to adjust the co-ordinates all of the data points in a dataset to allow a more
accurate match between known locations and a few data points within the dataset.
Rubber sheeting … preserves the interconnectivity or topology, between points and objects
through stretching, shrinking or re-orienting their interconnecting lines (AGI). Rubber-
sheeting techniques are widely used in the production of Cartograms (op. cit.)

Slope The amount of rise of a surface (change in elevation) divided by the distance over which
this rise is computed (the run), along a straight line transect in a specified direction. The
run is usually defined as the planar distance, in which case the slope is the tan() function.
Unless the surface is flat the slope at a given point on a surface will (typically) have a
maximum value in a particular direction (depending on the surface and the way in which
the calculations are carried out). This direction is known as the aspect. The vector
consisting of the slope and aspect is the gradient of the surface at that point (see also,
Gradient and Aspect)

Spatial A subset of econometric methods that is concerned with spatial aspects present in cross-
econometrics sectional and space-time observations. These methods focus in particular on two forms of
so-called spatial effects in econometric models, referred to as spatial dependence and
spatial heterogeneity (Anselin, 1988, 2006)

Spheroid A flattened (oblate) form of a sphere, or ellipse of revolution. The most widely used model
of the Earth is that of a spheroid, although the detailed form is slightly different from a
true spheroid

SQL/Structured Within GIS software SQL extensions known as spatial queries are frequently implemented.
Query Language These support queries that are based on spatial relationships rather than simply attribute
values

Surface A 2D geometric object. A simple surface consists of a single ‘patch’ that is associated with
one exterior boundary and 0 or more interior boundaries. Simple surfaces in 3D are

de Smith, Goodchild, Longley and Associates, 2025 [Link]


40 Terminology and Abbreviations

Term Definition

isomorphic to planar surfaces. Polyhedral surfaces are formed by ‘stitching’ together


simple surfaces along their boundaries (OGC). Surfaces may be regarded as scalar fields,
i.e. fields with a single value, e.g. elevation or temperature, at every point

Tesseral/Tessellatio A gridded representation of a plane surface into disjoint polygons. These polygons are
n normally either square (raster), triangular (TIN — see below), or hexagonal. These models
can be built into hierarchical structures, and have a range of algorithms available to
navigate through them. A (regular or irregular) 2D tessellation involves the subdivision of a
2-dimensional plane into polygonal tiles (polyhedral blocks) that completely cover a plane
(AGI). The term lattice is sometimes used to describe the complete division of the plane
into regular or irregular disjoint polygons. More generally the subdivision of the plane may
be achieved using arcs that are not necessarily straight lines

TIN Triangulated irregular network. A form of the tesseral model based on triangles. The
vertices of the triangles form irregularly spaced nodes. Unlike the grid, the TIN allows
dense information in complex areas, and sparse information in simpler or more
homogeneous areas. The TIN dataset includes topological relationships between points and
their neighboring triangles. Each sample point has an X,Y co-ordinate and a surface, or Z-
Value. These points are connected by edges to form a set of non-overlapping triangles
used to represent the surface. TINs are also called irregular triangular mesh or irregular
triangular surface models (AGI)

Topology The relative location of geographic phenomena independent of their exact position. In
digital data, topological relationships such as connectivity, adjacency and relative position
are usually expressed as relationships between nodes, links and polygons. For example, the
topology of a line includes its from- and to-nodes, and its left and right polygons (AGI). In
mathematics, a property is said to be topological if it survives stretching and distorting of
space

Transformation Map transformation: A computational process of converting an image or map from one
coordinate system to another. Transformation … typically involves rotation and scaling of
1. Map
grid cells, and thus requires resampling of values (AGI)

Transformation Affine transformation: When a map is digitized, the X and Y coordinates are initially held
in digitizer measurements. To make these X,Y pairs useful they must be converted to a
2. Affine
real world coordinate system. The affine transformation is a combination of linear
transformations that converts digitizer coordinates into Cartesian coordinates. The basic
property of an affine transformation is that parallel lines remain parallel (AGI, with
modifications). The principal affine transformations are contraction, expansion, dilation,
reflection, rotation, shear and translation

Transformation Data transformation (see also, subsection 6.7.1): A mathematical procedure (usually a
one-to-one mapping or function) applied to an initial dataset to produce a result dataset.
3. Data
An example might be the transformation of a set of sampled values {xi} using the log()
function, to create the set {log(xi)}. Affine and map transformations are examples of

de Smith, Goodchild, Longley and Associates, 2025 [Link]


Overview and Terminology 41

Term Definition

mathematical transformations applied to coordinate datasets. Note that operations on


transformed data, e.g. checking whether a value is within 10% of a target value, is not
equivalent to the same operation on untransformed data, even after back transformation

Transformation Back transformation: If a set of sampled values {xi} has been transformed by a one-to-one
4. Back mapping function f() into the set {f(xi)}, and f() has a one-to-one inverse mapping function
f-1(), then the process of computing f-1{f(xi)}={xi} is known as back transformation.
Example f()=ln() and f-1=exp()

Vector 1. Within GIS the term vector refers to data that are comprised of lines or arcs, defined by
beginning and end points, which meet at nodes. The locations of these nodes and the
topological structure are usually stored explicitly. Features are defined by their boundaries
only and curved lines are represented as a series of connecting arcs. Vector storage
involves the storage of explicit topology, which raises overheads, however it only stores
those points which define a feature and all space outside these features is “non-
existent” (AGI)
2. In mathematics the term refers to a directed line, i.e. a line with a defined origin,
direction and orientation. The same term is used to refer to a single column or row of a
matrix, in which case it is denoted by a bold letter, usually in lower case

Viewshed Regions of visibility observable from one or more observation points. Typically a viewshed
will be defined by the numerical or color coding of a raster image, indicating whether the
(target) cell can be seen from (or probably seen from) the (source) observation points. By
definition a cell that can be viewed from a specific observation point is inter-visible with
that point (each location can see the other). Viewsheds are usually determined for
optically defined visibility within a maximum range

WGS84 World Geodetic System, 1984 version. This models the Earth as a spheroid with major axis
6378.137 kms and flattening factor of 1:298.257, i.e. roughly 0.3% flatter at the poles
than a perfect sphere. One of a number of such global models. See further, Distance
Operations

Note: Where cited, references are drawn from the Association for Geographic Information (AGI), and the Open
Geospatial Consortium (OGC). Square bracketed text denotes insertion by the present authors into these
definitions. For OGC definitions see: Open Geospatial Consortium Inc (2006) in References section

de Smith, Goodchild, Longley and Associates, 2025 [Link]


42 Common Measures and Notation

1.5 Common Measures and Notation


Throughout this Guide a number of terms and associated formulas are used that are common to many analytical
procedures. In this section we provide a brief summary of those that fall into this category. Others, that are more
specific to a particular field of analysis, are treated within the section to which they primarily apply. Many of the
measures we list will be familiar to readers, since they originate from standard single variable (univariate)
statistics. For brevity we provide details of these in tabular form. In order to clarify the expressions used here and
elsewhere in the text, we use the notation shown in Table 1-2. Italics are used within the text and formulas to
denote variables and parameters, as well as selected terms.

1.5.1 Notation
Table 1-2 Notation and symbology

[a,b] A closed interval of the Real line, for example [0,1] means the set of all values between 0 and 1,
including 0 and 1

(a,b) An open interval of the Real line, for example (0,1) means the set of all values between 0 and 1, NOT
including 0 and 1. This should not be confused with the notation for coordinate pairs, (x,y), or its use
within bivariate functions such as f(x,y), or in connection with graph edges (see below) — the meaning
should be clear from the context

(i,j) In the context of graph theory, which forms the basis for network analysis, this pairwise notation is
often used to define an edge connecting the two vertices i and j

(x,y) A (spatial) data pair, usually representing a pair of coordinates in two dimensions. Terrestrial
coordinates are typically Cartesian (i.e. in the plane, or planar) based on a pre-specified projection of
the sphere, or Spherical (latitude, longitude). Spherical coordinates are often quoted in positive or
negative degrees from the Equator and the Greenwich meridian, so may have the ranges [-90,+90] for
latitude (north-south measurement) and [-180,180] for longitude (east-west measurement)

(x,y,z) A (spatial) data triple, usually representing a pair of coordinates in two dimensions, plus a third
coordinate (usually height or depth) or an attribute value, such as soil type or household income

{xi} A set of n values x1, x2, x3, … xn, typically continuous ratio-scaled variables in the range (-∞,∞) or [0,∞).
The values may represent measurements or attributes of distinct objects, or values that represent a
collection of objects (for example the population of a census tract)

{X i} A ordered set of n values x1, x2, x3, … xn, such that xi is less than or equal to xi+1 for all i

X,x The use of bold symbols in expressions indicates matrices (upper case) and vectors (lower case)

{fi} A set of k frequencies (k<=n), derived from a dataset {xi}. If {xi} contains discrete values, some of which
occur multiple times, then {fi} represents the number of occurrences or the count of each distinct
value. {fi} may also represent the number of occurrences of values that lie in a range or set of ranges,
{ri}. If a dataset contains n values, then the sum ∑fi=n. The set {fi} can also be written f(xi). If {fi} is

de Smith, Goodchild, Longley and Associates, 2025 [Link]


Overview and Terminology 43

regarded as a set of weights (for example attribute values) associated with the {xi}, it may be written
as the set {wi} or w(xi)

{pi} A set of k probabilities (k<=n), estimated from a dataset or theoretically derived. With a finite set of
values {xi}, pi=fi/n. If {xi} represents a set of k classes or ranges then pi is the probability of finding an
occurrence in the ith class or range, i.e. the proportion of events or values occurring in that class or
range. The sum ∑pi=1. If a set of frequencies, {fi}, have been standardized by dividing each value fi by
their sum, ∑fi, then {pi} is equivalent to {fi}

S Summation symbol, e.g. x1+x2+x3+ … +xn. If no limits are shown the sum is assumed to apply to all
subsequent elements, otherwise upper and/or lower limits for summation are provided

P Product symbol, e.g. x1∙x2∙x3∙ … ∙xn. If no limits are shown the product is assumed to apply to all
subsequent elements, otherwise upper and/or lower limits for multiplication are provided

^ Used here in conjunction with Greek symbols (directly above) to indicate a value is an estimate of the
true population value. Sometimes referred to as “hat”

~ Is distributed as, for example y~N(0,1) means the variable y has a distribution that is Normal with a
mean of 0 and standard deviation of 1

! Factorial symbol. z=x! means z=x(x-1)(x-2)…1. x>=0. Usually applied to integer values of x. May be
defined for fractional values of x using the Gamma function (Table 1-3)

º ‘Equivalent to’ symbol

» ‘Approximately equal to’ symbol

Î ‘Belongs to’ symbol, e.g. x Î[0,2] means that x belongs to/is drawn from the set of all values in the
closed interval [0,2]; xÎ{0,1} means that x can take the values 0 and 1

£ Less than or equal to, represented in the text where necessary by <= (provided in this form to support
display by some web browsers)

³ Greater than or equal to, represented in the text where necessary by >= (provided in this form to
support display by some web browsers)

1.5.2 Statistical measures and related formulas


Table 1-3, below, provides a list of common measures (univariate statistics) applied to datasets, and associated
formulas for calculating the measure from a sample dataset in summation form (rather than integral form) where
necessary. In some instances these formulas are adjusted to provide estimates of the population values rather
than those obtained from the sample of data one is working on.
Many of the measures can be extended to two-dimensional forms in a very straightforward manner, and thus they
provide the basis for numerous standard formulas in spatial statistics. For a number of univariate statistics

de Smith, Goodchild, Longley and Associates, 2025 [Link]


44 Common Measures and Notation

(variance, skewness, kurtosis) we refer to the notion of (estimated) moments about the mean. These are
computations of the form

 xi  x  ,r  1,2,3...
r

When r=1 this summation will be 0, since this is just the difference of all values from the mean. For values of r>1
the expression provides measures that are useful for describing the shape (spread, skewness, peakedness) of a
distribution, and simple variations on the formula are used to define the correlation between two or more
datasets (the product moment correlation). The term moment in this context comes from physics, i.e. like
‘momentum’ and ‘moment of inertia’, and in a spatial (2D) context provides the basis for the definition of a
centroid — the center of mass or center of gravity of an object, such as a polygon (see further, Section 4.2.5,
Centroids and centers).

Table 1-3 Common formulas and statistical measures


This table of measures has been divided into 9 subsections for ease of use. Each is provided with its own
subheading. For more details on these topics, see the relevant topic within the StatsRef website.

Counts and specific values


Measure Definition Expression(s)

Count The number of data values in a set Count({xi})=n

Top m, Bottom m The set of the largest (smallest) m values from a set. Topm{xi}={X n-m+1,…X n-1,X n};
May be generated via an SQL command
Botm{xi}={X 1,X 2,… X m};

Variety The number of distinct i.e. different data values in a


set. Some packages refer to the variety as diversity,
which should not be confused with information
theoretic and other diversity measures

Majority The most common i.e. most frequent data values in a


set. Similar to mode (see below), but often applied to
raster datasets at the neighborhood or zonal level. For
general datasets the term should only be applied to
cases where a given class is 50%+ of the total

Minority The least common i.e. least frequently occurring data


values in a set. Often applied to raster datasets at the
neighborhood or zonal level

Maximum, Max The maximum value of a set of values. May not be Max{xi}=X n
unique

de Smith, Goodchild, Longley and Associates, 2025 [Link]


Overview and Terminology 45

Measure Definition Expression(s)

Minimum, Min The minimum value of a set of values. May not be Min{xi}=X 1
unique

Sum The sum of a set of data values n


 xi
i 1

Measures of centrality
Measure Definition Expression(s)

Mean (arithmetic) The arithmetic average of a set of data values (also 1 n


known as the sample mean where the data are a x  xi
n i 1
sample from a larger population). Note that if the set
{fi} are regarded as weights rather than frequencies
n n
x   fi xi  fi
the result is known as the weighted mean. Other
mean values include the geometric and harmonic
i 1 i 1
mean. The population mean is often denoted by the
symbol μ. In many instances the sample mean is the n
best (unbiased) estimate of the population mean and x   pixi
is sometimes denoted by μ with a ^ symbol above it)
i 1
or as a variable such as x with a bar above it.

1
Mean (harmonic) The harmonic mean, H, is the mean of the reciprocals 1 n 1
of the data values, which is then adjusted by taking H
n

 
the reciprocal of the result. The harmonic mean is  i 1 xi 
less than or equal to the geometric mean, which is
less than or equal to the arithmetic mean

Mean (geometric) The geometric mean, G, is the mean defined by 1/n


 n 
taking the products of the data values and then
adjusting the value by taking the nth root of the
G
  xi 

 i1 
result. The geometric mean is greater than or equal
to the harmonic mean and is less than or equal to the hence
arithmetic mean
n
1
log(G) 
n
 log(xi)
i 1

1/ p
Mean (power) The general (limit) expression for mean values. 1 n 
Values for p give the following means: p=1 M
n 
xi p 

arithmetic; p=2 root mean square; p=-1 harmonic.  i1 
Limit values for p (i.e. as p tends to these values)

de Smith, Goodchild, Longley and Associates, 2025 [Link]


46 Common Measures and Notation

Measure Definition Expression(s)

give the following means: p=0 geometric; p=-∞


minimum; p=∞ maximum

Trim-mean, TM, t, The mean value computed with a specified n(1t /2)
1
Olympic mean percentage (proportion), t/2, of values removed from TM  
n(1 t) i nt/2
Xi
each tail to eliminate the highest and lowest outliers

tÎ[0,1]
and extreme values. For small samples a specific
number of observations (e.g. 1) rather than a
percentage, may be ignored. In general an equal
number, k, of high and low values should be removed
and the number of observations summed should equal
n(1-t) expressed as an integer. This variant is
sometimes described as the Olympic mean, as has
been used in scoring Olympic gymnastics for example

Mode The most common or frequently occurring value in a


set. Where a set has one dominant value or range of
values it is said to be unimodal; if there are several
commonly occurring values or ranges it is described as
multi-modal. Note that arithmetic mean-mode≈3
(arithmetic mean-median) for many unimodal
distributions

Median, Med The middle value in an ordered set of data if the set Med{xi}=X (n+1)/2 ; n odd
contains an odd number of values, or the average of
the two middle values if the set contains an even Med{xi}=(X n/2+X n/2+1)/2; n
number of values. For a continuous distribution the
even
median is the 50% point (0.5) obtained from the
cumulative distribution of the values or function

Mid-range, MR The middle value of the Range MR{xi}=Range/2

Root mean square (RMS) The root of the mean of squared data values. n
1
Squaring removes negative values
n
 x2 i
i1

Measures of spread
Measure Definition Expression(s)

Range The difference between the maximum and Range{xi}=X n-X 1


minimum values of a set

de Smith, Goodchild, Longley and Associates, 2025 [Link]


Overview and Terminology 47

Measure Definition Expression(s)

Lower quartile (25%), In an ordered set, 25% of data items are less LQ={X 1, … X (n+1)/4}
LQ than or equal to the upper bound of this range.
For a continuous distribution the LQ is the set
of values from 0% to 25% (0.25) obtained from
the cumulative distribution of the values or
function. Treatment of cases where n is even
and n is odd, and when i runs from 1 to n or 0
to n vary

Upper quartile (75%), In an ordered set 75% of data items are less UQ={X 3(n+1)/4, … X n}
UQ than or equal to the upper bound of this range.
For a continuous distribution the UQ is the set
of values from 75% (0.75) to 100% obtained
from the cumulative distribution of the values
or function. Treatment of cases where n is even
and n is odd, and when i runs from 1 to n or 0
to n vary

Inter-quartile range, The difference between the lower and upper IQR=UQ-LQ
IQR quartile values, hence covering the middle 50%
of the distribution. The inter-quartile range can
be obtained by taking the median of the
dataset, then finding the median of the upper
and lower halves of the set. The IQR is then the
difference between these two secondary
medians

Trim-range, TR, t The range computed with a specified TRt=X n(1-t/2)-X nt/2, tÎ[0,1]
percentage (proportion), t/2, of the highest
and lowest values removed to eliminate outliers TR =IQR
50%
and extreme values. For small samples a
specific number of observations (e.g. 1) rather
than a percentage, may be ignored. In general
an equal number, k, of high and low values are
removed (if possible)

Variance, Var, σ2, s2, The average squared difference of values in a n


Var   2   xi   
1 2
μ2 dataset from their population mean, μ, or from
the sample mean (also known as the sample
n i1
variance where the data are a sample from a
n
larger population). Differences are squared to 1
 xi  x 
2
remove the effect of negative values (the Var 
n i1
summation would otherwise be 0). The third
formula is the frequency form, where
frequencies have been standardized, i.e. ∑fi=1.

de Smith, Goodchild, Longley and Associates, 2025 [Link]


48 Common Measures and Notation

Measure Definition Expression(s)

Var is a function of the 2nd moment about the n

 fi xi  x 
2
mean. The population variance is often denoted Var 
by the symbol μ 2 or σ2. i1

The estimated population variance is often 1 n


denoted by s2 or by σ2 with a ^ symbol above it Var 
n
 xi  x xi  x 
i1

n
s 2  ˆ2 
1
 xi  x 
2

n 1 i1

Standard deviation, The square root of the variance, hence it is the SD  Var  
SD, s or RMSD Root Mean Squared Deviation (RMSD). The
population standard deviation is often denoted n
1
 xi  x 
2
by the symbol σ. SD* shows the estimated SD 
population standard deviation (sometimes n i 1
denoted by σ with a ^ symbol above it or by s)
n
SD*  ˆ 
1
 xi  x 
2

n 1 i 1

Standard error of the The estimated standard deviation of the mean SD


mean, SE values of n samples from the same population.
SE 
n
It is simply the sample standard deviation
reduced by a factor equal to the square root of
the number of samples, n>=1

Root mean squared The standard deviation of samples from a n


1 2
error, RMSE known set of true values, xi*. If xi* are RMSE 
n
 xi  xi* 
estimated by the mean of sampled values RMSE i1
is equivalent to RMSD

Mean deviation/error, The mean deviation of samples from the known n


1
MD or ME set of true values, xi* MD 
n
 xi  xi* 
i1

Mean absolute The mean absolute deviation of samples from n


1
deviation/error, MAD the known set of true values, xi* MAE 
n
 xi  xi*
or MAE i1

de Smith, Goodchild, Longley and Associates, 2025 [Link]


Overview and Terminology 49

Measure Definition Expression(s)

Covariance, Cov Literally the pattern of common (or co-) n


1
variation observed in a collection of two (or
more) datasets, or partitions of a single
Cov(x ,y ) 
n
 xi  x yi  y 
i1
dataset. Note that if the two sets are the same
Cov(x , x )  Var(x )
the covariance is the same as the variance

Correlation/ product A measure of the similarity between two (or r  Cov(x ,y )/SDx SDy
moment or Pearson’s more) paired datasets. The correlation
correlation coefficient, coefficient is the ratio of the covariance to the n

r product of the standard deviations. If the two  xi  x yi  y 


datasets are the same or perfectly matched this i1
r
will give a result=1 n n

  xi  x   yi  y 
2 2

i1 i1

Coefficient of The ratio of the standard deviation to the


SD / x
variation, CV mean, sometime computed as a percentage. If
this ratio is close to 1, and the distribution is
strongly left skewed, it may suggest the
underlying distribution is Exponential. Note,
mean values close to 0 may produce unstable
results

Variance mean ratio, The ratio of the variance to the mean, Var / x
VMR sometime computed as a percentage. If this
ratio is close to 1, and the distribution is
unimodal and relates to count data, it may
suggest the underlying distribution is Poisson.
Note, mean values close to 0 may produce
unstable results

Measures of distribution shape


Measure Definition Expression(s)

Skewness, α3 If a frequency distribution is unimodal and n


symmetric about the mean it has a skewness of 3  1
 xi   
3

0. Values greater than 0 suggest skewness of a n i13

unimodal distribution to the right, whilst values


n
less than 0 indicate skewness to the left. A
3  1
 xi  x 
3

nˆ i1
function of the 3rd moment about the mean 3
(denoted by α3 with a ^ symbol above it for the
sample skewness)

de Smith, Goodchild, Longley and Associates, 2025 [Link]


50 Common Measures and Notation

Measure Definition Expression(s)

n
ˆ3  n
 x  x 
3

 i1 i
(n  1)(n  2)ˆ 3

Kurtosis, α4 A measure of the peakedness of a frequency n


distribution. More pointy distributions tend to 4 
1
 xi  x 
4

have high kurtosis values. A function of the 4th nˆ4 i1


moment about the mean. It is customary to
n
4  xi   
subtract 3 from the raw kurtosis value (which is 1 4

n 4
the kurtosis of the Normal distribution) to give
a figure relative to the Normal (denoted by α4 i1

with a ^ symbol above it for the sample n


ˆ4  a
ˆ 
4
kurtosis)
4
xi  x  b
i1

where

n(n  1)
a
(n  1)(n  2)(n  3) ,
2
3 n  1
b
(n  2)(n  3)

Measures of complexity and dimensionality


Measure Definition Expression(s)

Information statistic A measure of the amount of pattern, disorder or k


(Entropy), I (Shannon’s) information, in a set {xi} where pi is the proportion I   pi log2 (pi )
of events or values occurring in the ith class or i1

range. Note that if pi=0 then pilog2(pi) is 0. I takes


values in the range [0,log2(k)]. The lower value
means all data falls into 1 category, whilst the
upper means all data are evenly spread

Information statistic Shannon’s entropy statistic (see above) k

(Diversity), Div standardized by the number of classes, k, to give a   pi log2(pi )


range of values from 0 to 1 i1
Div 
log2 (k)

Dimension (topological), Broadly, the number of (intrinsic) coordinates DT=0,1,2,3,…


DT needed to refer to a single point anywhere on the

de Smith, Goodchild, Longley and Associates, 2025 [Link]


Overview and Terminology 51

Measure Definition Expression(s)

object. The dimension of a point=0, a rectifiable


line=1, a surface=2 and a solid=3. See text for fuller
explanation. The value 2.5 (often denoted 2.5D) is
used in GIS to denote a planar region over which a
single-valued attribute has been defined at each
point (e.g. height). In mathematics topological
dimension is now equated to a definition similar to
cover dimension (see below)

Dimension (capacity, Let N(h) represent the number of small elements of lnN(h)
cover or fractal), Dc edge length h required to cover an object. For a Dc   lim , h  0
ln(h)
line, length 1, each element has length 1/h. For a
plane surface each element (small square of side Dc  0
length 1/h) has area 1/h2, and for a volume, each
element is a cube with volume 1/h3.
More generally N(h)=1/hD, where D is the
topological dimension, so N(h)= h-D and thus
log(N(h))=-Dlog(h) and so Dc=-log(N(h))/log(h). Dc
may be fractional, in which case the term fractal is
used

Common distributions
Measure Definition Expression(s)

Uniform (continuous) All values in the range are equally likely. 1


Mean=a/2, variance=a2/12. Here we use
f (x )  ; x  [0,a]
a
f(x) to denote the probability distribution
associated with continuous valued
variables x, also described as a probability
density function

n!
p x q1 x ; x  1,2,... n
Binomial (discrete) The terms of the Binomial give the
probability of x successes out of n trials,
p(x ) 
(n  x )!x !
for example 3 heads in 10 tosses of a coin,
where p=probability of success and
q=1-p=probability of failure. Mean, m=np,
variance=npq. Here we use p(x) to denote
the probability distribution associated with
discrete valued variables x

Poisson (discrete) An approximation to the Binomial when p


mx
is very small and n is large (>100), but the p(x )  em ; x  1,2,... n
x!

de Smith, Goodchild, Longley and Associates, 2025 [Link]


52 Common Measures and Notation

Measure Definition Expression(s)

mean m=np is fixed and finite (usually not


large). Mean=variance=m

Normal (continuous) The distribution of a measurement, x, that 1


f (z)  e z/2 ; z  [-,]
is subject to a large number of
2
independent, random, additive errors. The
Normal distribution may also be derived as
an approximation to the Binomial when p
is not small (e.g. p≈1/2) and n is large. If
μ=mean and σ=standard deviation, we
write N(μ,σ) as the Normal distribution
with these parameters. The Normal- or z-
transform z=(x-μ)/σ changes (normalizes)
the distribution so that it has a zero mean
and unit variance, N(0,1). The distribution
of n mean values of independent random
variables drawn from any underlying
distribution is also Normal (Central Limit
Theorem)

Data transforms and back transforms


Measure Definition Expression(s)

Log If the frequency distribution for a dataset is z=ln(x) or


broadly unimodal and left-skewed, the z=ln(x+1)
natural log transform (logarithms base e)
will adjust the pattern to make it more n.b. ln(x)=loge(x)=log10(x)*log10(e)
symmetric/similar to a Normal distribution.
x=exp(z) or x=exp(z)-1
For variates whose values may range from 0
upwards a value of 1 is often added to the
transform. Back transform with the exp()
function

Square root A transform that may adjust the dataset to z  x , or


(Freeman-Tukey) make it more similar to a Normal
distribution. For variates whose values may z  x  1, or
range from 0 upwards a value of 1 is often z  x + x  1 (FT)
added to the transform. For 0<=x<=1 (e.g.
rate data) the combined form of the x  z 2 , or x=z2  1
transform is often used, and is known as the
Freeman-Tukey (FT) transform

de Smith, Goodchild, Longley and Associates, 2025 [Link]


Overview and Terminology 53

Measure Definition Expression(s)

Logit Often used to transform binary response  p 


data, such as survival/non-survival or
z  ln  ,p  [0,1]
 1 p 
present/absent, to provide a continuous
value in the range (-∞,∞), where p is the
ez
proportion of the sample that is 1 (or 0). p
The inverse or back-transform is shown as p 1 ez
in terms of z. This transform avoids
concentration of values at the ends of the
range. For samples where proportions p may
take the values 0 or 1 a modified form of
the transform may be used. This is typically
achieved by adding 1/2n to the numerator
and denominator, where n is the sample
size. Often used to correct S-shaped
(logistic) relationships between response
and explanatory variables

Normal, z-transform This transform normalizes or standardizes (x  )


z1 
the distribution so that it has a zero mean 
and unit variance. If {xi} is a set of n sample
mean values from any probability (x  )
z2 
distribution with mean μ and variance σ2
then the z-transform shown here as z2 will
 n
be distributed N(0,1) for large n (Central
Limit Theorem). The divisor in this instance
is the standard error. In both instances the
standard deviation must be non-zero

Box-Cox, power A family of transforms defined for positive (x k  1)


transforms data values only, that often can make z , k  0, x  0
k
datasets more Normal; k is a parameter.
The inverse or back-transform is also shown 1/k
as x in terms of z x  kz  1 ,k0

Angular transforms A transform for proportions, p, designed to z  sin1pk ,p  sin(z)1/k


(Freeman-Tukey) spread the set of values near the end of the
range. k is typically 0.5. Often used to  x 
z  sin1 1 x  1  (FT)
correct S-shaped relationships between   sin  
response and explanatory variables. If  n  1  n  1
p=x/n then the Freeman-Tukey (FT) version
of this transform is the averaged version
shown. This is a variance-stabilizing
transform

de Smith, Goodchild, Longley and Associates, 2025 [Link]


54 Common Measures and Notation

Selected functions
Measure Definition Expression(s)

(1)i( /2)2i

J0 ( ) 
Bessel functions of the Bessel functions occur as the solution to specific
first kind differential equations. They are described with  (i!)2
reference to a parameter known as the order, shown i 0

as a subscript. For non-negative real orders Bessel and


functions can be represented as an infinite series.
( /2)2i1
Order 0 expansions are shown here for standard (J) 
and modified (I) Bessel functions. Usage in spatial I0 ( )  
analysis arises in connection with directional i 0
i!(i  1)!
statistics and spline curve fitting. See the Mathworld
website entry for more details


Exponential integral A definite integral function. Used in association with etx
function, E1(x) spline curve fitting. See the Mathworld website entry E1(x)   t
dt
for more details 1

Gamma function, Γ A widely used definite integral function. For integer 


values of x: (x)   x1/2e x dx
0
Γ(x)=(x-1)! and Γ(x/2)=(x/2-1)! so Γ(3/2)
=(1/2)!/2=(Öπ)/2
 1 2   
See the Mathworld website entry for more details

Matrix expressions
Measure Definition Expression(s)

Identity A matrix with diagonal elements 1 and off-diagonal 1 0 0 0


elements 0 0 1 0 0
I  
.. .. .. ..
 
0 0 0 1

Determinant Determinants are only defined for square matrices. Let |A|, Det(A)
A be an n by n matrix with elements {aij }. The matrix
Mij here is a subset of A known as the minor, formed by
eliminating row i and column j from A. An n by n
matrix, A, with Det=0 is described as singular, and such
a matrix has no inverse. If Det(A) is very close to 0 it is
described as ill-conditioned

de Smith, Goodchild, Longley and Associates, 2025 [Link]


Overview and Terminology 55

Measure Definition Expression(s)

Inverse The matrix equivalent of division in conventional A-1


algebra. For a matrix, A, to be invertible its
determinant must be non-zero, and ideally not very
close to zero. A matrix that has an inverse is by
definition non-singular. A symmetric real-valued matrix
is positive definite if all its eigenvalues are positive,
whereas a positive semi-definite matrix allows for some
eigenvalues to be 0. A matrix, A, that is invertible
satisfies the relation AA-1=I

Transpose A matrix operation in which the rows and columns are


T
transposed, i.e. in which elements aij are swapped with A or A ¢
aji for all i,j. The inverse of a transposed matrix is the
(AT)–1=(A-1)T
same as the transpose of the matrix inverse

Symmetric A matrix in which element aij =aji for all i,j A=AT

Trace The sum of the diagonal elements of a matrix, aii — the Tr(A)
sum of the eigenvalues of a matrix equals its trace

Eigenvalue, Eigenvector If A is a real-valued k by k square matrix and x is a (A-λ I)x=0


non-zero real-valued vector, then a scalar λ that
satisfies the equation shown in the adjacent column is A=EDE-1 (diagonalization)
known as an eigenvalue of A and x is an eigenvector of
A. There are k eigenvalues of A, each with a
corresponding eigenvector. The matrix A can be
decomposed into three parts, as shown, where E is a
matrix of its eigenvectors and D is a diagonal matrix of
its eigenvalues

de Smith, Goodchild, Longley and Associates, 2025 [Link]


Chapter

2
Conceptual Frameworks for Spatial Analysis 59

2 Conceptual Frameworks for Spatial Analysis


Geospatial analysis provides a distinct perspective on the world, a unique lens through which to examine events,
patterns, and processes that operate on or near the surface of our planet. It makes sense, then, to introduce the
main elements of this perspective, the conceptual framework that provides the background to spatial analysis, as
a preliminary to the main body of this Guide’s material. This chapter provides that introduction. It is divided into
four main sections. The first, Basic Primitives, describes the basic components of this view of the world — the
classes of things that a spatial analyst recognizes in the world, and the beginnings of a system of organization of
geographic knowledge. The second section, Spatial Relationships, describes some of the structures that are built
with these basic components and the relationships between them that interest geographers and others. The third
section, Spatial Statistics, introduces the concepts of spatial statistics, including probability, that provide perhaps
the most sophisticated elements of the conceptual framework. Finally, the fourth section, Spatial Data
Infrastructure, discusses some of the basic components of the data infrastructure that increasingly provides the
essential facilities for spatial analysis.
The domain of geospatial analysis is the surface of the Earth, extending upwards in the analysis of topography and
the atmosphere, and downwards in the analysis of groundwater and geology. In scale it extends from the most
local, when archaeologists record the locations of pieces of pottery to the nearest centimeter or property
boundaries are surveyed to the nearest millimeter, to the global, in the analysis of sea surface temperatures or
global warming. In time it extends backwards from the present into the analysis of historical population
migrations, the discovery of patterns in archaeological sites, or the detailed mapping of the movement of
continents, and into the future in attempts to predict the tracks of hurricanes, the melting of the Greenland ice-
cap, or the likely growth of urban areas. Methods of spatial analysis are robust and capable of operating over a
range of spatial and temporal scales.
Ultimately, geospatial analysis concerns what happens where, and makes use of geographic information that links
features and phenomena on the Earth’s surface to their locations. This sounds very simple and straightforward,
and it is not so much the basic information as the structures and arguments that can be built on it that provide
the richness of spatial analysis. In principle there is no limit to the complexity of spatial analytic techniques that
might find some application in the world, and might be used to tease out interesting insights and support practical
actions and decisions. In reality, some techniques are simpler, more useful, or more insightful than others, and
the contents of this Guide reflect that reality. This chapter is about the underlying concepts that are employed,
whether it be in simple, intuitive techniques or in advanced, complex mathematical or computational ones.
Spatial analysis exists at the interface between the human and the computer, and both play important roles. The
concepts that humans use to understand, navigate, and exploit the world around them are mirrored in the
concepts of spatial analysis. So the discussion that follows will often appear to be following parallel tracks — the
track of human intuition on the one hand, with all its vagueness and informality, and the track of the formal,
precise world of spatial analysis on the other. The relationship between these two tracks forms one of the
recurring themes of this Guide.

de Smith, Goodchild, Longley and Associates, 2025 [Link]


60 Basic Primitives

2.1 Basic Primitives


The building blocks for any form of spatial analysis are a set of basic primitives that refer to the place or places of
interest, their attributes and their arrangement. These basic primitives are discussed in the following subsections.

2.1.1 Place
At the center of all spatial analysis is the concept of place. The Earth’s surface comprises some 500,000,000 sq
km, so there would be room to pack half a billion industrial sites of 1 sq km each (assuming that nothing else
required space, and that the two-thirds of the Earth’s surface that is covered by water was as acceptable as the
one-third that is land); and 500 trillion sites of 1 sq m each (roughly the space occupied by a sleeping human).
People identify with places of various sizes and shapes, from the room to the parcel of land, to the neighborhood,
the city, the county, the state or province, or the nation-state. Places may overlap, as when a watershed spans
the boundary of two counties, and places may be nested hierarchically, as when counties combine to form a state
or province.
Places often have names, and people use these to talk about and distinguish between places. Some names are
official, having been recognized by national or state agencies charged with bringing order to geographic names. In
the U.S., for example, the Board on Geographic Names exists to ensure that all agencies of the federal
government use the same name in referring to a place, and to ensure as far as possible that duplicate names are
removed from the landscape. A list of officially sanctioned names is termed a gazetteer, though that word has
come to be used for any list of geographic names.
Places change continually, as people move, climate changes, cities expand, and a myriad of social and physical
processes affect virtually every spot on the Earth’s surface. For some purposes it is sufficient to treat places as if
they were static, especially if the processes that affect them are comparatively slow to operate. It is difficult, for
example, to come up with instances of the need to modify maps as continents move and mountains grow or shrink
in response to earthquakes and erosion. On the other hand it would be foolish to ignore the rapid changes that
occur in the social and economic makeup of cities, or the constant movement that characterizes modern life.
Throughout this Guide, it will be important to distinguish between these two cases, and to judge whether time is
or is not important.
People associate a vast amount of information with places. Three Mile Island, Sellafield, and Chernobyl are
associated with nuclear reactors and accidents, while Tahiti and Waikiki conjure images of (perhaps somewhat
faded) tropical paradise. One of the roles of places and their names is to link together what is known in useful
ways. So for example the statements “I am going to London next week” and “There’s always something going on
in London” imply that I will be having an exciting time next week. But while “London” plays a useful role, it is
nevertheless vague, since it might refer to the area administered by the Greater London Authority, the area inside
the M25 motorway, or something even less precise and determined by the context in which the name is used.
Science clearly needs something better if information is to be linked exactly to places, and if places are to be
matched, measured, and subjected to the rigors of spatial analysis.
The basis of rigorous and precise definition of place is a coordinate system, a set of measurements that allows
place to be specified unambiguously and in a way that is meaningful to everyone. The Meridian Convention of
1884 established the Greenwich Observatory in London as the basis of longitude, replacing a confusing multitude
of earlier systems. Today, the World Geodetic System (WGS84) of 1984 and subsequent adjustments provide a
highly accurate pair of coordinates for every location on the Earth’s surface (and incidentally place the line of
zero longitude about 100m east of the Greenwich Observatory). Elevation continues to be problematic, however,
since countries and even agencies within countries insist on their own definitions of what marks zero elevation, or
exactly how to define “sea level”. Many other coordinate systems are in use, but most are easily converted to and
from latitude/longitude. Today it is possible to measure location directly, using the Global Positioning System
(GPS) or its Russian counterpart GLONASS (and in future its European counterpart Galileo). Spatial analysis is most

de Smith, Goodchild, Longley and Associates, 2025 [Link]

Common questions

Powered by AI

The development of Big Data and AI is transforming the GIS industry by significantly enhancing the capacity to process and analyze large-scale geospatial datasets. These technologies enable predictive modeling, real-time data analysis, and the automation of spatial data processing, leading to more efficient and accurate geospatial insights. The integration of AI into GIS—known as GeoAI—facilitates advanced spatial analysis through machine learning and deep learning techniques, expanding the scope and capabilities of GIS applications across various sectors .

GeoAI extends traditional geospatial analysis capabilities through advanced machine learning and artificial intelligence applications, enabling enhanced pattern recognition, predictions, and automation in processing geospatial data. It facilitates more sophisticated analytical models and the ability to process and analyze massive datasets in real-time, offering unprecedented insights into spatial patterns, trends, and relationships. GeoAI empowers the development of intelligent systems that can adapt to changes and make predictive assessments autonomously .

Current GIS packages often have limitations in supporting complex statistical methods, network, and geocomputational analysis due to the inherent complexity and technical requirements of these domains. Many GIS systems are designed for general use and may not offer the depth of functionality needed for advanced statistical analysis or dynamic network modeling. Integration with dedicated statistical software and custom plugins is often necessary to extend GIS capabilities in these areas .

Research-ready data (RRD) enhances the effectiveness of spatial analysis by providing standardized, curated, and high-quality datasets that are immediately suitable for analysis. This eliminates time-consuming preparation steps, allowing analysts to focus directly on modeling and analysis. RRD supports consistency in analytical results and facilitates scalable applications across different projects and domains .

Model-building is crucial in spatial analysis for constructing representations of geospatial problems to find solutions. GIS models used to address geospatial problems include predictive, prescriptive, and exploratory models, which can be constructed using various types of data and analytical methods. These models help in understanding spatial distributions and patterns through simulation and predictive analytics, enabling decision-makers to formulate effective strategies .

Open-source GIS software, such as QGIS and GRASS, typically offers flexibility and cost-effectiveness but may be more targeted towards specific applications or forms of spatial representation, such as raster or vector models. It caters to users who prioritize customization and community-driven development. In contrast, commercial GIS software like ArcGIS provides comprehensive tools and ready-made solutions with extensive customer support, appealing to users needing robust, turnkey systems for general-purpose applications .

The Open Geospatial Consortium's "simple feature specifications" influence vector-based GIS operations by standardizing how data concerning geometric shapes and spatial features are used in GIS environments. This ensures interoperability among different systems and software, facilitating operations such as map overlay and buffering. It promotes consistent implementation of vector data management across various GIS tools, enhancing accuracy and compatibility .

Integrating 2D and 3D mapping facilities in GIS software is significant because it enhances the visualization and analysis of spatial data, allowing for a more comprehensive understanding of geospatial relationships. This integration supports a wide range of applications, from urban planning to environmental monitoring, by providing realistic representations of the spatial context. It facilitates better decision-making processes as both surface-level details and vertical structures are considered .

The evolution of spatial data exploration and visualization methodologies has greatly enhanced modern geospatial analysis by allowing analysts to uncover complex spatial relationships and trends more intuitively. Advanced visualization tools help in better communication of analysis results, supporting exploratory data analysis (EDA) and understanding phenomena at different scales. Techniques such as network-based modeling and geocomputation continue to evolve, providing dynamic and interactive means to explore spatial data at micro and macro levels .

The trend towards local analysis in geospatial analysis emphasizes understanding spatial phenomena at more localized scales, allowing more precise and context-sensitive insights compared to global analysis, which tends to generalize findings over larger areas. This shift enables the identification of micro-scale patterns and regional variations, facilitating targeted interventions and planning by focusing on specific community needs or environmental contexts .

You might also like