EXERCISE 6
VECTOR DATA EXPLORATION
WITH R
Outline of Exercise_______________________________________________
6.1 Introduction Working on State Shapefile
Expected Learning Skills Merging Districts
6.2 Requirements 6.4 Exercises: To Be Submitted
6.3 Exploring Vector Data 6.5 Exercises: Do It Yourself
Reading Shapefile 6.6 Further / Suggested Readings
Clipping a Shapefile
6.1 INTRODUCTION
Vector data, representing geographic features as points, lines, and polygons, is fundamental to
Geographic Information Systems (GIS). R, with its powerful statistical and graphical capabilities
combined with specialized spatial packages, provides an excellent environment for exploring and
analyzing vector data. This exploration involves a series of steps, from data import and
manipulation to spatial analysis and visualisation.
Exploring vector data of Indian district and state boundaries in R involves using the sf package for
spatial data handling, combined with dplyr for attribute manipulation and ggplot2 for visualization.
This process begins with importing shapefiles using st_read(), ensuring consistent CRS through
st_transform(), and inspecting data structure with str(). Attribute-based queries are performed using
dplyr verbs like filter(), while spatial relationships and geometric operations leverage sf functions
like st_intersects(), st_union(), and st_area(). Boundary dissolution to create state polygons from
districts is achieved using group_by() and summarise(). Finally, ggplot2 with geom_sf() or mapview
allows for creating static or interactive maps, enabling effective visualization and analysis of
administrative divisions within India.
MGYL-012 Advanced Geoinformatics Laboratory
.............….…...............................................................................................................................
Expected Learning Skills______________________________
After completing this exercise, you should be able to:
read and visualise a shape file in R;
clip a shape file in R; and
merge various features in a shape file using R.
6.2 REQUIREMENTS
Before diving into geospatial operations, you need to install and load the
following necessary packages in R used in this exercise:
1. sf: A modern package for working with vector data.
2. terra: Focused on raster and vector data processing.
3. dplyr: For data manipulation and filtering.
4. tmap: For visualization of spatial data.
6.3 EXPLORING VECTOR DATA
6.3.1 Reading A Shapefile
Now, we will read the India district boundary which is a polygon shapefile.
India_dist<- st_read("C:\\DEM\\spatialdata\\DISTRICT_BOUNDARY.shp")
2
Exercise 6 Vector Data Exploration with R
...............................................….…............................................................................................
Note: Check the file where you have saved the shape file in your system.
The path would vary depending upon where you have kept the file.
Reading layer `DISTRICT_BOUNDARY' from data source
`C:\\DEM\\spatialdata\\DISTRICT_BOUNDARY.shp
using driver `ESRI Shapefile'
Simple feature collection with 742 features and 7 fields
Geometry type: MULTIPOLYGON
Dimension:
XY
Bounding box: xmin: 2818364 ymin: 2177527 xmax: 5679119 ymax: 5444563
Projected CRS: LCC_WGS84
This also gives you all the details of the shapefile. There are 742 features and 7
fields. This means there are 742 polygons in this shapefile and there are 7
columns in the attribute table of the shapefile.
We can look at the structure.
str(India_dist)
3
MGYL-012 Advanced Geoinformatics Laboratory
.............….…...............................................................................................................................
We can visualize using plot function.
ggplot(India_dist) + geom_sf(fill = "white", color = 'black') + theme_void()
6.3.2 Clipping a Shapefile
Clipping involves extracting a subset of a shapefile based on a defined
boundary. For instance, if you have a shapefile containing administrative
boundaries for an entire country and you want to focus on a specific district or
region, clipping is the way to go.
Let us say, we want to find out what is average elevation for each district in
Kerala state? For that, we need to first extract Kerala state boundary with its
districts from the India_dist.
kerala<-India_dist %>% filter(STATE=="KERALA")
print(kerala)
4
Exercise 6 Vector Data Exploration with R
...............................................….…............................................................................................
plot(kerala)
6.3.3 Working on State Shapefile
Follow the steps given here:
1. ggplot(kerala):
5
MGYL-012 Advanced Geoinformatics Laboratory
.............….…...............................................................................................................................
ggplot() is the main function from the ggplot2 package for creating plots
in R.
kerala is the data argument. It should be an sf object (a spatial data
frame) containing the geometry and attributes of the state of Kerala (or
whatever region you are plotting). This sf object likely came from
reading a shapefile or other spatial data source using st_read() from the
sf package. This line initializes a ggplot2 plot, specifying that the data for
the plot will come from the kerala object.
2. + geom_sf(fill = "white", color = 'black'):
The + symbol in ggplot2 is used to add layers or components to the plot.
geom_sf() is the key function for plotting spatial data stored as sf
objects. It tells ggplot2 to draw the geometries (points, lines, or
polygons) contained in the sf object.
fill = "white" sets the fill color of the polygons (in this case, the state of
Kerala) to white.
color = 'black' sets the outline or border color of the polygons to black.
+ theme_void():
theme_void() is a function from ggplot2 that removes all non-data
elements from the plot, such as axes, grid lines, background, and plot
margins. This creates a very clean map with just the shape of Kerala
displayed.
In summary:
This code creates a map of Kerala (or whatever is in the kerala sf object) using
ggplot2. The state's shape is filled with white and has a black outline. The
theme_void() removes all other visual elements, leaving only the shape itself on
a blank background.
6
Exercise 6 Vector Data Exploration with R
...............................................….…............................................................................................
6.3.4 Merging Districts
Merging district boundaries to create a state boundary (or similar operations at
other administrative levels) is a fundamental GIS operation with several
important applications:
Data Generalisation and Simplification:
Map Clarity: When displaying maps at smaller scales (e.g., a map of a
country showing its states), showing all district boundaries would be too
cluttered and difficult to interpret. Merging simplifies the map by showing
only the higher-level administrative units (states).
Data Storage and Processing: Storing and processing fewer, larger
polygons (states) is more efficient than handling many smaller polygons
(districts). This is especially important when dealing with large datasets or
complex analyses.
Here we see that the plot of merged district boundary is inconsistent meaning
we have an inconsistent vector information about the districts of Kerala.
7
MGYL-012 Advanced Geoinformatics Laboratory
.............….…...............................................................................................................................
By mastering clipping and merging operations in R Studio using packages
like sf, sp, and ggplot2, you will enhance your geospatial analysis skills
significantly. These practical exercises will enable you to work with vector data
effectively, paving the way for more advanced spatial analyses and applications
in their respective fields. As you practice these techniques, consider exploring
additional functionalities within these packages to further expand your
geospatial toolkit.
6.4 EXERCISES: TO BE SUBMITTED
Submit answers to the following to your counsellor for evaluation as
practical records:
1. Map of India with its state boundaries as derived using R.
2. Map of Kerala state clipped from the shape file as derived using R.
3. Map of the Kerala state with its districts shown in different colours as
derived using R.
6.5 EXERCISES: DO IT YOURSELF
1. Try to work on the master column showing districts of Kerala and plot with
different colours.
2. Try to clip only one district of interest.
3. Try to cluster 3 largest area districts and plot with different colours
4. Try to cluster 3 smallest area districts and plot with different colours
5. Try to colour all districts differently according to their increasing areas.
6.6 FURTHER/ SUGGESTED READINGS
Lovelace, R., Nowosad, J., & Muenchow, J. (2019). Geocomputation with R.
CRC Press. [Link]
8
Exercise 6 Vector Data Exploration with R
...............................................….…............................................................................................
Bivand, R. S., Pebesma, E., & Gómez-Rubio, V. (2013). Applied Spatial
Data Analysis with R (2nd ed.). Springer. [Link]
4614-7618-4
Pebesma, E., &Bivand, R. S. (2023). Spatial Data Science with R. Springer.
Islam, S. (2021). Hands-On Geographic Information Science with R and
QGIS. Apress. [Link]
Gopi, K. (2021). Introduction to Geospatial Technologies Using R. CRC
Press. [Link]
Using-R/Gopi/p/book/9781032081696