0% found this document useful (0 votes)
14 views9 pages

Vector Data Analysis in R: A Guide

This document outlines an exercise on exploring vector data using R, focusing on reading, clipping, and merging shapefiles related to Indian district and state boundaries. It details the use of packages such as sf, dplyr, and ggplot2 for data manipulation and visualization. The exercise also includes practical tasks for students to submit and further readings for advanced understanding of geospatial analysis in R.

Uploaded by

manasranjandash
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views9 pages

Vector Data Analysis in R: A Guide

This document outlines an exercise on exploring vector data using R, focusing on reading, clipping, and merging shapefiles related to Indian district and state boundaries. It details the use of packages such as sf, dplyr, and ggplot2 for data manipulation and visualization. The exercise also includes practical tasks for students to submit and further readings for advanced understanding of geospatial analysis in R.

Uploaded by

manasranjandash
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

EXERCISE 6

VECTOR DATA EXPLORATION


WITH R
Outline of Exercise_______________________________________________
6.1 Introduction Working on State Shapefile

Expected Learning Skills Merging Districts

6.2 Requirements 6.4 Exercises: To Be Submitted


6.3 Exploring Vector Data 6.5 Exercises: Do It Yourself
Reading Shapefile 6.6 Further / Suggested Readings
Clipping a Shapefile

6.1 INTRODUCTION
Vector data, representing geographic features as points, lines, and polygons, is fundamental to
Geographic Information Systems (GIS). R, with its powerful statistical and graphical capabilities
combined with specialized spatial packages, provides an excellent environment for exploring and
analyzing vector data. This exploration involves a series of steps, from data import and
manipulation to spatial analysis and visualisation.
Exploring vector data of Indian district and state boundaries in R involves using the sf package for
spatial data handling, combined with dplyr for attribute manipulation and ggplot2 for visualization.
This process begins with importing shapefiles using st_read(), ensuring consistent CRS through
st_transform(), and inspecting data structure with str(). Attribute-based queries are performed using
dplyr verbs like filter(), while spatial relationships and geometric operations leverage sf functions
like st_intersects(), st_union(), and st_area(). Boundary dissolution to create state polygons from
districts is achieved using group_by() and summarise(). Finally, ggplot2 with geom_sf() or mapview
allows for creating static or interactive maps, enabling effective visualization and analysis of
administrative divisions within India.
MGYL-012 Advanced Geoinformatics Laboratory
.............….…...............................................................................................................................
Expected Learning Skills______________________________
After completing this exercise, you should be able to:
 read and visualise a shape file in R;
 clip a shape file in R; and
 merge various features in a shape file using R.

6.2 REQUIREMENTS
Before diving into geospatial operations, you need to install and load the
following necessary packages in R used in this exercise:
1. sf: A modern package for working with vector data.
2. terra: Focused on raster and vector data processing.
3. dplyr: For data manipulation and filtering.
4. tmap: For visualization of spatial data.

6.3 EXPLORING VECTOR DATA


6.3.1 Reading A Shapefile
Now, we will read the India district boundary which is a polygon shapefile.
India_dist<- st_read("C:\\DEM\\spatialdata\\DISTRICT_BOUNDARY.shp")

2
Exercise 6 Vector Data Exploration with R
...............................................….…............................................................................................
Note: Check the file where you have saved the shape file in your system.
The path would vary depending upon where you have kept the file.
Reading layer `DISTRICT_BOUNDARY' from data source
`C:\\DEM\\spatialdata\\DISTRICT_BOUNDARY.shp
using driver `ESRI Shapefile'
Simple feature collection with 742 features and 7 fields
Geometry type: MULTIPOLYGON
Dimension:
XY
Bounding box: xmin: 2818364 ymin: 2177527 xmax: 5679119 ymax: 5444563
Projected CRS: LCC_WGS84
This also gives you all the details of the shapefile. There are 742 features and 7
fields. This means there are 742 polygons in this shapefile and there are 7
columns in the attribute table of the shapefile.

We can look at the structure.


str(India_dist)

3
MGYL-012 Advanced Geoinformatics Laboratory
.............….…...............................................................................................................................
We can visualize using plot function.
ggplot(India_dist) + geom_sf(fill = "white", color = 'black') + theme_void()

6.3.2 Clipping a Shapefile


Clipping involves extracting a subset of a shapefile based on a defined
boundary. For instance, if you have a shapefile containing administrative
boundaries for an entire country and you want to focus on a specific district or
region, clipping is the way to go.
Let us say, we want to find out what is average elevation for each district in
Kerala state? For that, we need to first extract Kerala state boundary with its
districts from the India_dist.

kerala<-India_dist %>% filter(STATE=="KERALA")


print(kerala)

4
Exercise 6 Vector Data Exploration with R
...............................................….…............................................................................................

plot(kerala)

6.3.3 Working on State Shapefile


Follow the steps given here:
1. ggplot(kerala):

5
MGYL-012 Advanced Geoinformatics Laboratory
.............….…...............................................................................................................................
 ggplot() is the main function from the ggplot2 package for creating plots
in R.
 kerala is the data argument. It should be an sf object (a spatial data
frame) containing the geometry and attributes of the state of Kerala (or
whatever region you are plotting). This sf object likely came from
reading a shapefile or other spatial data source using st_read() from the
sf package. This line initializes a ggplot2 plot, specifying that the data for
the plot will come from the kerala object.
2. + geom_sf(fill = "white", color = 'black'):
 The + symbol in ggplot2 is used to add layers or components to the plot.
 geom_sf() is the key function for plotting spatial data stored as sf
objects. It tells ggplot2 to draw the geometries (points, lines, or
polygons) contained in the sf object.
 fill = "white" sets the fill color of the polygons (in this case, the state of
Kerala) to white.
 color = 'black' sets the outline or border color of the polygons to black.
 + theme_void():
 theme_void() is a function from ggplot2 that removes all non-data
elements from the plot, such as axes, grid lines, background, and plot
margins. This creates a very clean map with just the shape of Kerala
displayed.
In summary:
This code creates a map of Kerala (or whatever is in the kerala sf object) using
ggplot2. The state's shape is filled with white and has a black outline. The
theme_void() removes all other visual elements, leaving only the shape itself on
a blank background.

6
Exercise 6 Vector Data Exploration with R
...............................................….…............................................................................................
6.3.4 Merging Districts
Merging district boundaries to create a state boundary (or similar operations at
other administrative levels) is a fundamental GIS operation with several
important applications:
 Data Generalisation and Simplification:
Map Clarity: When displaying maps at smaller scales (e.g., a map of a
country showing its states), showing all district boundaries would be too
cluttered and difficult to interpret. Merging simplifies the map by showing
only the higher-level administrative units (states).
Data Storage and Processing: Storing and processing fewer, larger
polygons (states) is more efficient than handling many smaller polygons
(districts). This is especially important when dealing with large datasets or
complex analyses.

Here we see that the plot of merged district boundary is inconsistent meaning
we have an inconsistent vector information about the districts of Kerala.

7
MGYL-012 Advanced Geoinformatics Laboratory
.............….…...............................................................................................................................

By mastering clipping and merging operations in R Studio using packages


like sf, sp, and ggplot2, you will enhance your geospatial analysis skills
significantly. These practical exercises will enable you to work with vector data
effectively, paving the way for more advanced spatial analyses and applications
in their respective fields. As you practice these techniques, consider exploring
additional functionalities within these packages to further expand your
geospatial toolkit.

6.4 EXERCISES: TO BE SUBMITTED


Submit answers to the following to your counsellor for evaluation as
practical records:
1. Map of India with its state boundaries as derived using R.
2. Map of Kerala state clipped from the shape file as derived using R.
3. Map of the Kerala state with its districts shown in different colours as
derived using R.

6.5 EXERCISES: DO IT YOURSELF


1. Try to work on the master column showing districts of Kerala and plot with
different colours.
2. Try to clip only one district of interest.
3. Try to cluster 3 largest area districts and plot with different colours
4. Try to cluster 3 smallest area districts and plot with different colours
5. Try to colour all districts differently according to their increasing areas.

6.6 FURTHER/ SUGGESTED READINGS


 Lovelace, R., Nowosad, J., & Muenchow, J. (2019). Geocomputation with R.
CRC Press. [Link]

8
Exercise 6 Vector Data Exploration with R
...............................................….…............................................................................................
 Bivand, R. S., Pebesma, E., & Gómez-Rubio, V. (2013). Applied Spatial
Data Analysis with R (2nd ed.). Springer. [Link]
4614-7618-4
 Pebesma, E., &Bivand, R. S. (2023). Spatial Data Science with R. Springer.
 Islam, S. (2021). Hands-On Geographic Information Science with R and
QGIS. Apress. [Link]
 Gopi, K. (2021). Introduction to Geospatial Technologies Using R. CRC
Press. [Link]
Using-R/Gopi/p/book/9781032081696

Common questions

Powered by AI

Merging district boundaries into state boundaries simplifies map clarity by reducing visual clutter when displaying maps at smaller scales. It helps in data generalization and makes storage and processing more efficient by handling fewer, larger polygons instead of many smaller ones .

Clipping operations allow for focusing analysis on specific geographic areas by extracting relevant subsets from larger datasets. This reduces data complexity, improves processing efficiency, and enables detailed study on localized areas, such as regional policy planning or environmental assessments .

Merging operations in GIS are critical for creating simplified, higher-level administrative units from detailed layers, such as converting district boundaries into state boundaries. This process augments map clarity, storage efficiency, and computational processing by reducing the number of polygons, thus simplifying datasets .

Learning spatial data manipulation techniques such as filtering and geometric operations in R enriches geospatial analysis capabilities. They enable precise data extraction and transformation, support complex spatial calculations, and improve visualization and interpretation. These techniques are critical in fields like urban planning, environmental monitoring, and resource management .

The 'sf' package is essential for managing and analyzing vector data, representing geographic features as simple features. It provides functions like st_read() for importing shapefiles, st_transform() for handling CRS, and various geometric operations such as st_intersects() and st_area(). It integrates well with other R packages like dplyr and ggplot2, enhancing data manipulation and visualization .

The 'st_read()' function from the 'sf' package is pivotal for importing shapefiles into R. It reads spatial data, converting it into an 'sf' object ready for analysis and manipulation. This function is the entry point for inspecting vector data structure and details, integral to subsequent spatial operations .

The steps involve: 1) Extracting the specific district using dplyr functions like filter(), 2) Analyzing spatial relationships using the 'sf' package with functions like st_intersects(), and 3) Visualizing the district with ggplot2 by using geom_sf() and applying aesthetic options for clarity and emphasis .

The ggplot2 package allows for detailed customization and layering in visualizations. When plotting features like the Kerala state boundary, it uses functions such as geom_sf() to draw geometries, while theme_void() can remove non-data elements for a clean map. It facilitates the use of aesthetic mappings, making visualization intuitive and visually appealing .

Setting a consistent Coordinate Reference System (CRS) ensures that all spatial data aligns correctly when performing geometric operations or visualizations. It is crucial for accurate spatial analysis and visualization, as inconsistency can lead to errors in spatial relationships and distances .

'theme_void()' in ggplot2 removes non-essential plot elements like axes, grid lines, and background, focusing solely on the data. This enhances clarity by presenting a minimalistic map where the viewer's attention is on the geographic shapes themselves, beneficial for presentations and reports where emphasis is on spatial distributions .

You might also like