Packages:
- data.table data wrangling based on high performance [website]()
- dplyr data wrangling, using the tidy data approach [website]()
- rgdal bindings for GDAL and PROJ4 libraries
- rgeos
- sp 1st gen (2003) vector spatial data I/O and manipulation website
- sf 2nd gen (2015) vector spatial data I/O and manipulation website
- raster raster spatial data I/O and manipulation [website]()
- mapedit interactive editing for spatial objects blogpost-1 blogpost-2
- ggplot2 generic data visualization, including some mapping functionalities, using the Grammar of Graphics approach website
- tmap static mapping, using the same Grammar of Graphics layers' approach as ggplot2 GH
- leaflet interactive mapping website
- mapview interactive mapping website
Data I/O
# load packages
pkgs <- c('data.table', 'dplyr', 'rgdal', 'rgeos', 'sp', 'sf', 'raster', 'mapedit', 'ggplot2', 'tmap', 'leaflet', 'mapview')
lapply(pkgs, require, char = TRUE)
# Read in a vector spatial points (in the form of a dataframe with two columns `x_lon` and `y_lat` mapping to coordinates) using sp, adding the [WGS84 CRS](https://epsg.io/4326)
sp1 <- read.csv('poi.csv')
coordinates(sp1) <- ~x_lon+y_lat
proj4string(dts) <- CRS("+init=epsg:4326")
# Read in a vector spatial points using sf
sf1 <- st_as_sf('poi.csv', coords = c('x_lon', 'y_lat'), crs = 4326)
# Read in a vector polygons shapefile using rgdal for sp object
sp2 <- readOGR('path/to/file', 'areas')
# Read in a vector polygons shapefile using sf
sp2 <- st_read('areas.shp')
# convert an sf object into an sp object (as sp has been so important for more than a decade, it'll remain a legacy for a while. For example, raster+vector analysis can't be done over *sf* objects)
sp2 <- as(sf2, Class = 'Spatial')
# convert an sp object into an sf object
sf2 <- st_as_sf(sp2)
# write an sp object
writeOGR(sp1, )
# write an sf Points object (notice that without the `layer_options` argument the output won't include the coordinates, but only the attributes)
st_write(sf1, layer_options = "GEOMETRY=AS_XY")
# write an sf Polygon object
st_write(sf2)
# Read in a raster single layer tiff file using raster
s3 <- raster('grid.tif')
# plot an object using R core
plot(sf1)
# plot multiple objects together using R core (remember to run the code for all layers at once rather than line-by-line)
plot(sf1)
plot(sf2, add = TRUE)
plot(s3, add = TRUE)
...
# Read in a raster multi band tiff file using raster
s4 <- brick('col_field.tif')
# plot multiple layers in a raster brick in the same chart using raster
plotRGB(s4)
# Join a spatial object with a dataframe (= add attributes to a spatial object)
# for more maps use tmap
tm_shape(s1) +
tm_polygons(alpha = 0.5) +
tm_shape(s3) +
tm_rgb()
The sf package
Vector data are ...
Their structure can take one of three form: point, line, polygon, or any collection of them. In the case of polygons there can even be a collection of collections.
-
st_read('/path/to/file/fname.ext')
The function can recognize lots of different file types, inferring the right one from the file extension.
The output is a sort of dataframe with a metadata header, containing information about the spatial structure of the features, and an additional geometry column containing the actual spatial references. This means, in particular, that sf objects can be manipulated natively in the tidyverse. Notice though that when selecting columns, the geometry column is carried over implicitly.
We saw that geometry is a list column. A list-column behaves, to a certain extent, like any other R column. The main difference is that instead of an atomic value, each observation in that column is a piece of an R list and this list can be as complex as needed. The list column allows you to store far more information in a single variable, andsftakes advantage of this by storing all geographic information for each feature in the list. -
st_write(s, '/path/to/file/fname.ext') -
st_geometry(s)extract only the spatial feature from thesfobject. This can be useful in order to plot (only) the outlines of the spatial object, as otherwise the plot will draw a choropleth for every attribute in the table. -
st_sf(s) -
st_crs(s)get/set the coordinate reference system (CRS). When the geographic data you read in with sf already has a CRS defined, sf will recognize and retain it. When the CRS is not defined you will need to define it yourself using either an EPSG code, or a proj4 string, a full set of parameters spelled out in a string. See: IOGP's EPSG Geodetic Parameter Dataset and Registry, SpatialReference, EPSG.io: Coordinate Systems Worldwide.
It is also worth noting that some spatial operations require the involved spatial object(s) to have a projected CRS, for the operation to work or the result to be meaningful. An unprojected CRS uses latitude and longitude coordinates, and
references the earth as a three-dimensional object. A projected CRS uses X and Y coordinates as a two-dimensional
representation of the earth. To determine if an object has a projected CRS, simply look at the value set for +proj= at the start of the projection string; if it is longlat, then the corresponding CRS is unprojected. -
st_transform(s, crs = )transform (or project) the spatial object to the CRS specified with thecrsargument, that can be specified as:- a EPSG code,
- a proj4 string,
- a CRS of another spatial object. In order to perform any spatial analysis with multiple layers, all involved layers should share the same CRS (for example, if the layers do not share a common CRS, they may not correctly align on a plot).
For more information on the required conversions and transformations formulas, see this document.
-
Indexing, subsetting, filtering, and augmenting an sf object, through recoding, transformations or merging, could be easily done with
dplyras if the spatial object were instead a standard tibble. One important difference is that the resulting data frames will always include the geometry variable by default, unless it's explicitly dropped using one of the following commands:st_set_geometry(s, NULL) set_geometry(s) <- NULLthat removes the geometry from the original spatial object, coercing it to a data.frame.
-
st_simplify(s, preserveTopology = TRUE, dTolerance = t)Often, for the purpose of statistical mapping, the much detailed geographical maps are not only ..., but also ineffective. The process of reducing the details of map This is a good way to reduce object size for computation or quick display on the web. Most times the user will barely see the difference, unlesspreserveTopology = TRUEallows the borders to stay aligned,dTolerance =set the amount of simplification allowed in meters: the higher the value, the more simplified will be the result.
-
st_cast(s, ) -
To count the number of vertices for every polygon in a multipolygon spatial object:
ss <- st_cast(s$geometry, "MULTIPOINT") ss <- sapply(ss, length) sum(ss)
For the following spatial analysis functionalities, any layer involved should use a projected CRS, and if multiple layers are involved in the same analysis, all of them should have the same CRS.
Measures
-
st_area(s)returns the area of each feature in s. Notice that the the result is a units object, and requires some additional processing, likeunclassit, in order to use it in subsequent calculations -
st_length(s)returns the perimeter of each features in s -
st_distance(s1, s2, by_element = , )calculates the distances between every combination of the features of boths1ands2. Provides a useful feature-to-feature distance matrix as output and can be used for most distance calculation needs.
Unary Operations
-
st_buffer(s, dist = )Calculates a buffer layers ofdistradius, around every point ins. The units of dist is the projection unit. -
st_centroid(s, of_largest_polygon = FALSE)Returns the geometric centroid for every polygon ins. Whenof_largest_polygonis set to TRUE, return centroid of the largest (sub)polygon of a MULTIPOLYGON rather than of the whole MULTIPOLYGON -
st_bboxReturns the coordinates for the bounding box of any vector spatial object, which is the smallest rectangle surrounding every feature included in the object. These are helpful to create polygons to clip layers to a common area or identify regions of influence. -
st_make_grid(s, n = )Subdivides any spatial objectsusing a grid ofn * nequal squares. Whenn = 1, returns the polygon corresponding to the bounding box ofs. Ifn = c(n1, n2)the grid is made up ofn1 * n2equal rectangles. -
st_combine(s) -
st_convex_hull(s)returns the convex hull of aMULTI*feature, which is the smallest convex set that contains all the features in the collection. Notice thatst_convex_hullcomputes a tight box around each one of your features individually so if you want to create a convex hull around a group of features you'll need to use firstst_union()to combine individual features into a single multi-feature. -
st_boundary(x) -
st_simplify(x, preserveTopology = FALSE, dTolerance = 0) -
st_triangulate(x, dTolerance = 0, bOnlyEdges = FALSE) -
st_voronoi(x, envelope, dTolerance = 0, bOnlyEdges = FALSE) -
st_polygonize(x) -
st_line_merge(x) -
st_point_on_surface(x) -
st_node(x) -
st_segmentize(x, dfMaxLength, ...)
Binary Operations
The output of most of these and related functions that test relationships between two sets of features, is a special kind of list (of class sgbp) whose first element [[1]] shows the features from the second object that acts, according to the function used, on the first object. Likewise, [[2]] would show the features from the first object that acts on the second object.
-
st_union(s1, s2, ..., by_feature = FALSE)dissolve multiple features into one. If the elements of the collection are distinctPOINT, or non-overlappingPOLYGON, the resulting feature is always aMULTI*, that can be used for subsequent tasks like computing the convex hull. -
st_intersects(s1, s2)returns the indices of the features ins2that intersect withs1 -
st_intersection(s1, s2)clipss2based on the spatial shape ofs1 -
st_contains(s1, s2)returns the indices of the features ins2that are completely inside any feature ins1 -
st_disjoint(s1, s2 = s1) -
st_touches(s1, s2) -
st_crosses(s1, s2) -
st_within(s1, s2) -
st_contains_properly(s1, s2) -
st_overlaps(s1, s2) -
st_equals(s1, s2) -
st_covers(s1, s2) -
st_covered_by(s1, s2) -
st_equals_exact(s1, s2, par) -
st_is_within_distance(s1, s2, dist) -
st_difference(s1, s2) -
st_sym_difference(s1, s2) -
st_snap(s1, s2, tolerance) -
st_join(s1, s2)for spatial joins, where all involved objects are of spatial type, and are linked using their spatial features. This is used when there is no common id that can link the features in the usual way. -
st_spatial relationship
Mapping
ggplot2
-
geom_sfoperates over an sf object like any other layer in ggplot2, mapping variables from the data into aesthetics on the map through theaesfunction. Leaving off the aesthetic mapping will map the geometry alone. -
stat_sf -
coord_sf
tmap
tmap operates similarly to ggplot2, in the sense that it starts with a function to set up the map, in this case tm_shape(data), and then is followed by layers and other instructions separated with the plus sign. There are a few dedicated geometries according to the type of the spatial object:
tm_dotsortm_bubblesfor POINTStm_linesfor LINEStm_polygonsfor POLYGONStm_rgbfor RASTERS
To build a choropleth map just add the argumentcolwith a char vector identifying the variable(s) to be mapped.
The raster package
When working with raster data, one of the most important things to keep in mind is that the raw data can be of two format:
-
single-band rasters with a single layer of values, where each cell value represents some attribute at that location
-
multi-band rasters with more than one layer, where each cell group of values represents different aspects of the same attribute, or an attribute at different points in time (the typical example is a color aerial photo in which there would be one band each representing red, green or blue light).
-
raster('/path/to/file/fname.ext')to read single-band rasters -
brick('/path/to/file/fname.ext')to read a multi-band raster
Instead of storing raster objects in data frames, the raster package stores spatial data and metadata in specially designed R classes that contain slots where the data and metadata are stored. The data and metadata can be accessed using a suite of dedicated functions
-
nlayers(s)the number of bands/layers -
extent(s)the bounding box -
crs(s)the coordinate reference system -
ncell(s)the number of grid cells -
res(rst)returns the resolution of the grid -
inMemory(s)tests if the raster values are loaded in memory. Raster data can be very big, depending on the extent and resolution (grid size). In order to deal with this, both the raster() and brick() functions are designed to only read in the actual raster values as needed, partly or the full set, to save space in memory. The raster values will be read in by default when performing spatial analysis operations that require it. -
getValues(s)to manually read in memory the values for a raster already defined. -
plotRGB(s)plot all the raster layers in a brick as a single image. Using the coreplotfunction with a multi-band raster to draw as many charts as the number of layers in the raster. -
crs(s)get/set the coordinate reference system (CRS) -
projectRaster(s, crs = '', method = c('ngb'))transform the CRS associated with the spatial object to another one indicated by the argumentcrs. The new CRS can be- a proj4 string,
- a EPSG code, in the form "+init=epsg:xxxxx",
- a CRS of another spatial object. In this case, the parameter
asText = TRUEis needed to force thecrs()function to output the CRS as a string.
-
'aggregate(rst, fact = 2, fun = mean)
reduces the complexity (resolution) of the grid by a factor equal tofact, usingfunto calculate the value of each output cells. The higher the value offact`, the more pixelated the result. -
reclassify(rst, rcl = M)allows to recode the grid cells using the rules found along the rows of the matrixM, which could be structured with 2 or 3 columns:- In the former case, the first element of each row represent the old value, while the second value represent the new one.
- When in the 3-cols form, the first and second elements indicate the min and respectively the max of the range of values to be set at the value represented by the third element.
Raster + vector analysis
-
mask(rst, mask = sp)limit the scope of a raster using the shape of a vector layer as mask. Any raster cells outside of the vector object are assigned NA values. The result is a raster object whose extent is not changed. Remember that both layers must have the same CRS. -
crop(rst, sp)the output raster is the result of cutting the extent of the input raster using the bounding box of the vector layer, so that the two object share the same extent. Notice that no raster cells are set to NA. -
extract(rst, sp, fun = )extract the values from the raster object corresponding to the location in the vector object, applyingfunif necessary for neighborhood aggregation -
overlay(rst1, rst2, fun = )
Use Case: Trees streets density and average canopy comparison in New York Neighborhoods
Data available are:
neighborhoodsboundaries as vector Polygonstreesfor the city as vector Pointscanopyfor the city as raster
# Add areas to the neighborhoods object (need to unclass!)
neighborhoods <- mutate(neighborhoods, area = unclass(st_area(neighborhoods)))
# Compute the counts of all trees by hood, possibly renaming the count field and removing the geometry
trees <- count(trees, hood) %>% st_set_geometry(NULL) %>% rename(n_trees = n)
# Join neighborhoods and counts
neighborhoods <- left_join(neighborhoods, trees, by = "hood")
# Replace NA values with 0
neighborhoods <- mutate(neighborhoods, trees = ifelse(is.na(trees), 0, trees))
# Compute the tree density
neighborhoods <- mutate(neighborhoods, tree_density = n_tree / area)
# Transform the neighborhoods CRS to match the canopy layer (This is actually not strictly necessary because extract() can transform CRS on the fly. But it will be needed for plotting and other operations later)
neighborhoods <- st_transform(neighborhoods, crs = crs(canopy, asText = TRUE))
# Convert neighborhoods object to a Spatial sp object
neighborhoods <- as(neighborhoods, 'Spatial')
# Compute the mean of canopy values by neighborhood, and add it to the neighborhoods object
neighborhoods <- mutate(neighborhoods, avg_canopy = extract(canopy, neighborhoods, fun = mean) )
# Compute the correlation between tree density and average canopy
cor(neighborhoods$tree_density, neighborhoods$avg_canopy)
# Create a choropleth (color-coded) map of the tree density using ggplot
ggplot(neighborhoods) +
geom_sf(aes(fill = tree_density)) +
scale_fill_gradient(low = "#edf8e9", high = "#005a32")
# Create a choropleth map of the average canopy using tmap
tm_shape(neighborhoods) +
tm_polygons('avg_canopy',
palette = 'Greens',
style = 'quantile', n = 7,
title = 'Average tree canopy (%)'
)
# Create a choropleth map of both the tree densities and average canopy using tmap
tm_shape(neighborhoods) +
tm_polygons(c('tree_density', 'avg_canopy'),
palette = 'Greens',
style = 'quantile', n = 7,
title = c('Tree Density', 'Average tree canopy (%)')
)
# Create a raster aerial photo of Manhattan with the neighborhoods overlaying for comparison
tm_shape(manhattan) +
tm_rgb() +
tm_shape(neighborhoods) +
tm_polygons(col = "black", lwd = 0.5, alpha = 0.5)
# Combine the last two maps into a single output
tmap_arrange(tm1, tm2, asp = NA)