Spatial Analysis in R with sf and raster


        
                Markdown 19.05 KB
                                    
                        | None                    
                
                                        |
    0    0                            

            
                                    raw
                    download
                    clone
                    embed
                    print
                
                                    report
                
                
            Packages:

data.table  data wrangling based on high performance [website]() 
dplyr       data wrangling, using the tidy data approach [website]()
rgdal       bindings for GDAL and PROJ4 libraries
rgeos       
sp          1st gen (2003) vector spatial data I/O and manipulation website
sf          2nd gen (2015) vector spatial data I/O and manipulation website
raster      raster spatial data I/O and manipulation [website]()
mapedit     interactive editing for spatial objects blogpost-1 blogpost-2
ggplot2     generic data visualization, including some mapping functionalities, using the Grammar of Graphics approach website
tmap        static mapping, using the same Grammar of Graphics layers' approach as ggplot2 GH
leaflet     interactive mapping website
mapview     interactive mapping website

Data I/O
# load packages
pkgs <- c('data.table', 'dplyr', 'rgdal', 'rgeos', 'sp', 'sf', 'raster', 'mapedit', 'ggplot2', 'tmap', 'leaflet', 'mapview')
lapply(pkgs, require, char = TRUE)

# Read in a vector spatial points (in the form of a dataframe with two columns `x_lon` and `y_lat` mapping to coordinates) using sp, adding the [WGS84 CRS](https://epsg.io/4326)
sp1 <- read.csv('poi.csv')
coordinates(sp1) <- ~x_lon+y_lat
proj4string(dts) <- CRS("+init=epsg:4326")

# Read in a vector spatial points using sf
sf1 <- st_as_sf('poi.csv', coords = c('x_lon', 'y_lat'), crs = 4326)

# Read in a vector polygons shapefile using rgdal for sp object
sp2 <- readOGR('path/to/file', 'areas')

# Read in a vector polygons shapefile using sf
sp2 <- st_read('areas.shp')

# convert an sf object into an sp object (as sp has been so important for more than a decade, it'll remain a legacy for a while. For example, raster+vector analysis can't be done over *sf* objects)
sp2 <- as(sf2, Class = 'Spatial')

# convert an sp object into an sf object
sf2 <- st_as_sf(sp2)

# write an sp object
writeOGR(sp1, )

# write an sf Points object (notice that without the `layer_options` argument the output won't include the coordinates, but only the attributes)
st_write(sf1, layer_options = "GEOMETRY=AS_XY")

# write an sf Polygon object
st_write(sf2)

# Read in a raster single layer tiff file using raster
s3 <- raster('grid.tif')

# plot an object using R core
plot(sf1)

# plot multiple objects together using R core (remember to run the code for all layers at once rather than line-by-line)
plot(sf1)
plot(sf2, add = TRUE)
plot(s3, add = TRUE)
...

# Read in a raster multi band tiff file using raster
s4 <- brick('col_field.tif')

# plot multiple layers in a raster brick in the same chart using raster
plotRGB(s4)

# Join a spatial object with a dataframe (= add attributes to a spatial object)

# for more maps use tmap
tm_shape(s1) + 
    tm_polygons(alpha = 0.5) +
    tm_shape(s3) + 
    tm_rgb()
The sf package
Vector data are ...

Their structure can take one of three form: point, line, polygon, or any collection of them. In the case of polygons there can even be a collection of collections.


st_read('/path/to/file/fname.ext')

The function can recognize lots of different file types, inferring the right one from the file extension.

The output is a sort of dataframe with a metadata header, containing information about the spatial structure of the features, and an additional geometry column containing the actual spatial references. This means, in particular, that sf objects can be manipulated natively in the tidyverse. Notice though that when selecting columns, the geometry column is carried over implicitly.

We saw that geometry is a list column. A list-column behaves, to a certain extent, like any other R column. The main difference is that instead of an atomic value, each observation in that column is a piece of an R list and this list can be as complex as needed. The list column allows you to store far more information in a single variable, and sf takes advantage of this by storing all geographic information for each feature in the list.


st_write(s, '/path/to/file/fname.ext')


st_geometry(s) extract only the spatial feature from the sf object. This can be useful in order to plot (only) the outlines of the spatial object, as otherwise the plot will draw a choropleth for every attribute in the table.


st_sf(s) 


st_crs(s) get/set the coordinate reference system (CRS). When the geographic data you read in with sf already has a CRS defined, sf will recognize and retain it. When the CRS is not defined you will need to define it yourself using either an EPSG code, or a proj4 string, a full set of parameters spelled out in a string. See: IOGP's EPSG Geodetic Parameter Dataset and Registry, SpatialReference, EPSG.io: Coordinate Systems Worldwide.

It is also worth noting that some spatial operations require the involved spatial object(s) to have a projected CRS, for the operation to work or the result to be meaningful. An unprojected CRS uses latitude and longitude coordinates, and

references the earth as a three-dimensional object. A projected CRS uses X and Y coordinates as a two-dimensional

representation of the earth. To determine if an object has a projected CRS, simply look at the value set for +proj= at the start of the projection string; if it is longlat, then the corresponding CRS is unprojected.


st_transform(s, crs = ) transform (or project) the spatial object to the CRS specified with the crs argument, that can be specified as:

a EPSG code, 
a proj4 string, 
a CRS of another spatial object. In order to perform any spatial analysis with multiple layers, all involved layers should share the same CRS (for example, if the layers do not share a common CRS, they may not correctly align on a plot).

For more information on the required conversions and transformations formulas, see this document.


Indexing, subsetting, filtering, and augmenting an sf object, through recoding, transformations or merging, could be easily done with dplyr as if the spatial object were instead a standard tibble. One important difference is that the resulting data frames will always include the geometry variable by default, unless it's explicitly dropped using one of the following commands:
st_set_geometry(s, NULL) 
set_geometry(s) <- NULL
that removes the geometry from the original spatial object, coercing it to a data.frame.


st_simplify(s, preserveTopology = TRUE, dTolerance = t) Often, for the purpose of statistical mapping, the much detailed geographical maps are not only ..., but also ineffective. The process of reducing the details of map This is a good way to reduce object size for computation or quick display on the web. Most times the user will barely see the difference, unless  

preserveTopology = TRUE allows the borders to stay aligned, 
dTolerance = set the amount of simplification allowed in meters: the higher the value, the more simplified will be the result.


st_cast(s, )


To count the number of vertices for every polygon in a multipolygon spatial object:
ss <- st_cast(s$geometry, "MULTIPOINT")
ss <- sapply(ss, length)
sum(ss)


For the following spatial analysis functionalities, any layer involved should use a projected CRS, and if multiple layers are involved in the same analysis, all of them should have the same CRS.
Measures


st_area(s) returns the area of each feature in s. Notice that the the result is a units object, and requires some additional processing, like unclass it, in order to use it in subsequent calculations


st_length(s) returns the perimeter of each features in s


st_distance(s1, s2, by_element = , ) calculates the distances between every combination of the features of both s1 and s2. Provides a useful feature-to-feature distance matrix as output and can be used for most distance calculation needs. 


Unary Operations


st_buffer(s, dist = ) Calculates a buffer layers of dist radius, around every point in s. The units of dist is the projection unit.


st_centroid(s, of_largest_polygon = FALSE) Returns the geometric centroid for every polygon in s. When of_largest_polygon is set to TRUE, return centroid of the largest (sub)polygon of a MULTIPOLYGON rather than of the whole MULTIPOLYGON


st_bbox Returns the coordinates for the bounding box of any vector spatial object, which is the smallest rectangle surrounding every feature included in the object. These are helpful to create polygons to clip layers to a common area or identify regions of influence.


st_make_grid(s, n = ) Subdivides any spatial object s using a grid of n * n equal squares. When n = 1, returns the polygon corresponding to the bounding box of s. If n = c(n1, n2) the grid is made up of n1 * n2 equal rectangles.


st_combine(s)


st_convex_hull(s) returns the convex hull of a MULTI* feature, which is the smallest convex set that contains all the features in the collection. Notice that st_convex_hull computes a tight box around each one of your features individually so if you want to create a convex hull around a group of features you'll need to use first st_union() to combine individual features into a single multi-feature.


st_boundary(x)


st_simplify(x, preserveTopology = FALSE, dTolerance = 0)


st_triangulate(x, dTolerance = 0, bOnlyEdges = FALSE)


st_voronoi(x, envelope, dTolerance = 0, bOnlyEdges = FALSE)


st_polygonize(x)


st_line_merge(x)


st_point_on_surface(x)


st_node(x)


st_segmentize(x, dfMaxLength, ...)


Binary Operations
The output of most of these and related functions that test relationships between two sets of features, is a special kind of list (of class sgbp) whose first element [[1]] shows the features from the second object that acts, according to the function used, on the first object. Likewise, [[2]] would show the features from the first object that acts on the second object.


st_union(s1, s2, ..., by_feature = FALSE) dissolve multiple features into one. If the elements of the collection are distinct POINT, or non-overlapping POLYGON, the resulting feature is always a MULTI*, that can be used for subsequent tasks like computing the convex hull.


st_intersects(s1, s2) returns the indices of the features in s2 that intersect with s1


st_intersection(s1, s2) clips s2 based on the spatial shape of s1


st_contains(s1, s2) returns the indices of the features in s2 that are completely inside any feature in s1


st_disjoint(s1, s2 = s1) 


st_touches(s1, s2) 


st_crosses(s1, s2) 


st_within(s1, s2) 


st_contains_properly(s1, s2) 


st_overlaps(s1, s2) 


st_equals(s1, s2) 


st_covers(s1, s2) 


st_covered_by(s1, s2) 


st_equals_exact(s1, s2, par) 


st_is_within_distance(s1, s2, dist) 


st_difference(s1, s2)


st_sym_difference(s1, s2)


st_snap(s1, s2, tolerance)


st_join(s1, s2) for spatial joins, where all involved objects are of spatial type, and are linked using their spatial features. This is used when there is no common id that can link the features in the usual way.


st_ spatial relationship


Mapping
ggplot2


geom_sf operates over an sf object like any other layer in ggplot2, mapping variables from the data into aesthetics on the map through the aes function. Leaving off the aesthetic mapping will map the geometry alone.


stat_sf


coord_sf


tmap
tmap operates similarly to ggplot2, in the sense that it starts with a function to set up the map, in this case tm_shape(data), and then is followed by layers and other instructions separated with the plus sign. There are a few dedicated geometries according to the type of the spatial object:

tm_dots or tm_bubbles for POINTS
tm_lines for LINES
tm_polygons for POLYGONS
tm_rgb for RASTERS

To build a choropleth map just add the argument col with a char vector identifying the variable(s) to be mapped.

The raster package
When working with raster data, one of the most important things to keep in mind is that the raw data can be of two format:


single-band rasters with a single layer of values, where each cell value represents some attribute at that location


multi-band rasters with more than one layer, where each cell group of values represents different aspects of the same attribute, or an attribute at different points in time (the typical example is a color aerial photo in which there would be one band each representing red, green or blue light).


raster('/path/to/file/fname.ext') to read single-band rasters


brick('/path/to/file/fname.ext') to read a multi-band raster


Instead of storing raster objects in data frames, the raster package stores spatial data and metadata in specially designed R classes that contain slots where the data and metadata are stored. The data and metadata can be accessed using a suite of dedicated functions


nlayers(s) the number of bands/layers 


extent(s) the bounding box


crs(s) the coordinate reference system 


ncell(s) the number of grid cells


res(rst) returns the resolution of the grid


inMemory(s) tests if the raster values are loaded in memory. Raster data can be very big, depending on the extent and resolution (grid size). In order to deal with this, both the raster() and brick() functions are designed to only read in the actual raster values as needed, partly or the full set, to save space in memory. The raster values will be read in by default when performing spatial analysis operations that require it.


getValues(s) to manually read in memory the values for a raster already defined.


plotRGB(s) plot all the raster layers in a brick as a single image. Using the core plot function with a multi-band raster to draw as many charts as the number of layers in the raster.


crs(s) get/set the coordinate reference system (CRS)


projectRaster(s, crs = '', method = c('ngb')) transform the CRS associated with the spatial object to another one indicated by the argument crs. The new CRS can be 

a proj4 string, 
a EPSG code, in the form "+init=epsg:xxxxx", 
a CRS of another spatial object. In this case, the parameter asText = TRUE is needed to force the crs() function to output the CRS as a string.


'aggregate(rst, fact = 2, fun = mean)reduces the complexity (resolution) of the grid by a factor equal tofact, usingfunto calculate the value of each output cells. The higher the value offact`, the more pixelated the result.


reclassify(rst, rcl = M) allows to recode the grid cells using the rules found along the rows of the matrix M, which could be structured with 2 or 3 columns:

In the former case, the first element of each row represent the old value, while the second value represent the new one.
When in the 3-cols form, the first and second elements indicate the min and respectively the max of the range of values to be set at the value represented by the third element. 


Raster + vector analysis


mask(rst, mask = sp) limit the scope of a raster using the shape of a vector layer as mask. Any raster cells outside of the vector object are assigned NA values. The result is a raster object whose extent is not changed. Remember that both layers must have the same CRS.


crop(rst, sp) the output raster is the result of cutting the extent of the input raster using the bounding box of the vector layer, so that the two object share the same extent. Notice that no raster cells are set to NA.


extract(rst, sp, fun = ) extract the values from the raster object corresponding to the location in the vector object, applying fun if necessary for neighborhood aggregation 


overlay(rst1, rst2, fun = ) 


Use Case: Trees streets density and average canopy comparison in New York Neighborhoods
Data available are:

neighborhoods boundaries as vector Polygons
trees for the city as vector Points
canopy for the city as raster

# Add areas to the neighborhoods object (need to unclass!)
neighborhoods <- mutate(neighborhoods, area = unclass(st_area(neighborhoods)))
# Compute the counts of all trees by hood, possibly renaming the count field and removing the geometry
trees <- count(trees, hood) %>% st_set_geometry(NULL) %>% rename(n_trees = n)
# Join neighborhoods and counts
neighborhoods <- left_join(neighborhoods, trees, by = "hood")
# Replace NA values with 0
neighborhoods <- mutate(neighborhoods, trees = ifelse(is.na(trees), 0, trees))
# Compute the tree density
neighborhoods <- mutate(neighborhoods, tree_density = n_tree / area)
# Transform the neighborhoods CRS to match the canopy layer (This is actually not strictly necessary because extract() can transform CRS on the fly. But it will be needed for plotting and other operations later)
neighborhoods <- st_transform(neighborhoods, crs = crs(canopy, asText = TRUE))
# Convert neighborhoods object to a Spatial sp object
neighborhoods <- as(neighborhoods, 'Spatial')
# Compute the mean of canopy values by neighborhood, and add it to the neighborhoods object
neighborhoods <- mutate(neighborhoods, avg_canopy = extract(canopy, neighborhoods, fun = mean) )
# Compute the correlation between tree density and average canopy
cor(neighborhoods$tree_density, neighborhoods$avg_canopy)
# Create a choropleth (color-coded) map of the tree density using ggplot
ggplot(neighborhoods) + 
  geom_sf(aes(fill = tree_density)) + 
  scale_fill_gradient(low = "#edf8e9", high = "#005a32")
# Create a choropleth map of the average canopy using tmap
tm_shape(neighborhoods) + 
    tm_polygons('avg_canopy', 
        palette = 'Greens', 
        style = 'quantile', n = 7, 
        title = 'Average tree canopy (%)'
    )
# Create a choropleth map of both the tree densities and average canopy using tmap
tm_shape(neighborhoods) + 
    tm_polygons(c('tree_density', 'avg_canopy'), 
        palette = 'Greens', 
        style = 'quantile', n = 7, 
        title = c('Tree Density', 'Average tree canopy (%)')
    )
# Create a raster aerial photo of Manhattan with the neighborhoods overlaying for comparison
tm_shape(manhattan) + 
    tm_rgb() + 
    tm_shape(neighborhoods) + 
    tm_polygons(col = "black", lwd = 0.5, alpha = 0.5)
# Combine the last two maps into a single output
tmap_arrange(tm1, tm2, asp = NA)