::p_load(sf, terra, gstat, automap,
pacman tmap, viridis, tidyverse)
In-class Exercise 7
Overview
isohyet map is a surface map of the same precipitation: rain, snow, and others. Figure below is an isohyet map showing to spatial distribution of total monthly rainfall of Singapore in February 2024.
In order to prepare an isohyet map, spatial interpolation will be used. Spatial interpolation is the process of using points with known values to estimate values at other unknown points. For example, to make a rainfall above, we will not find enough evenly spread weather stations to cover the entire region. Spatial interpolation can estimate the rainfall at locations without recorded data by using known rainfall readings at nearby weather stations (see figure_temperature_map). This type of interpolated surface is often called a geostatistical surface. Elevation data, temperature, property prices, air quality index and population density are other types of data that can be computed using interpolation.
There are many interpolation methods. In this hands-on exercise, two widely used spatial interpolation methods called Inverse Distance Weighting (IDW) and kriging will be introduce. If you are looking for additional interpolation methods, please refer to the ‘Further Reading’ section at the end of this topic.
Getting Started
In this in-class exercise, beside tidyverse, viridis, sf and tmap libaries, two new R packages will be used, they are:
terra is a replacement of the raster package. It has a very similar, but simpler, interface, and it is faster than raster. In this hands-on exercise, it will be used to create grid (also known as raster) objects as the input and output of spatial interpolation.
gstat, an r packages for spatial and spatio-temporal geostatistical modelling, prediction and simulation. In this in-class exercise, it will be used to perform spatial interpolation.
automap, an r package for performing automatic variogram modelling and kriging interpolation.
The Data
Three data sets will be used in this exercise, they are:
- RainfallStation.csv provides location information of existing rainfall stations in Singapore. The data is downloaded from Meteological Service Singapore.
- DAILYDATA_202402.csv provides weather data are rainfall stations for the month February, 2024. The data is also downloaded from Meteological Service Singapore.
- MPSZ-2019 contains planning subzone boundary of URA Master Plan 2019. It is downloaded from data.gov.sg. The original data is in kml format.
Importing rainfall station data
In the code chunk below, read_csv()
of readr package is used to import RainfallStation.csv. rfstations, the output object is in tibble data.frame format.
<- read_csv("data/aspatial/RainfallStation.csv") rfstations
Importing rainfall record data
In the code chunk below, read_csv()
of readr package is used to import DAILYDATA_202402.csv. rfdata, the output object is in tibble data.frame format.
<- read_csv("data/aspatial/DAILYDATA_202402.csv") %>%
rfdata select(c(1, 5)) %>%
group_by(Station) %>%
summarise(MONTHSUM = sum(`Daily Rainfall Total (mm)`)) %>%
ungroup()
- select() of dplyr package is used to retain column 1 and 5 of the input data.
- group_by() and summarise() of dplyr package are used to compute the total monthly rainfall from Daily Rainfall Total (mm) field. The output is stored in a new field called MONTHSUM.
Converting aspatial data into geospatial data
Next, left_join()
of dplyr is used to join rfstations to rfdata by using the code chunk below.
<- rfdata %>%
rfdata left_join(rfstations)
Because Station field is available in both rfstations and rfdata, by()
argument of left_join()
is not needed.
In the code chunk below, st_as_sf()
of sf package is used to convert rfdata into a simple feature data.frame object called rfdata_sf.
<- st_as_sf(rfdata,
rfdata_sf coords = c("Longitude",
"Latitude"),
crs= 4326) %>%
st_transform(crs = 3414)
- For
coords
argument, it is important to map the X (i.e. Longitude) first, then follow by the Y (i.e. Latitude). crs = 4326
indicates that the source data is in wgs84 coordinates system.st_transform()
of sf package is then used to transform the source data from wgs84 to svy21 projected coordinates system.- svy21 is the official projected coordinates of Singapore. 3414 is the EPSG code of svy21.
Importing planning subzone boundary data
In the code chunk below, st_read()
of sf package is used to import MPSZ-2019 shapefile into R. The output is called mpsz2019. It is in polygon feature tibble data.frame format.
<- st_read(dsn = "data/geospatial",
mpsz2019 layer = "MPSZ-2019") %>%
st_transform(crs = 3414)
Reading layer `MPSZ-2019' from data source
`C:\AlexeiJason\ISSS608-VAA\In-class_Ex\In-class_Ex07\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS: WGS 84
- The source data is in wgs84 coordinates system, hence
st_tranform()
of sf package is used to theo output sf data.frame into svy21 project coordinates system.
Visualising the data prepared
It is always a good practice to visualise the data prepared. In the code chunk below, tmap functions are used to create a dot map showing locations of rainfall station in Singapore.
tmap_options(check.and.fix = TRUE)
tmap_mode("view")
tm_shape(rfdata_sf) +
tm_dots(col = "red")
tmap_mode("plot")
In the code chunk below, tmap functions are used to create a quantitative dot map of rainfall distribution by rainfall station in Singaspore.
tmap_options(check.and.fix = TRUE)
tmap_mode("view")
tm_shape(mpsz2019) +
tm_borders() +
tm_shape(rfdata_sf) +
tm_dots(col = 'MONTHSUM')
tmap_mode("plot")
Spatial Interpolation: gstat method
In this section, you will gain hands-on experience on performing spatial interpolation by using gstat package. In order to perform spatial interpolation by using gstat, we first need to create an object of class called gstat, using a function of the same name: gstat
. A gstat object contains all necessary information to conduct spatial interpolation, namely:
- The model definition
- The calibration data
Based on its arguments, the gstat function “understands” what type of interpolation model we want to use:
- No variogram model → IDW
- Variogram model, no covariates → Ordinary Kriging
- Variogram model, with covariates → Universal Kriging
The complete decision tree of gstat
, including several additional methods which we are not going to use, is shown in the figure below.
Data preparation
To getting start, we need create a grid data object by using rast()
of terra package as shown in the cod chunk below.
<- terra::rast(mpsz2019,
grid nrows = 690,
ncols = 1075)
grid
class : SpatRaster
dimensions : 690, 1075, 1 (nrow, ncol, nlyr)
resolution : 49.98037, 50.01103 (x, y)
extent : 2667.538, 56396.44, 15748.72, 50256.33 (xmin, xmax, ymin, ymax)
coord. ref. : SVY21 / Singapore TM (EPSG:3414)
Next, a list called xy will be created by using xyFromCell()
of terra package.
<- terra::xyFromCell(grid,
xy 1:ncell(grid))
head(xy)
x y
[1,] 2692.528 50231.33
[2,] 2742.509 50231.33
[3,] 2792.489 50231.33
[4,] 2842.469 50231.33
[5,] 2892.450 50231.33
[6,] 2942.430 50231.33
xyFromCell()
gets coordinates of the center of raster cells for a row, column, or cell number of a SpatRaster. Or get row, column, or cell numbers from coordinates or from each other.
Lastly, we will create a data frame called coop with prediction/simulation locations by using the code chunk below.
<- st_as_sf(as.data.frame(xy),
coop coords = c("x", "y"),
crs = st_crs(mpsz2019))
<- st_filter(coop, mpsz2019)
coop head(coop)
Simple feature collection with 6 features and 0 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 25883.42 ymin: 50231.33 xmax: 26133.32 ymax: 50231.33
Projected CRS: SVY21 / Singapore TM
geometry
1 POINT (25883.42 50231.33)
2 POINT (25933.4 50231.33)
3 POINT (25983.38 50231.33)
4 POINT (26033.36 50231.33)
5 POINT (26083.34 50231.33)
6 POINT (26133.32 50231.33)
Inverse Distance Weighted (IDW)
The method
In the IDW interpolation method, the sample points are weighted during interpolation such that the influence of one point relative to another declines with distance from the unknown point you want to create.
Weighting is assigned to sample points through the use of a weighting coefficient that controls how the weighting influence will drop off as the distance from new point increases. The greater the weighting coefficient, the less the effect points will have if they are far from the unknown point during the interpolation process. As the coefficient increases, the value of the unknown point approaches the value of the nearest observational point.
It is important to notice that the IDW interpolation method also has some disadvantages: the quality of the interpolation result can decrease, if the distribution of sample data points is uneven. Furthermore, maximum and minimum values in the interpolated surface can only occur at sample data points. This often results in small peaks and pits around the sample data points.
Working with gstat
We are going to use three parameters of the gstat function:
- formula: The prediction “formula” specifying the dependent and the independent variables (covariates)
- data: The calibration data
- model: The variogram model
Keep in mind that we need to specify parameter names, because these three parameters are not the first three in the gstat function definition.
For example, to interpolate using the IDW method we create the following gstat object, specifying just the formula and data:
g = gstat(formula = annual ~ 1, data = rainfall)
In R, formula objects are used to specify relation between objects, in particular—the role of different data columns in statistical models. A formula object is created using the ~ operator, which separates names of dependent variables (to the left of the ~ symbol) and independent variables (to the right of the ~ symbol). Writing 1 to the right of the ~ symbol, as in ~ 1, means that there are no independent variables38.
In the code chunk below,
<- gstat(formula = MONTHSUM ~ 1,
res locations = rfdata_sf,
nmax = 5,
set = list(idp = 0))
Spatial interpolation is not a rocket science, students should try to explore the method by changing nmax
argument in order to understand how the final surface map will be affected by different nmax values.
Now that our model is defined, we can use predict()
to actually interpolate, i.e., to calculate predicted values. The predict function accepts:
- A raster—stars object, such as dem
- A model—gstat object, such as g
The raster serves for two purposes:
- Specifying the locations where we want to make predictions (in all methods), and
- Specifying covariate values (in Universal Kriging only).
<- predict(res, coop) resp
[inverse distance weighted interpolation]
$x <- st_coordinates(resp)[,1]
resp$y <- st_coordinates(resp)[,2]
resp$pred <- resp$var1.pred
resp
<- terra::rasterize(resp, grid,
pred field = "pred",
fun = "mean")
Now, we will map the interpolated surface by using tmap functions as shown in the code chunk below.
tmap_options(check.and.fix = TRUE)
tmap_mode("plot")
tm_shape(pred) +
tm_raster(alpha = 0.6,
palette = "viridis")
Kriging
The method
Kriging is one of several methods that use a limited set of sampled data points to estimate the value of a variable over a continuous spatial field. An example of a value that varies across a random spatial field might be total monthly rainfall over Singapore. It differs from Inverse Distance Weighted Interpolation discussed earlier in that it uses the spatial correlation between sampled points to interpolate the values in the spatial field: the interpolation is based on the spatial arrangement of the empirical observations, rather than on a presumed model of spatial distribution. Kriging also generates estimates of the uncertainty surrounding each interpolated value.
In a general sense, the kriging weights are calculated such that points nearby to the location of interest are given more weight than those farther away. Clustering of points is also taken into account, so that clusters of points are weighted less heavily (in effect, they contain less information than single points). This helps to reduce bias in the predictions.
The kriging predictor is an “optimal linear predictor” and an exact interpolator, meaning that each interpolated value is calculated to minimize the prediction error for that point. The value that is generated from the kriging process for any actually sampled location will be equal to the observed value at this point, and all the interpolated values will be the Best Linear Unbiased Predictors (BLUPs).
Kriging will in general not be more effective than simpler methods of interpolation if there is little spatial autocorrelation among the sampled data points (that is, if the values do not co-vary in space). If there is at least moderate spatial autocorrelation, however, kriging can be a helpful method to preserve spatial variability that would be lost using a simpler method (for an example, see Auchincloss 2007, below).
Kriging can be understood as a two-step process:
- first, the spatial covariance structure of the sampled points is determined by fitting a variogram; and
- second, weights derived from this covariance structure are used to interpolate values for unsampled points or blocks across the spatial field.
Kriging methods require a variogram model. A variogram (sometimes called a “semivariogram”) is a visual depiction of the covariance exhibited between each pair of points in the sampled data. For each pair of points in the sampled data, the gamma-value or “semi-variance” (a measure of the half mean-squared difference between their values) is plotted against the distance, or “lag”, between them. The “experimental” variogram is the plot of observed values, while the “theoretical” or “model” variogram is the distributional model that best fits the data.
Working with gstat
Firstly, we will calculate and examine the empirical variogram by using variogram()
of gstat package. The function requires two arguments:
- formula, the dependent variable and the covariates (same as in gstat, see Section 12.2.1)
- data, a point layer with the dependent variable and covariates as attributes
as shown in the code chunk below.
<- variogram(MONTHSUM ~ 1,
v data = rfdata_sf)
plot(v)
We can then compare the plot with the theoretical models below.
With reference to the comparison above, am empirical variogram model will be fitted by using fit.variogram()
of gstat package as shown in the code chunk below.
<- fit.variogram(object = v,
fv model = vgm(
psill = 0.5,
model = "Sph",
range = 5000,
nugget = 0.1))
fv
model psill range
1 Nug 0.1129190 0.000
2 Sph 0.5292397 5213.396
Spatial interpolation is not a rocket science, students should try to explore the method by changing psill
, model
, range
, nugget
arguments in order to understand how the final surface map will be affected by different options used.
We can visualise how well the observed data fit the model by plotting fv using the code chunk below.
plot(v, fv)
The plot above reveals that the empirical model fits rather well. In view of this, we will go ahead to perform spatial interpolation by using the newly derived model as shown in the code chunk below.
<- gstat(formula = MONTHSUM ~ 1,
k data = rfdata_sf,
model = fv)
k
data:
var1 : formula = MONTHSUM`~`1 ; data dim = 43 x 2
variograms:
model psill range
var1[1] Nug 0.1129190 0.000
var1[2] Sph 0.5292397 5213.396
Once we are happy with the results, predict() of gstat package will be used to estimate the unknown grids by using the code chunk below.
<- predict(k, coop) resp
[using ordinary kriging]
$x <- st_coordinates(resp)[,1]
resp$y <- st_coordinates(resp)[,2]
resp$pred <- resp$var1.pred
resp$pred <- resp$pred
resp resp
Simple feature collection with 314019 features and 5 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 2692.528 ymin: 15773.73 xmax: 56371.45 ymax: 50231.33
Projected CRS: SVY21 / Singapore TM
First 10 features:
var1.pred var1.var geometry x y pred
1 131.0667 0.6608399 POINT (25883.42 50231.33) 25883.42 50231.33 131.0667
2 130.9986 0.6610337 POINT (25933.4 50231.33) 25933.40 50231.33 130.9986
3 130.9330 0.6612129 POINT (25983.38 50231.33) 25983.38 50231.33 130.9330
4 130.8698 0.6613782 POINT (26033.36 50231.33) 26033.36 50231.33 130.8698
5 130.8092 0.6615303 POINT (26083.34 50231.33) 26083.34 50231.33 130.8092
6 130.7514 0.6616697 POINT (26133.32 50231.33) 26133.32 50231.33 130.7514
7 130.6965 0.6617971 POINT (26183.3 50231.33) 26183.30 50231.33 130.6965
8 130.6446 0.6619131 POINT (26233.28 50231.33) 26233.28 50231.33 130.6446
9 130.5958 0.6620184 POINT (26283.26 50231.33) 26283.26 50231.33 130.5958
10 132.5484 0.6542154 POINT (25033.76 50181.32) 25033.76 50181.32 132.5484
resp is a sf tibble data.frame with point features.
In order to create a raster surface data object, rasterize() of terra is used as shown in the code chunk below.
<- terra::rasterize(resp, grid,
kpred field = "pred")
kpred
class : SpatRaster
dimensions : 690, 1075, 1 (nrow, ncol, nlyr)
resolution : 49.98037, 50.01103 (x, y)
extent : 2667.538, 56396.44, 15748.72, 50256.33 (xmin, xmax, ymin, ymax)
coord. ref. : SVY21 / Singapore TM (EPSG:3414)
source(s) : memory
name : last
min value : 72.77826
max value : 195.53284
The output object kpred is in SpatRaster object class with a spatial resolution of 50m x 50m. It consists of 1075 columns and 690 rows and in SVY21 projected coordinates system.
Mapping the interpolated rainfall raster
Finally, tmap functions are used to map the interpolated rainfall raster (i.e. kpred) by using the code chunk below.
tmap_options(check.and.fix = TRUE)
tmap_mode("plot")
tm_shape(kpred) +
tm_raster(alpha = 0.6,
palette = "viridis",
title = "Total monthly rainfall (mm)") +
tm_layout(main.title = "Distribution of monthly rainfall, Feb 2024",
main.title.position = "center",
main.title.size = 1.2,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE) +
tm_compass(type="8star", size = 2) +
tm_scale_bar() +
tm_grid(alpha =0.2)
Automatic variogram modelling
Beside using gstat to perform variogram modelling manually, autofirVariogram()
of automap package can be used to perform varigram modelling as shown in the code chunk below.
<- autofitVariogram(MONTHSUM ~ 1,
v_auto
rfdata_sf)plot(v_auto)
v_auto
$exp_var
np dist gamma dir.hor dir.ver id
1 15 1957.436 311.9613 0 0 var1
2 33 3307.349 707.7685 0 0 var1
3 54 4861.368 848.1314 0 0 var1
4 116 6716.531 730.3969 0 0 var1
5 111 9235.708 1006.5381 0 0 var1
6 120 11730.199 1167.5988 0 0 var1
7 135 14384.636 1533.5903 0 0 var1
$var_model
model psill range kappa
1 Nug 0.00 0 0.0
2 Ste 24100.71 1647955 0.3
$sserr
[1] 0.2178294
attr(,"class")
[1] "autofitVariogram" "list"
<- gstat(formula = MONTHSUM ~ 1,
k model = v_auto$var_model,
data = rfdata_sf)
k
data:
var1 : formula = MONTHSUM`~`1 ; data dim = 43 x 2
variograms:
model psill range kappa
var1[1] Nug 0.00 0 0.0
var1[2] Ste 24100.71 1647955 0.3
<- predict(k, coop) resp
[using ordinary kriging]
$x <- st_coordinates(resp)[,1]
resp$y <- st_coordinates(resp)[,2]
resp$pred <- resp$var1.pred
resp$pred <- resp$pred
resp
<- terra::rasterize(resp, grid,
kpred field = "pred")
tmap_options(check.and.fix = TRUE)
tmap_mode("plot")
tm_shape(kpred) +
tm_raster(alpha = 0.6,
palette = "viridis",
title = "Total monthly rainfall (mm)") +
tm_layout(main.title = "Distribution of monthly rainfall, Feb 2024",
main.title.position = "center",
main.title.size = 1.2,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE) +
tm_compass(type="8star", size = 2) +
tm_scale_bar() +
tm_grid(alpha =0.2)
Reference
Olea, Ricardo A. (2006-07) “A six-step practical approach to semivariogram modeling”, Stochastic Environmental Research and Risk Assessment, 2006-07, Vol.20 (5), p.307-318. SMU e-journal.