Henry Mount Soil Climate Database Tutorial
D.E. Beaudette
2016-10-14
Introduction
This document demonstrates how to use the soilDB package to download data from the Henry Mount soil climate database. Soil climate data are routinely collected by SSO staff via buried sensor/data-logger devices ("hobos") and now above ground weather stations. The Henry Mount Soil Climate database was established to assist with the management and analysis of these data.
Setup R Environment
With a recent version of R (>= 2.15), it is possible to get all of the packages that this tutorial depends on via:
# run these commands in the R console
install.packages('RColorBrewer', dep=TRUE)
install.packages('reshape', dep=TRUE)
install.packages('dismo', dep=TRUE)
install.packages('rgdal', dep=TRUE)
install.packages('soilDB', dep=TRUE)
# get latest version from GitHub
install.packages('devtools', dep=TRUE)
devtools::install_github("ncss-tech/soilDB", dependencies=FALSE, upgrade_dependencies=FALSE)
Getting and Viewing Data
Soil climate data can be queried by:
- project (typically a soil survey area, "CA630")
- NASIS user site ID (e.g. "2006CA7920001")
- MLRA soil survey office (e.g. "2-SON")
and optionally filtered by:
- start date ("YYYY-MM-DD")
- end date ("YYYY-MM-DD")
- sensor type ("soiltemp" is the only type currently available)
and aggregated to the following granularity:
- "day" (MAST and mean summer/winter temperatures are automatically computed)
- "week"
- "month"
- "year"
Query daily sensor data associated with the Sequoia / Kings Canyon soil survey.
library(soilDB)
library(lattice)
library(RColorBrewer)
library(plyr)
# get soil temperature, soil moisture, and air temperature data
x <- fetchHenry(project='CA792')
# check object structure:
str(x, 2)
Quick listing of essential site-level data. "Functional years" is the number of years of non-missing data, after grouping data by Julian day. "Complete years" is the number of years that have 365 days of non-missing data. "dslv" is the number of days since the data-logger was last visited.
# convert into data.frame
d <- as.data.frame(x$sensors)
# keep only information on soil temperature sensors
d <- subset(d, subset=sensor_type == 'soiltemp')
# check top 6 rows and select columns
head(d[, c('user_site_id', 'name', 'sensor_depth', 'MAST', 'Winter', 'Summer', 'STR', 'functional.yrs', 'complete.yrs', 'dslv')])
| user_site_id | name | sensor_depth | MAST | Winter | Summer | STR | functional.yrs | complete.yrs | dslv |
|---|---|---|---|---|---|---|---|---|---|
| 2006CA7920001 | Muir Pass | 50 | 1.31 | -1.60 | 5.19 | cryic* | 7 | 7 | 446 |
| 2012CA7921062 | Dusy Basin | 50 | 5.47 | 1.07 | 12.43 | frigid* | 1 | 0 | 1102 |
| 2015CA7921071 | Tyndall | 50 | NA | NA | NA | NA | NA | NA | NA |
| S2012CA019001 | Littlepete | 50 | 5.48 | 1.17 | 10.93 | frigid* | 2 | 2 | 449 |
| S2012CA019002 | LeConte | 50 | 6.79 | 1.48 | 12.69 | frigid* | 2 | 2 | 449 |
| S2012CA019003 | McDermand | 50 | 4.02 | 0.85 | 8.96 | frigid* | 2 | 2 | 447 |
Plot Data
Note that there are gaps in the data: between site visits and lack of synchronization of site visits with start/end of the year.
xyplot(sensor_value ~ date_time | name, data=x$soiltemp, main='Daily Soil Temperature (Deg. C)', type=c('l', 'g'), as.table=TRUE, layout=c(2,9), xlab='Date', ylab='Deg C')
xyplot(sensor_value ~ date_time | name, data=x$soilVWC, main='Daily Soil Moisture', type=c('l', 'g'), as.table=TRUE, layout=c(2,6), xlab='Date', ylab='Deg C')
Another approach for investigating data gaps, blue: data, grey: no data.
levelplot(factor(!is.na(sensor_value)) ~ doy * factor(year) | name, main='Daily Soil Temperature (Deg. C)',
data=x$soiltemp, layout=c(2,7), col.regions=c('grey', 'RoyalBlue'), cuts=1,
colorkey=FALSE, as.table=TRUE, scales=list(alternating=3, cex=0.75),
par.strip.text=list(cex=0.85), strip=strip.custom(bg='yellow'),
xlab='Julian Day', ylab='Year')
levelplot(factor(!is.na(sensor_value)) ~ doy * factor(year) | name, main='Daily Soil Moisture',
data=x$soilVWC, layout=c(2,4), col.regions=c('grey', 'RoyalBlue'), cuts=1,
colorkey=FALSE, as.table=TRUE, scales=list(alternating=3, cex=0.75),
par.strip.text=list(cex=0.85), strip=strip.custom(bg='yellow'),
xlab='Julian Day', ylab='Year')
# levelplot(factor(!is.na(sensor_value)) ~ doy * factor(year) | name, main='Daily Air Temperature (Deg. C)',
# data=x$airtemp, layout=c(1,1), col.regions=c('grey', 'RoyalBlue'), cuts=1,
# colorkey=FALSE, as.table=TRUE, scales=list(alternating=3, cex=0.75),
# par.strip.text=list(cex=0.85), strip=strip.custom(bg='yellow'),
# xlab='Julian Day', ylab='Year')
This style of plotting data can be useful for making comparisons between years.
# generate some better colors
cols <- colorRampPalette(rev(brewer.pal(11, 'RdYlBu')), space='Lab', interpolate='spline')
levelplot(sensor_value ~ doy * factor(year) | name, main='Daily Soil Temperature (Deg. C)',
data=x$soiltemp, layout=c(2,7), col.regions=cols,
colorkey=list(space='top'), as.table=TRUE, scales=list(alternating=3, cex=0.75),
par.strip.text=list(cex=0.85), strip=strip.custom(bg='grey'),
xlab='Julian Day', ylab='Year')
levelplot(sensor_value ~ doy * factor(year) | name, main='Daily Soil Moisture',
data=x$soilVWC, layout=c(2,4), col.regions=cols,
colorkey=list(space='top'), as.table=TRUE, scales=list(alternating=3, cex=0.75),
par.strip.text=list(cex=0.85), strip=strip.custom(bg='grey'),
xlab='Julian Day', ylab='Year')
Aggregate over years by sensor / Julian day.
# compute MAST by sensor
a <- ddply(x$soiltemp, c('name', 'doy'), .fun=plyr::summarise, soiltemp=mean(sensor_value, na.rm = TRUE))
# re-order sensor names according to MAST
a.mast <- sort(tapply(a$soiltemp, a$name, mean, na.rm=TRUE))
a$name <- factor(a$name, levels=names(a.mast))
levelplot(soiltemp ~ doy * name, main='Daily Soil Temperature (Deg. C)',
data=a, col.regions=cols, xlab='Julian Day', ylab='',
colorkey=list(space='top'), scales=list(alternating=3, cex=0.75, x=list(tick.number=15)))
Convert data to percent saturation. (still working on this)
fun <- function(i) {
i$pct.sat <- i$sensor_value / max(i$sensor_value, na.rm = TRUE)
return(i)
}
z <- ddply(x$soilVWC, c('sid', 'year'), .fun=fun)
z$pct.sat <- factor(z$pct.sat >= 0.5, levels = c('TRUE', 'FALSE'), labels = c('Moist', 'Dry'))
levelplot(pct.sat ~ doy * factor(year) | name, main='Daily Soil Moisture',
data=z, layout=c(2,4), col.regions=c('grey', 'RoyalBlue'), cuts=1,
colorkey=FALSE, as.table=TRUE, scales=list(alternating=3, cex=0.75),
par.strip.text=list(cex=0.85), strip=strip.custom(bg='grey'),
xlab='Julian Day', ylab='Year')
Data Summaries
In the presence of missing data, MAST calculations will be biased towards those data that are not missing. For example, a block of missing data in January will result in an estimated MAST that is too high due to the missing data from the middle of winter. It is possible to estimate (mostly) unbiased MAST values in the presence of some missing data by averaging multiple years of data by Julian day. This approach will generate reasonable summaries in the presence of missing data, as long as data gaps are "covered" by corresponding data from another year. The longer the period of record and shorter the data gaps, the better.
Soil temperature regime assignment for gelic, cryic, and frigid conditions typically require additional information and are marked with an '*'.
When daily data are queried, unbiased summaries and indices of data "completeness" are calculated.
as.data.frame(x$sensors)[which(!is.na(x$sensors$MAST)), c('user_site_id', 'sensor_depth', 'name', 'MAST', 'Winter', 'Summer', 'STR', 'functional.yrs', 'complete.yrs', 'gap.index')]
| user_site_id | sensor_depth | name | MAST | Winter | Summer | STR | functional.yrs | complete.yrs | gap.index | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2006CA7920001 | 50 | Muir Pass | 1.31 | -1.60 | 5.19 | cryic* | 7 | 7 | 0.17 |
| 2 | 2012CA7921062 | 50 | Dusy Basin | 5.47 | 1.07 | 12.43 | frigid* | 1 | 0 | 0.46 |
| 4 | S2012CA019001 | 50 | Littlepete | 5.48 | 1.17 | 10.93 | frigid* | 2 | 2 | 0.29 |
| 5 | S2012CA019002 | 50 | LeConte | 6.79 | 1.48 | 12.69 | frigid* | 2 | 2 | 0.29 |
| 6 | S2012CA019003 | 50 | McDermand | 4.02 | 0.85 | 8.96 | frigid* | 2 | 2 | 0.29 |
| 7 | S2012CA019004 | 50 | Evolution | 4.54 | 0.80 | 9.34 | frigid* | 2 | 2 | 0.29 |
| 18 | S2013CA107001 | 8 | Dome-MinKingRd | 11.57 | 5.90 | 21.29 | mesic | 0 | 0 | 0.70 |
| 19 | S2013CA107001 | 50 | Dome-MinKingRd | 11.39 | 7.29 | 18.76 | mesic | 0 | 0 | 0.70 |
| 20 | S2013CA107001 | 20 | Dome-MinKingRd | 11.61 | 6.56 | 20.34 | mesic | 0 | 0 | 0.70 |
| 21 | S2013CA107001 | 100 | Dome-MinKingRd | 11.37 | 8.16 | 17.42 | mesic | 0 | 0 | 0.70 |
| 35 | S2014CA107005 | 8 | SND-Headquarters | 20.57 | 11.57 | 29.72 | thermic | 2 | 1 | 0.22 |
| 36 | S2014CA107005 | 50 | SND-Headquarters | 20.42 | 12.92 | 27.37 | thermic | 2 | 1 | 0.22 |
| 37 | S2014CA107005 | 100 | SND-Headquarters | 20.59 | 14.28 | 25.96 | thermic | 2 | 1 | 0.22 |
| 42 | S2014CA107005 | 20 | SND-Headquarters | 20.47 | 11.98 | 28.89 | thermic | 2 | 1 | 0.22 |
Additional Ideas
- Save sites as shape file
library(rgdal)
writeOGR(x$sensors, dsn='foldername', layer='filename', driver='ESRI Shapefile')
- Overlay site locations on a Google map
library(dismo)
g <- gmap(x$sensors)
plot(g, interpolate=TRUE)
points(Mercator(x$sensors), col='red')
- Fit a simple model relating MAST to MAAT (PRISM) using soil temperature data from the 2-SON office.
library(raster)
library(rms)
# get soil temperature, soil moisture, and air temperature data
x <- fetchHenry(sso = '2-SON', what = 'soiltemp')
r <- raster('E:/gis_data/prism/final_MAAT_800m.tif')
x$sensors$maat <- extract(r, x$sensors)
m.sp <- subset(x$sensors, subset=sensor_depth == 50)
m <- as.data.frame(m.sp)
plot(MAST ~ maat, data=m)
dd <- datadist(m)
options(datadist="dd")
(m.ols <- ols(MAST ~ rcs(maat), data=m))
##
## Linear Regression Model
##
## ols(formula = MAST ~ rcs(maat), data = m)
## Frequencies of Missing Values Due to Each Variable
## MAST maat
## 13 0
##
##
## Model Likelihood Discrimination
## Ratio Test Indexes
## Obs 76 LR chi2 141.23 R2 0.844
## sigma 1.9803 d.f. 4 R2 adj 0.835
## d.f. 71 Pr(> chi2) 0.0000 g 5.074
##
## Residuals
##
## Min 1Q Median 3Q Max
## -3.1118 -1.4355 -0.1889 1.1481 7.2236
##
## Coef S.E. t Pr(>|t|)
## Intercept 4.3576 0.7391 5.90 <0.0001
## maat -0.0404 0.2013 -0.20 0.8415
## maat' 1.6910 0.5521 3.06 0.0031
## maat'' -11.7788 5.0577 -2.33 0.0227
## maat''' 35.6551 13.6737 2.61 0.0111
plot(Predict(m.ols, conf.type = 'simultaneous'), ylab='MAST', xlab='MAAT (PRISM)')
m.sp$resid <- resid(m.ols)
spplot(m.sp, 'resid')
This document is based on aqp version 1.9.10 and soilDB version 1.8-4.
This document is based on
aqp version 1.9 and
soilDB version 1.8-4.