header_tag.html

Skip to contents

Increasingly there are high frequency sensor data available for water quality data. There’s a need to join the sensor and discrete data by the closest time.

This article will discuss how to do that using the data.table package.

Let’s look at site “01646500”, and a nearby site with a real-time nitrate-plus-nitrite sensor. Our goal is to get the discharge and nitrate-plus-nitrite sensor data aligned with the discrete water quality samples.

library(dataRetrieval)

site_uv <- "01646500"
site_qw <- "USGS-01646580"
pcode_uv <- "99133"
pcode_qw <- "00631"
start_date <- as.Date("2018-01-01")
end_date <- as.Date("2020-01-01")
qw_data <- readWQPqw(site_qw, pcode_qw, 
                     startDate = start_date,
                     endDate = end_date)

uv_data <- readNWISuv(siteNumbers = site_uv, 
                      parameterCd = c(pcode_uv, "00060"),
                      startDate = start_date,
                     endDate = end_date)

The sensor data (“uv” data) at this particular site has 2 columns of data that are important. The first task is to combine those columns. This is rather unique to this particular site and probably won’t need to be done generally.

library(dplyr)

uv_trim <- uv_data |> 
  select(uv_date = dateTime, 
         val1 = X_SUNA...Discontinued._99133_00000,
         val2 = X_SUNA_99133_00000,
         flow = X_00060_00000) |> 
  mutate(val_uv = if_else(is.na(val1), val2, val1)) |> 
  select(-val1, -val2)
uv_date flow val_uv
2018-01-01 05:00:00 1940 1.43
2018-01-01 05:15:00 1940 1.43
2018-01-01 05:30:00 1850 1.43
2018-01-01 05:45:00 1850 1.43
2018-01-01 06:00:00 1850 1.43
2018-01-01 06:15:00 1810 1.43

Next we’ll clean up the discrete water quality “qw” data to make it easy to follow in this tutorial.

qw_trim <- qw_data |> 
  filter(ActivityTypeCode == "Sample-Routine",
         !is.na(ActivityStartDateTime)) |> 
  select(qw_date = ActivityStartDateTime,
         val_qw = ResultMeasureValue,
         det_txt = ResultDetectionConditionText)
qw_date val_qw det_txt
2018-01-04 15:30:00 1.39 NA
2018-01-15 15:15:00 1.73 NA
2018-02-01 15:30:00 1.66 NA
2018-02-11 19:00:00 1.33 NA
2018-02-13 17:30:00 1.53 NA
2018-02-15 16:00:00 1.65 NA

Finally, we’ll use the data.table package to do a join to the nearest date. The code to do that is here:

library(data.table)

setDT(qw_trim)[, join_date := qw_date]

setDT(uv_trim)[, join_date := uv_date]

closest_dt <- uv_trim[qw_trim, on = .(join_date), roll = "nearest"]

closest_dt is a data.table object. It similar to a data.frame, but not identical. We can convert it to a data.frame and then use dplyr commands. Note: the whole analysis can be done via data.table, but most examples in dataRetrieval have used dplyr, which is why we bring it back to data.frame. dplyr also has a join_by(closest()) option, but it is more complicated because you can only specify the closeness in either the forward or backwards direction (and we want either direction).

We can calculate “delta_time” - the difference in time between the uv and qw data. We’ll probably want to add a threshold that we don’t join values if they are too far apart in time. In this example, if the difference is greater than 24 hours, we’ll substitute NA.

qw_closest <- data.frame(closest_dt) |> 
  mutate(delta_time = difftime(qw_date, uv_date, 
                               units = "hours"),
         val_uv = if_else(abs(as.numeric(delta_time)) >= 24, NA, val_uv)) |> 
  select(-join_date)
uv_date flow val_uv qw_date val_qw det_txt delta_time
2018-01-04 15:30:00 1850 1.43 2018-01-04 15:30:00 1.39 NA 0 hours
2018-01-15 15:15:00 8960 1.84 2018-01-15 15:15:00 1.73 NA 0 hours
2018-02-01 15:30:00 6000 1.67 2018-02-01 15:30:00 1.66 NA 0 hours
2018-02-11 19:00:00 20800 1.29 2018-02-11 19:00:00 1.33 NA 0 hours
2018-02-13 17:30:00 30700 1.63 2018-02-13 17:30:00 1.53 NA 0 hours
2018-02-15 16:00:00 25300 1.68 2018-02-15 16:00:00 1.65 NA 0 hours

Here are a few plots to show the applications of these joins:

library(ggplot2)

ggplot(data = qw_closest) +
  geom_point(aes(x = val_uv, y = val_qw)) +
  theme_bw() +
  xlab("Sensor") +
  ylab("Discrete")

ggplot(data = qw_closest) +
  geom_point(aes(x = flow, y = val_qw)) +
  theme_bw() +
  xlab("Discharge") +
  ylab("Concentration") +
  scale_x_log10() +
  scale_y_log10()

ggplot() +
  geom_line(data = uv_trim,
            aes(x = uv_date, val_uv),
            color = "lightgrey") +
  geom_point(data = qw_closest,
             aes(x = qw_date, y = val_qw),
             color= "red") +
  theme_bw() +
  ggtitle("Red dots = discrete samples, grey lines = continuous sensor") +
  xlab("") +
  ylab("Concentration")