header_tag.html

Skip to contents

Using a data frame that has at least Date, Q, Qualifier, populates the rest of the basic Daily data frame used in EGRET analysis.

Usage

populateDaily(rawData, qConvert, verbose = TRUE, adjust = TRUE,
  fill = FALSE, maxgap = 21, fill_type = "interpolation")

Arguments

rawData

dataframe contains at least Date, Q, Qualifier columns.

qConvert

numeric conversion to cubic meters per second.

verbose

logical specifying whether or not to display messages.

adjust

logical specifying whether or not to add a constant to zero values to allow log transformation. Defaults to TRUE.

fill

logical specifying whether to fill NA values by linear interpolation. Defaults to FALSE.

maxgap

Maximum number of NA days allowed for interpolating gaps. Default is 21. Only used if fill is set to TRUE.

fill_type

character to define what process to fill missing data. Options are "interpolation", "spline", or "tsSmooth". "interpolation" is linear interpolation from the `zoo::na.approx`. "spline" is a spline fit using `zoo::na.spline`. "tsSmooth" uses `stats::tsSmooth` which is fixed-interval smoothing on time series. "tsStruct" uses a structural time series models. "log_interp" is linear interpolation in the log space. Only used if fill is set to TRUE.

Value

A data frame 'Daily' with the following columns:

NameTypeDescription
QnumericDischarge in m^3/s
JulianintegerNumber of days since Jan. 1, 1850
MonthintegerMonth of the year [1-12]
DayintegerDay of the year [1-366]
DecYearnumericDecimal year
MonthSeqintegerNumber of months since January 1, 1850
QualifiercharacterQualifying code
iintegerIndex of days, starting with 1
LogQnumericNatural logarithm of Q
Q7numeric7 day running average of Q
Q30numeric30 day running average of Q

Author

Robert M. Hirsch rhirsch@usgs.gov

Examples

Date <- as.character(seq(from = as.Date("2001/1/1"),
                         to = as.Date("2002/1/2"),
                         by = "day"))
Q <- c(-1:365)
Qualifier <- rep("",367)
dataInput_complete <- data.frame(Date, Q, Qualifier)
dataInput <- dataInput_complete[-4:-5,]

# No fill, but with 0 and negative:
Daily <- populateDaily(dataInput, qConvert = 1)
#> There are 1 negative flow days.
#> Many EGRET functions will not work with negative values.
#> Adjust is TRUE but there are 1 negative flow days.
#> Discharge was not adjusted.
#> 0.545% missing data.
#> NA ranges in Q (2 total NA values across 1 run):
#>   2001-01-04 to 2001-01-05 (2 days)
#> Many EGRET functions will not work with missing values.

# No negatives/zeros:
Q <- 2+sin(seq(from = 0, to = 2*pi, length.out = 367))
Q <- jitter(Q, factor = 500)
plot(Q, ylim = c(0, 3.2))

dataInput_complete <- data.frame(Date, Q, Qualifier)
# Remove some rows to test missing:
dataInput <- dataInput_complete[-4:-5,]
dataInput <- dataInput[-10:-20,]

# No fill:
Daily <- populateDaily(dataInput, qConvert = 1)
#> 3.54% missing data.
#> NA ranges in Q (13 total NA values across 2 runs):
#>   2001-01-04 to 2001-01-05 (2 days)
#>   2001-01-12 to 2001-01-22 (11 days)
#> Many EGRET functions will not work with missing values.
plot(Daily$Date[1:30], Daily$Q[1:30], type = "b", ylim = c(0, 3.2))


# Linear interpolation:
Daily_fill <- populateDaily(dataInput,
                            qConvert = 1,
                            fill = TRUE,
                            fill_type = "interpolation")
#> 3.54% missing data.
#> NA ranges in Q (13 total NA values across 2 runs):
#>   2001-01-04 to 2001-01-05 (2 days)
#>   2001-01-12 to 2001-01-22 (11 days)
#> NA values filled by when gap range less than 21 days.
plot(Daily_fill$Date[1:30],
     Daily_fill$Q[1:30],
     col = as.factor(Daily_fill$Qualifier[1:30]),
     type = "b", pch = 16, ylim = c(0, 3.2),
     main = "Linear Interpolation")


# Spline fit:
Daily_spline <- populateDaily(dataInput,
                              qConvert = 1,
                              fill = TRUE,
                              fill_type = "spline")
#> 3.54% missing data.
#> NA ranges in Q (13 total NA values across 2 runs):
#>   2001-01-04 to 2001-01-05 (2 days)
#>   2001-01-12 to 2001-01-22 (11 days)
#> NA values filled by when gap range less than 21 days.
plot(Daily_spline$Date[1:30],
     Daily_spline$Q[1:30],
     col = as.factor(Daily_spline$Qualifier[1:30]),
     main = "Spline Fit",
     type = "b", pch = 16, ylim = c(0, 3.2))


# Fixed-Interval Smoothing on Time Series:
Daily_tsSmooth <- populateDaily(dataInput,
                              qConvert = 1,
                              fill = TRUE,
                              fill_type = "tsSmooth")
#> 3.54% missing data.
#> NA ranges in Q (13 total NA values across 2 runs):
#>   2001-01-04 to 2001-01-05 (2 days)
#>   2001-01-12 to 2001-01-22 (11 days)
#> Warning: possible convergence problem: 'optim' gave code = 52 and message ‘ERROR: ABNORMAL_TERMINATION_IN_LNSRCH’
#> NA values filled by when gap range less than 21 days.
plot(Daily_tsSmooth$Date[1:30],
     Daily_tsSmooth$Q[1:30],
     col = as.factor(Daily_tsSmooth$Qualifier[1:30]),
     main = "Fixed-interval smoothing on time series",
     type = "b", pch = 16, ylim = c(0, 3.2))


Daily_tsStruct <- populateDaily(dataInput,
                              qConvert = 1,
                              fill = TRUE,
                              fill_type = "tsStruct")
#> 3.54% missing data.
#> NA ranges in Q (13 total NA values across 2 runs):
#>   2001-01-04 to 2001-01-05 (2 days)
#>   2001-01-12 to 2001-01-22 (11 days)
#> Warning: possible convergence problem: 'optim' gave code = 52 and message ‘ERROR: ABNORMAL_TERMINATION_IN_LNSRCH’
#> NA values filled by when gap range less than 21 days.
plot(Daily_tsStruct$Date[1:30],
     Daily_tsStruct$Q[1:30],
     col = as.factor(Daily_tsStruct$Qualifier[1:30]),
     main = "Fixed-interval on time series",
     type = "b", pch = 16, ylim = c(0, 3.2))


# Add a gap that is too big do deal with:
dataInput <- dataInput_complete[-4:-20,]
dataInput <- dataInput[-200:-255,]

Daily_interp <- populateDaily(dataInput,
                              qConvert = 1,
                              fill = TRUE,
                              fill_type = "interpolation")
#> 19.9% missing data.
#> NA ranges in Q (73 total NA values across 2 runs):
#>   2001-01-04 to 2001-01-20 (17 days)
#>   2001-08-05 to 2001-09-29 (56 days)
#> NA values filled by when gap range less than 21 days.
plot(Daily_interp$Date, Daily_interp$Q,
     col = as.factor(Daily_interp$Qualifier),
     main = "Linear Interpolation",
     type = "b", pch = 16, ylim = c(0, 3.2))

plot(Daily_interp$Date[1:50], Daily_interp$Q[1:50],
     col = as.factor(Daily_interp$Qualifier[1:50]),
     main = "Linear Interpolation",
     type = "b", pch = 16, ylim = c(0, 3.2))


Daily_log_interp <- populateDaily(dataInput,
                              qConvert = 1,
                              fill = TRUE,
                              fill_type = "log_interp")
#> 19.9% missing data.
#> NA ranges in Q (73 total NA values across 2 runs):
#>   2001-01-04 to 2001-01-20 (17 days)
#>   2001-08-05 to 2001-09-29 (56 days)
#> NA values filled by when gap range less than 21 days.
plot(Daily_log_interp$Date, Daily_log_interp$Q,
     col = as.factor(Daily_log_interp$Qualifier),
     main = "Linear Interpolation in Log Scale",
     type = "b", pch = 16, ylim = c(0, 3.2))

plot(Daily_log_interp$Date[1:50], Daily_log_interp$Q[1:50],
     col = as.factor(Daily_log_interp$Qualifier[1:50]),
     main = "Linear Interpolation in Log Scale",
     type = "b", pch = 16, ylim = c(0, 3.2))


Daily_spline <- populateDaily(dataInput,
                             qConvert = 1,
                             fill = TRUE,
                             fill_type = "spline")
#> 19.9% missing data.
#> NA ranges in Q (73 total NA values across 2 runs):
#>   2001-01-04 to 2001-01-20 (17 days)
#>   2001-08-05 to 2001-09-29 (56 days)
#> NA values filled by when gap range less than 21 days.
plot(Daily_spline$Date[1:50], Daily_spline$Q[1:50],
     col = as.factor(Daily_spline$Qualifier[1:50]),
     main = "Spline Fit",
     type = "b", pch = 16, ylim = c(0, 3.2))


Daily_tsSmooth <- populateDaily(dataInput,
                                qConvert = 1,
                                fill = TRUE,
                                fill_type = "tsSmooth")
#> 19.9% missing data.
#> NA ranges in Q (73 total NA values across 2 runs):
#>   2001-01-04 to 2001-01-20 (17 days)
#>   2001-08-05 to 2001-09-29 (56 days)
#> NA values filled by when gap range less than 21 days.
plot(Daily_tsSmooth$Date[1:50], Daily_tsSmooth$Q[1:50],
     col = as.factor(Daily_tsSmooth$Qualifier[1:50]),
     type = "b", pch = 16, ylim = c(0, 3.2))


Daily_tsStruct <- populateDaily(dataInput,
                                qConvert = 1,
                                fill = TRUE,
                                fill_type = "tsStruct")
#> 19.9% missing data.
#> NA ranges in Q (73 total NA values across 2 runs):
#>   2001-01-04 to 2001-01-20 (17 days)
#>   2001-08-05 to 2001-09-29 (56 days)
#> NA values filled by when gap range less than 21 days.
plot(Daily_tsStruct$Date[1:50], Daily_tsStruct$Q[1:50],
     col = as.factor(Daily_tsStruct$Qualifier[1:50]),
     type = "b", pch = 16, ylim = c(0, 3.2))


# Real data:
eList <- Choptank_eList
Daily_chop <- eList$Daily
df <- Daily_chop[,c("Date", "Q")]
df <- df[-2:-5, ]
df <- df[-100:-200,]
D2 <- populateDaily(df, 1, fill = TRUE)
#> 0.898% missing data.
#> NA ranges in Q (105 total NA values across 2 runs):
#>   1979-10-02 to 1979-10-05 (4 days)
#>   1980-01-12 to 1980-04-21 (101 days)
#> NA values filled by when gap range less than 21 days.
plot(D2$Date[1:20], D2$Q[1:20],
     col = as.factor(D2$Qualifier[1:20]),
     main = "Linear Interpolation",
     type = "b", pch = 16)

plot(D2$Date[1:110], D2$Q[1:110],
     col = as.factor(D2$Qualifier[1:110]),
     main = "Linear Interpolation",
     type = "b", pch = 16)


D3 <- populateDaily(df, 1, fill = TRUE, fill_type = "spline")
#> 0.898% missing data.
#> NA ranges in Q (105 total NA values across 2 runs):
#>   1979-10-02 to 1979-10-05 (4 days)
#>   1980-01-12 to 1980-04-21 (101 days)
#> NA values filled by when gap range less than 21 days.
plot(D3$Date[1:20], D3$Q[1:20],
     col = as.factor(D3$Qualifier[1:20]),
     main = "Spline Fit",
     type = "b", pch = 16)

plot(D3$Date[1:110], D3$Q[1:110],
     col = as.factor(D3$Qualifier[1:110]),
     main = "Spline Fit",
     type = "b", pch = 16)


D4 <- populateDaily(df, 1, fill = TRUE, fill_type = "tsSmooth")
#> 0.898% missing data.
#> NA ranges in Q (105 total NA values across 2 runs):
#>   1979-10-02 to 1979-10-05 (4 days)
#>   1980-01-12 to 1980-04-21 (101 days)
#> NA values filled by when gap range less than 21 days.
plot(D4$Date[1:20], D4$Q[1:20],
     col = as.factor(D4$Qualifier[1:20]),
     main = "Spline Fit",
     type = "b", pch = 16)

plot(D4$Date[1:110], D4$Q[1:110],
     col = as.factor(D4$Qualifier[1:110]),
     main = "tsSmooth Fit",
     type = "b", pch = 16)