EGRET was designed for water-quality and streamflow
exploration. One of the nice features of EGRET is the
convenient functions to import data from web services. This vignette
gives some advice for updating the classic workflows for importing water
quality EGRET data.
Users of the readNWISSample function may have noticed a
warning recently:
Warning message:
NWIS qw web services are being retired.
Please see vignette('qwdata_changes', package = 'dataRetrieval')
for more information.
https://cran.r-project.org/web/packages/dataRetrieval/vignettes/qwdata_changes.html
The exact date when the NWIS water-quality services will be shut down
is not known, but it is expected to happen in the first half of the
2024. When the NWIS web service shuts down, we will be removing the
readNWISSample function from EGRET.
USGS data that was historically retrieved from the NWIS services will still be available from the Water Quality Portal.
dataRetrieval Workflow
This is the recommended workflow for any water quality data that can be accessed through the Water Quality Portal (WQP). WQP houses USGS, EPA, and other water quality data. It is currently possible to query the WQP by USGS parameter code for a USGS site, but using parameter code is not guaranteed in the future. Therefore, this workflow will focus on best-practices for the WQP.
First, it’s recommended to get the raw data from WQP using functions
in the dataRetrieval package. Please see the dataRetrieval site
for complete information on dataRetrieval.
This will let us analyze and explore the data using all the
information available. Let’s start with our classic EGRET
example data from the Choptank River. The USGS station id is “01491000”.
Using the WQP, we need to add a “USGS-” prefix to the station id. Our
example eList in EGRET is for “Inorganic nitrogen (nitrate
and nitrite)”. In the WQP, we use these words as the input to the
CharacteristicName.
library(dataRetrieval)
nitrogen <- readWQPdata(siteNumbers = "USGS-01491000",
CharacteristicName = "Inorganic nitrogen (nitrate and nitrite)")This returns a data frame with 1413 rows. There are a few columns I recommend checking:
unique(nitrogen$ResultSampleFractionText)## [1] "Dissolved" "Total"
We’ll need to decide if it’s appropriate to use both Total and Dissolved in a single analysis. Usually it is not. There may be other sample fraction values depending on the parameters.
unique(nitrogen$ActivityTypeCode)## [1] "Sample-Routine"
## [2] "Sample-Composite Without Parents"
## [3] "Quality Control Sample-Reference Sample"
## [4] "Quality Control Sample-Field Replicate"
## [5] "Not determined"
Here we see there are some quality control samples included in the data. Perhaps replicates are acceptable for analysis, but perhaps not. You would definately want to remove samples that are “Blanks” for example.
unique(nitrogen$ActivityMediaName)## [1] "Water" "Biological Tissue"
For this analysis, we’re only interested in water.
Looking at these results, we need to filter the results down to “Water” as the media, “Total” for the sample fraction, and we want to exclude any quality control results. For your individual analysis, there might be other decisions you need to make. It’s important to take a look at this raw data early in your workflow to make sure you are looking at the right things.
We’ll use the dplyr package for some general
cleanup:
library(dplyr)
total_nitrogen_water <- nitrogen %>%
filter(ActivityMediaName == "Water",
ResultSampleFractionText == "Total",
!ActivityTypeCode %in% c("Quality Control Sample-Reference Sample",
"Quality Control Sample-Field Replicate"))We’ve taken our data down to 454 rows. Let’s look at a few more columns:
unique(total_nitrogen_water$ResultMeasure.MeasureUnitCode)## [1] "mg/l as N"
unique(total_nitrogen_water$ResultDetectionConditionText)## [1] NA
unique(total_nitrogen_water$HydrologicEvent)## [1] "Routine sample" "Hurricane"
## [3] "Not Determined (historical)" "Drought"
## [5] "Storm" "Snowmelt"
## [7] "Dambreak"
unique(total_nitrogen_water$HydrologicCondition)## [1] "Not determined" "Stable, high stage" "Stable, normal stage"
## [4] "Stable, low stage" "Rising stage" "Peak stage"
## [7] "Falling stage"
In this example, we have 1 reported measurement unit and no detection
condition text. I would guess for most EGRET type analysis,
this would be fine. However, maybe you decide you don’t want the
“Hurricane” data or “Dambreak”. These are the kinds of things you’ll
need to consider at the beginning of analysis.
So now we need to convert the WQP output into a Sample
data frame. There are 3 steps that are used in the
readWQPSample function. The first is
processQWData. This function tries to automate the
conversion process going from the WQP format to 3 simple columns:
dateTime, qualifier, and value.
processed_qw <- processQWData(total_nitrogen_water)The “qualifier” column will only come back with left-censored
indicators “<”. If your data has right or interval censored data, you
will need to determine how to flag those. It is a good idea to check the
output of processQWData to make sure the “qualifier” flag
seems to match the raw data. The function checks the column
ResultDetectionConditionText for any “non-detect” type text. It also
checks if the reported value is less than the reported detection limit.
There are no required text fields in the WQP, so it is a good idea to
look for any unusual “ResultDetectionConditionText” output that may not
be flagged in the processQWData function.
The function compressData converts the qualifier column
to ConcLow/ConcHigh/Uncen that is require in EGRET.
Finally, populateSampleColumns adds in the necessary date
columns:
compressedData <- compressData(processed_qw[c("dateTime",
"qualifier",
"value")],
verbose = FALSE)
Sample <- populateSampleColumns(compressedData)Classic workflows
The readNWISSample, readUserSample, and
readWQPSample functions are the classic ways to get water
quality data in an EGRET friendly format.
readUserSample
This has always been a valid option for getting independent water
quality data into EGRET. Users will need to generate a
delimited file. The separator can be anything, it is defined in the
“separator” argument. The file must be organized in a very strict way.
The first column must be the date column, the second column is the
remark code (which should use “<” for left-censored values), and the
third column is the value column.
filePath <- system.file("extdata", package="EGRET")
fileName <- 'ChoptankRiverNitrate.csv'
Sample <- readUserSample(filePath,
fileName,
separator = ";",
verbose = FALSE)readWQPSample
readWQPSample is a function that attempts to get water
quality data, and automatically format it in an
EGRET-friendly way. Generally it does a pretty good job.
However, the full set of raw data is not retained and some details on
the data may be lost.
Sample_All <- readWQPSample(siteNumber = 'WIDNR_WQX-10032762',
characteristicName = 'Specific conductance',
startDate = '',
endDate = '')Details described above in the recommended workflow sections should help you understand why starting with the original raw data is preferable. It takes a little more work to understand the data, but you also are more confident that you are analyizing the correct information.
