This article will show an example of how to convert “long” Water Quality Portal results to a “wide” data frame. Generally “long” data frames are preferred for R analysis. However, there are a few situations where one needs to make a “wide” table.
First, what do we mean? Here is an example of a “long” table:
site | day | param | value |
---|---|---|---|
A | 1 | Atrazine | 0.10 |
A | 1 | Metolachlor | 0.25 |
A | 2 | Atrazine | 0.13 |
A | 2 | Metolachlor | 0.38 |
B | 1 | Atrazine | 0.14 |
B | 1 | Metolachlor | 0.38 |
B | 2 | Atrazine | 0.15 |
B | 2 | Metolachlor | 0.40 |
Here is an example with the exact same data, but instead formatted in a “wide” table:
site | day | Atrazine | Metolachlor |
---|---|---|---|
A | 1 | 0.10 | 0.25 |
A | 2 | 0.13 | 0.38 |
B | 1 | 0.14 | 0.38 |
B | 2 | 0.15 | 0.40 |
The “long” table has one measurement per row, which means there is an additional column that defines the parameter. The “wide” table on the other hand has many measurements per row, but only 1 date/site.
Water Quality Portal example
Let’s grab data from 2 sites that measured 2 nutrients.
sites <- c("USGS-04027000", "USGS-04063700")
chracterisitic_names <- c("Phosphorus",
"Ammonia and ammonium")
nutrient_data <- readWQPqw(siteNumbers = sites,
chracterisitic_names)
The data comes back with 65 columns, let’s pull out the ones we decide are important for our analysis. We’ll rename the columns so they are easier to read on the screen, however the rename is not necessary. Your own pivots may need many more columns!
nutrient_data_slim <- nutrient_data |>
select(date = ActivityStartDateTime,
site = MonitoringLocationIdentifier,
param = CharacteristicName,
fraction = ResultSampleFractionText,
condition = ResultDetectionConditionText,
value = ResultMeasureValue,
det_va = DetectionQuantitationLimitMeasure.MeasureValue,
units = ResultMeasure.MeasureUnitCode,
det_units = DetectionQuantitationLimitMeasure.MeasureUnitCode) |>
mutate(units = if_else(is.na(units), det_units, units)) |>
select(-det_units) |>
filter(!is.na(date))
knitr::kable(head(nutrient_data_slim))
date | site | param | fraction | condition | value | det_va | units |
---|---|---|---|---|---|---|---|
1967-12-09 18:50:00 | USGS-04063700 | Phosphorus | Total | NA | 0.40 | NA | mg/l PO4 |
1968-01-08 21:30:00 | USGS-04063700 | Phosphorus | Total | NA | 0.05 | NA | mg/l PO4 |
1968-02-07 16:30:00 | USGS-04063700 | Phosphorus | Total | NA | 0.06 | NA | mg/l PO4 |
1968-03-11 18:10:00 | USGS-04063700 | Phosphorus | Total | NA | 0.07 | NA | mg/l PO4 |
1968-04-08 14:00:00 | USGS-04063700 | Phosphorus | Total | NA | 0.06 | NA | mg/l PO4 |
1968-04-30 12:30:00 | USGS-04063700 | Phosphorus | Total | NA | 0.04 | NA | mg/l PO4 |
For this analysis, each parameter needs individual detection
conditions, values, units, and detection limits. Using the
tidyr
package, we can run the pivot_wider
function and get a table with all the required columns.
library(tidyr)
nutrients_wide <- nutrient_data_slim |>
pivot_wider(names_from = c(param, units),
values_from = c(condition, value, det_va))
names(nutrients_wide)
## [1] "date"
## [2] "site"
## [3] "fraction"
## [4] "condition_Phosphorus_mg/l PO4"
## [5] "condition_Phosphorus_mg/l as P"
## [6] "condition_Ammonia and ammonium_mg/l as N"
## [7] "condition_Ammonia and ammonium_mg/l NH4"
## [8] "condition_Phosphorus_mg/l"
## [9] "condition_Phosphorus_%"
## [10] "condition_Phosphorus_mg/kg as P"
## [11] "value_Phosphorus_mg/l PO4"
## [12] "value_Phosphorus_mg/l as P"
## [13] "value_Ammonia and ammonium_mg/l as N"
## [14] "value_Ammonia and ammonium_mg/l NH4"
## [15] "value_Phosphorus_mg/l"
## [16] "value_Phosphorus_%"
## [17] "value_Phosphorus_mg/kg as P"
## [18] "det_va_Phosphorus_mg/l PO4"
## [19] "det_va_Phosphorus_mg/l as P"
## [20] "det_va_Ammonia and ammonium_mg/l as N"
## [21] "det_va_Ammonia and ammonium_mg/l NH4"
## [22] "det_va_Phosphorus_mg/l"
## [23] "det_va_Phosphorus_%"
## [24] "det_va_Phosphorus_mg/kg as P"
Often users want a table that is sorted by parameter. Interleave
columns can be done directly in the pivot using the
build_wider_spec
function. In this example, the columns
will come back grouped by parameter and unit:
spec <- nutrient_data_slim |>
build_wider_spec(
names_from = c(param, units),
values_from = c(condition, value, det_va),
names_glue = "{param}_{units}_{.value}") |>
arrange(param, units, .value)
nutrients_wide2 <- nutrient_data_slim |>
pivot_wider_spec(spec)
names(nutrients_wide2)
## [1] "date"
## [2] "site"
## [3] "fraction"
## [4] "Ammonia and ammonium_mg/l NH4_condition"
## [5] "Ammonia and ammonium_mg/l NH4_det_va"
## [6] "Ammonia and ammonium_mg/l NH4_value"
## [7] "Ammonia and ammonium_mg/l as N_condition"
## [8] "Ammonia and ammonium_mg/l as N_det_va"
## [9] "Ammonia and ammonium_mg/l as N_value"
## [10] "Phosphorus_%_condition"
## [11] "Phosphorus_%_det_va"
## [12] "Phosphorus_%_value"
## [13] "Phosphorus_mg/kg as P_condition"
## [14] "Phosphorus_mg/kg as P_det_va"
## [15] "Phosphorus_mg/kg as P_value"
## [16] "Phosphorus_mg/l_condition"
## [17] "Phosphorus_mg/l_det_va"
## [18] "Phosphorus_mg/l_value"
## [19] "Phosphorus_mg/l PO4_condition"
## [20] "Phosphorus_mg/l PO4_det_va"
## [21] "Phosphorus_mg/l PO4_value"
## [22] "Phosphorus_mg/l as P_condition"
## [23] "Phosphorus_mg/l as P_det_va"
## [24] "Phosphorus_mg/l as P_value"