Skip to contents

This article will show an example of how to convert “long” Water Quality Portal results to a “wide” data frame. Generally “long” data frames are preferred for R analysis. However, there are a few situations where one needs to make a “wide” table.

First, what do we mean? Here is an example of a “long” table:

site day param value
A 1 Atrazine 0.10
A 1 Metolachlor 0.25
A 2 Atrazine 0.13
A 2 Metolachlor 0.38
B 1 Atrazine 0.14
B 1 Metolachlor 0.38
B 2 Atrazine 0.15
B 2 Metolachlor 0.40

Here is an example with the exact same data, but instead formatted in a “wide” table:

site day Atrazine Metolachlor
A 1 0.10 0.25
A 2 0.13 0.38
B 1 0.14 0.38
B 2 0.15 0.40

The “long” table has one measurement per row, which means there is an additional column that defines the parameter. The “wide” table on the other hand has many measurements per row, but only 1 date/site.

Water Quality Portal example

Let’s grab data from 2 sites that measured 2 nutrients.

sites <- c("USGS-04027000", "USGS-04063700")
chracterisitic_names <- c("Phosphorus",
                          "Ammonia and ammonium")

nutrient_data <- readWQPqw(siteNumbers = sites,
                           chracterisitic_names)

The data comes back with 67 columns, let’s pull out the ones we decide are important for our analysis. We’ll rename the columns so they are easier to read on the screen, however the rename is not necessary. Your own pivots may need many more columns!

nutrient_data_slim <- nutrient_data |> 
  select(date = ActivityStartDateTime,
         site = MonitoringLocationIdentifier,
         param = CharacteristicName,
         fraction = ResultSampleFractionText,
         condition = ResultDetectionConditionText,
         value = ResultMeasureValue,
         det_va = DetectionQuantitationLimitMeasure.MeasureValue,
         units = ResultMeasure.MeasureUnitCode,
         det_units = DetectionQuantitationLimitMeasure.MeasureUnitCode
         ) |> 
  mutate(units = if_else(is.na(units), det_units, units)) |> 
  select(-det_units) |> 
  filter(!is.na(date))

knitr::kable(head(nutrient_data_slim))
date site param fraction condition value det_va units

For this analysis, each parameter needs individual detection conditions, values, units, and detection limits. Using the tidyr package, we can run the pivot_wider function and get a table with all the required columns.

library(tidyr)

nutrients_wide <- nutrient_data_slim |> 
  pivot_wider(names_from = c(param, units),
              values_from = c(condition, value, det_va))

names(nutrients_wide)
## [1] "date"     "site"     "fraction"

Often users want a table that is sorted by parameter. Interleave columns can be done directly in the pivot using the build_wider_spec function. In this example, the columns will come back grouped by parameter and unit:

spec <- nutrient_data_slim |> 
  build_wider_spec(
    names_from = c(param, units), 
    values_from = c(condition, value, det_va), 
    names_glue = "{param}_{units}_{.value}") |> 
  arrange(param, units, .value) 

nutrients_wide2 <- nutrient_data_slim |> 
  pivot_wider_spec(spec)

names(nutrients_wide2)
## [1] "date"      "site"      "fraction"  "condition"
## [5] "value"     "det_va"