header_tag.html

Skip to contents

This article will show an example of how to convert “long” Water Quality Portal results to a “wide” data frame. Generally “long” data frames are preferred for R analysis. However, there are a few situations where one needs to make a “wide” table.

First, what do we mean? Here is an example of a “long” table:

site day param value
A 1 Atrazine 0.10
A 1 Metolachlor 0.25
A 2 Atrazine 0.13
A 2 Metolachlor 0.38
B 1 Atrazine 0.14
B 1 Metolachlor 0.38
B 2 Atrazine 0.15
B 2 Metolachlor 0.40

Here is an example with the exact same data, but instead formatted in a “wide” table:

site day Atrazine Metolachlor
A 1 0.10 0.25
A 2 0.13 0.38
B 1 0.14 0.38
B 2 0.15 0.40

The “long” table has one measurement per row, which means there is an additional column that defines the parameter. The “wide” table on the other hand has many measurements per row, but only 1 date/site.

Water Quality Portal example

Let’s grab data from 2 sites that measured 2 nutrients.

sites <- c("USGS-04027000", "USGS-04063700")
chracterisitic_names <- c("Phosphorus",
                          "Ammonia and ammonium")

nutrient_data <- readWQPqw(siteNumbers = sites,
                           chracterisitic_names)

The data comes back with 65 columns, let’s pull out the ones we decide are important for our analysis. We’ll rename the columns so they are easier to read on the screen, however the rename is not necessary. Your own pivots may need many more columns!

nutrient_data_slim <- nutrient_data |> 
  select(date = ActivityStartDateTime,
         site = MonitoringLocationIdentifier,
         param = CharacteristicName,
         fraction = ResultSampleFractionText,
         condition = ResultDetectionConditionText,
         value = ResultMeasureValue,
         det_va = DetectionQuantitationLimitMeasure.MeasureValue,
         units = ResultMeasure.MeasureUnitCode,
         det_units = DetectionQuantitationLimitMeasure.MeasureUnitCode) |> 
  mutate(units = if_else(is.na(units), det_units, units)) |> 
  select(-det_units) |> 
  filter(!is.na(date))

knitr::kable(head(nutrient_data_slim))
date site param fraction condition value det_va units
1967-12-09 18:50:00 USGS-04063700 Phosphorus Total NA 0.40 NA mg/l PO4
1968-01-08 21:30:00 USGS-04063700 Phosphorus Total NA 0.05 NA mg/l PO4
1968-02-07 16:30:00 USGS-04063700 Phosphorus Total NA 0.06 NA mg/l PO4
1968-03-11 18:10:00 USGS-04063700 Phosphorus Total NA 0.07 NA mg/l PO4
1968-04-08 14:00:00 USGS-04063700 Phosphorus Total NA 0.06 NA mg/l PO4
1968-04-30 12:30:00 USGS-04063700 Phosphorus Total NA 0.04 NA mg/l PO4

For this analysis, each parameter needs individual detection conditions, values, units, and detection limits. Using the tidyr package, we can run the pivot_wider function and get a table with all the required columns.

library(tidyr)

nutrients_wide <- nutrient_data_slim |> 
  pivot_wider(names_from = c(param, units),
              values_from = c(condition, value, det_va))

names(nutrients_wide)
##  [1] "date"                                    
##  [2] "site"                                    
##  [3] "fraction"                                
##  [4] "condition_Phosphorus_mg/l PO4"           
##  [5] "condition_Phosphorus_mg/l as P"          
##  [6] "condition_Ammonia and ammonium_mg/l as N"
##  [7] "condition_Ammonia and ammonium_mg/l NH4" 
##  [8] "condition_Phosphorus_mg/l"               
##  [9] "condition_Phosphorus_%"                  
## [10] "condition_Phosphorus_mg/kg as P"         
## [11] "value_Phosphorus_mg/l PO4"               
## [12] "value_Phosphorus_mg/l as P"              
## [13] "value_Ammonia and ammonium_mg/l as N"    
## [14] "value_Ammonia and ammonium_mg/l NH4"     
## [15] "value_Phosphorus_mg/l"                   
## [16] "value_Phosphorus_%"                      
## [17] "value_Phosphorus_mg/kg as P"             
## [18] "det_va_Phosphorus_mg/l PO4"              
## [19] "det_va_Phosphorus_mg/l as P"             
## [20] "det_va_Ammonia and ammonium_mg/l as N"   
## [21] "det_va_Ammonia and ammonium_mg/l NH4"    
## [22] "det_va_Phosphorus_mg/l"                  
## [23] "det_va_Phosphorus_%"                     
## [24] "det_va_Phosphorus_mg/kg as P"

Often users want a table that is sorted by parameter. Interleave columns can be done directly in the pivot using the build_wider_spec function. In this example, the columns will come back grouped by parameter and unit:

spec <- nutrient_data_slim |> 
  build_wider_spec(
    names_from = c(param, units), 
    values_from = c(condition, value, det_va), 
    names_glue = "{param}_{units}_{.value}") |> 
  arrange(param, units, .value) 

nutrients_wide2 <- nutrient_data_slim |> 
  pivot_wider_spec(spec)

names(nutrients_wide2)
##  [1] "date"                                    
##  [2] "site"                                    
##  [3] "fraction"                                
##  [4] "Ammonia and ammonium_mg/l NH4_condition" 
##  [5] "Ammonia and ammonium_mg/l NH4_det_va"    
##  [6] "Ammonia and ammonium_mg/l NH4_value"     
##  [7] "Ammonia and ammonium_mg/l as N_condition"
##  [8] "Ammonia and ammonium_mg/l as N_det_va"   
##  [9] "Ammonia and ammonium_mg/l as N_value"    
## [10] "Phosphorus_%_condition"                  
## [11] "Phosphorus_%_det_va"                     
## [12] "Phosphorus_%_value"                      
## [13] "Phosphorus_mg/kg as P_condition"         
## [14] "Phosphorus_mg/kg as P_det_va"            
## [15] "Phosphorus_mg/kg as P_value"             
## [16] "Phosphorus_mg/l_condition"               
## [17] "Phosphorus_mg/l_det_va"                  
## [18] "Phosphorus_mg/l_value"                   
## [19] "Phosphorus_mg/l PO4_condition"           
## [20] "Phosphorus_mg/l PO4_det_va"              
## [21] "Phosphorus_mg/l PO4_value"               
## [22] "Phosphorus_mg/l as P_condition"          
## [23] "Phosphorus_mg/l as P_det_va"             
## [24] "Phosphorus_mg/l as P_value"