Long to Wide • dataRetrieval

This article will show an example of how to convert “long” Water Quality Portal results to a “wide” data frame. Generally “long” data frames are preferred for R analysis. However, there are a few situations where one needs to make a “wide” table.

First, what do we mean? Here is an example of a “long” table:

site	day	param	value
A	1	Atrazine	0.10
A	1	Metolachlor	0.25
A	2	Atrazine	0.13
A	2	Metolachlor	0.38
B	1	Atrazine	0.14
B	1	Metolachlor	0.38
B	2	Atrazine	0.15
B	2	Metolachlor	0.40

Here is an example with the exact same data, but instead formatted in a “wide” table:

site	day	Atrazine	Metolachlor
A	1	0.10	0.25
A	2	0.13	0.38
B	1	0.14	0.38
B	2	0.15	0.40

The “long” table has one measurement per row, which means there is an additional column that defines the parameter. The “wide” table on the other hand has many measurements per row, but only 1 date/site.

Water Quality Portal example

Let’s grab data from 2 sites that measured 2 nutrients.

sites <- c("USGS-04027000", "USGS-04063700")
chracterisitic_names <- c("Phosphorus",
                          "Ammonia and ammonium")

nutrient_data <- readWQPqw(siteNumbers = sites,
                           chracterisitic_names)

The data comes back with 65 columns, let’s pull out the ones we decide are important for our analysis. We’ll rename the columns so they are easier to read on the screen, however the rename is not necessary. Your own pivots may need many more columns!

nutrient_data_slim <- nutrient_data |> 
  select(date = ActivityStartDateTime,
         site = MonitoringLocationIdentifier,
         param = CharacteristicName,
         fraction = ResultSampleFractionText,
         condition = ResultDetectionConditionText,
         value = ResultMeasureValue,
         det_va = DetectionQuantitationLimitMeasure.MeasureValue,
         units = ResultMeasure.MeasureUnitCode,
         det_units = DetectionQuantitationLimitMeasure.MeasureUnitCode) |> 
  mutate(units = if_else(is.na(units), det_units, units)) |> 
  select(-det_units) |> 
  filter(!is.na(date))

knitr::kable(head(nutrient_data_slim))

date	site	param	fraction	condition	value	det_va	units
1967-12-09 18:50:00	USGS-04063700	Phosphorus	Total	NA	0.40	NA	mg/l PO4
1968-01-08 21:30:00	USGS-04063700	Phosphorus	Total	NA	0.05	NA	mg/l PO4
1968-02-07 16:30:00	USGS-04063700	Phosphorus	Total	NA	0.06	NA	mg/l PO4
1968-03-11 18:10:00	USGS-04063700	Phosphorus	Total	NA	0.07	NA	mg/l PO4
1968-04-08 14:00:00	USGS-04063700	Phosphorus	Total	NA	0.06	NA	mg/l PO4
1968-04-30 12:30:00	USGS-04063700	Phosphorus	Total	NA	0.04	NA	mg/l PO4

For this analysis, each parameter needs individual detection conditions, values, units, and detection limits. Using the tidyr package, we can run the pivot_wider function and get a table with all the required columns.

library(tidyr)

nutrients_wide <- nutrient_data_slim |> 
  pivot_wider(names_from = c(param, units),
              values_from = c(condition, value, det_va))

names(nutrients_wide)

##  [1] "date"                                    
##  [2] "site"                                    
##  [3] "fraction"                                
##  [4] "condition_Phosphorus_mg/l PO4"           
##  [5] "condition_Phosphorus_mg/l as P"          
##  [6] "condition_Ammonia and ammonium_mg/l as N"
##  [7] "condition_Ammonia and ammonium_mg/l NH4" 
##  [8] "condition_Phosphorus_mg/l"               
##  [9] "condition_Phosphorus_%"                  
## [10] "condition_Phosphorus_mg/kg as P"         
## [11] "value_Phosphorus_mg/l PO4"               
## [12] "value_Phosphorus_mg/l as P"              
## [13] "value_Ammonia and ammonium_mg/l as N"    
## [14] "value_Ammonia and ammonium_mg/l NH4"     
## [15] "value_Phosphorus_mg/l"                   
## [16] "value_Phosphorus_%"                      
## [17] "value_Phosphorus_mg/kg as P"             
## [18] "det_va_Phosphorus_mg/l PO4"              
## [19] "det_va_Phosphorus_mg/l as P"             
## [20] "det_va_Ammonia and ammonium_mg/l as N"   
## [21] "det_va_Ammonia and ammonium_mg/l NH4"    
## [22] "det_va_Phosphorus_mg/l"                  
## [23] "det_va_Phosphorus_%"                     
## [24] "det_va_Phosphorus_mg/kg as P"

Often users want a table that is sorted by parameter. Interleave columns can be done directly in the pivot using the build_wider_spec function. In this example, the columns will come back grouped by parameter and unit:

spec <- nutrient_data_slim |> 
  build_wider_spec(
    names_from = c(param, units), 
    values_from = c(condition, value, det_va), 
    names_glue = "{param}_{units}_{.value}") |> 
  arrange(param, units, .value) 

nutrients_wide2 <- nutrient_data_slim |> 
  pivot_wider_spec(spec)

names(nutrients_wide2)

##  [1] "date"                                    
##  [2] "site"                                    
##  [3] "fraction"                                
##  [4] "Ammonia and ammonium_mg/l NH4_condition" 
##  [5] "Ammonia and ammonium_mg/l NH4_det_va"    
##  [6] "Ammonia and ammonium_mg/l NH4_value"     
##  [7] "Ammonia and ammonium_mg/l as N_condition"
##  [8] "Ammonia and ammonium_mg/l as N_det_va"   
##  [9] "Ammonia and ammonium_mg/l as N_value"    
## [10] "Phosphorus_%_condition"                  
## [11] "Phosphorus_%_det_va"                     
## [12] "Phosphorus_%_value"                      
## [13] "Phosphorus_mg/kg as P_condition"         
## [14] "Phosphorus_mg/kg as P_det_va"            
## [15] "Phosphorus_mg/kg as P_value"             
## [16] "Phosphorus_mg/l_condition"               
## [17] "Phosphorus_mg/l_det_va"                  
## [18] "Phosphorus_mg/l_value"                   
## [19] "Phosphorus_mg/l PO4_condition"           
## [20] "Phosphorus_mg/l PO4_det_va"              
## [21] "Phosphorus_mg/l PO4_value"               
## [22] "Phosphorus_mg/l as P_condition"          
## [23] "Phosphorus_mg/l as P_det_va"             
## [24] "Phosphorus_mg/l as P_value"