USGS dataretrieval Python Package get_samples() Examples

This notebook provides examples of using the Python dataretrieval package to retrieve water quality sample data for United States Geological Survey (USGS) monitoring sites. The dataretrieval package provides a collection of functions to get data from the USGS Samples database and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA).

Install the Package

Use the following code to install the package if it doesn’t exist already within your Jupyter Python environment.

[1]:
!pip install dataretrieval
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: dataretrieval in /home/runner/.local/lib/python3.12/site-packages (0.1.dev1+g2c8419c)
Requirement already satisfied: requests in /usr/lib/python3/dist-packages (from dataretrieval) (2.31.0)
Requirement already satisfied: pandas==2.* in /home/runner/.local/lib/python3.12/site-packages (from dataretrieval) (2.3.0)
Requirement already satisfied: numpy>=1.26.0 in /home/runner/.local/lib/python3.12/site-packages (from pandas==2.*->dataretrieval) (2.3.1)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/lib/python3/dist-packages (from pandas==2.*->dataretrieval) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/lib/python3/dist-packages (from pandas==2.*->dataretrieval) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /home/runner/.local/lib/python3.12/site-packages (from pandas==2.*->dataretrieval) (2025.2)

Load the package so you can use it along with other packages used in this notebook.

[2]:
from dataretrieval import waterdata
from IPython.display import display

Basic Usage

The dataretrieval package has several functions that allow you to retrieve data from different web services. This examples uses the get_samples() function to retrieve water quality sample data for USGS monitoring sites from Samples. The following arguments are supported:

  • ssl_check : boolean, optional Check the SSL certificate.

  • service : string One of the available Samples services: “results”, “locations”, “activities”, “projects”, or “organizations”. Defaults to “results”.

  • profile : string One of the available profiles associated with a service. Options for each service are: results - “fullphyschem”, “basicphyschem”, “fullbio”, “basicbio”, “narrow”, “resultdetectionquantitationlimit”, “labsampleprep”, “count” locations - “site”, “count” activities - “sampact”, “actmetric”, “actgroup”, “count” projects - “project”, “projectmonitoringlocationweight” organizations - “organization”, “count”

  • activityMediaName : string or list of strings, optional Name or code indicating environmental medium in which sample was taken. Check the activityMediaName_lookup() function in this module for all possible inputs. Example: “Water”.

  • activityStartDateLower : string, optional The start date if using a date range. Takes the format YYYY-MM-DD. The logic is inclusive, i.e. it will also return results that match the date. If left as None, will pull all data on or before activityStartDateUpper, if populated.

  • activityStartDateUpper : string, optional The end date if using a date range. Takes the format YYYY-MM-DD. The logic is inclusive, i.e. it will also return results that match the date. If left as None, will pull all data after activityStartDateLower up to the most recent available results.

  • activityTypeCode : string or list of strings, optional Text code that describes type of field activity performed. Example: “Sample-Routine, regular”.

  • characteristicGroup : string or list of strings, optional Characteristic group is a broad category of characteristics describing one or more results. Check the characteristicGroup_lookup() function in this module for all possible inputs. Example: “Organics, PFAS”

  • characteristic : string or list of strings, optional Characteristic is a specific category describing one or more results. Check the characteristic_lookup() function in this module for all possible inputs. Example: “Suspended Sediment Discharge”

  • characteristicUserSupplied : string or list of strings, optional A user supplied characteristic name describing one or more results.

  • boundingBox: list of four floats, optional Filters on the the associated monitoring location’s point location by checking if it is located within the specified geographic area. The logic is inclusive, i.e. it will include locations that overlap with the edge of the bounding box. Values are separated by commas, expressed in decimal degrees, NAD83, and longitudes west of Greenwich are negative. The format is a string consisting of: - Western-most longitude - Southern-most latitude - Eastern-most longitude - Northern-most longitude Example: [-92.8,44.2,-88.9,46.0]

  • countryFips : string or list of strings, optional Example: “US” (United States)

  • stateFips : string or list of strings, optional Check the stateFips_lookup() function in this module for all possible inputs. Example: “US:15” (United States: Hawaii)

  • countyFips : string or list of strings, optional Check the countyFips_lookup() function in this module for all possible inputs. Example: “US:15:001” (United States: Hawaii, Hawaii County)

  • siteTypeCode : string or list of strings, optional An abbreviation for a certain site type. Check the siteType_lookup() function in this module for all possible inputs. Example: “GW” (Groundwater site)

  • siteTypeName : string or list of strings, optional A full name for a certain site type. Check the siteType_lookup() function in this module for all possible inputs. Example: “Well”

  • usgsPCode : string or list of strings, optional 5-digit number used in the US Geological Survey computerized data system, National Water Information System (NWIS), to uniquely identify a specific constituent. Check the characteristic_lookup() function in this module for all possible inputs. Example: “00060” (Discharge, cubic feet per second)

  • hydrologicUnit : string or list of strings, optional Max 12-digit number used to describe a hydrologic unit. Example: “070900020502”

  • monitoringLocationIdentifier : string or list of strings, optional A monitoring location identifier has two parts: the agency code and the location number, separated by a dash (-). Example: “USGS-040851385”

  • organizationIdentifier : string or list of strings, optional Designator used to uniquely identify a specific organization. Currently only accepting the organization “USGS”.

  • pointLocationLatitude : float, optional Latitude for a point/radius query (decimal degrees). Must be used with pointLocationLongitude and pointLocationWithinMiles.

  • pointLocationLongitude : float, optional Longitude for a point/radius query (decimal degrees). Must be used with pointLocationLatitude and pointLocationWithinMiles.

  • pointLocationWithinMiles : float, optional Radius for a point/radius query. Must be used with pointLocationLatitude and pointLocationLongitude

  • projectIdentifier : string or list of strings, optional Designator used to uniquely identify a data collection project. Project identifiers are specific to an organization (e.g. USGS). Example: “ZH003QW03”

  • recordIdentifierUserSupplied : string or list of strings, optional Internal AQS record identifier that returns 1 entry. Only available for the “results” service.

Example 1: Get all water quality sample data for a single monitoring site

[3]:
siteID = 'USGS-10109000'
wq_data = waterdata.get_samples(monitoringLocationIdentifier=siteID)
print('Retrieved data for ' + str(len(wq_data[0])) + ' samples.')
Request: https://api.waterdata.usgs.gov/samples-data/results/fullphyschem?monitoringLocationIdentifier=USGS-10109000&mimeType=text%2Fcsv
Retrieved data for 2293 samples.

Interpreting the Result

The result of calling the get_samples() function is an object that contains a Pandas data frame object and an associated metadata object. The Pandas data frame contains the water quality sample data for the requested site, and or observed variables and time frame.

Once you’ve got the data frame, there’s several useful things you can do to explore the data.

Display the data frame as a table. The default data frame for this function is a long, flat table, with a row for each observed variable at a given site and date/time.

[4]:
display(wq_data[0])
Org_Identifier Org_FormalName Project_Identifier Project_Name Project_QAPPApproved Project_QAPPApprovalAgency ProjectAttachment_FileName ProjectAttachment_FileType Location_Identifier Location_Name ... ResultAttachment_FileName ResultAttachment_FileType ResultAttachment_FileDownload ProviderName Result_CharacteristicComparable Result_CharacteristicGroup Org_Type LastChangeDate USGSpcode USGSSampleAquifer
0 USGS U.S. Geological Survey ["USGS"] NaN NaN NaN NaN NaN USGS-10109000 LOGAN RIVER ABOVE STATE DAM, NEAR LOGAN, UT ... NaN NaN NaN USGS NaN Physical Federal/US Government 2025-04-21 20 NaN
1 USGS U.S. Geological Survey ["USGS"] NaN NaN NaN NaN NaN USGS-10109000 LOGAN RIVER ABOVE STATE DAM, NEAR LOGAN, UT ... NaN NaN NaN USGS NaN Physical Federal/US Government 2025-04-21 61 NaN
2 USGS U.S. Geological Survey ["USGS"] NaN NaN NaN NaN NaN USGS-10109000 LOGAN RIVER ABOVE STATE DAM, NEAR LOGAN, UT ... NaN NaN NaN USGS NaN Physical Federal/US Government 2025-04-21 95 NaN
3 USGS U.S. Geological Survey ["USGS","UT2110000"] NaN NaN NaN NaN NaN USGS-10109000 LOGAN RIVER ABOVE STATE DAM, NEAR LOGAN, UT ... NaN NaN NaN USGS NaN Information Federal/US Government 2025-04-21 9 NaN
4 USGS U.S. Geological Survey ["USGS","UT1710001"] NaN NaN NaN NaN NaN USGS-10109000 LOGAN RIVER ABOVE STATE DAM, NEAR LOGAN, UT ... NaN NaN NaN USGS NaN Information Federal/US Government 2025-04-21 9 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2288 USGS U.S. Geological Survey ["USGS","UT2510000"] NaN NaN NaN NaN NaN USGS-10109000 LOGAN RIVER ABOVE STATE DAM, NEAR LOGAN, UT ... NaN NaN NaN USGS NaN Physical Federal/US Government 2025-05-20 4 NaN
2289 USGS U.S. Geological Survey ["USGS","UT2510000"] NaN NaN NaN NaN NaN USGS-10109000 LOGAN RIVER ABOVE STATE DAM, NEAR LOGAN, UT ... NaN NaN NaN USGS NaN Physical Federal/US Government 2025-05-20 95 NaN
2290 USGS U.S. Geological Survey ["USGS","UT2510000"] NaN NaN NaN NaN NaN USGS-10109000 LOGAN RIVER ABOVE STATE DAM, NEAR LOGAN, UT ... NaN NaN NaN USGS NaN Physical Federal/US Government 2025-05-20 65 NaN
2291 USGS U.S. Geological Survey ["USGS","UT2510000"] NaN NaN NaN NaN NaN USGS-10109000 LOGAN RIVER ABOVE STATE DAM, NEAR LOGAN, UT ... NaN NaN NaN USGS NaN Physical Federal/US Government 2025-05-20 61 NaN
2292 USGS U.S. Geological Survey ["USGS","UT2510000"] NaN NaN NaN NaN NaN USGS-10109000 LOGAN RIVER ABOVE STATE DAM, NEAR LOGAN, UT ... NaN NaN NaN USGS NaN Information Federal/US Government 2025-05-20 3 NaN

2293 rows × 181 columns

Show the data types of the columns in the resulting data frame.

[5]:
print(wq_data[0].dtypes)
Org_Identifier                 object
Org_FormalName                 object
Project_Identifier             object
Project_Name                  float64
Project_QAPPApproved          float64
                               ...
Result_CharacteristicGroup     object
Org_Type                       object
LastChangeDate                 object
USGSpcode                       int64
USGSSampleAquifer             float64
Length: 181, dtype: object

The other part of the result returned from the get_samples() function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS web service. The USGS web service responses contain a descriptive header that defines and can be helpful in interpreting the contents of the response.

[6]:
print('The query URL used to retrieve the data from USGS Samples was: ' + wq_data[1].url)
The query URL used to retrieve the data from USGS Samples was: https://api.waterdata.usgs.gov/samples-data/results/fullphyschem?monitoringLocationIdentifier=USGS-10109000&mimeType=text%2Fcsv

Additional Examples

Example 2: Get water quality sample data for multiple sites for a single parameter

[7]:
site_ids = ['USGS-04024430', 'USGS-04024000']
parameter_code = '00065'
wq_multi_site = waterdata.get_samples(monitoringLocationIdentifier=site_ids, usgsPCode=parameter_code)
print('Retrieved data for ' + str(len(wq_multi_site[0])) + ' samples.')
display(wq_multi_site[0])
Request: https://api.waterdata.usgs.gov/samples-data/results/fullphyschem?usgsPCode=00065&monitoringLocationIdentifier=USGS-04024430&monitoringLocationIdentifier=USGS-04024000&mimeType=text%2Fcsv
Retrieved data for 298 samples.
Org_Identifier Org_FormalName Project_Identifier Project_Name Project_QAPPApproved Project_QAPPApprovalAgency ProjectAttachment_FileName ProjectAttachment_FileType Location_Identifier Location_Name ... ResultAttachment_FileName ResultAttachment_FileType ResultAttachment_FileDownload ProviderName Result_CharacteristicComparable Result_CharacteristicGroup Org_Type LastChangeDate USGSpcode USGSSampleAquifer
0 USGS U.S. Geological Survey ["Great Lakes Restoration Initiative","GR13NK0... NaN NaN NaN NaN NaN USGS-04024000 ST. LOUIS RIVER AT SCANLON, MN ... NaN NaN NaN USGS NaN Physical Federal/US Government 2025-04-21 65 NaN
1 USGS U.S. Geological Survey ["Great Lakes Restoration Initiative","MN-00300"] NaN NaN NaN NaN NaN USGS-04024000 ST. LOUIS RIVER AT SCANLON, MN ... NaN NaN NaN USGS NaN Physical Federal/US Government 2025-04-21 65 NaN
2 USGS U.S. Geological Survey ["Great Lakes Restoration Initiative","GR12NK0... NaN NaN NaN NaN NaN USGS-04024000 ST. LOUIS RIVER AT SCANLON, MN ... NaN NaN NaN USGS NaN Physical Federal/US Government 2025-04-21 65 NaN
3 USGS U.S. Geological Survey ["Great Lakes Restoration Initiative","00GQ436... NaN NaN NaN NaN NaN USGS-04024000 ST. LOUIS RIVER AT SCANLON, MN ... NaN NaN NaN USGS NaN Physical Federal/US Government 2025-04-21 65 NaN
4 USGS U.S. Geological Survey ["Great Lakes Restoration Initiative","MN-00300"] NaN NaN NaN NaN NaN USGS-04024000 ST. LOUIS RIVER AT SCANLON, MN ... NaN NaN NaN USGS NaN Physical Federal/US Government 2025-04-21 65 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
293 USGS U.S. Geological Survey ["Great Lakes Restoration Initiative","NA00V18"] NaN NaN NaN NaN NaN USGS-04024000 ST. LOUIS RIVER AT SCANLON, MN ... NaN NaN NaN USGS NaN Physical Federal/US Government 2025-04-21 65 NaN
294 USGS U.S. Geological Survey ["Great Lakes Restoration Initiative","NA00V18"] NaN NaN NaN NaN NaN USGS-04024000 ST. LOUIS RIVER AT SCANLON, MN ... NaN NaN NaN USGS NaN Physical Federal/US Government 2025-05-01 65 NaN
295 USGS U.S. Geological Survey ["Great Lakes Restoration Initiative","NA00V18"] NaN NaN NaN NaN NaN USGS-04024000 ST. LOUIS RIVER AT SCANLON, MN ... NaN NaN NaN USGS NaN Physical Federal/US Government 2025-06-09 65 NaN
296 USGS U.S. Geological Survey ["Great Lakes Restoration Initiative","NA00V18"] NaN NaN NaN NaN NaN USGS-04024000 ST. LOUIS RIVER AT SCANLON, MN ... NaN NaN NaN USGS NaN Physical Federal/US Government 2025-06-03 65 NaN
297 USGS U.S. Geological Survey ["Great Lakes Restoration Initiative","NA00V18"] NaN NaN NaN NaN NaN USGS-04024000 ST. LOUIS RIVER AT SCANLON, MN ... NaN NaN NaN USGS NaN Physical Federal/US Government 2025-06-05 65 NaN

298 rows × 181 columns

Example 3: Retrieve water quality sample data for multiple sites, including a list of parameters, within a time period defined by start date until present

[8]:
site_ids = ['USGS-04024430', 'USGS-04024000']
parameterCd = ['34247', '30234', '32104', '34220']
startDate = '2012-01-01'
wq_data2 = waterdata.get_samples(monitoringLocationIdentifier=site_ids, usgsPCode=parameterCd,
                           activityStartDateLower=startDate)
print('Retrieved data for ' + str(len(wq_multi_site[0])) + ' samples.')
display(wq_data2[0])

Request: https://api.waterdata.usgs.gov/samples-data/results/fullphyschem?activityStartDateLower=2012-01-01&usgsPCode=34247&usgsPCode=30234&usgsPCode=32104&usgsPCode=34220&monitoringLocationIdentifier=USGS-04024430&monitoringLocationIdentifier=USGS-04024000&mimeType=text%2Fcsv
Retrieved data for 298 samples.
Org_Identifier Org_FormalName Project_Identifier Project_Name Project_QAPPApproved Project_QAPPApprovalAgency ProjectAttachment_FileName ProjectAttachment_FileType Location_Identifier Location_Name ... ResultAttachment_FileName ResultAttachment_FileType ResultAttachment_FileDownload ProviderName Result_CharacteristicComparable Result_CharacteristicGroup Org_Type LastChangeDate USGSpcode USGSSampleAquifer
0 USGS U.S. Geological Survey ["USGS","F6X7H"] NaN NaN NaN NaN NaN USGS-04024430 NEMADJI RIVER NEAR SOUTH SUPERIOR, WI ... NaN NaN NaN USGS NaN Organics, Other Federal/US Government 2025-04-21 32104 NaN
1 USGS U.S. Geological Survey ["Great Lakes Restoration Initiative","00GQ414... NaN NaN NaN NaN NaN USGS-04024000 ST. LOUIS RIVER AT SCANLON, MN ... NaN NaN NaN USGS NaN Organics, Other Federal/US Government 2025-04-21 34247 NaN
2 USGS U.S. Geological Survey ["USGS","GLRI TOX"] NaN NaN NaN NaN NaN USGS-04024430 NEMADJI RIVER NEAR SOUTH SUPERIOR, WI ... NaN NaN NaN USGS NaN Organics, Other Federal/US Government 2025-04-21 32104 NaN
3 USGS U.S. Geological Survey ["Great Lakes Restoration Initiative","GR12NK0... NaN NaN NaN NaN NaN USGS-04024000 ST. LOUIS RIVER AT SCANLON, MN ... NaN NaN NaN USGS NaN Organics, Pesticide Federal/US Government 2025-04-21 30234 NaN
4 USGS U.S. Geological Survey ["Great Lakes Restoration Initiative","MN-00300"] NaN NaN NaN NaN NaN USGS-04024000 ST. LOUIS RIVER AT SCANLON, MN ... NaN NaN NaN USGS NaN Organics, Other Federal/US Government 2025-04-21 32104 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
147 USGS U.S. Geological Survey ["Great Lakes Restoration Initiative","MN-00300"] NaN NaN NaN NaN NaN USGS-04024000 ST. LOUIS RIVER AT SCANLON, MN ... NaN NaN NaN USGS NaN Organics, Pesticide Federal/US Government 2025-04-21 30234 NaN
148 USGS U.S. Geological Survey ["Great Lakes Restoration Initiative","GR14NK0... NaN NaN NaN NaN NaN USGS-04024000 ST. LOUIS RIVER AT SCANLON, MN ... NaN NaN NaN USGS NaN Organics, Other Federal/US Government 2025-04-21 34220 NaN
149 USGS U.S. Geological Survey ["USGS","GLRI TOX"] NaN NaN NaN NaN NaN USGS-04024430 NEMADJI RIVER NEAR SOUTH SUPERIOR, WI ... NaN NaN NaN USGS NaN Organics, Other Federal/US Government 2025-04-22 34220 NaN
150 USGS U.S. Geological Survey ["Great Lakes Restoration Initiative","GR12NK0... NaN NaN NaN NaN NaN USGS-04024000 ST. LOUIS RIVER AT SCANLON, MN ... NaN NaN NaN USGS NaN Organics, Other Federal/US Government 2025-04-22 32104 NaN
151 USGS U.S. Geological Survey ["Great Lakes Restoration Initiative","GR12NK0... NaN NaN NaN NaN NaN USGS-04024000 ST. LOUIS RIVER AT SCANLON, MN ... NaN NaN NaN USGS NaN Organics, Other Federal/US Government 2025-04-21 34247 NaN

152 rows × 181 columns

Example 4: Retrieve water quality sample data for one site and convert to a wide format

Note that the USGS Samples database returns multiple parameters in a “long” format: each row in the resulting table represents a single observation of a single parameters. Furthermore, every observation has 181 fields of metadata. However, if you wanted to place your water quality data into a “wide” format, where each column represents a water quality parameter code, the code below details one solution.

[9]:
siteID = 'USGS-10109000'
wq_data,_ = waterdata.get_samples(monitoringLocationIdentifier=siteID)
print('Retrieved data for ' + str(len(wq_data)) + ' sample results.')

wq_data["characteristic_unit"] = wq_data["Result_Characteristic"] + ", " + wq_data["Result_MeasureUnit"]
wq_data_wide = wq_data.pivot_table(index=['Location_Identifier', 'Activity_StartDate', 'Activity_StartTime'], columns="characteristic_unit", values="Result_Measure", aggfunc='first')
display(wq_data_wide)

Request: https://api.waterdata.usgs.gov/samples-data/results/fullphyschem?monitoringLocationIdentifier=USGS-10109000&mimeType=text%2Fcsv
Retrieved data for 2293 sample results.
characteristic_unit Acidity, (H+), mg/L Alkalinity, mg/L Bicarbonate, mg/L Calcium, mg/L Carbon dioxide, mg/L Carbonate, mg/L Chloride, mg/L Depth of water column, ft Hardness, Ca, Mg, mg/L Hardness, non-carbonate, mg/L ... Stream flow, instantaneous, m3/sec Stream flow, m3/sec Stream width measure, ft Sulfate, mg/L Temperature, air, deg C Temperature, water, deg C Total dissolved solids, mg/L Total dissolved solids, tons/ac ft Total dissolved solids, tons/day pH, standard units
Location_Identifier Activity_StartDate Activity_StartTime
USGS-10109000 1967-09-13 07:35:00 0.00001 187 228 44.0 2.9 0.0 3.5 NaN 190 0.0 ... NaN 5.8 NaN 6.5 NaN 7.0 196 0.27 108 8.1
1968-01-18 12:20:00 0.00002 207 252 52.0 6.5 0.0 3.9 NaN 220 17 ... NaN 3.2 NaN 17.0 NaN 4.0 210 0.29 63.5 7.8
1968-05-15 12:30:00 0.00004 149 182 38.0 12 0.0 1.7 NaN 150 0.0 ... NaN 10 NaN 5.5 NaN 7.0 155 0.21 151 7.4
1968-07-26 14:40:00 0.00002 179 218 53.0 7.0 0.0 2.2 NaN 180 3 ... NaN 7.5 NaN 6.2 NaN 12.0 188 0.26 135 7.7
1972-12-08 16:15:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 4.0 NaN NaN NaN NaN 3.0 NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2024-11-06 11:02:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN 50.0 NaN NaN 3.9 NaN NaN NaN NaN
2024-12-11 16:21:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN 52.0 NaN NaN 3.1 NaN NaN NaN NaN
2025-02-03 17:53:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN 47.0 NaN NaN NaN NaN NaN NaN NaN
2025-03-26 15:09:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN 49.0 NaN NaN 10.3 NaN NaN NaN NaN
2025-05-13 16:42:00 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN 53.8 NaN NaN 8.2 NaN NaN NaN NaN

348 rows × 34 columns