USGS dataretrieval Python Package get_stats() Examples
This notebook provides examples of using the Python dataretrieval package to retrieve statistics for observed variables at a United States Geological Survey (USGS) monitoring site. The dataretrieval package provides a collection of functions to get data from the USGS National Water Information System (NWIS) and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA).
Install the Package
Use the following code to install the package if it doesn’t exist already within your Jupyter Python environment.
[1]:
!pip install dataretrieval
Requirement already satisfied: dataretrieval in /opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages (0.1.dev1+g0aec2c864)
Requirement already satisfied: requests in /opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages (from dataretrieval) (2.33.1)
Requirement already satisfied: pandas<4.0.0,>=2.0.0 in /opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages (from dataretrieval) (3.0.2)
Requirement already satisfied: numpy>=1.26.0 in /opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages (from pandas<4.0.0,>=2.0.0->dataretrieval) (2.4.4)
Requirement already satisfied: python-dateutil>=2.8.2 in /opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages (from pandas<4.0.0,>=2.0.0->dataretrieval) (2.9.0.post0)
Requirement already satisfied: six>=1.5 in /opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages (from python-dateutil>=2.8.2->pandas<4.0.0,>=2.0.0->dataretrieval) (1.17.0)
Requirement already satisfied: charset_normalizer<4,>=2 in /opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages (from requests->dataretrieval) (3.4.7)
Requirement already satisfied: idna<4,>=2.5 in /opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages (from requests->dataretrieval) (3.11)
Requirement already satisfied: urllib3<3,>=1.26 in /opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages (from requests->dataretrieval) (2.6.3)
Requirement already satisfied: certifi>=2023.5.7 in /opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages (from requests->dataretrieval) (2026.2.25)
Load the package so you can use it along with other packages used in this notebook.
[2]:
from IPython.display import display
from matplotlib import ticker
from dataretrieval import nwisfrom dataretrieval import waterdata
import dataretrieval.waterdata as waterdata
Cell In[2], line 4
from dataretrieval import nwisfrom dataretrieval import waterdata
^
SyntaxError: invalid syntax
Basic Usage
The dataretrieval package has several functions that allow you to retrieve data from different web services. This examples uses the get_stats() function to retrieve statistics for observed variable(s) for a USGS monitoring site from USGS NWIS. The following arguments are available:
Arguments (Additional parameters, if supplied, will be used as query parameters).
sites (string or list of strings): A string or list of strings contining the USGS site identifiers for which to retrive data.
parameterCd (string or list of strings): A list of USGS parameter codes for which to retrieve data.
statReportType (string): The aggregation period for which statistics should be reported. Can be specified as ‘daily’ (default), ‘monthly’, or ‘annual’.
statTypeCd (string): The type of statistic to be returned in the result. Can be specified as ‘all’, ‘mean’, ‘max’, ‘min’, or ‘median’
Example 1: Get all of the annual mean discharge data for a single site
[3]:
# Set the parameters needed to retrieve data
siteNumber = "02319394"
parameterCode = "00060" # Discharge
# Retrieve the statistics
x1 = nwis.get_stats(
sites=siteNumber, parameterCd=parameterCode, statReportType="annual"
)
print("Retrieved " + str(len(x1[0])) + " data values.")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[3], line 6
2 siteNumber = "02319394"
3 parameterCode = "00060" # Discharge
4
5 # Retrieve the statistics
----> 6 x1 = nwis.get_stats(
7 sites=siteNumber, parameterCd=parameterCode, statReportType="annual"
8 )
9 print("Retrieved " + str(len(x1[0])) + " data values.")
NameError: name 'nwis' is not defined
Interpreting the Result
The result of calling the get_stats() function is an object that contains a Pandas data frame object and an associated metadata object. The Pandas data frame contains the statistics values for the site and observed variable requested.
Once you’ve got the data frame, there’s several useful things you can do to explore the data.
[4]:
# Display the data frame as a table
display(x1[0])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[4], line 2
1 # Display the data frame as a table
----> 2 display(x1[0])
NameError: name 'x1' is not defined
Show the data types of the columns in the resulting data frame.
[5]:
print(x1[0].dtypes)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[5], line 1
----> 1 print(x1[0].dtypes)
NameError: name 'x1' is not defined
Make a quick time series plot of the annual mean values.
[6]:
ax = x1[0].plot(x="year_nu", y="mean_va")
ax.xaxis.set_major_formatter(ticker.FormatStrFormatter("%d"))
ax.set_xlabel("Year")
ax.set_ylabel("Annual mean discharge (cfs)")
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[6], line 1
----> 1 ax = x1[0].plot(x="year_nu", y="mean_va")
2 ax.xaxis.set_major_formatter(ticker.FormatStrFormatter("%d"))
3 ax.set_xlabel("Year")
4 ax.set_ylabel("Annual mean discharge (cfs)")
NameError: name 'x1' is not defined
The other part of the result returned from the get_stats() function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS web service. The USGS web service responses contain a descriptive header that defines and can be helpful in interpreting the contents of the response.
[7]:
print("The query URL used to retrieve the data from NWIS was: " + x1[1].url)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[7], line 1
----> 1 print("The query URL used to retrieve the data from NWIS was: " + x1[1].url)
NameError: name 'x1' is not defined
Additional Examples
Example 2: Get all of the annual mean discharge data for two sites
Note: Passing multiple parameters (temperature and flow) looks like it returns only what is available (in this example flow, 00060)
[8]:
x2 = nwis.get_stats(
sites=["02319394", "02171500"],
parameterCd=["00010", "00060"],
statReportType="annual",
)
display(x2[0])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[8], line 1
----> 1 x2 = nwis.get_stats(
2 sites=["02319394", "02171500"],
3 parameterCd=["00010", "00060"],
4 statReportType="annual",
NameError: name 'nwis' is not defined
Example 3: Request daily mean and median values for temperature and discharge for a site for years between 2000 and 2007
NOTE: The startDt and endDt parameters are not directly supported by this function but are turned into query parameters in the request to USGS NWIS, which means that they can be used to limit the time window requested.
[9]:
x3 = nwis.get_stats(
sites="02171500",
parameterCd=["00010", "00060"],
statReportType="daily",
statTypeCd=["mean", "median"],
startDt="2000",
endDt="2007",
)
display(x3[0])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[9], line 1
----> 1 x3 = nwis.get_stats(
2 sites="02171500",
3 parameterCd=["00010", "00060"],
4 statReportType="daily",
NameError: name 'nwis' is not defined