USGS dataretrieval Python Package get_stats() Examples

This notebook provides examples of using the Python dataretrieval package to retrieve statistics for observed variables at a United States Geological Survey (USGS) monitoring site. The dataretrieval package provides a collection of functions to get data from the USGS National Water Information System (NWIS) and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA).

Install the Package

Use the following code to install the package if it doesn’t exist already within your Jupyter Python environment.

[1]:
!pip install dataretrieval
Requirement already satisfied: dataretrieval in /opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages (0.1.dev1+g0aec2c864)
Requirement already satisfied: requests in /opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages (from dataretrieval) (2.33.1)
Requirement already satisfied: pandas<4.0.0,>=2.0.0 in /opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages (from dataretrieval) (3.0.2)
Requirement already satisfied: numpy>=1.26.0 in /opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages (from pandas<4.0.0,>=2.0.0->dataretrieval) (2.4.4)
Requirement already satisfied: python-dateutil>=2.8.2 in /opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages (from pandas<4.0.0,>=2.0.0->dataretrieval) (2.9.0.post0)
Requirement already satisfied: six>=1.5 in /opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages (from python-dateutil>=2.8.2->pandas<4.0.0,>=2.0.0->dataretrieval) (1.17.0)
Requirement already satisfied: charset_normalizer<4,>=2 in /opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages (from requests->dataretrieval) (3.4.7)
Requirement already satisfied: idna<4,>=2.5 in /opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages (from requests->dataretrieval) (3.11)
Requirement already satisfied: urllib3<3,>=1.26 in /opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages (from requests->dataretrieval) (2.6.3)
Requirement already satisfied: certifi>=2023.5.7 in /opt/hostedtoolcache/Python/3.13.12/x64/lib/python3.13/site-packages (from requests->dataretrieval) (2026.2.25)

Load the package so you can use it along with other packages used in this notebook.

[2]:
from IPython.display import display
from matplotlib import ticker

from dataretrieval import nwisfrom dataretrieval import waterdata
import dataretrieval.waterdata as waterdata

  Cell In[2], line 4
    from dataretrieval import nwisfrom dataretrieval import waterdata
                                       ^
SyntaxError: invalid syntax

Basic Usage

The dataretrieval package has several functions that allow you to retrieve data from different web services. This examples uses the get_stats() function to retrieve statistics for observed variable(s) for a USGS monitoring site from USGS NWIS. The following arguments are available:

Arguments (Additional parameters, if supplied, will be used as query parameters).

  • sites (string or list of strings): A string or list of strings contining the USGS site identifiers for which to retrive data.

  • parameterCd (string or list of strings): A list of USGS parameter codes for which to retrieve data.

  • statReportType (string): The aggregation period for which statistics should be reported. Can be specified as ‘daily’ (default), ‘monthly’, or ‘annual’.

  • statTypeCd (string): The type of statistic to be returned in the result. Can be specified as ‘all’, ‘mean’, ‘max’, ‘min’, or ‘median’

Example 1: Get all of the annual mean discharge data for a single site

[3]:
# Set the parameters needed to retrieve data
siteNumber = "02319394"
parameterCode = "00060"  # Discharge

# Retrieve the statistics
x1 = nwis.get_stats(
    sites=siteNumber, parameterCd=parameterCode, statReportType="annual"
)
print("Retrieved " + str(len(x1[0])) + " data values.")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[3], line 6
      2 siteNumber = "02319394"
      3 parameterCode = "00060"  # Discharge
      4
      5 # Retrieve the statistics
----> 6 x1 = nwis.get_stats(
      7     sites=siteNumber, parameterCd=parameterCode, statReportType="annual"
      8 )
      9 print("Retrieved " + str(len(x1[0])) + " data values.")

NameError: name 'nwis' is not defined

Interpreting the Result

The result of calling the get_stats() function is an object that contains a Pandas data frame object and an associated metadata object. The Pandas data frame contains the statistics values for the site and observed variable requested.

Once you’ve got the data frame, there’s several useful things you can do to explore the data.

[4]:
# Display the data frame as a table
display(x1[0])
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 2
      1 # Display the data frame as a table
----> 2 display(x1[0])

NameError: name 'x1' is not defined

Show the data types of the columns in the resulting data frame.

[5]:
print(x1[0].dtypes)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 print(x1[0].dtypes)

NameError: name 'x1' is not defined

Make a quick time series plot of the annual mean values.

[6]:
ax = x1[0].plot(x="year_nu", y="mean_va")
ax.xaxis.set_major_formatter(ticker.FormatStrFormatter("%d"))
ax.set_xlabel("Year")
ax.set_ylabel("Annual mean discharge (cfs)")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 1
----> 1 ax = x1[0].plot(x="year_nu", y="mean_va")
      2 ax.xaxis.set_major_formatter(ticker.FormatStrFormatter("%d"))
      3 ax.set_xlabel("Year")
      4 ax.set_ylabel("Annual mean discharge (cfs)")

NameError: name 'x1' is not defined

The other part of the result returned from the get_stats() function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS web service. The USGS web service responses contain a descriptive header that defines and can be helpful in interpreting the contents of the response.

[7]:
print("The query URL used to retrieve the data from NWIS was: " + x1[1].url)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[7], line 1
----> 1 print("The query URL used to retrieve the data from NWIS was: " + x1[1].url)

NameError: name 'x1' is not defined

Additional Examples

Example 2: Get all of the annual mean discharge data for two sites

Note: Passing multiple parameters (temperature and flow) looks like it returns only what is available (in this example flow, 00060)

[8]:
x2 = nwis.get_stats(
    sites=["02319394", "02171500"],
    parameterCd=["00010", "00060"],
    statReportType="annual",
)
display(x2[0])
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[8], line 1
----> 1 x2 = nwis.get_stats(
      2     sites=["02319394", "02171500"],
      3     parameterCd=["00010", "00060"],
      4     statReportType="annual",

NameError: name 'nwis' is not defined

Example 3: Request daily mean and median values for temperature and discharge for a site for years between 2000 and 2007

NOTE: The startDt and endDt parameters are not directly supported by this function but are turned into query parameters in the request to USGS NWIS, which means that they can be used to limit the time window requested.

[9]:
x3 = nwis.get_stats(
    sites="02171500",
    parameterCd=["00010", "00060"],
    statReportType="daily",
    statTypeCd=["mean", "median"],
    startDt="2000",
    endDt="2007",
)
display(x3[0])
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[9], line 1
----> 1 x3 = nwis.get_stats(
      2     sites="02171500",
      3     parameterCd=["00010", "00060"],
      4     statReportType="daily",

NameError: name 'nwis' is not defined