USGS dataretrieval Python Package get_stats() Examples

This notebook provides examples of using the Python dataretrieval package to retrieve statistics for observed variables at a United States Geological Survey (USGS) monitoring site. The dataretrieval package provides a collection of functions to get data from the USGS National Water Information System (NWIS) and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA).

Install the Package

Use the following code to install the package if it doesn’t exist already within your Jupyter Python environment.

[1]:
!pip install dataretrieval
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: dataretrieval in /home/runner/.local/lib/python3.10/site-packages (0.1.dev1+g3ba0c83)
Requirement already satisfied: requests in /home/runner/.local/lib/python3.10/site-packages (from dataretrieval) (2.32.3)
Requirement already satisfied: pandas==2.* in /home/runner/.local/lib/python3.10/site-packages (from dataretrieval) (2.2.3)
Requirement already satisfied: numpy>=1.22.4 in /home/runner/.local/lib/python3.10/site-packages (from pandas==2.*->dataretrieval) (2.1.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/runner/.local/lib/python3.10/site-packages (from pandas==2.*->dataretrieval) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /usr/lib/python3/dist-packages (from pandas==2.*->dataretrieval) (2022.1)
Requirement already satisfied: tzdata>=2022.7 in /home/runner/.local/lib/python3.10/site-packages (from pandas==2.*->dataretrieval) (2024.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/runner/.local/lib/python3.10/site-packages (from requests->dataretrieval) (3.4.0)
Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3/dist-packages (from requests->dataretrieval) (3.3)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/lib/python3/dist-packages (from requests->dataretrieval) (1.26.5)
Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests->dataretrieval) (2020.6.20)
Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.2->pandas==2.*->dataretrieval) (1.16.0)

Load the package so you can use it along with other packages used in this notebook.

[2]:
from dataretrieval import nwis
from IPython.display import display
from matplotlib import ticker

Basic Usage

The dataretrieval package has several functions that allow you to retrieve data from different web services. This examples uses the get_stats() function to retrieve statistics for observed variable(s) for a USGS monitoring site from USGS NWIS. The following arguments are available:

Arguments (Additional parameters, if supplied, will be used as query parameters).

  • sites (string or list of strings): A string or list of strings contining the USGS site identifiers for which to retrive data.

  • parameterCd (string or list of strings): A list of USGS parameter codes for which to retrieve data.

  • statReportType (string): The aggregation period for which statistics should be reported. Can be specified as ‘daily’ (default), ‘monthly’, or ‘annual’.

  • statTypeCd (string): The type of statistic to be returned in the result. Can be specified as ‘all’, ‘mean’, ‘max’, ‘min’, or ‘median’

Example 1: Get all of the annual mean discharge data for a single site

[3]:
# Set the parameters needed to retrieve data
siteNumber = "02319394"
parameterCode = "00060" # Discharge

# Retrieve the statistics
x1 = nwis.get_stats(sites=siteNumber, parameterCd=parameterCode, statReportType="annual")
print("Retrieved " + str(len(x1[0])) + " data values.")
Retrieved 20 data values.

Interpreting the Result

The result of calling the get_stats() function is an object that contains a Pandas data frame object and an associated metadata object. The Pandas data frame contains the statistics values for the site and observed variable requested.

Once you’ve got the data frame, there’s several useful things you can do to explore the data.

[4]:
# Display the data frame as a table
display(x1[0])
agency_cd site_no parameter_cd ts_id loc_web_ds year_nu mean_va count_nu
0 USGS 02319394 00060 26452 NaN 2001 1404.0 365
1 USGS 02319394 00060 26452 NaN 2002 795.9 365
2 USGS 02319394 00060 26452 NaN 2003 3395.0 365
3 USGS 02319394 00060 26452 NaN 2004 2471.0 366
4 USGS 02319394 00060 26452 NaN 2005 3764.0 365
5 USGS 02319394 00060 26452 NaN 2006 1029.0 365
6 USGS 02319394 00060 26452 NaN 2007 688.5 365
7 USGS 02319394 00060 26452 NaN 2008 2186.0 366
8 USGS 02319394 00060 26452 NaN 2009 2620.0 365
9 USGS 02319394 00060 26452 NaN 2012 784.3 366
10 USGS 02319394 00060 26452 NaN 2013 2929.0 365
11 USGS 02319394 00060 26452 NaN 2014 2847.0 365
12 USGS 02319394 00060 26452 NaN 2015 2258.0 365
13 USGS 02319394 00060 26452 NaN 2016 2125.0 366
14 USGS 02319394 00060 26452 NaN 2017 1215.0 365
15 USGS 02319394 00060 26452 NaN 2018 1961.0 365
16 USGS 02319394 00060 26452 NaN 2019 1684.0 365
17 USGS 02319394 00060 26452 NaN 2020 1665.0 366
18 USGS 02319394 00060 26452 NaN 2021 3582.0 365
19 USGS 02319394 00060 26452 NaN 2022 1801.0 365

Show the data types of the columns in the resulting data frame.

[5]:
print(x1[0].dtypes)
agency_cd        object
site_no          object
parameter_cd     object
ts_id             int64
loc_web_ds      float64
year_nu           int64
mean_va         float64
count_nu          int64
dtype: object

Make a quick time series plot of the annual mean values.

[6]:
ax = x1[0].plot(x='year_nu', y='mean_va')
ax.xaxis.set_major_formatter(ticker.FormatStrFormatter('%d'))
ax.set_xlabel('Year')
ax.set_ylabel('Annual mean discharge (cfs)')
[6]:
Text(0, 0.5, 'Annual mean discharge (cfs)')
../_images/examples_USGS_dataretrieval_Statistics_Examples_13_1.png

The other part of the result returned from the get_stats() function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS web service. The USGS web service responses contain a descriptive header that defines and can be helpful in interpreting the contents of the response.

[7]:
print("The query URL used to retrieve the data from NWIS was: " + x1[1].url)
The query URL used to retrieve the data from NWIS was: https://waterservices.usgs.gov/nwis/stat?sites=02319394&parameterCd=00060&statReportType=annual&format=rdb

Additional Examples

Example 2: Get all of the annual mean discharge data for two sites

Note: Passing multiple parameters (temperature and flow) looks like it returns only what is available (in this example flow, 00060)

[8]:
x2 = nwis.get_stats(sites=["02319394", "02171500"], parameterCd=["00010", "00060"],
                    statReportType="annual")
display(x2[0])
agency_cd site_no parameter_cd ts_id loc_web_ds year_nu mean_va count_nu
0 USGS 02171500 00010 306223 NaN 2023 20.12 365
1 USGS 02171500 00060 125737 NaN 1943 3026.00 365
2 USGS 02171500 00060 125737 NaN 1944 3966.00 366
3 USGS 02171500 00060 125737 NaN 1945 2577.00 365
4 USGS 02171500 00060 125737 NaN 1946 3274.00 365
... ... ... ... ... ... ... ... ...
91 USGS 02319394 00060 26452 NaN 2018 1961.00 365
92 USGS 02319394 00060 26452 NaN 2019 1684.00 365
93 USGS 02319394 00060 26452 NaN 2020 1665.00 366
94 USGS 02319394 00060 26452 NaN 2021 3582.00 365
95 USGS 02319394 00060 26452 NaN 2022 1801.00 365

96 rows × 8 columns

Example 3: Request daily mean and median values for temperature and discharge for a site for years between 2000 and 2007

NOTE: The startDt and endDt parameters are not directly supported by this function but are turned into query parameters in the request to USGS NWIS, which means that they can be used to limit the time window requested.

[9]:
x3 = nwis.get_stats(sites="02171500", parameterCd=["00010", "00060"],
                    statReportType="daily", statTypeCd=["mean", "median"],
                    startDt="2000", endDt="2007")
display(x3[0])
agency_cd site_no parameter_cd ts_id loc_web_ds month_nu day_nu begin_yr end_yr count_nu mean_va p50_va
0 USGS 02171500 00010 306223 NaN 1 1 2006 2006 1 9.5 9.5
1 USGS 02171500 00010 306223 NaN 1 2 2006 2006 1 9.7 9.7
2 USGS 02171500 00010 306223 NaN 1 3 2006 2006 1 10.3 10.3
3 USGS 02171500 00010 306223 NaN 1 4 2006 2006 1 10.1 10.1
4 USGS 02171500 00010 306223 NaN 1 5 2006 2006 1 10.3 10.3
... ... ... ... ... ... ... ... ... ... ... ... ...
708 USGS 02171500 00060 125737 NaN 12 27 2001 2008 8 631.0 628.0
709 USGS 02171500 00060 125737 NaN 12 28 2001 2008 8 616.0 626.0
710 USGS 02171500 00060 125737 NaN 12 29 2001 2008 8 605.0 624.0
711 USGS 02171500 00060 125737 NaN 12 30 2001 2008 8 610.0 629.0
712 USGS 02171500 00060 125737 NaN 12 31 2001 2008 7 589.0 621.0

713 rows × 12 columns