USGS dataretrieval Python Package get_dv() Examples

This notebook provides examples of using the Python dataretrieval package to retrieve daily streamflow data for a United States Geological Survey (USGS) monitoring site. The dataretrieval package provides a collection of functions to get data from the USGS National Water Information System (NWIS) and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA).

Install the Package

Use the following code to install the package if it doesn’t exist already within your Jupyter Python environment.

[1]:
!pip install dataretrieval
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: dataretrieval in /home/runner/.local/lib/python3.10/site-packages (0.1.dev1+g3ba0c83)
Requirement already satisfied: requests in /home/runner/.local/lib/python3.10/site-packages (from dataretrieval) (2.32.3)
Requirement already satisfied: pandas==2.* in /home/runner/.local/lib/python3.10/site-packages (from dataretrieval) (2.2.3)
Requirement already satisfied: numpy>=1.22.4 in /home/runner/.local/lib/python3.10/site-packages (from pandas==2.*->dataretrieval) (2.1.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/runner/.local/lib/python3.10/site-packages (from pandas==2.*->dataretrieval) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /usr/lib/python3/dist-packages (from pandas==2.*->dataretrieval) (2022.1)
Requirement already satisfied: tzdata>=2022.7 in /home/runner/.local/lib/python3.10/site-packages (from pandas==2.*->dataretrieval) (2024.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/runner/.local/lib/python3.10/site-packages (from requests->dataretrieval) (3.4.0)
Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3/dist-packages (from requests->dataretrieval) (3.3)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/lib/python3/dist-packages (from requests->dataretrieval) (1.26.5)
Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests->dataretrieval) (2020.6.20)
Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.2->pandas==2.*->dataretrieval) (1.16.0)

Load the package so you can use it along with other packages used in this notebook.

[2]:
from dataretrieval import nwis
from IPython.display import display

Basic Usage

The dataretrieval package has several functions that allow you to retrieve data from different web services. This examples uses the get_dv() function to retrieve daily streamflow data for a USGS monitoring site from NWIS. The following arguments are supported:

Arguments (Additional arguments, if supplied, will be used as query parameters)

  • sites (string or list of strings): A list of USGS site identifiers for which to retrieve data.

  • parameterCd (list of strings): A list of USGS parameter codes for which to retrieve data.

  • statCd (list of strings): A list of USGS statistic codes for which to retrieve data.

  • start (string): The beginning date for a period for which to retrieve data. If the waterdata parameter startDT is supplied, it will overwrite the start parameter.

  • end (string): The ending date for a period for which to retrieve data. If the waterdata parameter endDT is supplied, it will overwrite the end parameter.

Example 1: Get daily value data for a specific parameter at a single USGS NWIS monitoring site between a begin and end date.

[3]:
# Set the parameters needed to retrieve data
siteNumber = "10109000" # LOGAN RIVER ABOVE STATE DAM, NEAR LOGAN, UT
parameterCode = "00060" # Discharge
startDate = "2020-10-01"
endDate = "2021-09-30"

# Retrieve the data
dailyStreamflow = nwis.get_dv(sites=siteNumber, parameterCd=parameterCode, start=startDate, end=endDate)
print("Retrieved " + str(len(dailyStreamflow[0])) + " data values.")
Retrieved 365 data values.

Interpreting the Result

The result of calling the get_dv() function is an object that contains a Pandas data frame object and an associated metadata object. The Pandas data frame contains the daily values for the observed variable and time period requested. The data frame is indexed by the dates associated with the data values.

Once you’ve got the data frame, there’s several useful things you can do to explore the data.

[4]:
# Display the data frame as a table
display(dailyStreamflow[0])
site_no 00060_Mean 00060_Mean_cd
datetime
2020-10-01 00:00:00+00:00 10109000 88.8 A
2020-10-02 00:00:00+00:00 10109000 90.5 A
2020-10-03 00:00:00+00:00 10109000 92.7 A
2020-10-04 00:00:00+00:00 10109000 91.8 A
2020-10-05 00:00:00+00:00 10109000 91.0 A
... ... ... ...
2021-09-26 00:00:00+00:00 10109000 54.2 A
2021-09-27 00:00:00+00:00 10109000 54.5 A
2021-09-28 00:00:00+00:00 10109000 54.1 A
2021-09-29 00:00:00+00:00 10109000 53.5 A
2021-09-30 00:00:00+00:00 10109000 54.0 A

365 rows × 3 columns

Show the data types of the columns in the resulting data frame.

[5]:
print(dailyStreamflow[0].dtypes)
site_no           object
00060_Mean       float64
00060_Mean_cd     object
dtype: object

Get summary statistics for the daily streamflow values.

[6]:
dailyStreamflow[0].describe()
[6]:
00060_Mean
count 365.000000
mean 95.795616
std 41.146872
min 45.900000
25% 75.200000
50% 85.800000
75% 106.000000
max 271.000000

Make a quick time series plot.

[7]:
ax = dailyStreamflow[0].plot(y='00060_Mean')
ax.set_xlabel('Date')
ax.set_ylabel('Streamflow (cfs)')
[7]:
Text(0, 0.5, 'Streamflow (cfs)')
../_images/examples_USGS_dataretrieval_DailyValues_Examples_15_1.png

The other part of the result returned from the get_dv() function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS web service. The USGS web service responses contain a descriptive header that defines and can be helpful in interpreting the contents of the response.

[8]:
print("The query URL used to retrieve the data from NWIS was: " + dailyStreamflow[1].url)
The query URL used to retrieve the data from NWIS was: https://waterservices.usgs.gov/nwis/dv?format=json&parameterCd=00060&startDT=2020-10-01&endDT=2021-09-30&sites=10109000

Additional Examples

Example 2: Get daily mean and max discharge and temperature values for a site between a begin and end date.

Parameter Code: 00010 = temperature, 00060 = discharge See https://help.waterdata.usgs.gov/codes-and-parameters/parameters

Statistic Code: 00001 = Maximum, 00003 = Mean See https://help.waterdata.usgs.gov/stat_code

NOTE: There’s not full overlap in the availability of data for temperature and discharge for both statistics at this site. When data for one statistic is not available, a “NaN” value is returned in the data frame.

[9]:
siteID = "04085427"
dailyQAndT = nwis.get_dv(sites=siteID, parameterCd=["00010", "00060"],
                         start=startDate, end=endDate,
                         statCd=["00001", "00003"])
display(dailyQAndT[0])
site_no 00010_Maximum 00010_Maximum_cd 00010_Mean 00010_Mean_cd 00060_Mean 00060_Mean_cd
datetime
2020-10-01 00:00:00+00:00 04085427 NaN NaN NaN NaN 96.2 A
2020-10-02 00:00:00+00:00 04085427 NaN NaN NaN NaN 90.4 A
2020-10-03 00:00:00+00:00 04085427 NaN NaN NaN NaN 84.7 A
2020-10-04 00:00:00+00:00 04085427 NaN NaN NaN NaN 84.9 A
2020-10-05 00:00:00+00:00 04085427 NaN NaN NaN NaN 93.4 A
... ... ... ... ... ... ... ...
2021-09-26 00:00:00+00:00 04085427 17.4 A 15.9 A 127.0 A
2021-09-27 00:00:00+00:00 04085427 18.4 A 16.8 A 110.0 A
2021-09-28 00:00:00+00:00 04085427 18.1 A 16.1 A 100.0 A
2021-09-29 00:00:00+00:00 04085427 18.4 A 16.2 A 96.2 A
2021-09-30 00:00:00+00:00 04085427 19.0 A 16.9 A 92.9 A

365 rows × 7 columns

Example 3: Get daily mean and max discharge and temperature values for multiple sites between a begin and end date

[10]:
dailyMultiSites = nwis.get_dv(sites=["01491000", "01645000"], parameterCd=["00010", "00060"],
                              start="2012-01-01", end="2012-06-30", statCd=["00001","00003"])
display(dailyMultiSites[0])
00010_discontinued_Maximum 00010_discontinued_Maximum_cd 00010_discontinued_Mean 00010_discontinued_Mean_cd 00060_Mean 00060_Mean_cd
site_no datetime
01491000 2012-01-01 00:00:00+00:00 8.4 A 7.7 A 205.0 A
2012-01-02 00:00:00+00:00 8.5 A 7.3 A 193.0 A
2012-01-03 00:00:00+00:00 6.0 A 4.6 A 180.0 A
2012-01-04 00:00:00+00:00 3.0 A 1.8 A 162.0 A
2012-01-05 00:00:00+00:00 2.9 A 2.3 A 155.0 A
... ... ... ... ... ... ... ...
01645000 2012-06-26 00:00:00+00:00 NaN NaN NaN NaN 56.3 A
2012-06-27 00:00:00+00:00 NaN NaN NaN NaN 54.8 A
2012-06-28 00:00:00+00:00 NaN NaN NaN NaN 53.6 A
2012-06-29 00:00:00+00:00 NaN NaN NaN NaN 53.1 A
2012-06-30 00:00:00+00:00 NaN NaN NaN NaN 129.0 A

364 rows × 6 columns

The following example is the same as the previous example but with multi index turned off (multi_index=False)

[11]:
dailyMultiSites = nwis.get_dv(sites=["01491000", "01645000"], parameterCd=["00010", "00060"],
                              start="2012-01-01", end="2012-06-30", statCd=["00001","00003"],
                              multi_index=False)
display(dailyMultiSites[0])
site_no 00010_discontinued_Maximum 00010_discontinued_Maximum_cd 00010_discontinued_Mean 00010_discontinued_Mean_cd 00060_Mean 00060_Mean_cd
datetime
2012-01-01 00:00:00+00:00 01491000 8.4 A 7.7 A 205.00 A
2012-01-01 00:00:00+00:00 01645000 NaN NaN NaN NaN 175.00 A
2012-01-02 00:00:00+00:00 01645000 NaN NaN NaN NaN 168.00 A
2012-01-02 00:00:00+00:00 01491000 8.5 A 7.3 A 193.00 A
2012-01-03 00:00:00+00:00 01491000 6.0 A 4.6 A 180.00 A
... ... ... ... ... ... ... ...
2012-06-28 00:00:00+00:00 01491000 NaN NaN NaN NaN 12.10 A
2012-06-29 00:00:00+00:00 01645000 NaN NaN NaN NaN 53.10 A
2012-06-29 00:00:00+00:00 01491000 NaN NaN NaN NaN 8.58 A
2012-06-30 00:00:00+00:00 01491000 NaN NaN NaN NaN 7.15 A
2012-06-30 00:00:00+00:00 01645000 NaN NaN NaN NaN 129.00 A

364 rows × 7 columns

Example 4: Test for a site that is not active - returns an empty DataFrame.

[12]:
siteID = "05212700"
notActive = nwis.get_dv(sites=siteID, parameterCd="00060", start="2014-01-01", end="2014-01-07")
display(notActive[0])
site_no
datetime