USGS dataretrieval Python Package get_ratings()
Examples
This notebook provides examples of using the Python dataretrieval package to retrieve rating curve data for a United States Geological Survey (USGS) streamflow gage. The dataretrieval package provides a collection of functions to get data from the USGS National Water Information System (NWIS) and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA).
Install the Package
Use the following code to install the package if it doesn’t exist already within your Jupyter Python environment.
[1]:
!pip install dataretrieval
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: dataretrieval in /home/runner/.local/lib/python3.10/site-packages (0.1.dev1+g3ba0c83)
Requirement already satisfied: requests in /home/runner/.local/lib/python3.10/site-packages (from dataretrieval) (2.32.3)
Requirement already satisfied: pandas==2.* in /home/runner/.local/lib/python3.10/site-packages (from dataretrieval) (2.2.3)
Requirement already satisfied: numpy>=1.22.4 in /home/runner/.local/lib/python3.10/site-packages (from pandas==2.*->dataretrieval) (2.1.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/runner/.local/lib/python3.10/site-packages (from pandas==2.*->dataretrieval) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /usr/lib/python3/dist-packages (from pandas==2.*->dataretrieval) (2022.1)
Requirement already satisfied: tzdata>=2022.7 in /home/runner/.local/lib/python3.10/site-packages (from pandas==2.*->dataretrieval) (2024.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/runner/.local/lib/python3.10/site-packages (from requests->dataretrieval) (3.4.0)
Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3/dist-packages (from requests->dataretrieval) (3.3)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/lib/python3/dist-packages (from requests->dataretrieval) (1.26.5)
Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests->dataretrieval) (2020.6.20)
Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.2->pandas==2.*->dataretrieval) (1.16.0)
Load the package so you can use it along with other packages used in this notebook.
[2]:
from dataretrieval import nwis
from IPython.display import display
Basic Usage
The dataretrieval package has several functions that allow you to retrieve data from different web services. This example uses the get_ratings()
function to retrieve rating curve data for a monitoring site from USGS NWIS. The following arguments are available:
Arguments (Additional arguments, if supplied, will be used as query parameters)
site (string): A USGS site number. This is usually an 8 digit number as a string. If the nwis parameter site_no is supplied, it will overwrite the site parameter.
base (string): Can be “base”, “corr”, or “exsa”
county (string): County IDs from county lookup or “ALL”
categories (Listlike): List or comma delimited string of Two-letter category abbreviations
NOTE: Not all active USGS streamflow gages have traditional rating curves that relate stage to flow.
Example 1: Get rating data for an NWIS Site
[3]:
# Specify the USGS site number/code
site_id = "10109000"
# Get the rating curve data
ratingData = nwis.get_ratings(site=site_id, file_type="exsa")
print("Retrieved " + str(len(ratingData[0])) + " data values.")
Retrieved 409 data values.
Interpreting the Result
The result of calling the get_ratings()
function is an object that contains a Pandas data frame object and an associated metadata object. The Pandas data frame contains the rating curve data for the requested site.
Once you’ve got the data frame, there’s several useful things you can do to explore the data. You can execute the following code to display the data frame as a table.
If the “type” parameter in the request has a value of “base,” then the columns in the data frame are as follows: * INDEP - typically the gage height in feet * DEP - typically the streamflow in cubic feet per second * STOR - where an “*” indicates that the pair are a fixed point of the rating curve
If the “type” parameter is specified as “exsa,” then an additional column called SHIFT is included that indicates the current shift in the rating for that value of INDEP.
If the “type” parameter is specified as “corr,” then the columns are as follows: * INDEP - typically gage height in feet * CORR - the correction for that value * CORRINDEP - the corrected value for CORR
[4]:
display(ratingData[0])
INDEP | SHIFT | DEP | STOR | |
---|---|---|---|---|
0 | 1.77 | -0.07 | 7.46 | NaN |
1 | 1.78 | -0.07 | 7.87 | NaN |
2 | 1.79 | -0.07 | 8.29 | NaN |
3 | 1.80 | -0.07 | 8.73 | NaN |
4 | 1.81 | -0.07 | 9.18 | NaN |
... | ... | ... | ... | ... |
404 | 5.81 | 0.00 | 2175.11 | NaN |
405 | 5.82 | 0.00 | 2186.34 | NaN |
406 | 5.83 | 0.00 | 2197.61 | NaN |
407 | 5.84 | 0.00 | 2208.91 | NaN |
408 | 5.85 | 0.00 | 2220.24 | * |
409 rows × 4 columns
Show the data types of the columns in the resulting data frame
[5]:
print(ratingData[0].dtypes)
INDEP float64
SHIFT float64
DEP float64
STOR object
dtype: object
The other part of the result returned from the get_ratings()
function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS web service. The USGS web service responses contain a descriptive header that defines and can be helpful in interpreting the contents of the response.
[6]:
print("The query URL used to retrieve the data from NWIS was: " + ratingData[1].url)
The query URL used to retrieve the data from NWIS was: https://nwis.waterdata.usgs.gov/nwisweb/get_ratings/?site_no=10109000&file_type=exsa
Example 2: Get rating data for a different NWIS site by changing the site_id
[7]:
site_id = '01594440'
data = nwis.get_ratings(site=site_id, file_type="base")
print("Retrieved " + str(len(data[0])) + " data values.")
Retrieved 11 data values.