USGS dataretrieval Python Package what_sites() Examples

This notebook provides examples of using the Python dataretrieval package to search NWIS for sites within a region with specific data. The dataretrieval package provides a collection of functions to get data from the USGS National Water Information System (NWIS) and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA).

Install the Package

Use the following code to install the package if it doesn’t exist already within your Jupyter Python environment.

[1]:
!pip install dataretrieval
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: dataretrieval in /home/runner/.local/lib/python3.10/site-packages (0.1.dev1+g3ba0c83)
Requirement already satisfied: requests in /home/runner/.local/lib/python3.10/site-packages (from dataretrieval) (2.32.3)
Requirement already satisfied: pandas==2.* in /home/runner/.local/lib/python3.10/site-packages (from dataretrieval) (2.2.3)
Requirement already satisfied: numpy>=1.22.4 in /home/runner/.local/lib/python3.10/site-packages (from pandas==2.*->dataretrieval) (2.1.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/runner/.local/lib/python3.10/site-packages (from pandas==2.*->dataretrieval) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /usr/lib/python3/dist-packages (from pandas==2.*->dataretrieval) (2022.1)
Requirement already satisfied: tzdata>=2022.7 in /home/runner/.local/lib/python3.10/site-packages (from pandas==2.*->dataretrieval) (2024.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/runner/.local/lib/python3.10/site-packages (from requests->dataretrieval) (3.4.0)
Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3/dist-packages (from requests->dataretrieval) (3.3)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/lib/python3/dist-packages (from requests->dataretrieval) (1.26.5)
Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests->dataretrieval) (2020.6.20)
Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.2->pandas==2.*->dataretrieval) (1.16.0)

Load the package so you can use it along with other packages used in this notebook.

[2]:
from dataretrieval import nwis
from IPython.display import display

Basic Usage

The dataretrieval package has several functions that allow you to retrieve data from different web services. This examples uses the what_sites() function to search NWIS for sites within a region with specific data. The function has several arguments, depending on the result you want to retrieve.

Note: Must specify one major argument.

Major Arguments (Additional arguments, if supplied, will be used as query parameters)

  • sites (string or list): A list of site numbers. Sites may be prefixed with an optional agency code followed by a colon.

  • stateCd (string): U.S. postal service (2-digit) state code. Only 1 state can be specified per request.

  • huc (string or list): A list of hydrologic unit codes (HUC) or aggregated watersheds. Only 1 major HUC can be specified per request, or up to 10 minor HUCs. A major HUC has two digits.

  • bBox (list): A contiguous range of decimal latitude and longitude, starting with the west longitude, then the south latitude, then the east longitude, and then the north latitude with each value separated by a comma. The product of the range of latitude range and longitude cannot exceed 25 degrees. Whole or decimal degrees must be specified, up to six digits of precision. Minutes and seconds are not allowed.

  • countyCd (string or list): A list of county numbers, in a 5 digit numeric format. The first two digits of a county’s code are the FIPS State Code. (url: https://help.waterdata.usgs.gov/code/county_query?fmt=html)

Minor Arguments

  • startDt (string): Selects sites based on whether data was collected at a point in time beginning after startDt (start date). Dates must be in ISO-8601 Calendar Date format (for example: 1990-01-01).

  • endDt (string)

  • period (string): Selects sites based on whether or not they were active between now and a time in the past. For example, period=P10W will select sites active in the last ten weeks.

  • modifiedSince (string): Returns only sites where site attributes or period of record data have changed during the request period.

  • parameterCd (string or list): Returns only site data for those sites containing the requested USGS parameter codes.

  • siteType (string or list): Restricts sites to those having one or more major and/or minor site types, such as stream, spring or well. For a list of all valid site types see https://help.waterdata.usgs.gov/site_tp_cd. For example, siteType=’ST’ returns streams only.

Formatting Parameters

  • siteOutput (string ‘basic’ or ‘expanded’): Indicates the richness of metadata you want for site attributes. Note that for visually oriented formats like Google Map format, this argument has no meaning. Note: for performance reasons, siteOutput=’expanded’ cannot be used if seriesCatalogOutput=true or with any values for outputDataTypeCd.

  • seriesCatalogOutput (boolean): A switch that provides detailed period of record information for certain output formats. The period of record indicates date ranges for a certain kind of information about a site, for example the start and end dates for a site’s daily mean streamflow.

For additional parameter options see https://waterservices.usgs.gov/docs/site-service/site-service-details

Example 1: Retrieve information about sites in Ohio where phosphorus data was collected

[3]:
siteListPhos = nwis.what_sites(stateCd="OH", parameterCd="00665")

Interpreting the Result

The result of calling the what_sites() function is an object that contains a Pandas data frame object and an associated metadata object. The Pandas data frame contains the requestes site inventory data.

Once you’ve got the data frame, there’s several useful things you can do to explore the data.

[4]:
# Display the data frame as a table
display(siteListPhos[0])
agency_cd site_no station_nm site_tp_cd dec_lat_va dec_long_va coord_acy_cd dec_coord_datum_cd alt_va alt_acy_va alt_datum_cd huc_cd
0 USGS 03086500 Mahoning River at Alliance OH ST 40.932836 -81.094541 S NAD83 1034.79 0.10 NAVD88 5030103.0
1 USGS 03089500 Mill Creek near Berlin Center OH ST 41.000336 -80.968424 S NAD83 1032.90 0.01 NGVD29 5030103.0
2 USGS 03090500 Mahoning River bl Berlin Dam nr Berlin Center OH ST 41.048391 -81.001203 S NAD83 957.72 0.01 NAVD88 5030103.0
3 USGS 03091500 Mahoning River at Pricetown OH ST 41.131446 -80.971202 S NAD83 904.77 0.10 NAVD88 5030103.0
4 USGS 03092000 Kale Creek near Pricetown OH ST 41.139779 -80.995092 S NAD83 914.70 0.01 COE1912 5030103.0
... ... ... ... ... ... ... ... ... ... ... ... ...
1275 USGS 414144084242500 WM-103 OH GW 41.686439 -84.406893 S NAD83 850.00 10.00 NGVD29 4100006.0
1276 USGS 414150084331000 WM-87-S14 OH GW 41.697272 -84.552728 S NAD83 895.00 10.00 NGVD29 4100003.0
1277 USGS 414214083151000 Lake Erie at site WE12 near Toledo OH LK NaN NaN S NaN NaN NaN NaN 4120200.0
1278 USGS 414233083595500 Little Bear Creek at HWY 120 nr Seward, OH ST-DCH 41.709167 -83.998611 S NAD83 NaN NaN NaN 4100002.0
1279 USGS 414937083112500 Lake Erie at site WE4 near Toledo OH LK NaN NaN S NaN NaN NaN NaN 4120200.0

1280 rows × 12 columns

The other part of the result returned from the what_sites() function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS web service. The USGS web service responses contain a descriptive header that defines and can be helpful in interpreting the contents of the response.

[5]:
print('The query URL used to retrieve the data from NWIS was: ' + siteListPhos[1].url)
The query URL used to retrieve the data from NWIS was: https://waterservices.usgs.gov/nwis/site?stateCd=OH&parameterCd=00665&format=rdb

Additional Examples

Example 2: Retrieve site information for a single site

[6]:
oneSite = nwis.what_sites(sites='05114000')
display(oneSite[0])
agency_cd site_no station_nm site_tp_cd dec_lat_va dec_long_va coord_acy_cd dec_coord_datum_cd alt_va alt_acy_va alt_datum_cd huc_cd
0 USGS 05114000 SOURIS RIVER NEAR SHERWOOD, ND ST 48.989957 -101.958335 F NAD83 1605.0 0.19 NAVD88 9010008

Example 3: Retrieve site information for a single site and show the result with expanded output

[7]:
oneSite = nwis.what_sites(sites='05114000', siteOutput='expanded')
display(oneSite[0])
agency_cd site_no station_nm site_tp_cd lat_va long_va dec_lat_va dec_long_va coord_meth_cd coord_acy_cd ... local_time_fg reliability_cd gw_file_cd nat_aqfr_cd aqfr_cd aqfr_type_cd well_depth_va hole_depth_va depth_src_cd project_no
0 USGS 05114000 SOURIS RIVER NEAR SHERWOOD, ND ST 485924 1015728 48.989957 -101.958335 M F ... Y NaN NNNNNNNN NaN NaN NaN NaN NaN NaN NaN

1 rows × 42 columns

Example 4: Retrieve site information for sites in Utah with daily values data falling within a specified date range

[8]:
UTsites = nwis.what_sites(stateCd='UT', outputDataTypeCd='dv', startDT='1971-07-01', endDT='2021-07-28')
display(UTsites[0])
agency_cd site_no station_nm site_tp_cd dec_lat_va dec_long_va coord_acy_cd dec_coord_datum_cd alt_va alt_acy_va ... stat_cd ts_id loc_web_ds medium_grp_cd parm_grp_cd srs_id access_cd begin_date end_date count_nu
0 USGS 09163675 COTTONWOOD WASH AT I-70, NEAR CISCO, UTAH ST 39.081652 -109.217615 F NAD83 NaN NaN ... 3 142731 NaN wat NaN 1645423 0 1983-04-13 1986-09-30 1267
1 USGS 09180000 DOLORES RIVER NEAR CISCO, UT ST 38.797208 -109.195114 F NAD83 4168.32 0.11 ... 1 241055 NaN wat NaN 1645597 0 2006-07-18 2024-10-25 6603
2 USGS 09180000 DOLORES RIVER NEAR CISCO, UT ST 38.797208 -109.195114 F NAD83 4168.32 0.11 ... 2 241056 NaN wat NaN 1645597 0 2006-07-19 2024-10-25 6586
3 USGS 09180000 DOLORES RIVER NEAR CISCO, UT ST 38.797208 -109.195114 F NAD83 4168.32 0.11 ... 3 142732 NaN wat NaN 1645597 0 2006-07-19 2024-10-24 6579
4 USGS 09180000 DOLORES RIVER NEAR CISCO, UT ST 38.797208 -109.195114 F NAD83 4168.32 0.11 ... 11 142733 NaN wat NaN 1645597 0 1949-05-01 2004-08-17 13002
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1214 USGS 414411112543701 (B-12- 9)30cda- 1 GW 41.736312 -112.911094 S NAD83 4239.00 0.50 ... 2 144019 NaN wat NaN 1642008 0 1975-10-05 2024-10-07 9114
1215 USGS 414500112000000 COM FLOW BEAR AREA GSL INFLOW GROUP 1 ST 41.749928 -112.000782 F NAD83 NaN NaN ... 3 144020 NaN wat NaN 1645423 0 1960-10-01 1982-09-29 7303
1216 USGS 414500112000100 COM FLOW BEAR AREA GSL INFLOW GROUP 2 ST 41.749928 -112.001059 F NAD83 NaN NaN ... 3 144021 NaN wat NaN 1645423 0 1960-10-01 1980-09-29 6207
1217 USGS 414500112000200 COM FLOW BEAR AREA GSL INFLOW GROUP 3 ST 41.749928 -112.001337 F NAD83 NaN NaN ... 3 144022 NaN wat NaN 1645423 0 1960-10-01 1980-09-29 6573
1218 USGS 415703112514501 (B-14- 9) 9add- 1 GW 41.960106 -112.863444 1 NAD83 4387.88 0.10 ... 2 144023 NaN wat NaN 1642008 0 1981-07-21 2024-10-07 14872

1219 rows × 24 columns

Example 5: Retrieve site information for a single site and show the series catalog output

The series catalog output is a list of the parameters that have been collected at that site

[9]:
oneSite = nwis.what_sites(sites='05114000', seriesCatalogOutput='true')
display(oneSite[0])
agency_cd site_no station_nm site_tp_cd dec_lat_va dec_long_va coord_acy_cd dec_coord_datum_cd alt_va alt_acy_va ... stat_cd ts_id loc_web_ds medium_grp_cd parm_grp_cd srs_id access_cd begin_date end_date count_nu
0 USGS 05114000 SOURIS RIVER NEAR SHERWOOD, ND ST 48.989957 -101.958335 F NAD83 1605.0 0.19 ... NaN 0 NaN wat NaN 0 0 2006 2024 19
1 USGS 05114000 SOURIS RIVER NEAR SHERWOOD, ND ST 48.989957 -101.958335 F NAD83 1605.0 0.19 ... 1.0 91355 NaN wat NaN 1645597 0 1983-08-11 2023-10-09 5903
2 USGS 05114000 SOURIS RIVER NEAR SHERWOOD, ND ST 48.989957 -101.958335 F NAD83 1605.0 0.19 ... 2.0 91356 NaN wat NaN 1645597 0 1983-08-11 2023-10-09 5903
3 USGS 05114000 SOURIS RIVER NEAR SHERWOOD, ND ST 48.989957 -101.958335 F NAD83 1605.0 0.19 ... 3.0 91357 NaN wat NaN 1645597 0 1983-08-11 2023-10-09 5903
4 USGS 05114000 SOURIS RIVER NEAR SHERWOOD, ND ST 48.989957 -101.958335 F NAD83 1605.0 0.19 ... 11.0 91358 NaN wat NaN 1645597 0 1974-10-17 1981-09-02 1937
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
402 USGS 05114000 SOURIS RIVER NEAR SHERWOOD, ND ST 48.989957 -101.958335 F NAD83 1605.0 0.19 ... NaN 92591 NaN wat NaN 1645423 0 1994-10-01 2024-10-25 10982
403 USGS 05114000 SOURIS RIVER NEAR SHERWOOD, ND ST 48.989957 -101.958335 F NAD83 1605.0 0.19 ... NaN 92592 NaN wat NaN 17164583 0 2007-10-01 2024-10-25 6234
404 USGS 05114000 SOURIS RIVER NEAR SHERWOOD, ND ST 48.989957 -101.958335 F NAD83 1605.0 0.19 ... NaN 249682 NaN wat NaN 1646694 0 2019-05-15 2023-10-10 1609
405 USGS 05114000 SOURIS RIVER NEAR SHERWOOD, ND ST 48.989957 -101.958335 F NAD83 1605.0 0.19 ... NaN 249681 NaN wat NaN 1736457 0 2019-05-15 2023-10-10 1609
406 USGS 05114000 SOURIS RIVER NEAR SHERWOOD, ND ST 48.989957 -101.958335 F NAD83 1605.0 0.19 ... NaN 317832 NaN wat NaN 1642503 0 2024-01-01 2024-10-25 298

407 rows × 24 columns