USGS dataretrieval Python Package `get_gwlevels()` Examples

This notebook provides examples of using the Python dataretrieval package to retrieve groundwater level data for a United States Geological Survey (USGS) monitoring site. The dataretrieval package provides a collection of functions to get data from the USGS National Water Information System (NWIS) and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA).

Install the Package

Use the following code to install the package if it doesn’t exist already within your Jupyter Python environment.

[1]:

!pip install dataretrieval

Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: dataretrieval in /home/runner/.local/lib/python3.10/site-packages (0.1.dev1+g3ba0c83)
Requirement already satisfied: requests in /home/runner/.local/lib/python3.10/site-packages (from dataretrieval) (2.32.3)
Requirement already satisfied: pandas==2.* in /home/runner/.local/lib/python3.10/site-packages (from dataretrieval) (2.2.3)
Requirement already satisfied: numpy>=1.22.4 in /home/runner/.local/lib/python3.10/site-packages (from pandas==2.*->dataretrieval) (2.1.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/runner/.local/lib/python3.10/site-packages (from pandas==2.*->dataretrieval) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /usr/lib/python3/dist-packages (from pandas==2.*->dataretrieval) (2022.1)
Requirement already satisfied: tzdata>=2022.7 in /home/runner/.local/lib/python3.10/site-packages (from pandas==2.*->dataretrieval) (2024.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/runner/.local/lib/python3.10/site-packages (from requests->dataretrieval) (3.4.0)
Requirement already satisfied: idna<4,>=2.5 in /usr/lib/python3/dist-packages (from requests->dataretrieval) (3.3)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/lib/python3/dist-packages (from requests->dataretrieval) (1.26.5)
Requirement already satisfied: certifi>=2017.4.17 in /usr/lib/python3/dist-packages (from requests->dataretrieval) (2020.6.20)
Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil>=2.8.2->pandas==2.*->dataretrieval) (1.16.0)

Load the package so you can use it along with other packages used in this notebook.

[2]:

from dataretrieval import nwis
from IPython.display import display

Basic Usage

The dataretrieval package has several functions that allow you to retrieve data from different web services. This examples uses the get_gwlevels() function to retrieve groundwater level data from USGS NWIS. The following arguments are supported:

Arguments (Additional parameters, if supplied, will be used as query parameters)

sites (string or list of strings): A list of USGS site identifiers for which to retrieve data.
start (string): The beginning date for a period for which to retrieve data. If the waterdata parameter begin_date is supplied, it will overwrite the start parameter (defaults to ‘1851-01-01’)
end (string): The ending date for a period for which to retrieve data. If the waterdata parameter end_date is supplied, it will overwrite the end parameter.

Example 1: Get groundwater level data for a single monitoring site.

[3]:

# Set the parameters needed to retrieve data
site_id = "434400121275801"

# Retrieve the data
data = nwis.get_gwlevels(sites=site_id)
print("Retrieved " + str(len(data[0])) + " data values.")

Retrieved 744 data values.

/home/runner/.local/lib/python3.10/site-packages/dataretrieval/utils.py:90: UserWarning: Warning: 567 incomplete dates found, consider setting datetime_index to False.
  warnings.warn(

Interpreting the Result

The result of calling the get_gwlevels() function is an object that contains a Pandas data frame and an associated metadata object. The Pandas data frame contains the data requested. The data frame is indexed by the dates associated with the data values.

Once you’ve got the data frame, there’s several useful things you can do to explore the data.

Display the data frame as a table

[4]:

display(data[0])

	agency_cd	site_no	site_tp_cd	lev_dt	lev_tm	lev_tz_cd	lev_va	sl_lev_va	sl_datum_cd	lev_status_cd	lev_agency_cd	lev_dt_acy_cd	lev_acy_cd	lev_src_cd	lev_meth_cd	lev_age_cd	parameter_cd
datetime
1945-10-12 22:35:00+00:00	USGS	434400121275801	GW	1945-10-12	22:35	+0000	NaN	4192.65	NGVD29	1	USGS	m	2	S	O	A	62610
1945-10-12 22:35:00+00:00	USGS	434400121275801	GW	1945-10-12	22:35	+0000	NaN	4196.67	NAVD88	1	USGS	m	2	S	O	A	62611
1945-10-12 22:35:00+00:00	USGS	434400121275801	GW	1945-10-12	22:35	+0000	27.35	NaN	NaN	1	USGS	m	2	S	O	A	72019
1999-06-04 18:00:00+00:00	USGS	434400121275801	GW	1999-06-04	18:00	+0000	NaN	4203.22	NGVD29	1	USGS	m	2	S	S	A	62610
1999-06-04 18:00:00+00:00	USGS	434400121275801	GW	1999-06-04	18:00	+0000	NaN	4207.24	NAVD88	1	USGS	m	2	S	S	A	62611
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
NaT	USGS	434400121275801	GW	1999-03-19	NaN	NaN	NaN	4204.97	NAVD88	1	NaN	D	2	NaN	S	A	62611
NaT	USGS	434400121275801	GW	1999-03-19	NaN	NaN	19.05	NaN	NaN	1	NaN	D	2	NaN	S	A	72019
NaT	USGS	434400121275801	GW	1999-05-14	NaN	NaN	NaN	4201.99	NGVD29	1	USGS	D	2	S	S	A	62610
NaT	USGS	434400121275801	GW	1999-05-14	NaN	NaN	NaN	4206.01	NAVD88	1	USGS	D	2	S	S	A	62611
NaT	USGS	434400121275801	GW	1999-05-14	NaN	NaN	18.01	NaN	NaN	1	USGS	D	2	S	S	A	72019

744 rows × 17 columns

Show the data types of the columns in the resulting data frame.

[5]:

print(data[0].dtypes)

agency_cd         object
site_no           object
site_tp_cd        object
lev_dt            object
lev_tm            object
lev_tz_cd         object
lev_va           float64
sl_lev_va        float64
sl_datum_cd       object
lev_status_cd     object
lev_agency_cd     object
lev_dt_acy_cd     object
lev_acy_cd         int64
lev_src_cd        object
lev_meth_cd       object
lev_age_cd        object
parameter_cd      object
dtype: object

Get summary statistics for the daily streamflow values.

[6]:

data[0]['lev_va'].describe()

[6]:

count    248.000000
mean      23.690605
std        5.265540
min       10.900000
25%       20.127500
50%       24.495000
75%       27.375000
max       41.630000
Name: lev_va, dtype: float64

Make a quick time series plot.

[7]:

ax = data[0].plot(x = 'lev_dt', y='lev_va')
ax.set_xlabel('Date')
ax.set_ylabel('Water Level (feet below land surface)')

[7]:

Text(0, 0.5, 'Water Level (feet below land surface)')

../_images/examples_USGS_dataretrieval_GroundwaterLevels_Examples_16_1.png

The other part of the result returned from the get_gwlevels() function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS web service. The USGS web service responses contain a descriptive header that defines and can be helpful in interpreting the contents of the response.

[8]:

print("The query URL used to retrieve the data from  NWIS was: " + data[1].url)

The query URL used to retrieve the data from  NWIS was: https://nwis.waterdata.usgs.gov/nwis/gwlevels?format=rdb&begin_date=1851-01-01&site_no=434400121275801

Additional Examples

You can also request data for multiple sites at the same time.

Example 2: Get data for multiple sites. Site numbers are specified using a comma delimited list of strings.

[9]:

site_ids = ["434400121275801", "375907091432201"]
data2 = nwis.get_gwlevels(sites=site_ids)
print("Retrieved " + str(len(data2[0])) + " data values.")
display(data2[0])

Retrieved 933 data values.

/home/runner/.local/lib/python3.10/site-packages/dataretrieval/utils.py:90: UserWarning: Warning: 621 incomplete dates found, consider setting datetime_index to False.
  warnings.warn(

		agency_cd	site_tp_cd	lev_dt	lev_tm	lev_tz_cd	lev_va	sl_lev_va	sl_datum_cd	lev_status_cd	lev_agency_cd	lev_dt_acy_cd	lev_acy_cd	lev_src_cd	lev_meth_cd	lev_age_cd	parameter_cd
site_no	datetime
375907091432201	2007-01-26 17:08:00+00:00	USGS	GW	2007-01-26	17:08	+0000	NaN	871.79	NGVD29	1	USGS	m	2	S	V	A	62610
	2007-01-26 17:08:00+00:00	USGS	GW	2007-01-26	17:08	+0000	NaN	871.93	NAVD88	1	USGS	m	2	S	V	A	62611
	2007-01-26 17:08:00+00:00	USGS	GW	2007-01-26	17:08	+0000	317.21	NaN	NaN	1	USGS	m	2	S	V	A	72019
	2007-06-11 19:30:00+00:00	USGS	GW	2007-06-11	19:30	+0000	NaN	879.68	NGVD29	1	MO005	m	2	A	T	A	62610
	2007-06-11 19:30:00+00:00	USGS	GW	2007-06-11	19:30	+0000	NaN	879.82	NAVD88	1	MO005	m	2	A	T	A	62611
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
434400121275801	NaT	USGS	GW	1999-03-19	NaN	NaN	NaN	4204.97	NAVD88	1	NaN	D	2	NaN	S	A	62611
	NaT	USGS	GW	1999-03-19	NaN	NaN	19.05	NaN	NaN	1	NaN	D	2	NaN	S	A	72019
	NaT	USGS	GW	1999-05-14	NaN	NaN	NaN	4201.99	NGVD29	1	USGS	D	2	S	S	A	62610
	NaT	USGS	GW	1999-05-14	NaN	NaN	NaN	4206.01	NAVD88	1	USGS	D	2	S	S	A	62611
	NaT	USGS	GW	1999-05-14	NaN	NaN	18.01	NaN	NaN	1	USGS	D	2	S	S	A	72019

933 rows × 16 columns

The following example is the same as the previous example but with multi index turned off (multi_index=False)

[10]:

site_ids = ["434400121275801", "375907091432201"]
data2 = nwis.get_gwlevels(sites=site_ids, multi_index=False)
print("Retrieved " + str(len(data2[0])) + " data values.")
display(data2[0])

Retrieved 933 data values.

/home/runner/.local/lib/python3.10/site-packages/dataretrieval/utils.py:90: UserWarning: Warning: 621 incomplete dates found, consider setting datetime_index to False.
  warnings.warn(

	agency_cd	site_no	site_tp_cd	lev_dt	lev_tm	lev_tz_cd	lev_va	sl_lev_va	sl_datum_cd	lev_status_cd	lev_agency_cd	lev_dt_acy_cd	lev_acy_cd	lev_src_cd	lev_meth_cd	lev_age_cd	parameter_cd
datetime
1945-10-12 22:35:00+00:00	USGS	434400121275801	GW	1945-10-12	22:35	+0000	NaN	4192.65	NGVD29	1	USGS	m	2	S	O	A	62610
1945-10-12 22:35:00+00:00	USGS	434400121275801	GW	1945-10-12	22:35	+0000	NaN	4196.67	NAVD88	1	USGS	m	2	S	O	A	62611
1945-10-12 22:35:00+00:00	USGS	434400121275801	GW	1945-10-12	22:35	+0000	27.35	NaN	NaN	1	USGS	m	2	S	O	A	72019
1999-06-04 18:00:00+00:00	USGS	434400121275801	GW	1999-06-04	18:00	+0000	NaN	4203.22	NGVD29	1	USGS	m	2	S	S	A	62610
1999-06-04 18:00:00+00:00	USGS	434400121275801	GW	1999-06-04	18:00	+0000	NaN	4207.24	NAVD88	1	USGS	m	2	S	S	A	62611
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
NaT	USGS	434400121275801	GW	1999-03-19	NaN	NaN	NaN	4204.97	NAVD88	1	NaN	D	2	NaN	S	A	62611
NaT	USGS	434400121275801	GW	1999-03-19	NaN	NaN	19.05	NaN	NaN	1	NaN	D	2	NaN	S	A	72019
NaT	USGS	434400121275801	GW	1999-05-14	NaN	NaN	NaN	4201.99	NGVD29	1	USGS	D	2	S	S	A	62610
NaT	USGS	434400121275801	GW	1999-05-14	NaN	NaN	NaN	4206.01	NAVD88	1	USGS	D	2	S	S	A	62611
NaT	USGS	434400121275801	GW	1999-05-14	NaN	NaN	18.01	NaN	NaN	1	USGS	D	2	S	S	A	72019

933 rows × 17 columns

Some groundwater level data have dates that include only a year or a month and year, but no day.

Example 3: Retrieve groundwater level data that have dates without a day.

[11]:

data3 = nwis.get_gwlevels(sites="425957088141001")
print("Retrieved " + str(len(data3[0])) + " data values.")

# Print the date/time index values, which show up as NaT because
# the dates can't be converted to a date/time data type
print(data3[0].index)

Retrieved 102 data values.
DatetimeIndex(['NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT',
               'NaT',
               ...
               'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT', 'NaT',
               'NaT'],
              dtype='datetime64[ns, UTC]', name='datetime', length=102, freq=None)

/home/runner/.local/lib/python3.10/site-packages/dataretrieval/utils.py:90: UserWarning: Warning: 102 incomplete dates found, consider setting datetime_index to False.
  warnings.warn(

If you want to see the USGS RDB (delimited text) version of the data just retrieved, you can get the URL for the request that was sent to the USGS web service.

[12]:

# Print the URL used to retrieve the data
print("You can examine the data retrieved from NWIS at: " + data3[1].url)

You can examine the data retrieved from NWIS at: https://nwis.waterdata.usgs.gov/nwis/gwlevels?format=rdb&begin_date=1851-01-01&site_no=425957088141001

You can also retrieve data for a site within a specified time window by specifying a start date and an end date.

Example 4: Get groundwater level data for a site between a startDate and endDate.

[13]:

data4 = nwis.get_gwlevels(sites=site_id, start="1980-01-01", end="2000-12-31")
print("Retrieved " + str(len(data4[0])) + " data values.")

Retrieved 213 data values.

/home/runner/.local/lib/python3.10/site-packages/dataretrieval/utils.py:90: UserWarning: Warning: 189 incomplete dates found, consider setting datetime_index to False.
  warnings.warn(

USGS dataretrieval Python Package get_gwlevels() Examples

Install the Package

Basic Usage

Interpreting the Result

Additional Examples

USGS dataretrieval Python Package `get_gwlevels()` Examples