USGS dataretrieval Python Package get_gwlevels() Examples
This notebook provides examples of using the Python dataretrieval package to retrieve groundwater level data for a United States Geological Survey (USGS) monitoring site. The dataretrieval package provides a collection of functions to get data from the USGS National Water Information System (NWIS) and other online sources of hydrology and water quality data, including the United States Environmental Protection Agency (USEPA).
Install the Package
Use the following code to install the package if it doesn’t exist already within your Jupyter Python environment.
[1]:
!pip install dataretrieval
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: dataretrieval in /home/runner/.local/lib/python3.12/site-packages (0.1.dev1+g4dc9f6a68)
Requirement already satisfied: requests in /usr/lib/python3/dist-packages (from dataretrieval) (2.31.0)
Requirement already satisfied: pandas<3.0.0,>=2.0.0 in /home/runner/.local/lib/python3.12/site-packages (from dataretrieval) (2.3.3)
Requirement already satisfied: numpy>=1.26.0 in /home/runner/.local/lib/python3.12/site-packages (from pandas<3.0.0,>=2.0.0->dataretrieval) (2.4.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/lib/python3/dist-packages (from pandas<3.0.0,>=2.0.0->dataretrieval) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/lib/python3/dist-packages (from pandas<3.0.0,>=2.0.0->dataretrieval) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /home/runner/.local/lib/python3.12/site-packages (from pandas<3.0.0,>=2.0.0->dataretrieval) (2025.3)
Load the package so you can use it along with other packages used in this notebook.
[2]:
from dataretrieval import nwis
from IPython.display import display
Basic Usage
The dataretrieval package has several functions that allow you to retrieve data from different web services. This examples uses the get_gwlevels() function to retrieve groundwater level data from USGS NWIS. The following arguments are supported:
Arguments (Additional parameters, if supplied, will be used as query parameters)
sites (string or list of strings): A list of USGS site identifiers for which to retrieve data.
start (string): The beginning date for a period for which to retrieve data. If the waterdata parameter begin_date is supplied, it will overwrite the start parameter (defaults to ‘1851-01-01’)
end (string): The ending date for a period for which to retrieve data. If the waterdata parameter end_date is supplied, it will overwrite the end parameter.
Example 1: Get groundwater level data for a single monitoring site.
[3]:
# Set the parameters needed to retrieve data
site_id = "434400121275801"
# Retrieve the data
data = nwis.get_gwlevels(sites=site_id)
print("Retrieved " + str(len(data[0])) + " data values.")
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/.local/lib/python3.12/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
3811 try:
-> 3812 return self._engine.get_loc(casted_key)
3813 except KeyError as err:
File pandas/_libs/index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:7096, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'lev_tz_cd'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[3], line 5
2 site_id = "434400121275801"
4 # Retrieve the data
----> 5 data = nwis.get_gwlevels(sites=site_id)
6 print("Retrieved " + str(len(data[0])) + " data values.")
File ~/.local/lib/python3.12/site-packages/dataretrieval/nwis.py:346, in get_gwlevels(sites, start, end, multi_index, datetime_index, ssl_check, **kwargs)
343 df = _read_rdb(response.text)
345 if datetime_index is True:
--> 346 df = format_datetime(df, "lev_dt", "lev_tm", "lev_tz_cd")
348 # Filter by kwarg parameterCd because the service doesn't do it
349 if "parameterCd" in kwargs:
File ~/.local/lib/python3.12/site-packages/dataretrieval/utils.py:79, in format_datetime(df, date_field, time_field, tz_field)
56 """Creates a datetime field from separate date, time, and
57 time zone fields.
58
(...) 76
77 """
78 # create a datetime index from the columns in qwdata response
---> 79 df[tz_field] = df[tz_field].map(tz)
81 df["datetime"] = pd.to_datetime(
82 df[date_field] + " " + df[time_field] + " " + df[tz_field],
83 format="ISO8601",
84 utc=True,
85 )
87 # if there are any incomplete dates, warn the user
File ~/.local/lib/python3.12/site-packages/pandas/core/frame.py:4113, in DataFrame.__getitem__(self, key)
4111 if self.columns.nlevels > 1:
4112 return self._getitem_multilevel(key)
-> 4113 indexer = self.columns.get_loc(key)
4114 if is_integer(indexer):
4115 indexer = [indexer]
File ~/.local/lib/python3.12/site-packages/pandas/core/indexes/base.py:3819, in Index.get_loc(self, key)
3814 if isinstance(casted_key, slice) or (
3815 isinstance(casted_key, abc.Iterable)
3816 and any(isinstance(x, slice) for x in casted_key)
3817 ):
3818 raise InvalidIndexError(key)
-> 3819 raise KeyError(key) from err
3820 except TypeError:
3821 # If we have a listlike key, _check_indexing_error will raise
3822 # InvalidIndexError. Otherwise we fall through and re-raise
3823 # the TypeError.
3824 self._check_indexing_error(key)
KeyError: 'lev_tz_cd'
Interpreting the Result
The result of calling the get_gwlevels() function is an object that contains a Pandas data frame and an associated metadata object. The Pandas data frame contains the data requested. The data frame is indexed by the dates associated with the data values.
Once you’ve got the data frame, there’s several useful things you can do to explore the data.
Display the data frame as a table
[4]:
display(data[0])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[4], line 1
----> 1 display(data[0])
NameError: name 'data' is not defined
Show the data types of the columns in the resulting data frame.
[5]:
print(data[0].dtypes)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[5], line 1
----> 1 print(data[0].dtypes)
NameError: name 'data' is not defined
Get summary statistics for the daily streamflow values.
[6]:
data[0]['lev_va'].describe()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[6], line 1
----> 1 data[0]['lev_va'].describe()
NameError: name 'data' is not defined
Make a quick time series plot.
[7]:
ax = data[0].plot(x = 'lev_dt', y='lev_va')
ax.set_xlabel('Date')
ax.set_ylabel('Water Level (feet below land surface)')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[7], line 1
----> 1 ax = data[0].plot(x = 'lev_dt', y='lev_va')
2 ax.set_xlabel('Date')
3 ax.set_ylabel('Water Level (feet below land surface)')
NameError: name 'data' is not defined
The other part of the result returned from the get_gwlevels() function is a metadata object that contains information about the query that was executed to return the data. For example, you can access the URL that was assembled to retrieve the requested data from the USGS web service. The USGS web service responses contain a descriptive header that defines and can be helpful in interpreting the contents of the response.
[8]:
print("The query URL used to retrieve the data from NWIS was: " + data[1].url)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[8], line 1
----> 1 print("The query URL used to retrieve the data from NWIS was: " + data[1].url)
NameError: name 'data' is not defined
Additional Examples
You can also request data for multiple sites at the same time.
Example 2: Get data for multiple sites. Site numbers are specified using a comma delimited list of strings.
[9]:
site_ids = ["434400121275801", "375907091432201"]
data2 = nwis.get_gwlevels(sites=site_ids)
print("Retrieved " + str(len(data2[0])) + " data values.")
display(data2[0])
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/.local/lib/python3.12/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
3811 try:
-> 3812 return self._engine.get_loc(casted_key)
3813 except KeyError as err:
File pandas/_libs/index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:7096, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'lev_tz_cd'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[9], line 2
1 site_ids = ["434400121275801", "375907091432201"]
----> 2 data2 = nwis.get_gwlevels(sites=site_ids)
3 print("Retrieved " + str(len(data2[0])) + " data values.")
4 display(data2[0])
File ~/.local/lib/python3.12/site-packages/dataretrieval/nwis.py:346, in get_gwlevels(sites, start, end, multi_index, datetime_index, ssl_check, **kwargs)
343 df = _read_rdb(response.text)
345 if datetime_index is True:
--> 346 df = format_datetime(df, "lev_dt", "lev_tm", "lev_tz_cd")
348 # Filter by kwarg parameterCd because the service doesn't do it
349 if "parameterCd" in kwargs:
File ~/.local/lib/python3.12/site-packages/dataretrieval/utils.py:79, in format_datetime(df, date_field, time_field, tz_field)
56 """Creates a datetime field from separate date, time, and
57 time zone fields.
58
(...) 76
77 """
78 # create a datetime index from the columns in qwdata response
---> 79 df[tz_field] = df[tz_field].map(tz)
81 df["datetime"] = pd.to_datetime(
82 df[date_field] + " " + df[time_field] + " " + df[tz_field],
83 format="ISO8601",
84 utc=True,
85 )
87 # if there are any incomplete dates, warn the user
File ~/.local/lib/python3.12/site-packages/pandas/core/frame.py:4113, in DataFrame.__getitem__(self, key)
4111 if self.columns.nlevels > 1:
4112 return self._getitem_multilevel(key)
-> 4113 indexer = self.columns.get_loc(key)
4114 if is_integer(indexer):
4115 indexer = [indexer]
File ~/.local/lib/python3.12/site-packages/pandas/core/indexes/base.py:3819, in Index.get_loc(self, key)
3814 if isinstance(casted_key, slice) or (
3815 isinstance(casted_key, abc.Iterable)
3816 and any(isinstance(x, slice) for x in casted_key)
3817 ):
3818 raise InvalidIndexError(key)
-> 3819 raise KeyError(key) from err
3820 except TypeError:
3821 # If we have a listlike key, _check_indexing_error will raise
3822 # InvalidIndexError. Otherwise we fall through and re-raise
3823 # the TypeError.
3824 self._check_indexing_error(key)
KeyError: 'lev_tz_cd'
The following example is the same as the previous example but with multi index turned off (multi_index=False)
[10]:
site_ids = ["434400121275801", "375907091432201"]
data2 = nwis.get_gwlevels(sites=site_ids, multi_index=False)
print("Retrieved " + str(len(data2[0])) + " data values.")
display(data2[0])
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/.local/lib/python3.12/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
3811 try:
-> 3812 return self._engine.get_loc(casted_key)
3813 except KeyError as err:
File pandas/_libs/index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:7096, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'lev_tz_cd'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[10], line 2
1 site_ids = ["434400121275801", "375907091432201"]
----> 2 data2 = nwis.get_gwlevels(sites=site_ids, multi_index=False)
3 print("Retrieved " + str(len(data2[0])) + " data values.")
4 display(data2[0])
File ~/.local/lib/python3.12/site-packages/dataretrieval/nwis.py:346, in get_gwlevels(sites, start, end, multi_index, datetime_index, ssl_check, **kwargs)
343 df = _read_rdb(response.text)
345 if datetime_index is True:
--> 346 df = format_datetime(df, "lev_dt", "lev_tm", "lev_tz_cd")
348 # Filter by kwarg parameterCd because the service doesn't do it
349 if "parameterCd" in kwargs:
File ~/.local/lib/python3.12/site-packages/dataretrieval/utils.py:79, in format_datetime(df, date_field, time_field, tz_field)
56 """Creates a datetime field from separate date, time, and
57 time zone fields.
58
(...) 76
77 """
78 # create a datetime index from the columns in qwdata response
---> 79 df[tz_field] = df[tz_field].map(tz)
81 df["datetime"] = pd.to_datetime(
82 df[date_field] + " " + df[time_field] + " " + df[tz_field],
83 format="ISO8601",
84 utc=True,
85 )
87 # if there are any incomplete dates, warn the user
File ~/.local/lib/python3.12/site-packages/pandas/core/frame.py:4113, in DataFrame.__getitem__(self, key)
4111 if self.columns.nlevels > 1:
4112 return self._getitem_multilevel(key)
-> 4113 indexer = self.columns.get_loc(key)
4114 if is_integer(indexer):
4115 indexer = [indexer]
File ~/.local/lib/python3.12/site-packages/pandas/core/indexes/base.py:3819, in Index.get_loc(self, key)
3814 if isinstance(casted_key, slice) or (
3815 isinstance(casted_key, abc.Iterable)
3816 and any(isinstance(x, slice) for x in casted_key)
3817 ):
3818 raise InvalidIndexError(key)
-> 3819 raise KeyError(key) from err
3820 except TypeError:
3821 # If we have a listlike key, _check_indexing_error will raise
3822 # InvalidIndexError. Otherwise we fall through and re-raise
3823 # the TypeError.
3824 self._check_indexing_error(key)
KeyError: 'lev_tz_cd'
Some groundwater level data have dates that include only a year or a month and year, but no day.
Example 3: Retrieve groundwater level data that have dates without a day.
[11]:
data3 = nwis.get_gwlevels(sites="425957088141001")
print("Retrieved " + str(len(data3[0])) + " data values.")
# Print the date/time index values, which show up as NaT because
# the dates can't be converted to a date/time data type
print(data3[0].index)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/.local/lib/python3.12/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
3811 try:
-> 3812 return self._engine.get_loc(casted_key)
3813 except KeyError as err:
File pandas/_libs/index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:7096, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'lev_tz_cd'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[11], line 1
----> 1 data3 = nwis.get_gwlevels(sites="425957088141001")
2 print("Retrieved " + str(len(data3[0])) + " data values.")
4 # Print the date/time index values, which show up as NaT because
5 # the dates can't be converted to a date/time data type
File ~/.local/lib/python3.12/site-packages/dataretrieval/nwis.py:346, in get_gwlevels(sites, start, end, multi_index, datetime_index, ssl_check, **kwargs)
343 df = _read_rdb(response.text)
345 if datetime_index is True:
--> 346 df = format_datetime(df, "lev_dt", "lev_tm", "lev_tz_cd")
348 # Filter by kwarg parameterCd because the service doesn't do it
349 if "parameterCd" in kwargs:
File ~/.local/lib/python3.12/site-packages/dataretrieval/utils.py:79, in format_datetime(df, date_field, time_field, tz_field)
56 """Creates a datetime field from separate date, time, and
57 time zone fields.
58
(...) 76
77 """
78 # create a datetime index from the columns in qwdata response
---> 79 df[tz_field] = df[tz_field].map(tz)
81 df["datetime"] = pd.to_datetime(
82 df[date_field] + " " + df[time_field] + " " + df[tz_field],
83 format="ISO8601",
84 utc=True,
85 )
87 # if there are any incomplete dates, warn the user
File ~/.local/lib/python3.12/site-packages/pandas/core/frame.py:4113, in DataFrame.__getitem__(self, key)
4111 if self.columns.nlevels > 1:
4112 return self._getitem_multilevel(key)
-> 4113 indexer = self.columns.get_loc(key)
4114 if is_integer(indexer):
4115 indexer = [indexer]
File ~/.local/lib/python3.12/site-packages/pandas/core/indexes/base.py:3819, in Index.get_loc(self, key)
3814 if isinstance(casted_key, slice) or (
3815 isinstance(casted_key, abc.Iterable)
3816 and any(isinstance(x, slice) for x in casted_key)
3817 ):
3818 raise InvalidIndexError(key)
-> 3819 raise KeyError(key) from err
3820 except TypeError:
3821 # If we have a listlike key, _check_indexing_error will raise
3822 # InvalidIndexError. Otherwise we fall through and re-raise
3823 # the TypeError.
3824 self._check_indexing_error(key)
KeyError: 'lev_tz_cd'
If you want to see the USGS RDB (delimited text) version of the data just retrieved, you can get the URL for the request that was sent to the USGS web service.
[12]:
# Print the URL used to retrieve the data
print("You can examine the data retrieved from NWIS at: " + data3[1].url)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[12], line 2
1 # Print the URL used to retrieve the data
----> 2 print("You can examine the data retrieved from NWIS at: " + data3[1].url)
NameError: name 'data3' is not defined
You can also retrieve data for a site within a specified time window by specifying a start date and an end date.
Example 4: Get groundwater level data for a site between a startDate and endDate.
[13]:
data4 = nwis.get_gwlevels(sites=site_id, start="1980-01-01", end="2000-12-31")
print("Retrieved " + str(len(data4[0])) + " data values.")
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File ~/.local/lib/python3.12/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key)
3811 try:
-> 3812 return self._engine.get_loc(casted_key)
3813 except KeyError as err:
File pandas/_libs/index.pyx:167, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/index.pyx:196, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:7096, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'lev_tz_cd'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[13], line 1
----> 1 data4 = nwis.get_gwlevels(sites=site_id, start="1980-01-01", end="2000-12-31")
2 print("Retrieved " + str(len(data4[0])) + " data values.")
File ~/.local/lib/python3.12/site-packages/dataretrieval/nwis.py:346, in get_gwlevels(sites, start, end, multi_index, datetime_index, ssl_check, **kwargs)
343 df = _read_rdb(response.text)
345 if datetime_index is True:
--> 346 df = format_datetime(df, "lev_dt", "lev_tm", "lev_tz_cd")
348 # Filter by kwarg parameterCd because the service doesn't do it
349 if "parameterCd" in kwargs:
File ~/.local/lib/python3.12/site-packages/dataretrieval/utils.py:79, in format_datetime(df, date_field, time_field, tz_field)
56 """Creates a datetime field from separate date, time, and
57 time zone fields.
58
(...) 76
77 """
78 # create a datetime index from the columns in qwdata response
---> 79 df[tz_field] = df[tz_field].map(tz)
81 df["datetime"] = pd.to_datetime(
82 df[date_field] + " " + df[time_field] + " " + df[tz_field],
83 format="ISO8601",
84 utc=True,
85 )
87 # if there are any incomplete dates, warn the user
File ~/.local/lib/python3.12/site-packages/pandas/core/frame.py:4113, in DataFrame.__getitem__(self, key)
4111 if self.columns.nlevels > 1:
4112 return self._getitem_multilevel(key)
-> 4113 indexer = self.columns.get_loc(key)
4114 if is_integer(indexer):
4115 indexer = [indexer]
File ~/.local/lib/python3.12/site-packages/pandas/core/indexes/base.py:3819, in Index.get_loc(self, key)
3814 if isinstance(casted_key, slice) or (
3815 isinstance(casted_key, abc.Iterable)
3816 and any(isinstance(x, slice) for x in casted_key)
3817 ):
3818 raise InvalidIndexError(key)
-> 3819 raise KeyError(key) from err
3820 except TypeError:
3821 # If we have a listlike key, _check_indexing_error will raise
3822 # InvalidIndexError. Otherwise we fall through and re-raise
3823 # the TypeError.
3824 self._check_indexing_error(key)
KeyError: 'lev_tz_cd'