Continuous Data
Continuous data are collected by automated sensors, typically at a fixed 15-minute interval (you may also hear them called “instantaneous values” or “IV”). They are described by parameter name and parameter code, and retrieved with get_continuous.
This notebook covers the two things that matter when a continuous pull gets large: dataretrieval chunks big requests for you and can resume a pull that was interrupted partway through, and the one case you still handle yourself — the service’s 3-year-per-request time limit.
[1]:
import pandas as pd
from dataretrieval import waterdata
site = "USGS-0208458892"
What continuous data are available?
Filter the combined metadata to data_type="Continuous values" to see which time series a site offers and how far back each goes:
[2]:
continuous_available, _ = waterdata.get_combined_metadata(
monitoring_location_id=site,
data_type="Continuous values",
)
avail = continuous_available[["parameter_code", "parameter_name", "begin", "end"]]
avail.sort_values("parameter_code").reset_index(drop=True)
Retrieving: combined-metadata · 1 page · 8 rows
No API key detected — register for higher rate limits at https://api.waterdata.usgs.gov/signup/
[2]:
| parameter_code | parameter_name | begin | end | |
|---|---|---|---|---|
| 0 | 00010 | Temperature, water | 2012-09-21 08:00:00+00:00 | 2026-06-15 12:15:00+00:00 |
| 1 | 00062 | Reservoir elevation | 2013-10-01 09:00:00+00:00 | 2026-06-15 14:15:00+00:00 |
| 2 | 00095 | Specific cond at 25C | 2012-09-21 08:00:00+00:00 | 2026-06-15 12:15:00+00:00 |
| 3 | 00300 | Dissolved oxygen | 2012-09-21 08:00:00+00:00 | 2026-06-15 12:15:00+00:00 |
| 4 | 00400 | pH | 2012-09-21 08:00:00+00:00 | 2026-06-15 14:15:00+00:00 |
| 5 | 00480 | Salinity | 2012-09-21 08:00:00+00:00 | 2026-06-15 12:15:00+00:00 |
| 6 | 62615 | Elevation, lake/res, NAVD88 | 2013-10-01 09:00:00+00:00 | 2026-06-15 12:15:00+00:00 |
| 7 | 63680 | Turbidity, FNU | 2015-10-02 08:00:00+00:00 | 2020-02-25 21:00:00+00:00 |
Large requests are chunked for you
Any list-valued argument — a long list of monitoring locations, several parameter codes, a complex CQL filter — can push a single request URL past the server’s ~8 KB limit. dataretrieval handles this automatically: it splits the query into URL-sized sub-requests, issues them, and recombines (and de-duplicates) the results into one frame. You never need to loop over sites yourself — request everything in one call.
For example, asking for several parameter codes at once just returns one combined long-format frame:
[3]:
multi, _ = waterdata.get_continuous(
monitoring_location_id=site,
parameter_code=["00095", "00010"], # specific conductance + water temperature
time="2024-07-01/2024-07-02",
)
multi.groupby("parameter_code")["value"].agg(["count", "min", "max"])
Retrieving: continuous · 1 page · 194 rows
[3]:
| count | min | max | |
|---|---|---|---|
| parameter_code | |||
| 00010 | 97 | 27.2 | 30.9 |
| 00095 | 97 | 954.0 | 975.0 |
Resilient pulls: resume after an interruption
A large request becomes many sub-requests under the hood, so a long pull can be interrupted partway through by a rate limit (HTTP 429) or a transient server error (HTTP 5xx). Rather than discard the work already done, dataretrieval raises a ChunkInterrupted that preserves the completed sub-requests and lets you continue:
QuotaExhausted(429) andServiceInterrupted(5xx) both subclassChunkInterrupted.exc.partial_frameholds whatever completed before the failure.exc.retry_afteris the server’s suggested wait (when provided).exc.call.resume()re-issues only the still-pending sub-requests and returns the full(data, metadata).
The pattern below waits out the interruption and resumes until the pull finishes. (In normal conditions the request completes on the first try and the except block never runs.)
[4]:
import time
from dataretrieval.waterdata.chunking import ChunkInterrupted
try:
sensor_data, _ = waterdata.get_continuous(
monitoring_location_id=site,
parameter_code="00095",
time="2024-07-01/2024-07-08",
)
except ChunkInterrupted as exc:
print(
f"interrupted after {exc.completed_chunks}/{exc.total_chunks} chunks; resuming"
)
while True:
time.sleep(exc.retry_after or 5 * 60) # honor Retry-After, else back off
try:
sensor_data, _ = exc.call.resume()
break
except ChunkInterrupted as again:
exc = again
print(f"{len(sensor_data):,} rows")
sensor_data[["time", "parameter_code", "value", "approval_status"]].head()
Retrieving: continuous · 1 page · 673 rows
673 rows
[4]:
| time | parameter_code | value | approval_status | |
|---|---|---|---|---|
| 0 | 2024-07-01 00:00:00+00:00 | 00095 | 967 | Approved |
| 1 | 2024-07-01 00:15:00+00:00 | 00095 | 967 | Approved |
| 2 | 2024-07-01 00:30:00+00:00 | 00095 | 967 | Approved |
| 3 | 2024-07-01 00:45:00+00:00 | 00095 | 966 | Approved |
| 4 | 2024-07-01 01:00:00+00:00 | 00095 | 966 | Approved |
The 3-year window: the one axis you split yourself
There is one limit the library does not chunk for you: the continuous service returns at most 3 years of data per request, and a time window is not a list-shaped axis it can fan out. (With no time argument the service returns the latest year; continuous data also has no geometry column and ignores bounding-box queries.)
So a multi-year, single-site pull is the one place you still split by time. The service is most efficient one calendar year at a time, so build a list of yearly windows:
[5]:
# Split [start, end] into per-calendar-year (start, end) date strings.
def year_chunks(start, end):
start, end = pd.Timestamp(start), pd.Timestamp(end)
edges = pd.to_datetime([f"{y}-01-01" for y in range(start.year + 1, end.year + 1)])
starts = [start, *edges]
ends = [*(edges - pd.Timedelta(days=1)), end]
return [
(s.strftime("%Y-%m-%d"), e.strftime("%Y-%m-%d")) for s, e in zip(starts, ends)
]
# Covering a full multi-year record (no data downloaded here):
pd.DataFrame(year_chunks("2012-10-01", "2025-09-30"), columns=["start", "end"])
[5]:
| start | end | |
|---|---|---|
| 0 | 2012-10-01 | 2012-12-31 |
| 1 | 2013-01-01 | 2013-12-31 |
| 2 | 2014-01-01 | 2014-12-31 |
| 3 | 2015-01-01 | 2015-12-31 |
| 4 | 2016-01-01 | 2016-12-31 |
| 5 | 2017-01-01 | 2017-12-31 |
| 6 | 2018-01-01 | 2018-12-31 |
| 7 | 2019-01-01 | 2019-12-31 |
| 8 | 2020-01-01 | 2020-12-31 |
| 9 | 2021-01-01 | 2021-12-31 |
| 10 | 2022-01-01 | 2022-12-31 |
| 11 | 2023-01-01 | 2023-12-31 |
| 12 | 2024-01-01 | 2024-12-31 |
| 13 | 2025-01-01 | 2025-09-30 |
Then request each window and concatenate. (We use a short two-window span here so the notebook runs quickly; widen the dates for a full period of record.)
[6]:
chunks = year_chunks("2023-10-01", "2024-03-31")
frames = []
for start, end in chunks:
part, _ = waterdata.get_continuous(
monitoring_location_id=site,
parameter_code="00095",
time=f"{start}/{end}",
)
frames.append(part)
por = pd.concat(frames, ignore_index=True)
print(
f"{len(por):,} rows from {len(chunks)} windows, "
f"{por['time'].min()} -> {por['time'].max()}"
)
Retrieving: continuous · 1 page · 8,734 rows
Retrieving: continuous · 1 page · 8,637 rows
17,371 rows from 2 windows, 2023-10-01 00:00:00+00:00 -> 2024-03-31 00:00:00+00:00
Wrap each window’s call in the resume pattern above for an unattended, restart-safe pull. USGS also expects to offer a direct full-period-of-record download before the legacy NWIS services are decommissioned, which may make time-window splitting unnecessary — check the documentation for updates.
More help
Documentation: https://doi-usgs.github.io/dataretrieval-python/
Chunking and resume internals:
dataretrieval.waterdata.chunkingIssues / questions: https://github.com/DOI-USGS/dataretrieval-python/issues
Equivalent R article: Continuous Data