Continuous Data

Continuous data are collected by automated sensors, typically at a fixed 15-minute interval (you may also hear them called “instantaneous values” or “IV”). They are described by parameter name and parameter code, and retrieved with get_continuous.

This notebook covers the two things that matter when a continuous pull gets large: dataretrieval chunks big requests for you and can resume a pull that was interrupted partway through, and the one case you still handle yourself — the service’s 3-year-per-request time limit.

[1]:

import pandas as pd

from dataretrieval import waterdata

site = "USGS-0208458892"

What continuous data are available?

Filter the combined metadata to data_type="Continuous values" to see which time series a site offers and how far back each goes:

[2]:

continuous_available, _ = waterdata.get_combined_metadata(
    monitoring_location_id=site,
    data_type="Continuous values",
)
avail = continuous_available[["parameter_code", "parameter_name", "begin", "end"]]
avail.sort_values("parameter_code").reset_index(drop=True)

Retrieving: combined-metadata · 1 page · 8 rows
No API key detected — register for higher rate limits at https://api.waterdata.usgs.gov/signup/

[2]:

	parameter_code	parameter_name	begin	end
0	00010	Temperature, water	2012-09-21 08:00:00+00:00	2026-07-23 10:15:00+00:00
1	00062	Reservoir elevation	2013-10-01 09:00:00+00:00	2026-07-23 11:15:00+00:00
2	00095	Specific cond at 25C	2012-09-21 08:00:00+00:00	2026-07-23 10:15:00+00:00
3	00300	Dissolved oxygen	2012-09-21 08:00:00+00:00	2026-07-23 10:15:00+00:00
4	00400	pH	2012-09-21 08:00:00+00:00	2026-07-23 11:15:00+00:00
5	00480	Salinity	2012-09-21 08:00:00+00:00	2026-07-23 10:15:00+00:00
6	62615	Elevation, lake/res, NAVD88	2013-10-01 09:00:00+00:00	2026-07-23 10:15:00+00:00
7	63680	Turbidity, FNU	2015-10-02 08:00:00+00:00	2020-02-25 21:00:00+00:00

Large requests are chunked for you

Any list-valued argument — a long list of monitoring locations, several parameter codes, a complex CQL filter — can push a single request URL past the server’s ~8 KB limit. dataretrieval handles this automatically: it splits the query into URL-sized sub-requests, issues them, and recombines (and de-duplicates) the results into one frame. You never need to loop over sites yourself — request everything in one call.

For example, asking for several parameter codes at once just returns one combined long-format frame:

[3]:

multi, _ = waterdata.get_continuous(
    monitoring_location_id=site,
    parameter_code=["00095", "00010"],  # specific conductance + water temperature
    time="2024-07-01/2024-07-02",
)
multi.groupby("parameter_code")["value"].agg(["count", "min", "max"])

Retrieving: continuous · 1 page · 194 rows

[3]:

	count	min	max
parameter_code
00010	97	27.2	30.9
00095	97	954.0	975.0

Resilient pulls: resume after an interruption

A large request becomes many sub-requests under the hood, so a long pull can be interrupted partway through by a rate limit (HTTP 429) or a transient server error (HTTP 5xx). Rather than discard the work already done, dataretrieval raises a ChunkInterrupted that preserves the completed sub-requests and lets you continue:

QuotaExhausted (429) and ServiceInterrupted (5xx) both subclass ChunkInterrupted.
exc.partial_frame holds whatever completed before the failure.
exc.retry_after is the server’s suggested wait (when provided).
exc.call.resume() re-issues only the still-pending sub-requests and returns the full (data, metadata).

The pattern below waits out the interruption and resumes until the pull finishes. (In normal conditions the request completes on the first try and the except block never runs.)

[4]:

import time

from dataretrieval import ChunkInterrupted

try:
    sensor_data, _ = waterdata.get_continuous(
        monitoring_location_id=site,
        parameter_code="00095",
        time="2024-07-01/2024-07-08",
    )
except ChunkInterrupted as exc:
    print(
        f"interrupted after {exc.completed_chunks}/{exc.total_chunks} chunks; resuming"
    )
    while True:
        time.sleep(exc.retry_after or 5 * 60)  # honor Retry-After, else back off
        try:
            sensor_data, _ = exc.call.resume()
            break
        except ChunkInterrupted as again:
            exc = again

print(f"{len(sensor_data):,} rows")
sensor_data[["time", "parameter_code", "value", "approval_status"]].head()

Retrieving: continuous · 1 page · 673 rows

673 rows

[4]:

	time	parameter_code	value	approval_status
0	2024-07-01 00:00:00+00:00	00095	967	Approved
1	2024-07-01 00:15:00+00:00	00095	967	Approved
2	2024-07-01 00:30:00+00:00	00095	967	Approved
3	2024-07-01 00:45:00+00:00	00095	966	Approved
4	2024-07-01 01:00:00+00:00	00095	966	Approved

The 3-year window: the one axis you split yourself

There is one limit the library does not chunk for you: the continuous service returns at most 3 years of data per request, and a time window is not a list-shaped axis it can fan out. (With no time argument the service returns the latest year; continuous data also has no geometry column and ignores bounding-box queries.)

So a multi-year, single-site pull is the one place you still split by time. The service is most efficient one calendar year at a time, so build a list of yearly windows:

[5]:

# Split [start, end] into per-calendar-year (start, end) date strings.
def year_chunks(start, end):
    start, end = pd.Timestamp(start), pd.Timestamp(end)
    edges = pd.to_datetime([f"{y}-01-01" for y in range(start.year + 1, end.year + 1)])
    starts = [start, *edges]
    ends = [*(edges - pd.Timedelta(days=1)), end]
    return [
        (s.strftime("%Y-%m-%d"), e.strftime("%Y-%m-%d")) for s, e in zip(starts, ends)
    ]


# Covering a full multi-year record (no data downloaded here):
pd.DataFrame(year_chunks("2012-10-01", "2025-09-30"), columns=["start", "end"])

[5]:

	start	end
0	2012-10-01	2012-12-31
1	2013-01-01	2013-12-31
2	2014-01-01	2014-12-31
3	2015-01-01	2015-12-31
4	2016-01-01	2016-12-31
5	2017-01-01	2017-12-31
6	2018-01-01	2018-12-31
7	2019-01-01	2019-12-31
8	2020-01-01	2020-12-31
9	2021-01-01	2021-12-31
10	2022-01-01	2022-12-31
11	2023-01-01	2023-12-31
12	2024-01-01	2024-12-31
13	2025-01-01	2025-09-30

Then request each window and concatenate. (We use a short two-window span here so the notebook runs quickly; widen the dates for a full period of record.)

[6]:

chunks = year_chunks("2023-10-01", "2024-03-31")

frames = []
for start, end in chunks:
    part, _ = waterdata.get_continuous(
        monitoring_location_id=site,
        parameter_code="00095",
        time=f"{start}/{end}",
    )
    frames.append(part)

por = pd.concat(frames, ignore_index=True)
print(
    f"{len(por):,} rows from {len(chunks)} windows, "
    f"{por['time'].min()} -> {por['time'].max()}"
)

Retrieving: continuous · 1 page · 8,734 rows
Retrieving: continuous · 1 page · 8,637 rows

17,371 rows from 2 windows, 2023-10-01 00:00:00+00:00 -> 2024-03-31 00:00:00+00:00

Wrap each window’s call in the resume pattern above for an unattended, restart-safe pull. USGS also expects to offer a direct full-period-of-record download before the legacy NWIS services are decommissioned, which may make time-window splitting unnecessary — check the documentation for updates.

More help

Documentation: https://doi-usgs.github.io/dataretrieval-python/
Chunking and resume internals: dataretrieval.ogc.chunking
Issues / questions: https://github.com/DOI-USGS/dataretrieval-python/issues
Equivalent R article: Continuous Data