Continuous Data

Continuous data are collected by automated sensors, typically at a fixed 15-minute interval (you may also hear them called “instantaneous values” or “IV”). They are described by parameter name and parameter code, and retrieved with get_continuous.

This notebook covers the two things that matter when a continuous pull gets large: dataretrieval chunks big requests for you and can resume a pull that was interrupted partway through, and the one case you still handle yourself — the service’s 3-year-per-request time limit.

[1]:
import pandas as pd

from dataretrieval import waterdata

site = "USGS-0208458892"

What continuous data are available?

Filter the combined metadata to data_type="Continuous values" to see which time series a site offers and how far back each goes:

[2]:
continuous_available, _ = waterdata.get_combined_metadata(
    monitoring_location_id=site,
    data_type="Continuous values",
)
avail = continuous_available[["parameter_code", "parameter_name", "begin", "end"]]
avail.sort_values("parameter_code").reset_index(drop=True)
Retrieving: combined-metadata · 1 page · 8 rows
No API key detected — register for higher rate limits at https://api.waterdata.usgs.gov/signup/
[2]:
parameter_code parameter_name begin end
0 00010 Temperature, water 2012-09-21 08:00:00+00:00 2026-06-15 12:15:00+00:00
1 00062 Reservoir elevation 2013-10-01 09:00:00+00:00 2026-06-15 14:15:00+00:00
2 00095 Specific cond at 25C 2012-09-21 08:00:00+00:00 2026-06-15 12:15:00+00:00
3 00300 Dissolved oxygen 2012-09-21 08:00:00+00:00 2026-06-15 12:15:00+00:00
4 00400 pH 2012-09-21 08:00:00+00:00 2026-06-15 14:15:00+00:00
5 00480 Salinity 2012-09-21 08:00:00+00:00 2026-06-15 12:15:00+00:00
6 62615 Elevation, lake/res, NAVD88 2013-10-01 09:00:00+00:00 2026-06-15 12:15:00+00:00
7 63680 Turbidity, FNU 2015-10-02 08:00:00+00:00 2020-02-25 21:00:00+00:00

Large requests are chunked for you

Any list-valued argument — a long list of monitoring locations, several parameter codes, a complex CQL filter — can push a single request URL past the server’s ~8 KB limit. dataretrieval handles this automatically: it splits the query into URL-sized sub-requests, issues them, and recombines (and de-duplicates) the results into one frame. You never need to loop over sites yourself — request everything in one call.

For example, asking for several parameter codes at once just returns one combined long-format frame:

[3]:
multi, _ = waterdata.get_continuous(
    monitoring_location_id=site,
    parameter_code=["00095", "00010"],  # specific conductance + water temperature
    time="2024-07-01/2024-07-02",
)
multi.groupby("parameter_code")["value"].agg(["count", "min", "max"])
Retrieving: continuous · 1 page · 194 rows
[3]:
count min max
parameter_code
00010 97 27.2 30.9
00095 97 954.0 975.0

Resilient pulls: resume after an interruption

A large request becomes many sub-requests under the hood, so a long pull can be interrupted partway through by a rate limit (HTTP 429) or a transient server error (HTTP 5xx). Rather than discard the work already done, dataretrieval raises a ChunkInterrupted that preserves the completed sub-requests and lets you continue:

  • QuotaExhausted (429) and ServiceInterrupted (5xx) both subclass ChunkInterrupted.

  • exc.partial_frame holds whatever completed before the failure.

  • exc.retry_after is the server’s suggested wait (when provided).

  • exc.call.resume() re-issues only the still-pending sub-requests and returns the full (data, metadata).

The pattern below waits out the interruption and resumes until the pull finishes. (In normal conditions the request completes on the first try and the except block never runs.)

[4]:
import time

from dataretrieval.waterdata.chunking import ChunkInterrupted

try:
    sensor_data, _ = waterdata.get_continuous(
        monitoring_location_id=site,
        parameter_code="00095",
        time="2024-07-01/2024-07-08",
    )
except ChunkInterrupted as exc:
    print(
        f"interrupted after {exc.completed_chunks}/{exc.total_chunks} chunks; resuming"
    )
    while True:
        time.sleep(exc.retry_after or 5 * 60)  # honor Retry-After, else back off
        try:
            sensor_data, _ = exc.call.resume()
            break
        except ChunkInterrupted as again:
            exc = again

print(f"{len(sensor_data):,} rows")
sensor_data[["time", "parameter_code", "value", "approval_status"]].head()
Retrieving: continuous · 1 page · 673 rows
673 rows
[4]:
time parameter_code value approval_status
0 2024-07-01 00:00:00+00:00 00095 967 Approved
1 2024-07-01 00:15:00+00:00 00095 967 Approved
2 2024-07-01 00:30:00+00:00 00095 967 Approved
3 2024-07-01 00:45:00+00:00 00095 966 Approved
4 2024-07-01 01:00:00+00:00 00095 966 Approved

The 3-year window: the one axis you split yourself

There is one limit the library does not chunk for you: the continuous service returns at most 3 years of data per request, and a time window is not a list-shaped axis it can fan out. (With no time argument the service returns the latest year; continuous data also has no geometry column and ignores bounding-box queries.)

So a multi-year, single-site pull is the one place you still split by time. The service is most efficient one calendar year at a time, so build a list of yearly windows:

[5]:
# Split [start, end] into per-calendar-year (start, end) date strings.
def year_chunks(start, end):
    start, end = pd.Timestamp(start), pd.Timestamp(end)
    edges = pd.to_datetime([f"{y}-01-01" for y in range(start.year + 1, end.year + 1)])
    starts = [start, *edges]
    ends = [*(edges - pd.Timedelta(days=1)), end]
    return [
        (s.strftime("%Y-%m-%d"), e.strftime("%Y-%m-%d")) for s, e in zip(starts, ends)
    ]


# Covering a full multi-year record (no data downloaded here):
pd.DataFrame(year_chunks("2012-10-01", "2025-09-30"), columns=["start", "end"])
[5]:
start end
0 2012-10-01 2012-12-31
1 2013-01-01 2013-12-31
2 2014-01-01 2014-12-31
3 2015-01-01 2015-12-31
4 2016-01-01 2016-12-31
5 2017-01-01 2017-12-31
6 2018-01-01 2018-12-31
7 2019-01-01 2019-12-31
8 2020-01-01 2020-12-31
9 2021-01-01 2021-12-31
10 2022-01-01 2022-12-31
11 2023-01-01 2023-12-31
12 2024-01-01 2024-12-31
13 2025-01-01 2025-09-30

Then request each window and concatenate. (We use a short two-window span here so the notebook runs quickly; widen the dates for a full period of record.)

[6]:
chunks = year_chunks("2023-10-01", "2024-03-31")

frames = []
for start, end in chunks:
    part, _ = waterdata.get_continuous(
        monitoring_location_id=site,
        parameter_code="00095",
        time=f"{start}/{end}",
    )
    frames.append(part)

por = pd.concat(frames, ignore_index=True)
print(
    f"{len(por):,} rows from {len(chunks)} windows, "
    f"{por['time'].min()} -> {por['time'].max()}"
)
Retrieving: continuous · 1 page · 8,734 rows
Retrieving: continuous · 1 page · 8,637 rows
17,371 rows from 2 windows, 2023-10-01 00:00:00+00:00 -> 2024-03-31 00:00:00+00:00

Wrap each window’s call in the resume pattern above for an unattended, restart-safe pull. USGS also expects to offer a direct full-period-of-record download before the legacy NWIS services are decommissioned, which may make time-window splitting unnecessary — check the documentation for updates.

More help