dataretrieval.exceptions

Exception taxonomy for dataretrieval.

Every service module (nwis, wqp, nldi, waterdata, streamstats) raises a subclass of DataRetrievalError when a request fails, so one except dataretrieval.DataRetrievalError catches them all – including connection-level failures (timeouts, DNS, refused connections), which are wrapped as NetworkError with the underlying httpx exception on __cause__.

Most failures are an HTTPError carrying the response .status_code, of which TransientError (429 / 5xx) is the retryable subset. The rest aren’t a plain status: RequestTooLarge (with URLTooLong / Unchunkable), NetworkError (a failed connection, per above), and NoSitesError. error_for_status() maps a status to its type.

This module has no third-party runtime dependencies – httpx is imported only for type checking – so any module can import it without pulling in pandas / httpx and without risking an import cycle.

exception dataretrieval.exceptions.DataRetrievalError[source]

Bases: Exception

Base class for every failed-request error in dataretrieval.

Catch it to handle any USGS or EPA service failure uniformly, and branch on the read-anywhere fields below without needing the concrete subclass:

try:
    df, md = dataretrieval.waterdata.get_daily(...)
except dataretrieval.DataRetrievalError as e:
    if e.retryable:  # 429 / 5xx / connection failure
        time.sleep(e.retry_after or backoff)
        ...  # re-issue the request
    elif e.status_code == 404:  # ``None`` unless an HTTP status error
        ...
    else:
        raise

Connection-level failures (timeouts, DNS) are wrapped as NetworkError, so this single clause covers them too.

retry_after: float | None = None: Seconds the server asked us to wait before retrying (its Retry-After header), or None when it gave no hint. Set by TransientError.

retryable: ClassVar[bool] = False: Whether re-issuing the same request might succeed – True for the transient HTTP statuses (429 / 5xx, TransientError) and for connection failures (NetworkError); False otherwise.

status_code: int | None = None: HTTP status that triggered the error, or None for errors without one (connection failure, too-long URL, no data). Set by HTTPError.

exception dataretrieval.exceptions.HTTPError(message: str, *, status_code: int)[source]

Bases: DataRetrievalError

The service returned an error HTTP status.

The numeric status is on status_code; branch on it, e.g. except HTTPError as e: ... if e.status_code == 404. TransientError (429 / 5xx) is the retryable subset, and is itself an HTTPError. The one exception to “a status is an HTTPError” is a request the service rejects as too long: it surfaces as URLTooLong (a RequestTooLarge), not an HTTPError – so catch DataRetrievalError to be certain of spanning every failure. See error_for_status() for the full mapping.

Parameters:

message (str) – Human-readable error message.
status_code (int) – The HTTP status the service returned.

exception dataretrieval.exceptions.NetworkError[source]

Bases: DataRetrievalError

The request never completed a round-trip to the service – a DNS failure, refused connection, or timeout – so no HTTP response arrived to classify.

Wraps the underlying httpx transport exception, preserved on __cause__. Worth retrying (retryable is True), but carries no .status_code because no response came back.

retryable: ClassVar[bool] = True: Whether re-issuing the same request might succeed – True for the transient HTTP statuses (429 / 5xx, TransientError) and for connection failures (NetworkError); False otherwise.

exception dataretrieval.exceptions.NoSitesError(url: httpx.URL)[source]

Bases: DataRetrievalError

A request succeeded (HTTP 200) but matched no sites/data.

A no-data result is normally not an error: the modern getters (waterdata, wqp, nldi) return an empty DataFrame. Only the deprecated nwis (waterservices) path still raises this.

exception dataretrieval.exceptions.RateLimited(message: str, *, status_code: int | None = None, retry_after: float | None = None)[source]

Bases: TransientError

A request was rejected with HTTP 429 (too many requests).

_DEFAULT_STATUS: ClassVar[int] = 429: Canonical status a concrete transient stamps when built without an explicit status_code (RateLimited = 429, ServiceUnavailable = 503). TransientError itself is abstract and sets none, so constructing it bare requires status_code.

exception dataretrieval.exceptions.RequestTooLarge[source]

Bases: DataRetrievalError

The request is too large for the service to satisfy.

Base for the two ways that happens; catch it to handle either: URLTooLong (a single request rejected for length) and Unchunkable (a Water Data call the chunker could not split small enough to fit).

exception dataretrieval.exceptions.ServiceUnavailable(message: str, *, status_code: int | None = None, retry_after: float | None = None)[source]

Bases: TransientError

A request was rejected with a server error (HTTP 5xx).

Raised by both the legacy query path and the Water Data path, so a 5xx surfaces as one type whichever subsystem issued the request. .status_code holds the actual 5xx; it falls back to 503 only on a bare hand-construction.

_DEFAULT_STATUS: ClassVar[int] = 503: Canonical status a concrete transient stamps when built without an explicit status_code (RateLimited = 429, ServiceUnavailable = 503). TransientError itself is abstract and sets none, so constructing it bare requires status_code.

exception dataretrieval.exceptions.TransientError(message: str, *, status_code: int | None = None, retry_after: float | None = None)[source]

Bases: HTTPError

A 429 or 5xx the server may serve on a later try – RateLimited for 429, ServiceUnavailable for 5xx.

This only classifies the condition; it does not itself retry. Whether to retry is up to the calling path: a single-shot request raises it for the caller to handle (e.g. wait retry_after seconds, then re-issue), while the Water Data chunker retries and resumes automatically.

Parameters:

message (str) – Human-readable error message.
status_code (int, optional) – The HTTP status the service returned. Defaults to the leaf’s canonical code (429 / 503) when omitted; error_for_status() always passes the real status.
retry_after (float, optional) – Seconds to wait before retrying, parsed from the Retry-After response header; None when the header is absent or unparseable.

_DEFAULT_STATUS: ClassVar[int]: Canonical status a concrete transient stamps when built without an explicit status_code (RateLimited = 429, ServiceUnavailable = 503). TransientError itself is abstract and sets none, so constructing it bare requires status_code.

retryable: ClassVar[bool] = True: Whether re-issuing the same request might succeed – True for the transient HTTP statuses (429 / 5xx, TransientError) and for connection failures (NetworkError); False otherwise.

exception dataretrieval.exceptions.URLTooLong[source]

Bases: RequestTooLarge

A single request URL was too long for the service.

Raised on the legacy query path (which sends one un-chunked request), whether the URL is rejected client-side before sending or by the server (see error_for_status()). Remediation: query fewer sites, or split the call manually.

exception dataretrieval.exceptions.Unchunkable[source]

Bases: RequestTooLarge

No chunking plan fits the URL byte limit.

Raised by the Water Data chunker when even the smallest reducible plan (every list axis at one atom per sub-request, the filter at one clause per sub-request) still exceeds the server’s byte limit – so unlike URLTooLong, automatic splitting has already been tried and exhausted. Shrink the input lists, simplify the filter, or split the call manually.

dataretrieval.exceptions.error_for_status(status: int, message: str, *, retry_after: float | None = None) → DataRetrievalError[source]

Return the typed DataRetrievalError for an HTTP error status.

The one status-to-type mapping every request path shares (the legacy query path, waterdata, streamstats), so a given status becomes the same type everywhere:

413, 414 -> URLTooLong (a RequestTooLarge) – the “too long” semantic is more actionable than a bare status, and it matches the client-side over-long-URL case
429 -> RateLimited
5xx -> ServiceUnavailable
anything else -> HTTPError

message is used verbatim; retry_after is attached only to the transient (TransientError) types. status must be an error status (>= 400) – classifying a success or redirect is a usage error and raises ValueError.

Resumable chunk interruptions

These are raised when a transparently-chunked request is interrupted mid-stream; the completed work is preserved and exc.call.resume() continues it. They are defined in dataretrieval.ogc.interruptions (they carry pandas/httpx state) but are importable from the top level, e.g. from dataretrieval import ChunkInterrupted.

class dataretrieval.ChunkInterrupted(*, completed_chunks: int, total_chunks: int, call: ChunkedCall | None = None, retry_after: float | None = None, cause: BaseException | None = None)[source]

Bases: DataRetrievalError

Base class for mid-stream chunk failures whose completed work is preserved and resumable.

A ChunkInterrupted subclass means: a sub-request failed, but ChunkedCall still owns whatever completed successfully before the failure. Call self.call.resume() to pick up where the failure stopped you — only still-pending sub-requests are re-issued.

Subclasses describe why ChunkedCall stopped so callers can pick a retry policy: QuotaExhausted for 429 (wait for the rate-limit window), ServiceInterrupted for 5xx (wait for the upstream to recover). The .call handle is the same object across every interruption of a single chunked call — frames accumulate across retries.

call

Resumable handle into the ChunkedCall that raised this exception. None only on hand-constructed exceptions (test fixtures), where .call-derived accessors degrade to empty/None.

Type:: ChunkedCall or None

retry_after

Seconds the server suggested waiting (Retry-After header). None when the server gave no hint.

Type:: float or None

completed_chunks

Number of sub-requests successfully completed before the failure.

Type:: int

total_chunks

Total sub-requests in the plan.

Type:: int

partial_frame

Combined frame of work completed by the moment this exception was raised. Snapshot at raise time — does NOT advance on a later call.resume() (use exc.call.partial_frame for the live view).

Type:: pandas.DataFrame

partial_response

Raw aggregate response covering the completed sub-requests at raise time; None if nothing had completed yet. Same snapshot semantics as partial_frame. (Raw, not finalized — use exc.call.resume() for the finalized (df, metadata) result.)

Type:: httpx.Response or None

Examples

Retry on any transient interruption, honoring the server’s Retry-After hint when present and falling back to a fixed wait otherwise. Each new interruption keeps the already-completed work intact — only the still-pending sub-requests are re-issued.

import time
from dataretrieval import ChunkInterrupted

# ``getter`` is any chunked OGC getter — e.g.
# ``waterdata.get_daily`` or ``ngwmn.get_water_level``.
try:
    df, md = getter(monitoring_location_id=long_list_of_sites)
except ChunkInterrupted as exc:
    while True:
        time.sleep(exc.retry_after or 5 * 60)
        try:
            df, md = exc.call.resume()
            break
        except ChunkInterrupted as next_exc:
            exc = next_exc

class dataretrieval.QuotaExhausted(*, completed_chunks: int, total_chunks: int, call: ChunkedCall | None = None, retry_after: float | None = None, cause: BaseException | None = None)[source]

Bases: ChunkInterrupted

A sub-request returned HTTP 429 — the per-key rate-limit window is exhausted. Subclass of ChunkInterrupted.

The completed sub-requests are preserved on .call; once the rate-limit window resets, .call.resume() re-issues only the still-pending work. partial_frame holds what completed before the 429.

class dataretrieval.ServiceInterrupted(*, completed_chunks: int, total_chunks: int, call: ChunkedCall | None = None, retry_after: float | None = None, cause: BaseException | None = None)[source]

Bases: ChunkInterrupted

A sub-request returned HTTP 5xx — the upstream service failed transiently. Subclass of ChunkInterrupted.

The completed sub-requests are preserved on .call; once the upstream recovers, .call.resume() resumes only the still-pending work.