dataretrieval.wateruse

Retrieve USGS water-use data from the National Water Availability Assessment Data Companion (NWDC).

The NWDC web services provide national-scale, USGS-modeled water-use data that underlie the USGS National Water Availability Assessment. Estimates are served on a HUC12 (12-digit hydrologic unit) spatial grid and can be queried for any county, state, or hydrologic unit. This is the modern replacement for the defunct legacy NWIS water-use service (nwis.get_water_use).

Unlike the main Water Data getters (dataretrieval.waterdata) and NGWMN (dataretrieval.ngwmn), the NWDC is a plain CSV REST service rather than an OGC API Features collection. This module supplies the NWDC-specific bits — request building, CSV parsing, the Link-header cursor, and the {detail} error envelope — but reuses the OGC engine’s generic, API-agnostic pagination and sync-from-async plumbing (_paginate() and _run_sync()) rather than re-implementing it. It follows the same conventions: shared request headers (_default_headers()), the typed DataRetrievalError taxonomy, and a (DataFrame, BaseMetadata) return.

See https://api.water.usgs.gov/docs/nwaa-data/ for the API reference and https://water.usgs.gov/nwaa-data/ for the catalog of available models and variables.

Examples

from dataretrieval import wateruse

# Monthly public-supply withdrawals for Rhode Island, 2020 onward.
df, md = wateruse.get_wateruse(
    model="wu-public-supply-wd",
    variable=["pswdtot", "pswdgw", "pswdsw"],
    state="RI",
    start_date="2020-01",
    time_resolution="monthly",
)
dataretrieval.wateruse.MAX_CONCURRENT_REQUESTS = 4

Maximum locations fetched concurrently when a list of state/county/huc selectors is fanned out (one request per location). Kept conservative because this module intentionally carries no request backoff/retry; the NWDC tolerates this level of concurrency without rate-limit errors (verified by stress test). Set wateruse.MAX_CONCURRENT_REQUESTS = 1 for serial.

dataretrieval.wateruse.MODELS = ('wu-public-supply-wd', 'wu-public-supply-cu', 'wu-thermoelectric', 'wu-irrigation-wd', 'wu-irrigation-cu')

Water-use models (categories) served by the NWDC. The catalog at https://water.usgs.gov/nwaa-data/ lists the variables available within each.

dataretrieval.wateruse.TIME_RESOLUTIONS = ('monthly', 'annualcy', 'annualwy')

monthly, annual calendar year, annual water year.

Type:

Temporal resolutions

dataretrieval.wateruse._as_list(value: object) list[Any][source]

A scalar becomes a one-element list; any non-string iterable (list, tuple, Series, ndarray, generator) is materialized to a list. A string is treated as a scalar so it isn’t exploded into characters.

async dataretrieval.wateruse._fan_out(requests: list[Request], headers: dict[str, str], ssl_check: bool) tuple[DataFrame, Response][source]

Fetch every request (each paginated) concurrently over one shared client.

Each request is paginated by the engine’s _paginate() with NWDC strategies: parse a CSV page and read its Link header cursor (parse), follow that cursor (follow), and raise the typed error carrying the NWDC detail (raise_for_status). Concurrency is bounded by a semaphore at MAX_CONCURRENT_REQUESTS, and asyncio.gather preserves input order, so the concatenation is deterministic. The shared httpx.AsyncClient keeps connections alive across pages and requests.

dataretrieval.wateruse._next_page_url(response: Response) str | None[source]

Return the absolute URL of the next page, or None if this is the last.

Reads the standard Link: <...>; rel="next" header (parsed by httpx into response.links). A next link served against the bare water.usgs.gov host is normalized to the public api.water.usgs.gov gateway so the follow-up request reaches the API.

dataretrieval.wateruse._nwdc_error_detail(response: Response) str | None[source]

Pull the detail message out of an NWDC JSON error envelope, if any.

The NWDC reports errors as {"detail": "Invalid model name: ..."}. Passed to _raise_for_status() as detail_from so the service’s wording surfaces in the typed error message.

dataretrieval.wateruse._read_csv_page(response: Response) DataFrame[source]

Parse one CSV page; huc12_id stays a string to keep leading zeros.

dataretrieval.wateruse._resolve_locations(state: str | int | Iterable[str | int] | None, county: str | Iterable[str] | None, huc: str | Iterable[str] | None) list[str][source]

Build the NWDC location=<type>:<id> value(s) from the selectors.

Exactly one of state / county / huc must be given; each may be a single value or a list. state is normalized to the two-letter postal code stateCd requires; county is a five-digit FIPS code; and a huc code’s length selects its level (huc2huc12). Returns one location string per value — the caller issues one request per location.

dataretrieval.wateruse._validate_county(value: object) str[source]

Validate and normalize a five-digit state+county FIPS code.

dataretrieval.wateruse._validate_huc(value: object) str[source]

Validate a HUC code (even length 2-12 digits; level set by length).

dataretrieval.wateruse.get_wateruse(model: str, variable: str | Iterable[str] | None = None, state: str | int | Iterable[str | int] | None = None, county: str | Iterable[str] | None = None, huc: str | Iterable[str] | None = None, time_resolution: str | None = None, start_date: str | None = None, end_date: str | None = None, intersection: str = 'overlap', limit: int = 600, ssl_check: bool = True) tuple[DataFrame, BaseMetadata][source]

Get USGS water-use data from the NWDC web service.

Retrieves modeled water-use estimates from the USGS National Water Availability Assessment Data Companion. The area is given as exactly one of state, county, or huc; results are always returned on a HUC12 grid, in a long (tidy) frame with one row per HUC12 and time step. Large areas (e.g. a whole region or a populous state) are served across multiple pages, which this function follows transparently and concatenates into one frame.

Each selector also accepts a list of values. The NWDC queries one area per request, so a list is fanned out into one request per value — up to MAX_CONCURRENT_REQUESTS in parallel — and the results are concatenated in the order given.

Parameters:
  • model (string) – Water-use category to query. See MODELS for the available options (e.g. "wu-public-supply-wd"). The full catalog of models and their variables is at https://water.usgs.gov/nwaa-data/.

  • variable (string or iterable of strings, optional) – One or more variable IDs within model (e.g. "pswdtot" for total public-supply withdrawals, or ["pswdgw", "pswdsw"] for the groundwater and surface-water components). Multiple variables are comma-joined into a single request. The service requires at least one variable; omitting it returns a 400 listing the model’s valid variable IDs (surfaced as a DataRetrievalError).

  • state (string, int, or iterable, optional) – One or more US states/territories to query. Each accepts a full name ("Wisconsin"), a two-letter postal code ("WI"), or a two-digit ANSI/FIPS code ("55" or 55), mirroring dataretrieval.ngwmn.get_sites().

  • county (string or iterable, optional) – One or more five-digit county FIPS codes — state FIPS + county FIPS, e.g. "55025" for Dane County, Wisconsin.

  • huc (string or iterable, optional) –

    One or more hydrologic unit codes. Each code’s level is taken from its length: a 2-digit code queries a HUC2 region, 8-digit a HUC8 subbasin, 12-digit a single HUC12, and so on (even lengths 2-12, e.g. "04", "07070005", "010900020502").

    Provide exactly one of state, county, or huc (each may be a single value or a list).

  • time_resolution (string, optional) – Temporal resolution: "monthly", "annualcy" (annual, calendar year), or "annualwy" (annual, water year). See TIME_RESOLUTIONS.

  • start_date (string, optional) – Start of the query window, formatted "YYYY" for annual data or "YYYY-MM" for monthly data.

  • end_date (string, optional) – End of the query window, in the same format as start_date.

  • intersection (string, optional) – How to select HUC12s that straddle the queried-area boundary: "overlap" (any overlap, the default) or "envelop" (fully enclosed).

  • limit (int, optional) – Maximum number of HUC12s returned per page. Queries spanning more than limit HUC12s are split across pages and reassembled. Default 600.

  • ssl_check (bool, optional) – If True (default), verify SSL certificates; set False to skip verification (e.g. behind a TLS-intercepting proxy).

Returns:

  • df (pandas.DataFrame) – Water-use estimates in long form: a huc12_id column (string, leading zeros preserved), a time column (year_month for monthly data or year for annual data), and one value column per requested variable (suffixed with its unit, e.g. pswdtot_mgd for million gallons per day).

  • md (dataretrieval.utils.BaseMetadata) – Metadata describing the request (URL, query time, response headers).

Raises:
  • ValueError – If not exactly one of state, county, or huc is given, or a given selector is malformed (an unrecognized state, a county code that is not five digits, or a HUC of invalid length).

  • DataRetrievalError – On an HTTP error response, the typed subclass for the status (see dataretrieval.exceptions.error_for_status()); or NetworkError on a connection-level failure (timeout, DNS).

Examples

>>> from dataretrieval import wateruse
>>> df, md = wateruse.get_wateruse(
...     model="wu-public-supply-wd",
...     variable=["pswdtot", "pswdgw", "pswdsw"],
...     state="RI",
...     start_date="2020-01",
...     time_resolution="monthly",
... )