dataretrieval.utils

Useful utilities for data munging.

class dataretrieval.utils.Ambient(name: str, default: _T)[source]

A ContextVar paired with a scoping contextmanager.

Bundles the var and its set/reset-token dance into one object, so an ambient value needs a single declaration instead of a var + setter-function pair. Read the current value with get(); set it for a with block by calling the instance — the previous value is restored on exit (and can’t leak into a later call the way a hand-written try/finally can when its reset is dropped):

_base_url = Ambient("ogc_base_url", DEFAULT)
with _base_url(other):  # scoped to the block
    _base_url.get()  # -> other
__call__(value: _T) Iterator[None][source]

Set the value for the duration of the with block.

__init__(name: str, default: _T) None[source]
__weakref__

list of weak references to the object

get() _T[source]

The current value — the default outside any active scope.

class dataretrieval.utils.BaseMetadata(response: Response)[source]

Base class for metadata.

url

Response url

Type:

str

query_time

Response elapsed time

Type:

datetime.timedelta

header

Response headers

Type:

httpx.Headers

__init__(response: Response) None[source]

Generates a standard set of metadata informed by the response.

Parameters:

response (httpx.Response) – Response object from the httpx module.

__repr__() str[source]

Return repr(self).

__weakref__

list of weak references to the object

dataretrieval.utils._attach_datetime_columns(df: DataFrame) DataFrame[source]

Add <prefix>DateTime UTC columns for any Date/Time/TimeZone triplets and sort the frame by the activity-start datetime.

Detects two naming patterns that appear in USGS Samples and Water Quality Portal CSV responses:

  • WQX3<prefix>Date, <prefix>Time, <prefix>TimeZone

  • Legacy WQP<prefix>Date, <prefix>Time/Time, <prefix>Time/TimeZoneCode

For every triplet present, a new <prefix>DateTime column is appended holding a UTC Timestamp (offsets resolved via dataretrieval.codes.tz). The original Date/Time/TimeZone columns are left intact, and an existing <prefix>DateTime column is never overwritten.

Rows are sorted (and the index reset) by the canonical activity-start datetime when present — Activity_StartDateTime (WQX3) or ActivityStartDateTime (legacy WQP) — falling back to the first detected *Date column. Mirrors R dataRetrieval’s end-of-pipeline sort in importWQP.R.

Parameters:

df (pandas.DataFrame) – DataFrame returned from a Samples or WQP CSV endpoint.

Returns:

df – A new DataFrame with derivable <prefix>DateTime columns appended and rows sorted by the activity-start datetime (if any date column was detected).

Return type:

pandas.DataFrame

dataretrieval.utils._build_utc_datetime(date_series: Series, time_series: Series, tz_series: Series) Series[source]

Combine date + time + tz-abbreviation columns into a UTC pandas Series.

Unknown timezone codes (and rows missing any of the three values) yield NaT. The input columns are not mutated.

dataretrieval.utils._default_headers() dict[str, str][source]

Build the default HTTP headers for a USGS web-API request.

Always sets a descriptive User-Agent plus Accept / Accept-Encoding and lang. If the API_USGS_PAT environment variable is set, its value is added as the X-Api-Key header — a USGS personal access token raises the request rate limit.

Shared by the OGC engine (dataretrieval.ogc), the Water Data getters (dataretrieval.waterdata), and dataretrieval.wateruse, so the request identity is consistent across every USGS API the package talks to.

Returns:

Headers suitable for an httpx request against a USGS API.

Return type:

dict[str, str]

dataretrieval.utils._get(url: str | URL, **kwargs: Any) Response[source]

httpx.get for the single-shot paths, surfacing a transport failure as a typed NetworkError (the chunker wraps its own as resumable interruptions, so it stays off this wrapper).

dataretrieval.utils._network_error(url: str | URL, exc: TransportError) NetworkError[source]

Build the NetworkError for a failed round-trip exc (no HTTP response: timeout, DNS, refused connection).

dataretrieval.utils._raise_for_status(response: Response, *, detail_from: Callable[[Response], str | None] | None = None) None[source]

Raise the typed DataRetrievalError for an HTTP error response; return None on success.

Shared by the legacy query() path (and streamstats / wateruse). Delegates the status-to-type mapping to dataretrieval.exceptions.error_for_status(), except a too-long-URL status (413 / 414): that gets the same actionable “split your query” remediation as the client-side over-long-URL case below, rather than a bare HTTP 414 (both still raise URLTooLong).

detail_from, when given, is called only on an error response to pull an API-specific detail string (e.g. a JSON error envelope’s message) out of the body; a truthy result is appended to the raised message. This lets callers surface their API’s error wording without re-implementing the status-to-type mapping and message format.

dataretrieval.utils.format_datetime(df: DataFrame, date_field: str, time_field: str, tz_field: str) DataFrame[source]

Creates a datetime field from separate date, time, and time zone fields.

Assumes ISO 8601.

Parameters:
  • df (pandas.DataFrame) – A data frame containing date, time, and timezone fields.

  • date_field (string) – Name of date column in df.

  • time_field (string) – Name of time column in df.

  • tz_field (string) – Name of time zone column in df.

Returns:

df – The data frame with a formatted ‘datetime’ column

Return type:

pandas.DataFrame

dataretrieval.utils.query(url: str, payload: dict[str, Any], delimiter: str = ',', ssl_check: bool = True) Response[source]

Send a query.

Wrapper for httpx.get that handles errors, converts listed query parameters to comma separated strings, and returns response.

Parameters:
  • url (string) – URL to query

  • payload (dict) – query parameters passed to httpx.get

  • delimiter (string) – delimiter to use with lists

  • ssl_check (bool) – If True, check SSL certificates, if False, do not check SSL, default is True

Returns:

response – The response from the API query httpx.get function call.

Return type:

httpx.Response

Raises:

DataRetrievalError – On an HTTP error response, the typed subclass for the status (see dataretrieval.exceptions.error_for_status() for the mapping); or NoSitesError when a 200 response reports no data matched; or NetworkError on a connection-level failure (timeout, DNS), with the underlying httpx exception on __cause__.

dataretrieval.utils.to_str(listlike: object, delimiter: str = ',') str | None[source]

Translates list-like objects into strings.

Parameters:
  • listlike (list-like object) – An object that is a list, or list-like (e.g., pandas.core.series.Series)

  • delimiter (string, optional) – The delimiter that is placed between entries in listlike when it is turned into a string. Default value is a comma.

Returns:

listlike – The listlike object as string separated by the delimiter

Return type:

string

Examples

>>> dataretrieval.utils.to_str([1, "a", 2])
'1,a,2'

>>> dataretrieval.utils.to_str([0, 10, 42], delimiter="+")
'0+10+42'