dataretrieval.utils

Useful utilities for data munging.

class dataretrieval.utils.BaseMetadata(response: Response)[source]

Base class for metadata.

url

Response url

Type:

str

query_time

Response elapsed time

Type:

datetime.timedelta

header

Response headers

Type:

httpx.Headers

__init__(response: Response) None[source]

Generates a standard set of metadata informed by the response.

Parameters:

response (httpx.Response) – Response object from the httpx module.

__repr__() str[source]

Return repr(self).

__weakref__

list of weak references to the object

dataretrieval.utils._attach_datetime_columns(df: DataFrame) DataFrame[source]

Add <prefix>DateTime UTC columns for any Date/Time/TimeZone triplets and sort the frame by the activity-start datetime.

Detects two naming patterns that appear in USGS Samples and Water Quality Portal CSV responses:

  • WQX3<prefix>Date, <prefix>Time, <prefix>TimeZone

  • Legacy WQP<prefix>Date, <prefix>Time/Time, <prefix>Time/TimeZoneCode

For every triplet present, a new <prefix>DateTime column is appended holding a UTC Timestamp (offsets resolved via dataretrieval.codes.tz). The original Date/Time/TimeZone columns are left intact, and an existing <prefix>DateTime column is never overwritten.

Rows are sorted (and the index reset) by the canonical activity-start datetime when present — Activity_StartDateTime (WQX3) or ActivityStartDateTime (legacy WQP) — falling back to the first detected *Date column. Mirrors R dataRetrieval’s end-of-pipeline sort in importWQP.R.

Parameters:

df (pandas.DataFrame) – DataFrame returned from a Samples or WQP CSV endpoint.

Returns:

df – A new DataFrame with derivable <prefix>DateTime columns appended and rows sorted by the activity-start datetime (if any date column was detected).

Return type:

pandas.DataFrame

dataretrieval.utils._build_utc_datetime(date_series: Series, time_series: Series, tz_series: Series) Series[source]

Combine date + time + tz-abbreviation columns into a UTC pandas Series.

Unknown timezone codes (and rows missing any of the three values) yield NaT. The input columns are not mutated.

dataretrieval.utils._get(url: str | URL, **kwargs: Any) Response[source]

httpx.get for the single-shot paths, surfacing a transport failure as a typed NetworkError (the chunker wraps its own as resumable interruptions, so it stays off this wrapper).

dataretrieval.utils._network_error(url: str | URL, exc: TransportError) NetworkError[source]

Build the NetworkError for a failed round-trip exc (no HTTP response: timeout, DNS, refused connection).

dataretrieval.utils._raise_for_status(response: Response) None[source]

Raise the typed DataRetrievalError for an HTTP error response; return None on success.

Shared by the legacy query() path (and nadp / streamstats). Delegates the status-to-type mapping to dataretrieval.exceptions.error_for_status(), except a too-long-URL status (413 / 414): that gets the same actionable “split your query” remediation as the client-side over-long-URL case below, rather than a bare HTTP 414 (both still raise URLTooLong).

dataretrieval.utils.format_datetime(df: DataFrame, date_field: str, time_field: str, tz_field: str) DataFrame[source]

Creates a datetime field from separate date, time, and time zone fields.

Assumes ISO 8601.

Parameters:
  • df (pandas.DataFrame) – A data frame containing date, time, and timezone fields.

  • date_field (string) – Name of date column in df.

  • time_field (string) – Name of time column in df.

  • tz_field (string) – Name of time zone column in df.

Returns:

df – The data frame with a formatted ‘datetime’ column

Return type:

pandas.DataFrame

dataretrieval.utils.query(url: str, payload: dict[str, Any], delimiter: str = ',', ssl_check: bool = True) Response[source]

Send a query.

Wrapper for httpx.get that handles errors, converts listed query parameters to comma separated strings, and returns response.

Parameters:
  • url (string) – URL to query

  • payload (dict) – query parameters passed to httpx.get

  • delimiter (string) – delimiter to use with lists

  • ssl_check (bool) – If True, check SSL certificates, if False, do not check SSL, default is True

Returns:

response – The response from the API query httpx.get function call.

Return type:

httpx.Response

Raises:

DataRetrievalError – On an HTTP error response, the typed subclass for the status (see dataretrieval.exceptions.error_for_status() for the mapping); or NoSitesError when a 200 response reports no data matched; or NetworkError on a connection-level failure (timeout, DNS), with the underlying httpx exception on __cause__.

dataretrieval.utils.to_str(listlike: object, delimiter: str = ',') str | None[source]

Translates list-like objects into strings.

Parameters:
  • listlike (list-like object) – An object that is a list, or list-like (e.g., pandas.core.series.Series)

  • delimiter (string, optional) – The delimiter that is placed between entries in listlike when it is turned into a string. Default value is a comma.

Returns:

listlike – The listlike object as string separated by the delimiter

Return type:

string

Examples

>>> dataretrieval.utils.to_str([1, "a", 2])
'1,a,2'

>>> dataretrieval.utils.to_str([0, 10, 42], delimiter="+")
'0+10+42'