dataretrieval.waterdata

Water Data API module for accessing USGS water data services.

This module provides functions for downloading data from the Water Data APIs, including the USGS Aquarius Samples database.

See https://api.waterdata.usgs.gov/ for API reference.

dataretrieval.waterdata.get_codes(code_service: Literal['characteristicgroup', 'characteristics', 'counties', 'countries', 'observedproperty', 'samplemedia', 'sitetype', 'states']) → DataFrame[source]

Return codes from a Samples code service.

Parameters:: code_service (string) – One of the following options: “states”, “counties”, “countries” “sitetype”, “samplemedia”, “characteristicgroup”, “characteristics”, or “observedproperty”

dataretrieval.waterdata.get_continuous(monitoring_location_id: List[str] | str | None = None, parameter_code: List[str] | str | None = None, statistic_id: List[str] | str | None = None, properties: List[str] | None = None, time_series_id: List[str] | str | None = None, continuous_id: List[str] | str | None = None, approval_status: List[str] | str | None = None, unit_of_measure: List[str] | str | None = None, qualifier: List[str] | str | None = None, value: List[str] | str | None = None, last_modified: str | None = None, time: List[str] | str | None = None, limit: int | None = None, convert_type: bool = True) → Tuple[DataFrame, BaseMetadata][source]

Continuous data provide instantanous water conditions.

This is an early version of the continuous endpoint that is feature-complete and is being made available for limited use. Geometries are not included with the continuous endpoint. If the “time” input is left blank, the service will return the most recent year of measurements. Users may request no more than three years of data with each function call.

Continuous data are collected at a high frequency, typically 15-minute intervals. Depending on the specific monitoring location, the data may be transmitted automatically via telemetry and be available on WDFN within minutes of collection, while other times the delivery of data may be delayed if the monitoring location does not have the capacity to automatically transmit data. Continuous data are described by parameter name and parameter code (pcode). These data might also be referred to as “instantaneous values” or “IV”.

Parameters:

monitoring_location_id (string or list of strings, optional) – A unique identifier representing a single monitoring location. This corresponds to the id field in the monitoring-locations endpoint. Monitoring location IDs are created by combining the agency code of the agency responsible for the monitoring location (e.g. USGS) with the ID number of the monitoring location (e.g. 02238500), separated by a hyphen (e.g. USGS-02238500).
parameter_code (string or list of strings, optional) – Parameter codes are 5-digit codes used to identify the constituent measured and the units of measure. A complete list of parameter codes and associated groupings can be found at https://help.waterdata.usgs.gov/codes-and-parameters/parameters.
statistic_id (string or list of strings, optional) – A code corresponding to the statistic an observation represents. Continuous data are nearly always associated with statistic id 00011. Using a different code (such as 00003 for mean) will typically return no results. A complete list of codes and their descriptions can be found at https://help.waterdata.usgs.gov/code/stat_cd_nm_query?stat_nm_cd=%25&fmt=html.
properties (string or list of strings, optional) – A vector of requested columns to be returned from the query. Available options are: geometry, id, time_series_id, monitoring_location_id, parameter_code, statistic_id, time, value, unit_of_measure, approval_status, qualifier, last_modified
time_series_id (string or list of strings, optional) – A unique identifier representing a single time series. This corresponds to the id field in the time-series-metadata endpoint.
continuous_id (string or list of strings, optional) – A universally unique identifier (UUID) representing a single version of a record. It is not stable over time. Every time the record is refreshed in our database (which may happen as part of normal operations and does not imply any change to the data itself) a new ID will be generated. To uniquely identify a single observation over time, compare the time and time_series_id fields; each time series will only have a single observation at a given time.
approval_status (string or list of strings, optional) – Some of the data that you have obtained from this U.S. Geological Survey database may not have received Director’s approval. Any such data values are qualified as provisional and are subject to revision. Provisional data are released on the condition that neither the USGS nor the United States Government may be held liable for any damages resulting from its use. This field reflects the approval status of each record, and is either “Approved”, meaining processing review has been completed and the data is approved for publication, or “Provisional” and subject to revision. For more information about provisional data, go to: https://waterdata.usgs.gov/provisional-data-statement/.
unit_of_measure (string or list of strings, optional) – A human-readable description of the units of measurement associated with an observation.
qualifier (string or list of strings, optional) – This field indicates any qualifiers associated with an observation, for instance if a sensor may have been impacted by ice or if values were estimated.
value (string or list of strings, optional) – The value of the observation. Values are transmitted as strings in the JSON response format in order to preserve precision.
last_modified (string, optional) –
The last time a record was refreshed in our database. This may happen due to regular operational processes and does not necessarily indicate anything about the measurement has changed. You can query this field using date-times or intervals, adhering to RFC 3339, or using ISO 8601 duration objects. Intervals may be bounded or half-bounded (double-dots at start or end). Examples:
- A date-time: “2018-02-12T23:20:50Z”
- A bounded interval: “2018-02-12T00:00:00Z/2018-03-18T12:31:12Z”
- Half-bounded intervals: “2018-02-12T00:00:00Z/..” or “../2018-03-18T12:31:12Z”
- Duration objects: “P1M” for data from the past month or “PT36H” for the last 36 hours
Only features that have a last_modified that intersects the value of datetime are selected.
time (string, optional) –
The date an observation represents. You can query this field using date-times or intervals, adhering to RFC 3339, or using ISO 8601 duration objects. Intervals may be bounded or half-bounded (double-dots at start or end). Only features that have a time that intersects the value of datetime are selected. If a feature has multiple temporal properties, it is the decision of the server whether only a single temporal property is used to determine the extent or all relevant temporal properties. Examples:
- A date-time: “2018-02-12T23:20:50Z”
- A bounded interval: “2018-02-12T00:00:00Z/2018-03-18T12:31:12Z”
- Half-bounded intervals: “2018-02-12T00:00:00Z/..” or “../2018-03-18T12:31:12Z”
- Duration objects: “P1M” for data from the past month or “PT36H” for the last 36 hours
limit (numeric, optional) – The optional limit parameter is used to control the subset of the selected features that should be returned in each page. The maximum allowable limit is 10000. It may be beneficial to set this number lower if your internet connection is spotty. The default (NA) will set the limit to the maximum allowable limit for the service.
convert_type (boolean, optional) – If True, the function will convert the data to dates and qualifier to string vector

Returns:

df (pandas.DataFrame or geopandas.GeoDataFrame) – Formatted data returned from the API query.
md (dataretrieval.utils.Metadata) – A custom metadata object

Examples

>>> # Get instantaneous gage height data from a
>>> # single site from a single year
>>> df, md = dataretrieval.waterdata.get_continuous(
...     monitoring_location_id="USGS-02238500",
...     parameter_code="00065",
...     time="2021-01-01T00:00:00Z/2022-01-01T00:00:00Z",
... )

Daily data provide one data value to represent water conditions for the day.

Throughout much of the history of the USGS, the primary water data available was daily data collected manually at the monitoring location once each day. With improved availability of computer storage and automated transmission of data, the daily data published today are generally a statistical summary or metric of the continuous data collected each day, such as the daily mean, minimum, or maximum value. Daily data are automatically calculated from the continuous data of the same parameter code and are described by parameter code and a statistic code. These data have also been referred to as “daily values” or “DV”.

Parameters:

monitoring_location_id (string or list of strings, optional) – A unique identifier representing a single monitoring location. This corresponds to the id field in the monitoring-locations endpoint. Monitoring location IDs are created by combining the agency code of the agency responsible for the monitoring location (e.g. USGS) with the ID number of the monitoring location (e.g. 02238500), separated by a hyphen (e.g. USGS-02238500).
parameter_code (string or list of strings, optional) – Parameter codes are 5-digit codes used to identify the constituent measured and the units of measure. A complete list of parameter codes and associated groupings can be found at https://help.waterdata.usgs.gov/codes-and-parameters/parameters.
statistic_id (string or list of strings, optional) – A code corresponding to the statistic an observation represents. Example codes include 00001 (max), 00002 (min), and 00003 (mean). A complete list of codes and their descriptions can be found at https://help.waterdata.usgs.gov/code/stat_cd_nm_query?stat_nm_cd=%25&fmt=html.
properties (string or list of strings, optional) – A vector of requested columns to be returned from the query. Available options are: geometry, id, time_series_id, monitoring_location_id, parameter_code, statistic_id, time, value, unit_of_measure, approval_status, qualifier, last_modified
time_series_id (string or list of strings, optional) – A unique identifier representing a single time series. This corresponds to the id field in the time-series-metadata endpoint.
daily_id (string or list of strings, optional) – A universally unique identifier (UUID) representing a single version of a record. It is not stable over time. Every time the record is refreshed in our database (which may happen as part of normal operations and does not imply any change to the data itself) a new ID will be generated. To uniquely identify a single observation over time, compare the time and time_series_id fields; each time series will only have a single observation at a given time.
approval_status (string or list of strings, optional) – Some of the data that you have obtained from this U.S. Geological Survey database may not have received Director’s approval. Any such data values are qualified as provisional and are subject to revision. Provisional data are released on the condition that neither the USGS nor the United States Government may be held liable for any damages resulting from its use. This field reflects the approval status of each record, and is either “Approved”, meaining processing review has been completed and the data is approved for publication, or “Provisional” and subject to revision. For more information about provisional data, go to: https://waterdata.usgs.gov/provisional-data-statement/.
unit_of_measure (string or list of strings, optional) – A human-readable description of the units of measurement associated with an observation.
qualifier (string or list of strings, optional) – This field indicates any qualifiers associated with an observation, for instance if a sensor may have been impacted by ice or if values were estimated.
value (string or list of strings, optional) – The value of the observation. Values are transmitted as strings in the JSON response format in order to preserve precision.
last_modified (string, optional) –
The last time a record was refreshed in our database. This may happen due to regular operational processes and does not necessarily indicate anything about the measurement has changed. You can query this field using date-times or intervals, adhering to RFC 3339, or using ISO 8601 duration objects. Intervals may be bounded or half-bounded (double-dots at start or end). Examples:
- A date-time: “2018-02-12T23:20:50Z”
- A bounded interval: “2018-02-12T00:00:00Z/2018-03-18T12:31:12Z”
- Half-bounded intervals: “2018-02-12T00:00:00Z/..” or “../2018-03-18T12:31:12Z”
- Duration objects: “P1M” for data from the past month or “PT36H” for the last 36 hours
Only features that have a last_modified that intersects the value of datetime are selected.
skip_geometry (boolean, optional) – This option can be used to skip response geometries for each feature. The returning object will be a data frame with no spatial information. Note that the USGS Water Data APIs use camelCase “skipGeometry” in CQL2 queries.
time (string, optional) –
The date an observation represents. You can query this field using date-times or intervals, adhering to RFC 3339, or using ISO 8601 duration objects. Intervals may be bounded or half-bounded (double-dots at start or end). Only features that have a time that intersects the value of datetime are selected. If a feature has multiple temporal properties, it is the decision of the server whether only a single temporal property is used to determine the extent or all relevant temporal properties. Examples:
- A date-time: “2018-02-12T23:20:50Z”
- A bounded interval: “2018-02-12T00:00:00Z/2018-03-18T12:31:12Z”
- Half-bounded intervals: “2018-02-12T00:00:00Z/..” or “../2018-03-18T12:31:12Z”
- Duration objects: “P1M” for data from the past month or “PT36H” for the last 36 hours
bbox (list of numbers, optional) – Only features that have a geometry that intersects the bounding box are selected. The bounding box is provided as four or six numbers, depending on whether the coordinate reference system includes a vertical axis (height or depth). Coordinates are assumed to be in crs 4326. The expected format is a numeric vector structured: c(xmin,ymin,xmax,ymax). Another way to think of it is c(Western-most longitude, Southern-most latitude, Eastern-most longitude, Northern-most longitude).
limit (numeric, optional) – The optional limit parameter is used to control the subset of the selected features that should be returned in each page. The maximum allowable limit is 50000. It may be beneficial to set this number lower if your internet connection is spotty. The default (NA) will set the limit to the maximum allowable limit for the service.
convert_type (boolean, optional) – If True, converts columns to appropriate types.

Returns:

df (pandas.DataFrame or geopandas.GeoDataFrame) – Formatted data returned from the API query.
md (dataretrieval.utils.Metadata) – A custom metadata object

Examples

>>> # Get daily flow data from a single site
>>> # over a yearlong period
>>> df, md = dataretrieval.waterdata.get_daily(
...     monitoring_location_id="USGS-02238500",
...     parameter_code="00060",
...     time="2021-01-01T00:00:00Z/2022-01-01T00:00:00Z",
... )

>>> # Get approved daily flow data from multiple sites
>>> df, md = dataretrieval.waterdata.get_daily(
...     monitoring_location_id = ["USGS-05114000", "USGS-09423350"],
...     approval_status = "Approved",
...     time = "2024-01-01/.."

dataretrieval.waterdata.get_field_measurements(monitoring_location_id: List[str] | str | None = None, parameter_code: List[str] | str | None = None, observing_procedure_code: List[str] | str | None = None, properties: List[str] | None = None, field_visit_id: List[str] | str | None = None, approval_status: List[str] | str | None = None, unit_of_measure: List[str] | str | None = None, qualifier: List[str] | str | None = None, value: List[str] | str | None = None, last_modified: List[str] | str | None = None, observing_procedure: List[str] | str | None = None, vertical_datum: List[str] | str | None = None, measuring_agency: List[str] | str | None = None, skip_geometry: bool | None = None, time: List[str] | str | None = None, bbox: List[float] | None = None, limit: int | None = None, convert_type: bool = True) → Tuple[DataFrame, BaseMetadata][source]

Field measurements are physically measured values collected during a visit to the monitoring location. Field measurements consist of measurements of gage height and discharge, and readings of groundwater levels, and are primarily used as calibration readings for the automated sensors collecting continuous data. They are collected at a low frequency, and delivery of the data in WDFN may be delayed due to data processing time.

Parameters:

monitoring_location_id (string or list of strings, optional) – A unique identifier representing a single monitoring location. This corresponds to the id field in the monitoring-locations endpoint. Monitoring location IDs are created by combining the agency code of the agency responsible for the monitoring location (e.g. USGS) with the ID number of the monitoring location (e.g. 02238500), separated by a hyphen (e.g. USGS-02238500).
parameter_code (string or list of strings, optional) – Parameter codes are 5-digit codes used to identify the constituent measured and the units of measure. A complete list of parameter codes and associated groupings can be found at https://help.waterdata.usgs.gov/codes-and-parameters/parameters.
observing_procedure_code (string or list of strings, optional) – A short code corresponding to the observing procedure for the field measurement.
properties (string or list of strings, optional) – A vector of requested columns to be returned from the query. Available options are: geometry, id, time_series_id, monitoring_location_id, parameter_code, statistic_id, time, value, unit_of_measure, approval_status, qualifier, last_modified
field_visit_id (string or list of strings, optional) – A universally unique identifier (UUID) for the field visit. Multiple measurements may be made during a single field visit.
approval_status (string or list of strings, optional) – Some of the data that you have obtained from this U.S. Geological Survey database may not have received Director’s approval. Any such data values are qualified as provisional and are subject to revision. Provisional data are released on the condition that neither the USGS nor the United States Government may be held liable for any damages resulting from its use. This field reflects the approval status of each record, and is either “Approved”, meaining processing review has been completed and the data is approved for publication, or “Provisional” and subject to revision. For more information about provisional data, go to: https://waterdata.usgs.gov/provisional-data-statement/.
unit_of_measure (string or list of strings, optional) – A human-readable description of the units of measurement associated with an observation.
qualifier (string or list of strings, optional) – This field indicates any qualifiers associated with an observation, for instance if a sensor may have been impacted by ice or if values were estimated.
value (string or list of strings, optional) – The value of the observation. Values are transmitted as strings in the JSON response format in order to preserve precision.
last_modified (string, optional) –
The last time a record was refreshed in our database. This may happen due to regular operational processes and does not necessarily indicate anything about the measurement has changed. You can query this field using date-times or intervals, adhering to RFC 3339, or using ISO 8601 duration objects. Intervals may be bounded or half-bounded (double-dots at start or end). Only features that have a last_modified that intersects the value of datetime are selected. Examples:
- A date-time: “2018-02-12T23:20:50Z”
- A bounded interval: “2018-02-12T00:00:00Z/2018-03-18T12:31:12Z”
- Half-bounded intervals: “2018-02-12T00:00:00Z/..” or “../2018-03-18T12:31:12Z”
- Duration objects: “P1M” for data from the past month or “PT36H” for the last 36 hours
observing_procedure (string or list of strings, optional) – Water measurement or water-quality observing procedure descriptions.
vertical_datum (string or list of strings, optional) – The datum used to determine altitude and vertical position at the monitoring location. A list of codes is available.
measuring_agency (string or list of strings, optional) – The agency performing the measurement.
skip_geometry (boolean, optional) – This option can be used to skip response geometries for each feature. The returning object will be a data frame with no spatial information. Note that the USGS Water Data APIs use camelCase “skipGeometry” in CQL2 queries.
time (string, optional) –
The date an observation represents. You can query this field using date-times or intervals, adhering to RFC 3339, or using ISO 8601 duration objects. Intervals may be bounded or half-bounded (double-dots at start or end). Only features that have a time that intersects the value of datetime are selected. If a feature has multiple temporal properties, it is the decision of the server whether only a single temporal property is used to determine the extent or all relevant temporal properties. Examples:
- A date-time: “2018-02-12T23:20:50Z”
- A bounded interval: “2018-02-12T00:00:00Z/2018-03-18T12:31:12Z”
- Half-bounded intervals: “2018-02-12T00:00:00Z/..” or “../2018-03-18T12:31:12Z”
- Duration objects: “P1M” for data from the past month or “PT36H” for the last 36 hours
bbox (list of numbers, optional) – Only features that have a geometry that intersects the bounding box are selected. The bounding box is provided as four or six numbers, depending on whether the coordinate reference system includes a vertical axis (height or depth). Coordinates are assumed to be in crs 4326. The expected format is a numeric vector structured: c(xmin,ymin,xmax,ymax). Another way to think of it is c(Western-most longitude, Southern-most latitude, Eastern-most longitude, Northern-most longitude).
limit (numeric, optional) – The optional limit parameter is used to control the subset of the selected features that should be returned in each page. The maximum allowable limit is 50000. It may be beneficial to set this number lower if your internet connection is spotty. The default (None) will set the limit to the maximum allowable limit for the service.
convert_type (boolean, optional) – If True, converts columns to appropriate types.

Returns:

df (pandas.DataFrame or geopandas.GeoDataFrame) – Formatted data returned from the API query.
md (dataretrieval.utils.Metadata) – A custom metadata object

Examples

>>> # Get field measurements from a single groundwater site
>>> # and parameter code, and do not return geometry
>>> df, md = dataretrieval.waterdata.get_field_measurements(
...     monitoring_location_id="USGS-375907091432201",
...     parameter_code="72019",
...     skip_geometry=True,
... )

>>> # Get field measurements from multiple sites and
>>> # parameter codes from the last 20 years
>>> df, md = dataretrieval.waterdata.get_field_measurements(
...     monitoring_location_id = ["USGS-451605097071701",
                                  "USGS-263819081585801"],
...     parameter_code = ["62611", "72019"],
...     time = "P20Y"
... )

dataretrieval.waterdata.get_latest_continuous(monitoring_location_id: List[str] | str | None = None, parameter_code: List[str] | str | None = None, statistic_id: List[str] | str | None = None, properties: List[str] | str | None = None, time_series_id: List[str] | str | None = None, latest_continuous_id: List[str] | str | None = None, approval_status: List[str] | str | None = None, unit_of_measure: List[str] | str | None = None, qualifier: List[str] | str | None = None, value: int | None = None, last_modified: List[str] | str | None = None, skip_geometry: bool | None = None, time: List[str] | str | None = None, bbox: List[float] | None = None, limit: int | None = None, convert_type: bool = True) → Tuple[DataFrame, BaseMetadata][source]

This endpoint provides the most recent observation for each time series of continuous data. Continuous data are collected via automated sensors installed at a monitoring location. They are collected at a high frequency and often at a fixed 15-minute interval. Depending on the specific monitoring location, the data may be transmitted automatically via telemetry and be available on WDFN within minutes of collection, while other times the delivery of data may be delayed if the monitoring location does not have the capacity to automatically transmit data. Continuous data are described by parameter name and parameter code. These data might also be referred to as “instantaneous values” or “IV”

Parameters:

monitoring_location_id (string or list of strings, optional) – A unique identifier representing a single monitoring location. This corresponds to the id field in the monitoring-locations endpoint. Monitoring location IDs are created by combining the agency code of the agency responsible for the monitoring location (e.g. USGS) with the ID number of the monitoring location (e.g. 02238500), separated by a hyphen (e.g. USGS-02238500).
parameter_code (string or list of strings, optional) – Parameter codes are 5-digit codes used to identify the constituent measured and the units of measure. A complete list of parameter codes and associated groupings can be found at https://help.waterdata.usgs.gov/codes-and-parameters/parameters.
statistic_id (string or list of strings, optional) – A code corresponding to the statistic an observation represents. Example codes include 00001 (max), 00002 (min), and 00003 (mean). A complete list of codes and their descriptions can be found at https://help.waterdata.usgs.gov/code/stat_cd_nm_query?stat_nm_cd=%25&fmt=html.
properties (string or list of strings, optional) – A vector of requested columns to be returned from the query. Available options are: geometry, id, time_series_id, monitoring_location_id, parameter_code, statistic_id, time, value, unit_of_measure, approval_status, qualifier, last_modified
time_series_id (string or list of strings, optional) – A unique identifier representing a single time series. This corresponds to the id field in the time-series-metadata endpoint.
latest_continuous_id (string or list of strings, optional) – A universally unique identifier (UUID) representing a single version of a record. It is not stable over time. Every time the record is refreshed in our database (which may happen as part of normal operations and does not imply any change to the data itself) a new ID will be generated. To uniquely identify a single observation over time, compare the time and time_series_id fields; each time series will only have a single observation at a given time.
approval_status (string or list of strings, optional) – Some of the data that you have obtained from this U.S. Geological Survey database may not have received Director’s approval. Any such data values are qualified as provisional and are subject to revision. Provisional data are released on the condition that neither the USGS nor the United States Government may be held liable for any damages resulting from its use. This field reflects the approval status of each record, and is either “Approved”, meaining processing review has been completed and the data is approved for publication, or “Provisional” and subject to revision. For more information about provisional data, go to: https://waterdata.usgs.gov/provisional-data-statement/.
unit_of_measure (string or list of strings, optional) – A human-readable description of the units of measurement associated with an observation.
qualifier (string or list of strings, optional) – This field indicates any qualifiers associated with an observation, for instance if a sensor may have been impacted by ice or if values were estimated.
value (string or list of strings, optional) – The value of the observation. Values are transmitted as strings in the JSON response format in order to preserve precision.
last_modified (string, optional) –
The last time a record was refreshed in our database. This may happen due to regular operational processes and does not necessarily indicate anything about the measurement has changed. You can query this field using date-times or intervals, adhering to RFC 3339, or using ISO 8601 duration objects. Intervals may be bounded or half-bounded (double-dots at start or end). Only features that have a last_modified that intersects the value of datetime are selected. Examples:
- A date-time: “2018-02-12T23:20:50Z”
- A bounded interval: “2018-02-12T00:00:00Z/2018-03-18T12:31:12Z”
- Half-bounded intervals: “2018-02-12T00:00:00Z/..” or “../2018-03-18T12:31:12Z”
- Duration objects: “P1M” for data from the past month or “PT36H” for the last 36 hours
skip_geometry (boolean, optional) – This option can be used to skip response geometries for each feature. The returning object will be a data frame with no spatial information. Note that the USGS Water Data APIs use camelCase “skipGeometry” in CQL2 queries.
time (string, optional) –
The date an observation represents. You can query this field using date-times or intervals, adhering to RFC 3339, or using ISO 8601 duration objects. Intervals may be bounded or half-bounded (double-dots at start or end). Only features that have a time that intersects the value of datetime are selected. If a feature has multiple temporal properties, it is the decision of the server whether only a single temporal property is used to determine the extent or all relevant temporal properties. Examples:
- A date-time: “2018-02-12T23:20:50Z”
- A bounded interval: “2018-02-12T00:00:00Z/2018-03-18T12:31:12Z”
- Half-bounded intervals: “2018-02-12T00:00:00Z/..” or “../2018-03-18T12:31:12Z”
- Duration objects: “P1M” for data from the past month or “PT36H” for the last 36 hours
bbox (list of numbers, optional) – Only features that have a geometry that intersects the bounding box are selected. The bounding box is provided as four or six numbers, depending on whether the coordinate reference system includes a vertical axis (height or depth). Coordinates are assumed to be in crs 4326. The expected format is a numeric vector structured: c(xmin,ymin,xmax,ymax). Another way to think of it is c(Western-most longitude, Southern-most latitude, Eastern-most longitude, Northern-most longitude).
limit (numeric, optional) – The optional limit parameter is used to control the subset of the selected features that should be returned in each page. The maximum allowable limit is 50000. It may be beneficial to set this number lower if your internet connection is spotty. The default (None) will set the limit to the maximum allowable limit for the service.
convert_type (boolean, optional) – If True, converts columns to appropriate types.

Returns:

df (pandas.DataFrame or geopandas.GeoDataFrame) – Formatted data returned from the API query.
md (dataretrieval.utils.Metadata) – A custom metadata object

Examples

>>> # Get latest flow data from a single site
>>> df, md = dataretrieval.waterdata.get_latest_continuous(
...     monitoring_location_id="USGS-02238500", parameter_code="00060"
... )

>>> # Get latest continuous measurements for multiple sites
>>> df, md = dataretrieval.waterdata.get_latest_continuous(
...     monitoring_location_id=["USGS-05114000", "USGS-09423350"]
... )

dataretrieval.waterdata.get_latest_daily(monitoring_location_id: List[str] | str | None = None, parameter_code: List[str] | str | None = None, statistic_id: List[str] | str | None = None, properties: List[str] | str | None = None, time_series_id: List[str] | str | None = None, latest_daily_id: List[str] | str | None = None, approval_status: List[str] | str | None = None, unit_of_measure: List[str] | str | None = None, qualifier: List[str] | str | None = None, value: int | None = None, last_modified: List[str] | str | None = None, skip_geometry: bool | None = None, time: List[str] | str | None = None, bbox: List[float] | None = None, limit: int | None = None, convert_type: bool = True) → Tuple[DataFrame, BaseMetadata][source]

Daily data provide one data value to represent water conditions for the day.

Throughout much of the history of the USGS, the primary water data available was daily data collected manually at the monitoring location once each day. With improved availability of computer storage and automated transmission of data, the daily data published today are generally a statistical summary or metric of the continuous data collected each day, such as the daily mean, minimum, or maximum value. Daily data are automatically calculated from the continuous data of the same parameter code and are described by parameter code and a statistic code. These data have also been referred to as “daily values” or “DV”.

Parameters:

monitoring_location_id (string or list of strings, optional) – A unique identifier representing a single monitoring location. This corresponds to the id field in the monitoring-locations endpoint. Monitoring location IDs are created by combining the agency code of the agency responsible for the monitoring location (e.g. USGS) with the ID number of the monitoring location (e.g. 02238500), separated by a hyphen (e.g. USGS-02238500).
parameter_code (string or list of strings, optional) – Parameter codes are 5-digit codes used to identify the constituent measured and the units of measure. A complete list of parameter codes and associated groupings can be found at https://help.waterdata.usgs.gov/codes-and-parameters/parameters.
statistic_id (string or list of strings, optional) – A code corresponding to the statistic an observation represents. Example codes include 00001 (max), 00002 (min), and 00003 (mean). A complete list of codes and their descriptions can be found at https://help.waterdata.usgs.gov/code/stat_cd_nm_query?stat_nm_cd=%25&fmt=html.
properties (string or list of strings, optional) – A vector of requested columns to be returned from the query. Available options are: geometry, id, time_series_id, monitoring_location_id, parameter_code, statistic_id, time, value, unit_of_measure, approval_status, qualifier, last_modified
time_series_id (string or list of strings, optional) – A unique identifier representing a single time series. This corresponds to the id field in the time-series-metadata endpoint.
latest_daily_id (string or list of strings, optional) – A universally unique identifier (UUID) representing a single version of a record. It is not stable over time. Every time the record is refreshed in our database (which may happen as part of normal operations and does not imply any change to the data itself) a new ID will be generated. To uniquely identify a single observation over time, compare the time and time_series_id fields; each time series will only have a single observation at a given time.
approval_status (string or list of strings, optional) – Some of the data that you have obtained from this U.S. Geological Survey database may not have received Director’s approval. Any such data values are qualified as provisional and are subject to revision. Provisional data are released on the condition that neither the USGS nor the United States Government may be held liable for any damages resulting from its use. This field reflects the approval status of each record, and is either “Approved”, meaining processing review has been completed and the data is approved for publication, or “Provisional” and subject to revision. For more information about provisional data, go to: https://waterdata.usgs.gov/provisional-data-statement/.
unit_of_measure (string or list of strings, optional) – A human-readable description of the units of measurement associated with an observation.
qualifier (string or list of strings, optional) – This field indicates any qualifiers associated with an observation, for instance if a sensor may have been impacted by ice or if values were estimated.
value (string or list of strings, optional) – The value of the observation. Values are transmitted as strings in the JSON response format in order to preserve precision.
last_modified (string, optional) –
The last time a record was refreshed in our database. This may happen due to regular operational processes and does not necessarily indicate anything about the measurement has changed. You can query this field using date-times or intervals, adhering to RFC 3339, or using ISO 8601 duration objects. Intervals may be bounded or half-bounded (double-dots at start or end). Only features that have a last_modified that intersects the value of datetime are selected. Examples:
- A date-time: “2018-02-12T23:20:50Z”
- A bounded interval: “2018-02-12T00:00:00Z/2018-03-18T12:31:12Z”
- Half-bounded intervals: “2018-02-12T00:00:00Z/..” or “../2018-03-18T12:31:12Z”
- Duration objects: “P1M” for data from the past month or “PT36H” for the last 36 hours
skip_geometry (boolean, optional) – This option can be used to skip response geometries for each feature. The returning object will be a data frame with no spatial information. Note that the USGS Water Data APIs use camelCase “skipGeometry” in CQL2 queries.
time (string, optional) –
The date an observation represents. You can query this field using date-times or intervals, adhering to RFC 3339, or using ISO 8601 duration objects. Intervals may be bounded or half-bounded (double-dots at start or end). Only features that have a time that intersects the value of datetime are selected. If a feature has multiple temporal properties, it is the decision of the server whether only a single temporal property is used to determine the extent or all relevant temporal properties. Examples:
- A date-time: “2018-02-12T23:20:50Z”
- A bounded interval: “2018-02-12T00:00:00Z/2018-03-18T12:31:12Z”
- Half-bounded intervals: “2018-02-12T00:00:00Z/..” or “../2018-03-18T12:31:12Z”
- Duration objects: “P1M” for data from the past month or “PT36H” for the last 36 hours
bbox (list of numbers, optional) – Only features that have a geometry that intersects the bounding box are selected. The bounding box is provided as four or six numbers, depending on whether the coordinate reference system includes a vertical axis (height or depth). Coordinates are assumed to be in crs 4326. The expected format is a numeric vector structured: c(xmin,ymin,xmax,ymax). Another way to think of it is c(Western-most longitude, Southern-most latitude, Eastern-most longitude, Northern-most longitude).
limit (numeric, optional) – The optional limit parameter is used to control the subset of the selected features that should be returned in each page. The maximum allowable limit is 50000. It may be beneficial to set this number lower if your internet connection is spotty. The default (None) will set the limit to the maximum allowable limit for the service.
convert_type (boolean, optional) – If True, converts columns to appropriate types.

Returns:

df (pandas.DataFrame or geopandas.GeoDataFrame) – Formatted data returned from the API query.
md (dataretrieval.utils.Metadata) – A custom metadata object

Examples

>>> # Get most recent daily flow data from a single site
>>> df, md = dataretrieval.waterdata.get_latest_daily(
...     monitoring_location_id="USGS-02238500", parameter_code="00060"
... )

>>> # Get most recent daily measurements for two sites
>>> df, md = dataretrieval.waterdata.get_latest_daily(
...     monitoring_location_id=["USGS-05114000", "USGS-09423350"]
... )

dataretrieval.waterdata.get_monitoring_locations(monitoring_location_id: List[str] | None = None, agency_code: List[str] | None = None, agency_name: List[str] | None = None, monitoring_location_number: List[str] | None = None, monitoring_location_name: List[str] | None = None, district_code: List[str] | None = None, country_code: List[str] | None = None, country_name: List[str] | None = None, state_code: List[str] | None = None, state_name: List[str] | None = None, county_code: List[str] | None = None, county_name: List[str] | None = None, minor_civil_division_code: List[str] | None = None, site_type_code: List[str] | None = None, site_type: List[str] | None = None, hydrologic_unit_code: List[str] | None = None, basin_code: List[str] | None = None, altitude: List[str] | None = None, altitude_accuracy: List[str] | None = None, altitude_method_code: List[str] | None = None, altitude_method_name: List[str] | None = None, vertical_datum: List[str] | None = None, vertical_datum_name: List[str] | None = None, horizontal_positional_accuracy_code: List[str] | None = None, horizontal_positional_accuracy: List[str] | None = None, horizontal_position_method_code: List[str] | None = None, horizontal_position_method_name: List[str] | None = None, original_horizontal_datum: List[str] | None = None, original_horizontal_datum_name: List[str] | None = None, drainage_area: List[str] | None = None, contributing_drainage_area: List[str] | None = None, time_zone_abbreviation: List[str] | None = None, uses_daylight_savings: List[str] | None = None, construction_date: List[str] | None = None, aquifer_code: List[str] | None = None, national_aquifer_code: List[str] | None = None, aquifer_type_code: List[str] | None = None, well_constructed_depth: List[str] | None = None, hole_constructed_depth: List[str] | None = None, depth_source_code: List[str] | None = None, properties: List[str] | None = None, skip_geometry: bool | None = None, time: List[str] | str | None = None, bbox: List[float] | None = None, limit: int | None = None, convert_type: bool = True) → Tuple[DataFrame, BaseMetadata][source]

Location information is basic information about the monitoring location including the name, identifier, agency responsible for data collection, and the date the location was established. It also includes information about the type of location, such as stream, lake, or groundwater, and geographic information about the location, such as state, county, latitude and longitude, and hydrologic unit code (HUC).

Parameters:

monitoring_location_id (string or list of strings, optional) – A unique identifier representing a single monitoring location. This corresponds to the id field in the monitoring-locations endpoint. Monitoring location IDs are created by combining the agency code of the agency responsible for the monitoring location (e.g. USGS) with the ID number of the monitoring location (e.g. 02238500), separated by a hyphen (e.g. USGS-02238500).
agency_code (string or list of strings, optional) – The agency that is reporting the data. Agency codes are fixed values assigned by the National Water Information System (NWIS). A list of agency codes is available at: https://help.waterdata.usgs.gov/code/agency_cd_query?fmt=html.
agency_name (string or list of strings, optional) – The name of the agency that is reporting the data.
monitoring_location_number (string or list of strings, optional) – Each monitoring location in the USGS data base has a unique 8- to 15-digit identification number. Monitoring location numbers are assigned based on this logic: https://help.waterdata.usgs.gov/faq/sites/do-station-numbers-have-any-particular-meaning.
monitoring_location_name (string or list of strings, optional) – This is the official name of the monitoring location in the database. For well information this can be a district-assigned local number.
district_code (string or list of strings, optional) – The Water Science Centers (WSCs) across the United States use the FIPS state code as the district code. In some case, monitoring locations and samples may be managed by a water science center that is adjacent to the state in which the monitoring location actually resides. For example a monitoring location may have a district code of 30 which translates to Montana, but the state code could be 56 for Wyoming because that is where the monitoring location actually is located.
country_code (string or list of strings, optional) – The code for the country in which the monitoring location is located.
country_name (string or list of strings, optional) – The name of the country in which the monitoring location is located.
state_code (string or list of strings, optional) – State code. A two-digit ANSI code (formerly FIPS code) as defined by the American National Standards Institute, to define States and equivalents. A three-digit ANSI code is used to define counties and county equivalents. A lookup table is available. The only countries with political subdivisions other than the US are Mexico and Canada. The Mexican states have US state codes ranging from 81-86 and Canadian provinces have state codes ranging from 90-98.
state_name (string or list of strings, optional) – The name of the state or state equivalent in which the monitoring location is located.
county_code (string or list of strings, optional) – The code for the county or county equivalent (parish, borough, etc.) in which the monitoring location is located. A list of codes is available.
county_name (string or list of strings, optional) –
The name of the county or county equivalent (parish, borough, etc.) in which the monitoring location is located. A list of codes is available.
minor_civil_division_code (string or list of strings, optional) – Codes for primary governmental or administrative divisions of the county or county equivalent in which the monitoring location is located.
site_type_code (string or list of strings, optional) –
A code describing the hydrologic setting of the monitoring location. A list of codes is available. Example: “US:15:001” (United States: Hawaii, Hawaii County)
site_type (string or list of strings, optional) –
A description of the hydrologic setting of the monitoring location. A list of codes is available.
hydrologic_unit_code (string or list of strings, optional) – The United States is divided and sub-divided into successively smaller hydrologic units which are classified into four levels: regions, sub-regions, accounting units, and cataloging units. The hydrologic units are arranged within each other, from the smallest (cataloging units) to the largest (regions). Each hydrologic unit is identified by a unique hydrologic unit code (HUC) consisting of two to eight digits based on the four levels of classification in the hydrologic unit system.
basin_code (string or list of strings, optional) – The Basin Code or “drainage basin code” is a two-digit code that further subdivides the 8-digit hydrologic-unit code. The drainage basin code is defined by the USGS State Office where the monitoring location is located.
altitude (string or list of strings, optional) – Altitude of the monitoring location referenced to the specified Vertical Datum.
altitude_accuracy (string or list of strings, optional) – Accuracy of the altitude, in feet. An accuracy of +/- 0.1 foot would be entered as “.1”. Many altitudes are interpolated from the contours on topographic maps; accuracies determined in this way are generally entered as one-half of the contour interval.
altitude_method_code (string or list of strings, optional) –
Codes representing the method used to measure altitude. A list of codes is available.
altitude_method_name (float, optional) –
The name of the the method used to measure altitude. A list of codes is available.
vertical_datum (float, optional) –
The datum used to determine altitude and vertical position at the monitoring location. A list of codes is available.
vertical_datum_name (float, optional) –
The datum used to determine altitude and vertical position at the monitoring location. A list of codes is available.
horizontal_positional_accuracy_code (string or list of strings, optional) –
Indicates the accuracy of the latitude longitude values. A list of codes is available.
horizontal_positional_accuracy (string or list of strings, optional) –
Indicates the accuracy of the latitude longitude values. A list of codes is available.
horizontal_position_method_code (string or list of strings, optional) –
Indicates the method used to determine latitude longitude values. A list of codes is available.
horizontal_position_method_name (string or list of strings, optional) –
Indicates the method used to determine latitude longitude values. A list of codes is available.
original_horizontal_datum (string or list of strings, optional) –
Coordinates are published in EPSG:4326 / WGS84 / World Geodetic System 1984. This field indicates the original datum used to determine coordinates before they were converted. A list of codes is available.
original_horizontal_datum_name (string or list of strings, optional) –
Coordinates are published in EPSG:4326 / WGS84 / World Geodetic System 1984. This field indicates the original datum used to determine coordinates before they were converted. A list of codes is available.
drainage_area (string or list of strings, optional) – The area enclosed by a topographic divide from which direct surface runoff from precipitation normally drains by gravity into the stream above that point.
contributing_drainage_area (string or list of strings, optional) – The contributing drainage area of a lake, stream, wetland, or estuary monitoring location, in square miles. This item should be present only if the contributing area is different from the total drainage area. This situation can occur when part of the drainage area consists of very porous soil or depressions that either allow all runoff to enter the groundwater or traps the water in ponds so that rainfall does not contribute to runoff. A transbasin diversion can also affect the total drainage area.
time_zone_abbreviation (string or list of strings, optional) – A short code describing the time zone used by a monitoring location.
uses_daylight_savings (string or list of strings, optional) – A flag indicating whether or not a monitoring location uses daylight savings.
construction_date (string or list of strings, optional) – Date the well was completed.
aquifer_code (string or list of strings, optional) – Local aquifers in the USGS water resources data base are identified by a geohydrologic unit code (a three-digit number related to the age of the formation, followed by a 4 or 5 character abbreviation for the geologic unit or aquifer name). Additional information is available at this link.
national_aquifer_code (string or list of strings, optional) – National aquifers are the principal aquifers or aquifer systems in the United States, defined as regionally extensive aquifers or aquifer systems that have the potential to be used as a source of potable water. Not all groundwater monitoring locations can be associated with a National Aquifer. Such monitoring locations will not be retrieved using this search criteria. A list of National aquifer codes and names is available.
aquifer_type_code (string or list of strings, optional) –
Groundwater occurs in aquifers under two different conditions. Where water only partly fills an aquifer, the upper surface is free to rise and decline. These aquifers are referred to as unconfined (or water-table) aquifers. Where water completely fills an aquifer that is overlain by a confining bed, the aquifer is referred to as a confined (or artesian) aquifer. When a confined aquifer is penetrated by a well, the water level in the well will rise above the top of the aquifer (but not necessarily above land surface). Additional information is available at this link.
well_constructed_depth (string or list of strings, optional) – The depth of the finished well, in feet below land surface datum. Note: Not all groundwater monitoring locations have information on Well Depth. Such monitoring locations will not be retrieved using this search criteria.
hole_constructed_depth (string or list of strings, optional) – The total depth to which the hole is drilled, in feet below land surface datum. Note: Not all groundwater monitoring locations have information on Hole Depth. Such monitoring locations will not be retrieved using this search criteria.
depth_source_code (string or list of strings, optional) –
A code indicating the source of water-level data. A list of codes is available.
properties (string or list of strings, optional) – A vector of requested columns to be returned from the query. Available options are: geometry, id, agency_code, agency_name, monitoring_location_number, monitoring_location_name, district_code, country_code, country_name, state_code, state_name, county_code, county_name, minor_civil_division_code, site_type_code, site_type, hydrologic_unit_code, basin_code, altitude, altitude_accuracy, altitude_method_code, altitude_method_name, vertical_datum, vertical_datum_name, horizontal_positional_accuracy_code, horizontal_positional_accuracy, horizontal_position_method_code, horizontal_position_method_name, original_horizontal_datum, original_horizontal_datum_name, drainage_area, contributing_drainage_area, time_zone_abbreviation, uses_daylight_savings, construction_date, aquifer_code, national_aquifer_code, aquifer_type_code, well_constructed_depth, hole_constructed_depth, depth_source_code.
bbox (list of numbers, optional) – Only features that have a geometry that intersects the bounding box are selected. The bounding box is provided as four or six numbers, depending on whether the coordinate reference system includes a vertical axis (height or depth). Coordinates are assumed to be in crs 4326. The expected format is a numeric vector structured: c(xmin,ymin,xmax,ymax). Another way to think of it is c(Western-most longitude, Southern-most latitude, Eastern-most longitude, Northern-most longitude).
limit (numeric, optional) – The optional limit parameter is used to control the subset of the selected features that should be returned in each page. The maximum allowable limit is 50000. It may be beneficial to set this number lower if your internet connection is spotty. The default (NA) will set the limit to the maximum allowable limit for the service.
skip_geometry (boolean, optional) – This option can be used to skip response geometries for each feature. The returning object will be a data frame with no spatial information. Note that the USGS Water Data APIs use camelCase “skipGeometry” in CQL2 queries.
convert_type (boolean, optional) – If True, converts columns to appropriate types.

Returns:

df (pandas.DataFrame or geopandas.GeoDataFrame) – Formatted data returned from the API query.
md (dataretrieval.utils.Metadata) – A custom metadata object

Examples

>>> # Get monitoring locations within a bounding box
>>> # and leave out geometry
>>> df, md = dataretrieval.waterdata.get_monitoring_locations(
...     bbox=[-90.2, 42.6, -88.7, 43.2], skip_geometry=True
... )

>>> # Get monitoring location info for specific sites
>>> # and only specific properties
>>> df, md = dataretrieval.waterdata.get_monitoring_locations(
...     monitoring_location_id=["USGS-05114000", "USGS-09423350"],
...     properties=["monitoring_location_id", "state_name", "country_name"],
... )

dataretrieval.waterdata.get_reference_table(collection: str, limit: int | None = None, query: dict | None = {}) → Tuple[DataFrame, BaseMetadata][source]

Get metadata reference tables for the USGS Water Data API.

Reference tables provide the range of allowable values for parameter arguments in the waterdata module.

Parameters:

collection (string) – One of the following options: “agency-codes”, “altitude-datums”, “aquifer-codes”, “aquifer-types”, “coordinate-accuracy-codes”, “coordinate-datum-codes”, “coordinate-method-codes”, “counties”, “hydrologic-unit-codes”, “medium-codes”, “national-aquifer-codes”, “parameter-codes”, “reliability-codes”, “site-types”, “states”, “statistic-codes”, “topographic-codes”, “time-zone-codes”
limit (numeric, optional) – The optional limit parameter is used to control the subset of the selected features that should be returned in each page. The maximum allowable limit is 50000. It may be beneficial to set this number lower if your internet connection is spotty. The default (None) will set the limit to the maximum allowable limit for the service.
query (dictionary, optional) – The optional args parameter can be used to pass a dictionary of query parameters to the collection API call.

Returns:

df (pandas.DataFrame or geopandas.GeoDataFrame) – Formatted data returned from the API query. The primary metadata of each reference table will show up in the first column, where the name of the column is the singular form of the collection name, separated by underscores (e.g. the “medium-codes” reference table has a column called “medium_code”, which contains all possible medium code values).
md (dataretrieval.utils.Metadata) – A custom metadata object including the URL request and query time.

Examples

>>> # Get table of USGS parameter codes
>>> ref, md = dataretrieval.waterdata.get_reference_table(
...     collection="parameter-codes"
... )

>>> # Get table of selected USGS parameter codes
>>> ref, md = dataretrieval.waterdata.get_reference_table(
...     collection="parameter-codes"
...     query={'id': '00001,00002'}
... )

Search Samples database for USGS water quality data. This is a wrapper function for the Samples database API. All potential filters are provided as arguments to the function, but please do not populate all possible filters; leave as many as feasible with their default value (None). This is important because overcomplicated web service queries can bog down the database’s ability to return an applicable dataset before it times out.

The web GUI for the Samples database can be found here: https://waterdata.usgs.gov/download-samples/#dataProfile=site

If you would like more details on feasible query parameters (complete with examples), please visit the Samples database swagger docs, here: https://api.waterdata.usgs.gov/samples-data/docs#/

Parameters:

ssl_check (bool, optional) – Check the SSL certificate.
service (string) – One of the available Samples services: “results”, “locations”, “activities”, “projects”, or “organizations”. Defaults to “results”.
profile (string) – One of the available profiles associated with a service. Options for each service are: results - “fullphyschem”, “basicphyschem”, “fullbio”, “basicbio”, “narrow”, “resultdetectionquantitationlimit”, “labsampleprep”, “count” locations - “site”, “count” activities - “sampact”, “actmetric”, “actgroup”, “count” projects - “project”, “projectmonitoringlocationweight” organizations - “organization”, “count”
activityMediaName (string or list of strings, optional) – Name or code indicating environmental medium in which sample was taken. Check the activityMediaName_lookup() function in this module for all possible inputs. Example: “Water”.
activityStartDateLower (string, optional) – The start date if using a date range. Takes the format YYYY-MM-DD. The logic is inclusive, i.e. it will also return results that match the date. If left as None, will pull all data on or before activityStartDateUpper, if populated.
activityStartDateUpper (string, optional) – The end date if using a date range. Takes the format YYYY-MM-DD. The logic is inclusive, i.e. it will also return results that match the date. If left as None, will pull all data after activityStartDateLower up to the most recent available results.
activityTypeCode (string or list of strings, optional) – Text code that describes type of field activity performed. Example: “Sample-Routine, regular”.
characteristicGroup (string or list of strings, optional) – Characteristic group is a broad category of characteristics describing one or more results. Check the characteristicGroup_lookup() function in this module for all possible inputs. Example: “Organics, PFAS”
characteristic (string or list of strings, optional) – Characteristic is a specific category describing one or more results. Check the characteristic_lookup() function in this module for all possible inputs. Example: “Suspended Sediment Discharge”
characteristicUserSupplied (string or list of strings, optional) – A user supplied characteristic name describing one or more results.
boundingBox (list of four floats, optional) –
Filters on the the associated monitoring location’s point location by checking if it is located within the specified geographic area. The logic is inclusive, i.e. it will include locations that overlap with the edge of the bounding box. Values are separated by commas, expressed in decimal degrees, NAD83, and longitudes west of Greenwich are negative. The format is a string consisting of:
- Western-most longitude
- Southern-most latitude
- Eastern-most longitude
- Northern-most longitude
Example: [-92.8,44.2,-88.9,46.0]
countryFips (string or list of strings, optional) – Example: “US” (United States)
stateFips (string or list of strings, optional) – Check the stateFips_lookup() function in this module for all possible inputs. Example: “US:15” (United States: Hawaii)
countyFips (string or list of strings, optional) – Check the countyFips_lookup() function in this module for all possible inputs. Example: “US:15:001” (United States: Hawaii, Hawaii County)
siteTypeCode (string or list of strings, optional) – An abbreviation for a certain site type. Check the siteType_lookup() function in this module for all possible inputs. Example: “GW” (Groundwater site)
siteTypeName (string or list of strings, optional) – A full name for a certain site type. Check the siteType_lookup() function in this module for all possible inputs. Example: “Well”
usgsPCode (string or list of strings, optional) – 5-digit number used in the US Geological Survey computerized data system, National Water Information System (NWIS), to uniquely identify a specific constituent. Check the characteristic_lookup() function in this module for all possible inputs. Example: “00060” (Discharge, cubic feet per second)
hydrologicUnit (string or list of strings, optional) – Max 12-digit number used to describe a hydrologic unit. Example: “070900020502”
monitoringLocationIdentifier (string or list of strings, optional) – A monitoring location identifier has two parts: the agency code and the location number, separated by a dash (-). Example: “USGS-040851385”
organizationIdentifier (string or list of strings, optional) – Designator used to uniquely identify a specific organization. Currently only accepting the organization “USGS”.
pointLocationLatitude (float, optional) – Latitude for a point/radius query (decimal degrees). Must be used with pointLocationLongitude and pointLocationWithinMiles.
pointLocationLongitude (float, optional) – Longitude for a point/radius query (decimal degrees). Must be used with pointLocationLatitude and pointLocationWithinMiles.
pointLocationWithinMiles (float, optional) – Radius for a point/radius query. Must be used with pointLocationLatitude and pointLocationLongitude
projectIdentifier (string or list of strings, optional) – Designator used to uniquely identify a data collection project. Project identifiers are specific to an organization (e.g. USGS). Example: “ZH003QW03”
recordIdentifierUserSupplied (string or list of strings, optional) – Internal AQS record identifier that returns 1 entry. Only available for the “results” service.

Returns:

df (pandas.DataFrame) – Formatted data returned from the API query.
md (dataretrieval.utils.Metadata) – Custom dataretrieval metadata object pertaining to the query.

Examples

>>> # Get PFAS results within a bounding box
>>> df, md = dataretrieval.waterdata.get_samples(
...     boundingBox=[-90.2, 42.6, -88.7, 43.2],
...     characteristicGroup="Organics, PFAS",
... )

>>> # Get all activities for the Commonwealth of Virginia over a date range
>>> df, md = dataretrieval.waterdata.get_samples(
...     service="activities",
...     profile="sampact",
...     activityStartDateLower="2023-10-01",
...     activityStartDateUpper="2024-01-01",
...     stateFips="US:51",
... )

>>> # Get all pH samples for two sites in Utah
>>> df, md = dataretrieval.waterdata.get_samples(
...     monitoringLocationIdentifier=[
...         "USGS-393147111462301",
...         "USGS-393343111454101",
...     ],
...     usgsPCode="00400",
... )

Get monthly and annual water data statistics from the USGS Water Data API. This service (called the “observationIntervals” endpoint on api.waterdata.usgs.gov) provides endpoints for access to computations on the historical record regarding water conditions, including minimum, maximum, mean, median, and percentiles for month-year, and water/calendar years. For more information regarding the calculation of statistics and other details, please visit the Statistics documentation page: https://waterdata.usgs.gov/statistics-documentation/.

Note: This API is under active beta development and subject to change. Improved handling of significant figures will be addressed in a future release.

Parameters:

approval_status (string, optional) – Whether to include approved and/or provisional observations. At this time, only approved observations are returned.
computation_type (string, optional) – Desired statistical computation method. Available values are: arithmetic_mean, maximum, median, minimum, percentile.
country_code (string, optional) – Country query parameter. API defaults to “US”.
state_code (string, optional) – State query parameter. Takes the format “US:XX”, where XX is the two-digit state code. API defaults to “US:42” (Pennsylvania).
county_code (string, optional) – County query parameter. Takes the format “US:XX:YYY”, where XX is the two-digit state code and YYY is the three-digit county code. API defaults to “US:42:103” (Pennsylvania, Pike County).
start_date (string or datetime, optional) – Start date for the query in the year-month-day format (YYYY-MM-DD).
end_date (string or datetime, optional) – End date for the query in the year-month-day format (YYYY-MM-DD).
monitoring_location_id (string or list of strings, optional) – A unique identifier representing a single monitoring location. This corresponds to the id field in the monitoring-locations endpoint. Monitoring location IDs are created by combining the agency code of the agency responsible for the monitoring location (e.g. USGS) with the ID number of the monitoring location (e.g. 02238500), separated by a hyphen (e.g. USGS-02238500).
page_size (int, optional) – The number of results to return per page, where one result represents a monitoring location. The default is 1000.
parent_time_series_id (string, optional) – The parent_time_series_id returns statistics tied to a particular datbase entry.
site_type_code (string, optional) – Site type code query parameter. You can see a list of valid site type codes here: https://api.waterdata.usgs.gov/ogcapi/v0/collections/site-types/items. Example: “GW” (Groundwater site)
site_type_name (string, optional) – Site type name query parameter. You can see a list of valid site type names here: https://api.waterdata.usgs.gov/ogcapi/v0/collections/site-types/items. Example: “Well”
parameter_code (string or list of strings, optional) – Parameter codes are 5-digit codes used to identify the constituent measured and the units of measure. A complete list of parameter codes and associated groupings can be found at https://help.waterdata.usgs.gov/codes-and-parameters/parameters.
expand_percentiles (boolean) – Percentile data for a given day of year or month of year by default are returned from the service as lists of string values and percentile thresholds in the “values” and “percentiles” columns, respectively. When expand_percentiles is set to True (default), each value and percentile threshold specific to a computation id are returned as individual rows in the dataframe, with the value reported in the “value” column and the corresponding percentile reported in a “percentile” column (and the “values” and “percentiles” columns are removed). Missing percentile values expressed as ‘nan’ in the list of string values are removed from the dataframe to save space. Setting expand_percentiles to False retains the “values” and “percentiles” columns produced by the service. Including both ‘percentiles’ and one or more other statistics (‘median’, ‘minimum’, ‘maximum’, or ‘arithmetic_mean’) in the computation_type argument will return both the “values” column, containing the list of percentile threshold values, and a “value” column, containing the singular summary value for the other statistics.

Examples

>>> # Get monthly and yearly medians for streamflow at streams in Rhode Island
>>> # from calendar year 2024.
>>> df, md = dataretrieval.waterdata.get_stats_date_range(
...     state_code="US:44", # State code for Rhode Island
...     parameter_code="00060",
...     site_type_code="ST",
...     start_date="2024-01-01",
...     end_date="2024-12-31",
...     computation_type="median"
... )

>>> # Get monthly and yearly minimum and maximums for gage height at
>>> # a monitoring location of interest
>>> df, md = dataretrieval.waterdata.get_stats_date_range(
...     monitoring_location_id="USGS-05114000",
...     parameter_code="00065",
...     computation_type=["minimum", "maximum"]
... )

Get day-of-year and month-of-year water data statistics from the USGS Water Data API. This service (called the “observationNormals” endpoint on api.waterdata.usgs.gov) provides endpoints for access to computations on the historical record regarding water conditions, including minimum, maximum, mean, median, and percentiles for day of year and month of year. For more information regarding the calculation of statistics and other details, please visit the Statistics documentation page: https://waterdata.usgs.gov/statistics-documentation/.

Note: This API is under active beta development and subject to change. Improved handling of significant figures will be addressed in a future release.

Parameters:

approval_status (string, optional) – Whether to include approved and/or provisional observations. At this time, only approved observations are returned.
computation_type (string, optional) – Desired statistical computation method. Available values are: arithmetic_mean, maximum, median, minimum, percentile.
country_code (string, optional) – Country query parameter. API defaults to “US”.
state_code (string, optional) – State query parameter. Takes the format “US:XX”, where XX is the two-digit state code. API defaults to “US:42” (Pennsylvania).
county_code (string, optional) – County query parameter. Takes the format “US:XX:YYY”, where XX is the two-digit state code and YYY is the three-digit county code. API defaults to “US:42:103” (Pennsylvania, Pike County).
start_date (string or datetime, optional) – Start day for the query in the month-day format (MM-DD).
end_date (string or datetime, optional) – End day for the query in the month-day format (MM-DD).
monitoring_location_id (string or list of strings, optional) – A unique identifier representing a single monitoring location. This corresponds to the id field in the monitoring-locations endpoint. Monitoring location IDs are created by combining the agency code of the agency responsible for the monitoring location (e.g. USGS) with the ID number of the monitoring location (e.g. 02238500), separated by a hyphen (e.g. USGS-02238500).
page_size (int, optional) – The number of results to return per page, where one result represents a monitoring location. The default is 1000.
parent_time_series_id (string, optional) – The parent_time_series_id returns statistics tied to a particular datbase entry.
site_type_code (string, optional) – Site type code query parameter. You can see a list of valid site type codes here: https://api.waterdata.usgs.gov/ogcapi/v0/collections/site-types/items. Example: “GW” (Groundwater site)
site_type_name (string, optional) – Site type name query parameter. You can see a list of valid site type names here: https://api.waterdata.usgs.gov/ogcapi/v0/collections/site-types/items. Example: “Well”
parameter_code (string or list of strings, optional) – Parameter codes are 5-digit codes used to identify the constituent measured and the units of measure. A complete list of parameter codes and associated groupings can be found at https://help.waterdata.usgs.gov/codes-and-parameters/parameters.
expand_percentiles (boolean) – Percentile data for a given day of year or month of year by default are returned from the service as lists of string values and percentile thresholds in the “values” and “percentiles” columns, respectively. When expand_percentiles is set to True (default), each value and percentile threshold specific to a computation id are returned as individual rows in the dataframe, with the value reported in the “value” column and the corresponding percentile reported in a “percentile” column (and the “values” and “percentiles” columns are removed). Missing percentile values expressed as ‘nan’ in the list of string values are removed from the dataframe to save space. Setting expand_percentiles to False retains the “values” and “percentiles” columns produced by the service. Including both ‘percentiles’ and one or more other statistics (‘median’, ‘minimum’, ‘maximum’, or ‘arithmetic_mean’) in the computation_type argument will return both the “values” column, containing the list of percentile threshold values, and a “value” column, containing the singular summary value for the other statistics.

Examples

>>> # Get daily, monthly, and annual percentiles for streamflow at
>>> # a monitoring location of interest
>>> df, md = dataretrieval.waterdata.get_stats_por(
...     monitoring_location_id="USGS-05114000",
...     parameter_code="00060",
...     computation_type="percentile"
... )

>>> # Get all daily and monthly statistics for the month of January
>>> # over the entire period of record for streamflow and gage height
>>> # at a monitoring location of interest
>>> df, md = dataretrieval.waterdata.get_stats_por(
...     monitoring_location_id="USGS-05114000",
...     parameter_code=["00060", "00065"],
...     start_date="01-01",
...     end_date="01-31"
... )

dataretrieval.waterdata.get_time_series_metadata(monitoring_location_id: List[str] | str | None = None, parameter_code: List[str] | str | None = None, parameter_name: List[str] | str | None = None, properties: List[str] | str | None = None, statistic_id: List[str] | str | None = None, hydrologic_unit_code: List[str] | str | None = None, state_name: List[str] | str | None = None, last_modified: List[str] | str | None = None, begin: List[str] | str | None = None, end: List[str] | str | None = None, begin_utc: List[str] | str | None = None, end_utc: List[str] | str | None = None, unit_of_measure: List[str] | str | None = None, computation_period_identifier: List[str] | str | None = None, computation_identifier: List[str] | str | None = None, thresholds: int | None = None, sublocation_identifier: List[str] | str | None = None, primary: List[str] | str | None = None, parent_time_series_id: List[str] | str | None = None, time_series_id: List[str] | str | None = None, web_description: List[str] | str | None = None, skip_geometry: bool | None = None, time: List[str] | str | None = None, bbox: List[float] | None = None, limit: int | None = None, convert_type: bool = True) → Tuple[DataFrame, BaseMetadata][source]

Daily data and continuous measurements are grouped into time series, which represent a collection of observations of a single parameter, potentially aggregated using a standard statistic, at a single monitoring location. This endpoint provides metadata about those time series, including their operational thresholds, units of measurement, and when the earliest and most recent observations in a time series occurred.

Parameters:

monitoring_location_id (string or list of strings, optional) – A unique identifier representing a single monitoring location. This corresponds to the id field in the monitoring-locations endpoint. Monitoring location IDs are created by combining the agency code of the agency responsible for the monitoring location (e.g. USGS) with the ID number of the monitoring location (e.g. 02238500), separated by a hyphen (e.g. USGS-02238500).
parameter_code (string or list of strings, optional) – Parameter codes are 5-digit codes used to identify the constituent measured and the units of measure. A complete list of parameter codes and associated groupings can be found at https://help.waterdata.usgs.gov/codes-and-parameters/parameters.
parameter_name (string or list of strings, optional) – A human-understandable name corresponding to parameter_code.
properties (string or list of strings, optional) – A vector of requested columns to be returned from the query. Available options are: geometry, id, time_series_id, monitoring_location_id, parameter_code, statistic_id, time, value, unit_of_measure, approval_status, qualifier, last_modified
statistic_id (string or list of strings, optional) – A code corresponding to the statistic an observation represents. Example codes include 00001 (max), 00002 (min), and 00003 (mean). A complete list of codes and their descriptions can be found at https://help.waterdata.usgs.gov/code/stat_cd_nm_query?stat_nm_cd=%25&fmt=html.
hydrologic_unit_code (string or list of strings, optional) – The United States is divided and sub-divided into successively smaller hydrologic units which are classified into four levels: regions, sub-regions, accounting units, and cataloging units. The hydrologic units are arranged within each other, from the smallest (cataloging units) to the largest (regions). Each hydrologic unit is identified by a unique hydrologic unit code (HUC) consisting of two to eight digits based on the four levels of classification in the hydrologic unit system.
state_name (string or list of strings, optional) – The name of the state or state equivalent in which the monitoring location is located.
last_modified (string, optional) –
The last time a record was refreshed in our database. This may happen due to regular operational processes and does not necessarily indicate anything about the measurement has changed. You can query this field using date-times or intervals, adhering to RFC 3339, or using ISO 8601 duration objects. Intervals may be bounded or half-bounded (double-dots at start or end). Only features that have a last_modified that intersects the value of datetime are selected. Examples:
- A date-time: “2018-02-12T23:20:50Z”
- A bounded interval: “2018-02-12T00:00:00Z/2018-03-18T12:31:12Z”
- Half-bounded intervals: “2018-02-12T00:00:00Z/..” or
  ”../2018-03-18T12:31:12Z”
- Duration objects: “P1M” for data from the past month or “PT36H”
  for the last 36 hours
begin (string or list of strings, optional) – This field contains the same information as “begin_utc”, but in the local time of the monitoring location. It is retained for backwards compatibility, but will be removed in V1 of these APIs.
end (string or list of strings, optional) – This field contains the same information as “end_utc”, but in the local time of the monitoring location. It is retained for backwards compatibility, but will be removed in V1 of these APIs.
begin_utc (string or list of strings, optional) –
The datetime of the earliest observation in the time series. Together with end, this field represents the period of record of a time series. Note that some time series may have large gaps in their collection record. This field is currently in the local time of the monitoring location. We intend to update this in version v0 to use UTC with a time zone. You can query this field using date-times or intervals, adhering to RFC 3339, or using ISO 8601 duration objects. Intervals may be bounded or half-bounded (double-dots at start or end). Only features that have a begin that intersects the value of datetime are selected. Examples:
- A date-time: “2018-02-12T23:20:50Z”
- A bounded interval: “2018-02-12T00:00:00Z/2018-03-18T12:31:12Z”
- Half-bounded intervals: “2018-02-12T00:00:00Z/..” or “../2018-03-18T12:31:12Z”
- Duration objects: “P1M” for data from the past month or “PT36H” for the last 36 hours
end_utc (string or list of strings, optional) –
The datetime of the most recent observation in the time series. Data returned by this endpoint updates at most once per day, and potentially less frequently than that, and as such there may be more recent observations within a time series than the time series end value reflects. Together with begin, this field represents the period of record of a time series. It is additionally used to determine whether a time series is “active”. We intend to update this in version v0 to use UTC with a time zone. You can query this field using date-times or intervals, adhering to RFC 3339, or using ISO 8601 duration objects. Intervals may be bounded or half-bounded (double-dots at start or end). Only features that have a end that intersects the value of datetime are selected. Examples:
- A date-time: “2018-02-12T23:20:50Z”
- A bounded interval: “2018-02-12T00:00:00Z/2018-03-18T12:31:12Z”
- Half-bounded intervals: “2018-02-12T00:00:00Z/..” or “../2018-03-18T12:31:12Z”
- Duration objects: “P1M” for data from the past month or “PT36H” for the last 36 hours
unit_of_measure (string or list of strings, optional) – A human-readable description of the units of measurement associated with an observation.
computation_period_identifier (string or list of strings, optional) – Indicates the period of data used for any statistical computations.
computation_identifier (string or list of strings, optional) – Indicates whether the data from this time series represent a specific statistical computation.
thresholds (numeric or list of numbers, optional) – Thresholds represent known numeric limits for a time series, for example the historic maximum value for a parameter or a level below which a sensor is non-operative. These thresholds are sometimes used to automatically determine if an observation is erroneous due to sensor error, and therefore shouldn’t be included in the time series.
sublocation_identifier (string or list of strings, optional)
primary (string or list of strings, optional)
parent_time_series_id (string or list of strings, optional)
time_series_id (string or list of strings, optional) – A unique identifier representing a single time series. This corresponds to the id field in the time-series-metadata endpoint.
web_description (string or list of strings, optional) – A description of what this time series represents, as used by WDFN and other USGS data dissemination products.
skip_geometry (boolean, optional) – This option can be used to skip response geometries for each feature. The returning object will be a data frame with no spatial information. Note that the USGS Water Data APIs use camelCase “skipGeometry” in CQL2 queries.
bbox (list of numbers, optional) – Only features that have a geometry that intersects the bounding box are selected. The bounding box is provided as four or six numbers, depending on whether the coordinate reference system includes a vertical axis (height or depth). Coordinates are assumed to be in crs 4326. The expected format is a numeric vector structured: c(xmin,ymin,xmax,ymax). Another way to think of it is c(Western-most longitude, Southern-most latitude, Eastern-most longitude, Northern-most longitude).
limit (numeric, optional) – The optional limit parameter is used to control the subset of the selected features that should be returned in each page. The maximum allowable limit is 50000. It may be beneficial to set this number lower if your internet connection is spotty. The default (None) will set the limit to the maximum allowable limit for the service.
convert_type (boolean, optional) – If True, converts columns to appropriate types.

Returns:

df (pandas.DataFrame or geopandas.GeoDataFrame) – Formatted data returned from the API query.
md (dataretrieval.utils.Metadata) – A custom metadata object

Examples

>>> # Get timeseries metadata information from a single site
>>> # over a yearlong period
>>> df, md = dataretrieval.waterdata.get_time_series_metadata(
...     monitoring_location_id="USGS-02238500"
... )

>>> # Get timeseries metadata information from multiple sites
>>> # that begin after January 1, 1990.
>>> df, md = dataretrieval.waterdata.get_time_series_metadata(
...     monitoring_location_id = ["USGS-05114000", "USGS-09423350"],
...     begin = "1990-01-01/.."
... )