Daily statistics: `get_stats_por` and `get_stats_date_range`

get_stats_por and get_stats_date_range return pre-computed temporal statistics from the modernized statistics API, the modern replacement for the legacy NWIS statistics service. The two functions wrap endpoints that look similar but answer different questions:

Function	API endpoint	Returns
`get_stats_por`	`observationNormals`	day-of-year and month-of-year statistics across the period of record
`get_stats_date_range`	`observationIntervals`	monthly and annual statistics within a requested date range

A couple of usage notes:

Pass computation_type= to choose the statistic — arithmetic_mean, median, minimum, maximum, or percentile.
There is no dedicated argument to return only day-of-year vs. month-of-year (or only calendar vs. water year), so filter the returned time_of_year_type / interval_type column in pandas, as shown below.

[1]:

import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import pandas as pd

from dataretrieval import waterdata

%matplotlib inline

site = "USGS-02037500"

Fetching day-of-year and month-of-year statistics

Day-of-year and month-of-year statistics aggregate observations for the same calendar day or month across many years to describe typical seasonal conditions (all Januarys, or all January 1sts). Below we request day-of-year discharge averages for January 1 and 2 — note start_date/end_date are in MM-DD format:

[2]:

jan_por_mean, _ = waterdata.get_stats_por(
    monitoring_location_id=site,
    parameter_code="00060",
    computation_type="arithmetic_mean",
    start_date="01-01",
    end_date="01-02",
)
jan_por_mean[["time_of_year", "time_of_year_type", "computation", "value"]]

Retrieving: observationNormals · 1 page · 3 rows
No API key detected — register for higher rate limits at https://api.waterdata.usgs.gov/signup/

[2]:

	time_of_year	time_of_year_type	computation	value
0	01-01	day_of_year	arithmetic_mean	9342.396
1	01-02	day_of_year	arithmetic_mean	9437.714
2	01	month_of_year	arithmetic_mean	7118.02

The first two rows are the day-of-year averages. What’s the third row? Its time_of_year_type is month_of_year — it’s the average across all Januarys. This is a quirk of the statistics API: whenever the start_date–end_date range overlaps the first day of a month (here 01-01), you also get the month-of-year summary.

To return only one type, filter the time_of_year_type column — here, month-of-year only:

[3]:

moy = jan_por_mean[jan_por_mean["time_of_year_type"] == "month_of_year"]
moy[["time_of_year", "time_of_year_type", "computation", "value"]]

[3]:

	time_of_year	time_of_year_type	computation	value
2	01	month_of_year	arithmetic_mean	7118.02

Percentile band plot

Now an example that shows the power of the statistics API: we pull all day-of-year discharge percentiles for the site. Computing these without the API would mean downloading the entire daily period of record and computing percentiles by hand.

By default get_stats_por sets expand_percentiles=True, returning one row per percentile with the value in value and the threshold in percentile (minimum is reported as percentile 0, maximum as 100).

[4]:

full_por_percentiles, _ = waterdata.get_stats_por(
    monitoring_location_id=site,
    parameter_code="00060",
    computation_type=["minimum", "maximum", "percentile"],
    start_date="01-01",
    end_date="12-31",
)
# The January 1 day-of-year percentiles (used on the WDFN state pages):
jan1 = full_por_percentiles[
    (full_por_percentiles["time_of_year"] == "01-01")
    & (full_por_percentiles["time_of_year_type"] == "day_of_year")
]
jan1.sort_values("percentile")[["time_of_year", "computation", "percentile", "value"]]

Retrieving: observationNormals · 1 page · 1,134 rows

[4]:

	time_of_year	computation	percentile	value
378	01-01	minimum	0.0	380.0
0	01-01	percentile	5.0	1266.0
1	01-01	percentile	10.0	1922.0
2	01-01	percentile	25.0	3580.0
3	01-01	percentile	50.0	5550.0
4	01-01	percentile	75.0	9890.0
5	01-01	percentile	90.0	22000.0
6	01-01	percentile	95.0	37780.0
0	01-01	maximum	100.0	60000.0

Pivoting the day-of-year rows so each percentile is a column lets us draw the percentile “ribbons” — each band spans two adjacent percentiles (min–5th, 5th–10th, …):

[5]:

doy = full_por_percentiles[
    full_por_percentiles["time_of_year_type"] == "day_of_year"
].copy()
doy["value"] = pd.to_numeric(doy["value"], errors="coerce")  # API returns strings
bands = doy.pivot_table(index="time_of_year", columns="percentile", values="value")
bands.columns = [int(c) for c in bands.columns]
bands = bands.sort_index()  # "MM-DD" strings sort chronologically within a year

# x positions: map MM-DD onto a reference (leap) year so 02-29 is included
x = pd.to_datetime("2024-" + bands.index, format="%Y-%m-%d")

# (lo, hi) percentile range, fill color, legend label
band_defs = [
    ((95, 100), "#292f6b", "95th Percentile - Max"),
    ((90, 95), "#5699c0", "90th - 95th Percentile"),
    ((75, 90), "#aacee0", "75th - 90th Percentile"),
    ((25, 75), "#e9e9e9", "25th - 75th Percentile"),
    ((10, 25), "#ebd6ab", "10th - 25th Percentile"),
    ((5, 10), "#dcb668", "5th - 10th Percentile"),
    ((0, 5), "#8f4f1f", "Min - 5th Percentile"),
]

fig, ax = plt.subplots(figsize=(9, 5))
for (lo, hi), color, label in band_defs:
    ax.fill_between(x, bands[lo], bands[hi], facecolor=color, alpha=0.7, label=label)
ax.set_yscale("log")
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter("%b"))
ax.set_xlabel("Month")
ax.set_ylabel("Discharge, cubic feet per second")
ax.set_title("Day-of-year percentile bands (USGS-02037500)")
ax.legend(title="Historical percentiles", fontsize=7, loc="upper right")
plt.show()

../_images/examples_USGS_WaterData_DailyStatistics_Examples_9_0.png

Finally, overlay the actual daily mean discharge so we can see where recent conditions fall relative to the historical bands — exactly the view on the Water Data for the Nation (WDFN) statistical graphs. We pull two water years of daily means and join them to the bands by month-day.

[6]:

daily, _ = waterdata.get_daily(
    monitoring_location_id=site,
    parameter_code="00060",
    statistic_id="00003",
    time=["2024-01-01", "2025-12-31"],
)
daily = daily.sort_values("time").reset_index(drop=True)
daily["md"] = daily["time"].dt.strftime("%m-%d")

# Repeat the day-of-year bands across each actual calendar date
b = bands.reindex(daily["md"]).reset_index(drop=True)

fig, ax = plt.subplots(figsize=(9, 5))
for (lo, hi), color, label in band_defs:
    ax.fill_between(
        daily["time"], b[lo], b[hi], facecolor=color, alpha=0.7, label=label
    )
ax.plot(daily["time"], daily["value"], color="black", lw=0.9, label="Daily mean")
prov = daily[daily["approval_status"] == "Provisional"]
ax.scatter(prov["time"], prov["value"], color="red", s=5, zorder=3, label="Provisional")
ax.set_yscale("log")
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=3))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%b %Y"))
ax.set_ylabel("Discharge, cubic feet per second")
ax.set_title("Daily mean discharge vs. historical percentile bands")
ax.legend(fontsize=7, ncol=2, loc="upper right")
fig.autofmt_xdate()
plt.show()

Retrieving: daily · 1 page · 731 rows

../_images/examples_USGS_WaterData_DailyStatistics_Examples_11_1.png

Fetching monthly and annual statistics within a date range

Unlike the day-/month-of-year normals, get_stats_date_range summarizes specific months and years inside a requested window. Here we ask for the average discharge for January 2024 — note the YYYY-MM-DD date format:

[7]:

jan_daterange_mean, _ = waterdata.get_stats_date_range(
    monitoring_location_id=site,
    parameter_code="00060",
    computation_type="arithmetic_mean",
    start_date="2024-01-01",
    end_date="2024-01-31",
)
jan_daterange_mean[["start_date", "end_date", "interval_type", "value"]]

Retrieving: observationIntervals · 1 page · 3 rows

[7]:

	start_date	end_date	interval_type	value
0	2024-01-01	2024-01-31	month	15787.097
1	2024-01-01	2024-12-31	calendar_year	6795.699
2	2023-10-01	2024-09-30	water_year	6306.902

Instead of time_of_year, the output has start_date, end_date, and interval_type. The first row is the monthly average; the API also returns the calendar year and water year averages for any year intersecting the range. A 93-day window can therefore touch two calendar and two water years:

[8]:

multiyear, _ = waterdata.get_stats_date_range(
    monitoring_location_id=site,
    parameter_code="00060",
    computation_type="arithmetic_mean",
    start_date="2023-09-30",
    end_date="2024-01-01",
)
multiyear[["start_date", "end_date", "interval_type", "value"]]

Retrieving: observationIntervals · 1 page · 9 rows

[8]:

	start_date	end_date	interval_type	value
0	2023-09-01	2023-09-30	month	1917.667
1	2023-10-01	2023-10-31	month	1264.516
2	2023-11-01	2023-11-30	month	2000.333
3	2023-12-01	2023-12-31	month	5939.355
4	2024-01-01	2024-01-31	month	15787.097
5	2023-01-01	2023-12-31	calendar_year	5062.219
6	2024-01-01	2024-12-31	calendar_year	6795.699
7	2022-10-01	2023-09-30	water_year	5740.247
8	2023-10-01	2024-09-30	water_year	6306.902

Filter the interval_type column (values month, calendar_year, water_year) to keep only certain intervals — here, the annual rows:

[9]:

multiyear[multiyear["interval_type"].isin(["calendar_year", "water_year"])][
    ["start_date", "end_date", "interval_type", "value"]
]

[9]:

	start_date	end_date	interval_type	value
5	2023-01-01	2023-12-31	calendar_year	5062.219
6	2024-01-01	2024-12-31	calendar_year	6795.699
7	2022-10-01	2023-09-30	water_year	5740.247
8	2023-10-01	2024-09-30	water_year	6306.902

Monthly mean table

We can reproduce something like a Water Year Summary monthly-mean table. We pull the full period of record (no dates), keep the monthly intervals, and aggregate by calendar month in water-year order. (Values may differ slightly from the official summaries due to rounding.)

[10]:

monthly_raw, _ = waterdata.get_stats_date_range(
    monitoring_location_id=site,
    parameter_code="00060",
    computation_type="arithmetic_mean",
)
m = monthly_raw[monthly_raw["interval_type"] == "month"].copy()
m["start_date"] = pd.to_datetime(m["start_date"])
m["value"] = pd.to_numeric(m["value"], errors="coerce")
m = m[(m["start_date"] >= "2004-10-01") & (m["start_date"] < "2025-09-01")]
m = m.dropna(subset=["value"])
m["month"] = m["start_date"].dt.strftime("%b")
m["water_year"] = (m["start_date"] + pd.DateOffset(months=3)).dt.year


def summarize(g):
    hi = g.loc[g["value"].idxmax()]
    lo = g.loc[g["value"].idxmin()]
    return pd.Series(
        {
            "Mean": round(g["value"].mean()),
            "Max (WY)": f"{round(hi['value'])} ({int(hi['water_year'])})",
            "Min (WY)": f"{round(lo['value'])} ({int(lo['water_year'])})",
        }
    )


wy_order = [
    "Oct",
    "Nov",
    "Dec",
    "Jan",
    "Feb",
    "Mar",
    "Apr",
    "May",
    "Jun",
    "Jul",
    "Aug",
    "Sep",
]
table = m.groupby("month")[["value", "water_year"]].apply(summarize).reindex(wy_order)
table.T

Retrieving: observationIntervals · 1 page · 1,277 rows

[10]:

month	Oct	Nov	Dec	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep
Mean	4752	6211	9464	9315	11258	10980	10901	10397	5855	3337	2661	3370
Max (WY)	14059 (2019)	20085 (2019)	24194 (2019)	18138 (2010)	26647 (2025)	19426 (2019)	17269 (2019)	19037 (2020)	13455 (2013)	11236 (2013)	6883 (2020)	15211 (2018)
Min (WY)	1230 (2009)	1497 (2008)	1508 (2018)	2536 (2018)	2900 (2009)	3316 (2017)	5459 (2025)	3344 (2006)	1833 (2008)	1163 (2010)	1094 (2008)	968 (2010)

Statistics API tips

The statistics API does not follow the OGC standards used by the api.waterdata.usgs.gov/ogcapi/v0/ endpoints. A few things to keep in mind:

Higher rate limits. At the time of writing the statistics API allows ~4000 requests/hour per IP (per token if a token is supplied).
All columns, always. There is no skip_geometry or properties argument — the API returns the full column set.
Month-of-year normals. To get month-of-year statistics from get_stats_por, make the start_date–end_date range overlap the first of the month (e.g. 01-01–03-01 returns the January, February, and March month-of-year stats in addition to each day-of-year).
Monthly/annual intervals. get_stats_date_range returns a summary for every calendar month, calendar year, and water year that intersects the range.
Median = the 50th percentile. Requesting both median and percentile duplicates the median; you rarely need both.
Min/max are not percentiles. Use computation_type=["minimum", "maximum", "percentile"] for a complete set of order statistics (as we did for the band plot).
Fixed percentiles. percentile only ever returns the 5th, 10th, 25th, 50th, 75th, 90th, and 95th. For other percentiles, pull the daily record with get_daily and compute them yourself.
Watch ``sample_count``. It’s the number of observations behind a statistic; there is no minimum, so a monthly/annual value can rest on a single daily observation.

More help

Documentation: https://doi-usgs.github.io/dataretrieval-python/
Statistics documentation: https://waterdata.usgs.gov/statistics-documentation/
Equivalent R article: daily statistics
Issues / questions: https://github.com/DOI-USGS/dataretrieval-python/issues

Daily statistics: get_stats_por and get_stats_date_range