Percentile Calculation Examples

Here we showcase the percentile calculation functionality available in the hyswap.percentiles module.

Calculating Historic Percentiles for One Site

The hyswap.percentiles.calculate_fixed_percentile_thresholds function is used to calculate a set of percentile thresholds given a set of data. This function simply calculates one set of fixed percentile thresholds using all available data, and is not intended to be used for calculating percentile thresholds separately for individual days of the year.

By default this method calculates percentiles using the Weibull distribution with an alpha parameter of 0 and a beta parameter of 0. The Weibull distribution is set as the default for percentile calculations after the USGS Guidelines for determining flood flow frequency — Bulletin 17C, Appendix 5.

Below is an example of fetching NWIS streamflow data for a USGS gage and then calculating the 10th, 50th, and 90th percentiles for that data.

# fetch data from NWIS using dataretrieval
df, _ = dataretrieval.nwis.get_dv("03586500",
                                  parameterCd="00060",
                                  start="1776-01-01",
                                  end="2022-12-31")

# calculate percentiles
pct_values = hyswap.percentiles.calculate_fixed_percentile_thresholds(
    df['00060_Mean'], percentiles=[10, 50, 90])

# print percentile values (corresponding to 10th, 50th, 90th percentiles)
print(pct_values)
         min  p10   p50    p90      max    mean  count start_yr end_yr
values  0.23  4.9  75.6  637.0  18400.0  278.54  27911     1935   2022

The percentile calculation can use a method other than the Weibull method if desired by specifying a keyword parameter method. See the numpy.percentile documentation for more information on the available methods.

Calculating Historic Variable Percentiles for the Full Year

The hyswap.percentiles module also contains functionality to calculate percentile thresholds for each day of the year (variable threshold) using historical values. This is done using the hyswap.percentiles.calculate_variable_percentile_thresholds_by_day function. This function also defaults to using the Weibull distribution to calculate percentiles, but can use other methods as well just like the hyswap.percentiles.calculate_fixed_percentile_thresholds function.

Below is an example of fetching NWIS streamflow data for a USGS gage and then calculating the 10th, 50th, and 90th percentiles for each day of the year.

# fetch data from NWIS using dataretrieval
df, _ = dataretrieval.nwis.get_dv("03586500",
                                  parameterCd="00060",
                                  start="1776-01-01",
                                  end="2022-12-31")

# calculate percentiles by day
pcts = hyswap.percentiles.calculate_variable_percentile_thresholds_by_day(
    df, '00060_Mean', percentiles=[10, 50, 90])

# print first 5 rows of the percentile dataframe
print(pcts.head())
           min   p10    p50     p90     max    mean count start_yr end_yr
month_day
01-01      2.0  29.6  226.0  1514.0  4810.0  523.32    75     1936   2022
01-02      2.0  41.0  199.0  1694.0  3760.0  533.11    75     1936   2022
01-03      2.0  43.4  260.0  1724.0  3620.0  567.63    75     1936   2022
01-04      2.0  56.4  239.0  1800.0  5120.0  589.99    75     1936   2022
01-05      2.0  53.8  257.0  1368.0  8690.0  624.48    75     1936   2022

By default, percentiles are only computed for days which have at least 10 years of data available, however this parameter can be altered by setting the min_years parameter to a different value. Multi-day averaging can also be performed by setting the window_width parameter to a value like 7-day, 14-day, or 28-day, the default value is daily which is no temporal averaging. See the function documentation (hyswap.percentiles.calculate_variable_percentile_thresholds_by_day) for additional details about the parameters and options for this function.

Interpolating New Percentiles Using Previously Calculated Percentiles

To support faster calculations of percentiles without the need to repeatedly fetch all historic data from NWIS, the hyswap.percentiles.calculate_fixed_percentile_from_value and :obj:`hyswap.percentiles.calculate_variable_percentile_from_value`functions support the interpolation of a new percentile value for a measurement given a previously calculated set of percentiles and their associated values.

First is an example of fetching NWIS streamflow data for a USGS gage and then calculating the 10th, 50th, and 90th fixed-threshold percentiles using all of the data. Then, a new fixed-threshold percentile value is interpolated for a measurement of 100.0 cfs.

# fetch data from NWIS using dataretrieval
df, _ = dataretrieval.nwis.get_dv("03586500",
                                  parameterCd="00060",
                                  start="1776-01-01",
                                  end="2022-12-31")

# calculate percentiles
pct_values = hyswap.percentiles.calculate_fixed_percentile_thresholds(
    df['00060_Mean'], percentiles=[10, 50, 90])

# calculate the percentile associated with 100.0 cfs
pct = hyswap.percentiles.calculate_fixed_percentile_from_value(
    100.0, pct_values)

# print that percentile value
print(pct)
51.74

Next is an example of fetching NWIS streamflow data for a USGS gage and then calculating the variable-threshold percentiles using all of the data. Then, a new variable-threshold percentile value is interpolated for a measurement of 100.0 cfs on September 1st.

# fetch data from NWIS using dataretrieval
df, _ = dataretrieval.nwis.get_dv("03586500",
                                  parameterCd="00060",
                                  start="1776-01-01",
                                  end="2022-12-31")

# calculate percentiles
pct_values = hyswap.percentiles.calculate_variable_percentile_thresholds_by_day(
    df,'00060_Mean')

# calculate the percentile associated with 100.0 cfs for September 1st
pct = hyswap.percentiles.calculate_variable_percentile_from_value(
    100.0, pct_values, '09-01')

# print that percentile value
print(pct)
90.03

Percentiles can also be calculated for multiple streamflow values at once. Below is an example of fetching NWIS streamflow data for a USGS gage and then calculating variable-threshold percentiles using all of the data. Then, new variable-threshold percentile values are interpolated for measurements from a recent month.

# fetch data from NWIS using dataretrieval
df, _ = dataretrieval.nwis.get_dv("03586500",
                                  parameterCd="00060",
                                  start="1776-01-01",
                                  end="2022-12-31")

# calculate percentiles
pct_values = hyswap.percentiles.calculate_variable_percentile_thresholds_by_day(
    df,'00060_Mean')

# fetch data from NWIS using dataretrieval
new_df, _ = dataretrieval.nwis.get_dv("03586500",
                                  parameterCd="00060",
                                  start="2023-01-01",
                                  end="2023-01-31")

# calculate the percentile associated streamflow for January, 2023
pcts = hyswap.percentiles.calculate_multiple_variable_percentiles_from_values(
    new_df, '00060_Mean', pct_values)

# print that percentile value
print(pcts['est_pct'].head())

datetime
2023-01-01    24.31
2023-01-02    21.20
2023-01-03    33.21
2023-01-04    77.94
2023-01-05    74.12

Below is an example of fetching variable-threshold percentiles for January 1st and their associated values from the NWIS statistics service for a USGS gage and then calculating a new variable-threshold percentile value for a measurement of 100.0 cfs.

# fetch data from NWIS using dataretrieval
df, _ = nwis.get_stats("03586500",
                                        parameterCd="00060",
                                        statReportType="daily")

# munge the data
munged_df = hyswap.utils.munge_nwis_stats(df)

# calculate the percentile associated with 100.0 cfs
pct = hyswap.percentiles.calculate_variable_percentile_from_value(
    100.0, munged_df, '01-01')

# print that percentile value
print(np.round(pct, 2))
22.97

Categorizing Streamflow Conditions Based on Estimated Percentiles

To support generation of tables, figures and maps of current and past streamflow conditions, the category of a given streamflow can be determined using hyswap.utils.categorize_flows. The function assigns a category to a given streamflow observation based on interpolated percentiles and a given categorization schema.

Below is an example of fetching NWIS streamflow data for a USGS gage and then calculating the variable-threshold percentiles using all of the data. Then, new variable-threshold percentile values are interpolated for measurements from a recent month and flow categories assigned.

# fetch data from NWIS using dataretrieval
df, _ = dataretrieval.nwis.get_dv("04288000",
                                  parameterCd="00060",
                                  start="1900-01-01",
                                  end="2022-12-31")

# calculate percentiles
pct_values = hyswap.percentiles.calculate_variable_percentile_thresholds_by_day(
    df,'00060_Mean')

# fetch data from NWIS using dataretrieval
new_df, _ = dataretrieval.nwis.get_dv("03586500",
                                  parameterCd="00060",
                                  start="2023-01-01",
                                  end="2023-01-31")

# calculate the percentile associated with streamflow for January, 2023
new_df = hyswap.percentiles.calculate_multiple_variable_percentiles_from_values(
    new_df, '00060_Mean', pct_values)

# categorize streamflow using the default categorization schema
flow_cat = hyswap.utils.categorize_flows(new_df, 'est_pct', schema_name='NWD')

# print that flow categorizations
print(flow_cat[['00060_Mean', 'est_pct', 'flow_cat']].head())
                            00060_Mean  est_pct           flow_cat
datetime
2023-01-01 00:00:00+00:00       112.0    26.70             Normal
2023-01-02 00:00:00+00:00       103.0    23.75       Below normal
2023-01-03 00:00:00+00:00       170.0    43.13             Normal
2023-01-04 00:00:00+00:00       823.0    96.00  Much above normal
2023-01-05 00:00:00+00:00       559.0    93.34  Much above normal