Preprocessing hydrography

Sometimes we might need to customize a pre-existing hydrography dataset in some way, for example by adding additional streams that weren’t mapped or fixing errors in mapped streams. If we’re working with a large area or with the High Resolution version of NHDPlus, we may also need to combine file sets from multiple areas, or prune intermittent headwater streams from the dataset.

The preprocessing module of SFRmaker is intended to automate these tasks in a consistent, dependable way. Currently, the preprocessing module accepts NHDPlus version 2 data as input, and outputs either a culled and merged set of NHDPlus version 2 files (cull_flowlines function) or a single shapefile with all of the information needed to build an SFR package (preprocess_preprocess_nhdplus function).

As of SFRmaker 0.9.3, the preprocessing module is still a work in progress, and liable to change as improvements are made to streamline the workflow. This page is based on the preprocessing_demo.ipynb notebook in the examples/ folder, and therefore reflects the current state of the code.

[1]:
from pathlib import Path
import geopandas as gpd
import matplotlib.pyplot as plt
from sfrmaker.preprocessing import (
    clip_flowlines_to_polygon,
    cull_flowlines,
    preprocess_nhdplus
)

We start by supplying one or more paths to NHDPlus data (in the same structure as downloaded). Multiple drainage basins can be combined by included multiple paths in the list (for example, the Great Lakes and Upper Mississippi basins as shown here:

[2]:
NHDPlus_paths_list = ['/NHDPlusGL/NHDPlus04/',
                      '/NHDPlusMS/NHDPlus07/']

Merging and culling flowlines

The cull_flowlines() function culls NHDPlus version 2 data to a bounding box defined by an active_area coordinate tuple or polygon, and to flowlines with arbolate sums greater than specified thresholds. There are two arbolate sum thresholds:

  • asum_thresh: all non perennial streams (with FCodes other than 46006) with arbolate sums less than this amount (in km) will be culled.

  • intermittent_streams_asum_thresh: streams coded as intermittent (FCode 46003) with arbolate sums less than this amount will be culled. This allows other features such as ditches or artifical paths to be culled more aggressively than intermittent streams.

Neither of the arbolate sum thresholds apply to streams classified as perennial, which are assumed to be important boundary conditions that should be included in the model. With the SFR package, streams are allowed to dry, in which case they have no effect on the groundwater solution (until there is water again). Therefore, the goal of culling of streams is mostly to achieve a stream network that is computationally managable and appropriate for the scale of the model, while retaining features that may affect groundwater flow.

cull_flowlines() can also remove lines that are isolated from the stream network (cull_isolated=True) or are missing attribute information (cull_invalid=True). The output data are merged in a single set of NHDPlus version 2 files that are saved to outfolder:

elevslope.dbf
PlusFlow.dbf
PlusFlowlineVAA.dbf
NHDFlowline.shp

The Tyler Forks example is contained in the Great Lakes basin, so we just need one path. We also need to specify a folder for writing the merged and culled files.

[3]:
outfolder = Path('output')
outfolder.mkdir(exist_ok=True)

results = cull_flowlines(NHDPlus_paths=['../tylerforks/NHDPlus/'],
                         asum_thresh=3, intermittent_streams_asum_thresh=3,
                         cull_invalid=True, cull_isolated=False,
                         active_area='../tylerforks/active_area.shp',
                         outfolder=outfolder)
for basins:
../tylerforks/NHDPlus/

reading ../tylerforks/active_area.shp...
--> building dataframe... (may take a while for large shapefiles)

reading ../tylerforks/NHDPlus/NHDSnapshot/Hydrography/NHDFlowline.shp...
filtering on bounding box -90.59552642527598, 46.37324928457199, -90.45520721646749, 46.43603727904705...
--> building dataframe... (may take a while for large shapefiles)

reading ../tylerforks/NHDPlus/NHDPlusAttributes/PlusFlowlineVAA.dbf...
--> building dataframe... (may take a while for large shapefiles)

reading ../tylerforks/NHDPlus/NHDPlusAttributes/PlusFlow.dbf...
--> building dataframe... (may take a while for large shapefiles)

reading ../tylerforks/NHDPlus/NHDPlusAttributes/elevslope.dbf...
--> building dataframe... (may take a while for large shapefiles)
writing output/flowlines_gt3km.shp... Done
writing output/PlusFlowlineVAA_gt3km.dbf... Done
writing output/PlusFlow_gt3km.dbf... Done
writing output/elevslope_gt3km.dbf... Done
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:117: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if LooseVersion(fiona.__gdal_version__) < LooseVersion("3.0.0"):
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:117: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if LooseVersion(fiona.__gdal_version__) < LooseVersion("3.0.0"):

The output files are listed in a returned results dictionary, and are named based on asum_thresh. This allows for easy experimentation with variuos arbolate sum thresholds, without a lot of extra file wrangling.

[4]:
results
[4]:
{'flowlines_file': 'output/flowlines_gt3km.shp',
 'pfvaa_file': 'output/PlusFlowlineVAA_gt3km.dbf',
 'pf_file': 'output/PlusFlow_gt3km.dbf',
 'elevslope_file': 'output/elevslope_gt3km.dbf'}

Plot the culled flowlines

We can use geopandas to quickly plot the results. As noted above, all perennial streams that intersect the active area bounding box are retained (blue lines). Within the active area, two intermittent streams with arbolate sums of less than 3km are culled (red lines).

[5]:
original_fl = gpd.read_file('../tylerforks/NHDPlus/NHDSnapshot/Hydrography/NHDFlowline.shp')
fig, ax = plt.subplots(figsize=(11.5, 8))
original_fl.plot(ax=ax, color='r')

culled_fl = gpd.read_file(results['flowlines_file'])
culled_fl.plot(ax=ax)

active_area = gpd.read_file('../tylerforks/active_area.shp')
active_area.plot(ax=ax, zorder=-1, fc='0.9')
active_area.envelope.plot(ax=ax, color='1.0', ec='k', zorder=-2)
[5]:
<Axes: >
../_images/notebooks_preprocessing_demo_9_1.png

Preprocessing the flowlines

The preprocess_nhdplus() function creates a single shapefile with the information needed to build an SFR package (that can be input to `sfrmaker.Lines.from_shapefile() <https://aleaf.github.io/sfrmaker/latest/api/sfrmaker.lines.html#sfrmaker.lines.Lines.from_shapefile>`__). The set of unified NHDPlus files from cull_flowlines() can be input or, if no culling is needed and the project area falls within a single drainage basin, an original set of NHDPlus version 2 files can be used.

The shapefile output from preprocess_nhdplus() can be thought of as a grid-independent representation of the SFR package. Further editing can be done on the shapefile either manually in a GIS environment, or automatically with the `preprocessing.edit_flowlines() <https://aleaf.github.io/sfrmaker/latest/api/sfrmaker.preprocessing.html#sfrmaker.preprocessing.edit_flowlines>`__ function. Abstracting the key details of the SFR package to this shapefile allows for the SFR package to be easily regenerated if other apsects of the model (such as the grid) change.

To make a suitable SFR dataset, preprocess_nhdplus() does some additional processing that is described in more detail in the code reference, including:
* selecting a single downstream segment at each divergence, using zonal statistics computed on buffered areas (customizable with buffersize_meters) around the lines. * for instances where the automated proceedure selects the wrong distributary, or where routing information is absent in NHDPlus, connections can be established manually via a dictionary of FROMCOMID:TOCOMID pairs * smoothed streambed elevations are also produced in the zonal statistics sampling; for large models with cell sizes of several hundred meters or more (in which zonal statistics may take a hour or longer to run), this may be preferable to sampling the elevations during the construction of the SFR package, which would greatly slow (re)building of the model. * after the divergences are pruned to single downstream segments, arbolate sums are recomputed, and the lines can optionally be culled again with the asum_thresh argument here. This can be used to remove minor distributies that are only active during high water events, and therefore potentially not relevant to the groundwater model. * the computed arbolate sums are also used to estimate channel widths, using the relationship \(a\cdot arbolate\_sum ^b\) (Leaf, 2023; Feinstein et al, 2010). The \(a\) and \(b\) parameters in this relationship can be adjusted as needed to achieve the desired widths for a study area. A minimum_width can also be specified for headwater tributaries that may not be well represented by the above power-law relationship (and that may also have greater streambed conductance).

Additional processing options not shown below

  • for large project areas, additional width information from the North American Width Dataset (NARWidth) can be incorporated via the narwidth_shapefile argument (see Leaf, 2023; Allen and Pavelsky, 2015 and the code reference

  • field measurements of streambed elevation can be incorporated into the smoothed streambed elevations produced for each line, via the update_up_elevations and update_dn_elevations arguments.

[6]:
preprocessed_flowlines = preprocess_nhdplus(
    flowlines_file=results['flowlines_file'],
    pfvaa_file=results['pfvaa_file'],
    pf_file=results['pf_file'],
    elevslope_file=results['elevslope_file'],
    demfile='../tylerforks/dem_26715.tif',
    dem_length_units='meters',
    buffersize_meters=50,
    asum_thresh= 3.,
    width_from_asum_a_param=0.0592,
    width_from_asum_b_param=0.5127,
    minimum_width=1.,
    known_connections={1814967: 1814897},
    output_length_units='meters',
    outfolder=outfolder,
    dest_crs=26915  # UTM zone 15 north
)

reading output/PlusFlowlineVAA_gt3km.dbf...
--> building dataframe... (may take a while for large shapefiles)

reading output/PlusFlow_gt3km.dbf...
--> building dataframe... (may take a while for large shapefiles)

reading output/elevslope_gt3km.dbf...
--> building dataframe... (may take a while for large shapefiles)
writing output/flowlines_gt3km_buffers.shp... Done

Smoothing elevations...
finished in 0.00s
writing output/preprocessed_flowlines_gt3km.shp... Done
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:117: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if LooseVersion(fiona.__gdal_version__) < LooseVersion("3.0.0"):
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:117: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if LooseVersion(fiona.__gdal_version__) < LooseVersion("3.0.0"):
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:117: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if LooseVersion(fiona.__gdal_version__) < LooseVersion("3.0.0"):
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:117: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if LooseVersion(fiona.__gdal_version__) < LooseVersion("3.0.0"):

preprocess_nhdplus() writes a shapefile of the preprocessed flowlines, and returns a GeoDataFrame representation that is described in detail in the code reference.

[7]:
preprocessed_flowlines.head()
[7]:
COMID FDATE RESOLUTION GNIS_ID GNIS_NAME LENGTHKM REACHCODE FLOWDIR WBAREACOMI FTYPE ... elevup elevdn elevupsmo elevdnsmo asum_calc asum_diff width1asum width2asum width1 width2
COMID
1814983 1814983 1999-07-05 Medium 1580686 Tyler Forks 2.947 04010302000160 With Digitized 0 StreamRiver ... 343.306821 340.423085 343.306821 340.423085 133.052 0.0 7.558493 7.645791 7.558493 7.645791
1814907 1814907 1999-07-05 Medium 1580686 Tyler Forks 1.483 04010302000161 With Digitized 0 StreamRiver ... 344.879608 343.306821 344.879608 343.306821 125.007 0.0 7.359996 7.405168 7.359996 7.405168
1814869 1814869 1999-07-05 Medium 1580686 Tyler Forks 7.555 04010302000162 With Digitized 0 StreamRiver ... 353.597050 344.879608 353.597050 344.879608 119.314 0.0 6.991834 7.230301 6.991834 7.230301
1814859 1814859 1999-07-05 Medium 1580686 Tyler Forks 2.558 04010302000163 With Digitized 0 StreamRiver ... 367.337701 353.597050 367.337701 353.597050 107.771 0.0 6.778782 6.862785 6.778782 6.862785
1814897 1814897 1999-07-05 Medium 1580686 Tyler Forks 1.624 04010302000164 With Digitized 0 StreamRiver ... 389.698828 367.337701 389.698828 367.337701 102.417 0.0 6.631250 6.685816 6.631250 6.685816

5 rows × 38 columns

[8]:
ax = preprocessed_flowlines.plot()
active_area.to_crs(26915).plot(ax=ax, zorder=-1, fc='0.9')
[8]:
<Axes: >
../_images/notebooks_preprocessing_demo_14_1.png

Clipping the flowlines to a specific area

As a final step, we may want to clip the flowlines to an irregular area where SFR will be included in the model. For large project areas, this can reduce the file size and make the preprocessed lines easier to work with. This step may be best done last, so that the information in the surrounding flowlines (routing, elevations, etc) can be used in the preprocessing above. For large study areas with an overly detailed active area boundary (for example, one generated from a raster), specifying a simplification tolerance (simplify_tol) for the active area boundary can greatly speed up the clipping. All lines that intersect the simplified active area will be retained.

[9]:
clipped_flowlines = clip_flowlines_to_polygon(
    preprocessed_flowlines, '../tylerforks/active_area.shp',
    simplify_tol=100)
/home/runner/micromamba/envs/sfrmaker_ci/lib/python3.12/site-packages/gisutils/shapefile.py:278: FionaDeprecationWarning: instances of this class -- CRS, geometry, and feature objects -- will become immutable in fiona version 2.0
  props['geometry'] = line.get('geometry', None)

reading ../tylerforks/active_area.shp...
--> building dataframe... (may take a while for large shapefiles)
starting lines: 19
remaining lines: 11
[10]:
clipped_flowlines.plot()
[10]:
<Axes: >
../_images/notebooks_preprocessing_demo_17_1.png