03: Useful standard library modules

(pathlib, shutil, sys, os, subprocess, zipfile, etc.)

These packages are part of the standard python library and provide very useful functionality for working with your operating system and files. This notebook will provide explore these packages and demonstrate some of their functionality. Online documentation is at https://docs.python.org/3/library/.

Topics covered:

  • pathlib:

    • listing files

    • creating, moving and deleting files

    • absolute vs relative paths

    • useful path object attributes

  • shutil:

    • copying, moving and deleting files AND folders

  • sys:

    • python and platform information

    • command line arguments

    • modifying the python path to import code from other locations

  • os:

    • changing the working directory

    • recursive iteration through folder structures

    • accessing environmental variables

  • subprocess:

    • running system commands and checking the results

  • zipfile:

    • creating and extracting from zip archives

[1]:
import os
from pathlib import Path
import shutil
import subprocess
import sys
import zipfile

pathlib — Object-oriented filesystem paths

Pathlib provides convenient “pathlike” objects for working with file paths across platforms (meaning paths or operations done with pathlib work the same on Windows or POSIX systems (Linux, OSX, etc)). The main entry point for users is the Path() class.

Make a Path() object for the current folder

[2]:
cwd = Path('.')
cwd
[2]:
PosixPath('.')
[3]:
for f in cwd.iterdir():
    print(f)
07b_VSCode.md
10_Rasterio.ipynb
05_numpy.ipynb
11_xarray_mt_rainier_precip.ipynb
00_python_basics_review.ipynb
06b_matplotlib_animation.ipynb
09_a_Geopandas.ipynb
data
solutions
09_b_Geopandas_ABQ.ipynb
03_useful-std-library-modules.ipynb

List just the notebooks using the .glob() method

[4]:
for nb in cwd.glob('*.ipynb'):
    print(nb)
10_Rasterio.ipynb
05_numpy.ipynb
11_xarray_mt_rainier_precip.ipynb
00_python_basics_review.ipynb
06b_matplotlib_animation.ipynb
09_a_Geopandas.ipynb
09_b_Geopandas_ABQ.ipynb
03_useful-std-library-modules.ipynb

Note: .glob() works across folders too

List all notebooks for both class components

[5]:
for nb in cwd.glob('../*/*.ipynb'):
    print(nb)
../part0_python_intro/10_Rasterio.ipynb
../part0_python_intro/05_numpy.ipynb
../part0_python_intro/11_xarray_mt_rainier_precip.ipynb
../part0_python_intro/00_python_basics_review.ipynb
../part0_python_intro/06b_matplotlib_animation.ipynb
../part0_python_intro/09_a_Geopandas.ipynb
../part0_python_intro/09_b_Geopandas_ABQ.ipynb
../part0_python_intro/03_useful-std-library-modules.ipynb
../part1_flopy/01-Flopy-intro.ipynb
../part1_flopy/05-unstructured-grids.ipynb
../part1_flopy/08_Modflow-setup-demo.ipynb
../part1_flopy/09-gwt-voronoi-demo.ipynb
../part1_flopy/10_modpath_particle_tracking-demo.ipynb

But glob results aren’t sorted alphabetically!

(and the sorting is platform-dependent)

https://arstechnica.com/information-technology/2019/10/chemists-discover-cross-platform-python-scripts-not-so-cross-platform/?comments=1&post=38113333

we can easily sort them by casting the results to a list

[6]:
sorted(list(cwd.glob('../*/*.ipynb')))
[6]:
[PosixPath('../part0_python_intro/00_python_basics_review.ipynb'),
 PosixPath('../part0_python_intro/03_useful-std-library-modules.ipynb'),
 PosixPath('../part0_python_intro/05_numpy.ipynb'),
 PosixPath('../part0_python_intro/06b_matplotlib_animation.ipynb'),
 PosixPath('../part0_python_intro/09_a_Geopandas.ipynb'),
 PosixPath('../part0_python_intro/09_b_Geopandas_ABQ.ipynb'),
 PosixPath('../part0_python_intro/10_Rasterio.ipynb'),
 PosixPath('../part0_python_intro/11_xarray_mt_rainier_precip.ipynb'),
 PosixPath('../part1_flopy/01-Flopy-intro.ipynb'),
 PosixPath('../part1_flopy/05-unstructured-grids.ipynb'),
 PosixPath('../part1_flopy/08_Modflow-setup-demo.ipynb'),
 PosixPath('../part1_flopy/09-gwt-voronoi-demo.ipynb'),
 PosixPath('../part1_flopy/10_modpath_particle_tracking-demo.ipynb')]

Note: There is also a glob module in the standard python library that works directly with string paths

[7]:
import glob
sorted(list(glob.glob('../*/*.ipynb')))
[7]:
['../part0_python_intro/00_python_basics_review.ipynb',
 '../part0_python_intro/03_useful-std-library-modules.ipynb',
 '../part0_python_intro/05_numpy.ipynb',
 '../part0_python_intro/06b_matplotlib_animation.ipynb',
 '../part0_python_intro/09_a_Geopandas.ipynb',
 '../part0_python_intro/09_b_Geopandas_ABQ.ipynb',
 '../part0_python_intro/10_Rasterio.ipynb',
 '../part0_python_intro/11_xarray_mt_rainier_precip.ipynb',
 '../part1_flopy/01-Flopy-intro.ipynb',
 '../part1_flopy/05-unstructured-grids.ipynb',
 '../part1_flopy/08_Modflow-setup-demo.ipynb',
 '../part1_flopy/09-gwt-voronoi-demo.ipynb',
 '../part1_flopy/10_modpath_particle_tracking-demo.ipynb']

List just the subfolders

[8]:
[f for f in cwd.iterdir() if f.is_dir()]
[8]:
[PosixPath('data'), PosixPath('solutions')]

Create a new path for the data subfolder

[9]:
data_path = cwd / 'data'
data_path
[9]:
PosixPath('data')

or an individual file

[10]:
f = cwd / '00_python_basics_review.ipynb'
f
[10]:
PosixPath('00_python_basics_review.ipynb')

check if it exists, or if it’s a directory

[11]:
f.exists(), f.is_dir()
[11]:
(True, False)

make a new subdirectory

[12]:
new_folder = cwd / 'more_files'
new_folder
[12]:
PosixPath('more_files')
[13]:
new_folder.exists()
[13]:
False
[14]:
new_folder.mkdir(); new_folder.exists()
[14]:
True

Note that if you try to run the above cell twice, you’ll get an error that the folder already exists exist_ok=True supresses these errors.

[15]:
new_folder.mkdir(exist_ok=True)

make a new subfolder within a new subfolder

The parents=True argument allows for making subfolders within new subfolders

[16]:
(new_folder / 'subfolder').mkdir(exist_ok=True, parents=True)

Get the absolute location of the current working directory

[17]:
abs_cwd = Path.cwd()
abs_cwd
[17]:
PosixPath('/home/runner/work/python-for-hydrology/python-for-hydrology/docs/source/notebooks/part0_python_intro')

Go up two levels to the course repository

[18]:
class_root = (abs_cwd / '../../')
class_root
[18]:
PosixPath('/home/runner/work/python-for-hydrology/python-for-hydrology/docs/source/notebooks/part0_python_intro/../..')

Simplify or resolve the path

[19]:
class_root = class_root.resolve()
class_root
[19]:
PosixPath('/home/runner/work/python-for-hydrology/python-for-hydrology/docs/source')

Get the cwd relative to the course repository

[20]:
abs_cwd.relative_to(class_root)
[20]:
PosixPath('notebooks/part0_python_intro')

check if this is an absolute or relative path

[21]:
abs_cwd.relative_to(class_root).is_absolute()
[21]:
False
[22]:
abs_cwd.is_absolute()
[22]:
True

gottcha: Path.relative_to() only works when the first path is a subpath of the second path, or if both paths are aboslute

For example, try executing this line:

Path('../part1_flopy/').relative_to('data')

If you need a relative path that will work robustly in a script, os.path.relpath might be a better choice

[23]:
os.path.relpath('../part1_flopy/', 'data')
[23]:
'../../part1_flopy'
[24]:
os.path.relpath('data', '../part1_flopy/')
[24]:
'../part0_python_intro/data'
[25]:
abs_cwd.parent
[25]:
PosixPath('/home/runner/work/python-for-hydrology/python-for-hydrology/docs/source/notebooks')
[26]:
abs_cwd.parent.parent
[26]:
PosixPath('/home/runner/work/python-for-hydrology/python-for-hydrology/docs/source')
[27]:
f.name
[27]:
'00_python_basics_review.ipynb'
[28]:
f.suffix
[28]:
'.ipynb'
[29]:
f.with_suffix('.junk')
[29]:
PosixPath('00_python_basics_review.junk')
[30]:
f.stem
[30]:
'00_python_basics_review'

Make a file

[31]:
fname = Path('new_file.txt')
with open(fname, 'w') as dest:
    dest.write("A new text file.")
[32]:
fname.exists()
[32]:
True

Move the file

[33]:
fname2 = Path('new_file2.txt')
fname.rename(fname2)
[33]:
PosixPath('new_file2.txt')
[34]:
fname.exists()
[34]:
False

Delete the file

[35]:
fname2.unlink()
[36]:
fname2.exists()
[36]:
False

Delete the empty folder we made above

Note: this only works for empty directories (use shutil.rmtree() very carefully for removing folders and all contents within)

[37]:
Path('more_files/subfolder/').rmdir()

shutil — High-level file operations

module for copying, moving, and deleting files and directories.

https://docs.python.org/3/library/shutil.html

The functions from shutil that you may find useful are:

shutil.copy()
shutil.copy2()  # this preserves most metadata (i.e. dates); unlike copy()
shutil.copytree()
shutil.move()
shutil.rmtree()  #obviously, you need to be careful with this one!

Give these guys a shot and see what they do. Remember, you can always get help by typing:

help(shutil.copy)
[38]:
#try them here.  Be careful!
[39]:
shutil.rmtree(new_folder)

sys — System-specific parameters and functions

Getting information about python and the os

where python is installed

[40]:
print(sys.prefix)
/home/runner/micromamba/envs/pyclass-docs
[41]:
print(sys.version_info)
sys.version_info(major=3, minor=11, micro=9, releaselevel='final', serial=0)
[42]:
sys.platform
[42]:
'linux'

Adding command line arguments to a script

Here the command line arguments reflect that we’re running a Juptyer Notebook.

In a python script, command line arguments are listed after the first item in the list.

[43]:
sys.argv
[43]:
['/home/runner/micromamba/envs/pyclass-docs/lib/python3.11/site-packages/ipykernel_launcher.py',
 '-f',
 '/tmp/tmplw6oqjym.json',
 '--HistoryManager.hist_file=:memory:']

Exercise: Make a script with a command line argument using sys.argv

  1. Using a text editor such as VSCode, make a new *.py file with the following contents:

import sys

if len(sys.argv) > 1:
    for argument in sys.argv[1:]:
        print(argument)
else:
    print("usage is: python <script name>.py argument")
    quit()
  1. Try running the script at the command line

modifying the python path

If you haven’t seen sys.path already mentioned in a python script, you will soon. sys.path is a list of directories. This path list is used by python to search for python modules and packages. If for some reason, you want to use a python package or module that is not installed in the main python folder, you can add the directory containing your module to sys.path.

Any packages installed by linking the source code in place (i.e. pip install -e . will also show up here.

[44]:
for pth in sys.path:
    print(pth)
/home/runner/micromamba/envs/pyclass-docs/lib/python311.zip
/home/runner/micromamba/envs/pyclass-docs/lib/python3.11
/home/runner/micromamba/envs/pyclass-docs/lib/python3.11/lib-dynload

/home/runner/micromamba/envs/pyclass-docs/lib/python3.11/site-packages

Using sys.path to import code from an arbitrary location

  1. Using a text editor such as VSCode (or pathlib and python) make a new *.py file in another folder (anything in the same folder as this notebook can already be imported). For example:

[45]:
subfolder = Path('another_subfolder/scripts')
subfolder.mkdir(exist_ok=True, parents=True)

with open(subfolder / 'mycode.py', 'w') as dest:
    dest.write("stuff = {'this is': 'a dictionary'}")

Now add this folder to the python path

[46]:
sys.path.append('another_subfolder/scripts')

Code can be imported by calling the containing module

[47]:
from mycode import stuff

stuff
[47]:
{'this is': 'a dictionary'}

Note: Generally, importing code using sys.path is considered bad practice, because

  • it can hide dependencies.

    • from the information above, we don’t know whether mycode is a package that is installed, a module in the current folder, or anywhere else for that matter.

    • Similarly, we know that any modules from 'another_subfolder/scripts' can be imported, but we don’t know which modules in that folder are needed without some additional checking.

  • importing code using sys.path is also sensitive to the location of the script relative to the path. If the script is moved or used on someone else’s computer with a different file structure, it’ll break.

In general, installing reusable code in a package is the best way to go. Packages provide a framework for organizing, documenting, testing and sharing code in a way that is easily understood by others.

Whatever you do, avoid importing with an * (i.e. from mycode import *) at all costs. This imports everything from the namespace of a module, which can lead to unintended consequences.

os — Miscellaneous operating system interfaces¶

Historically, the os.path module was the de facto standard for file and path manipulation. Since python 3.4 however, pathlib is generally cleaner and easier to use for most of these operations. But there are some exceptions.

Changing the current working directory

pathlib doesn’t do this.
Note: this can obviously lead to trouble in scripts, so should usually be avoided, but sometimes it is necessary.
[48]:
# Example of changing the working directory
old_wd = os.getcwd()

# Go up one directory
os.chdir('..')
cwd = os.getcwd()
print ('Now in: ', cwd)

# Change back to original
os.chdir(old_wd)
cwd = os.getcwd()
print('Switched back to: ', cwd)
Now in:  /home/runner/work/python-for-hydrology/python-for-hydrology/docs/source/notebooks
Switched back to:  /home/runner/work/python-for-hydrology/python-for-hydrology/docs/source/notebooks/part0_python_intro

os.walk

os.walk() is a great way to recursively generate all the file names and folders in a directory. The following shows how it can be used to identify large directories.

[49]:
pth = Path('..')
results = list(os.walk(pth))
results
[49]:
[('..', ['part0_python_intro', 'part1_flopy'], []),
 ('../part0_python_intro',
  ['data', 'solutions', 'another_subfolder'],
  ['07b_VSCode.md',
   '10_Rasterio.ipynb',
   '05_numpy.ipynb',
   '11_xarray_mt_rainier_precip.ipynb',
   '00_python_basics_review.ipynb',
   '06b_matplotlib_animation.ipynb',
   '09_a_Geopandas.ipynb',
   '09_b_Geopandas_ABQ.ipynb',
   '03_useful-std-library-modules.ipynb']),
 ('../part0_python_intro/data',
  ['fileio', 'geopandas', 'pandas', 'rasterio', 'numpy', 'xarray'],
  ['theis_charles_vernon.jpg', 'netcdf_data.zip', 'dream.txt']),
 ('../part0_python_intro/data/fileio',
  [],
  ['FileWithComments.txt', 'friends.txt']),
 ('../part0_python_intro/data/geopandas',
  ['abq'],
  ['Madison_Tree_Species_Lookup.xlsx',
   'Neighborhood_Associations.geojson',
   'Street_Trees.geojson',
   'Madison_Parks.geojson']),
 ('../part0_python_intro/data/geopandas/abq',
  [],
  ['abq_films.geojson', 'zoneatlaspagegrid.kmz']),
 ('../part0_python_intro/data/pandas',
  [],
  ['stock_russian.jpg',
   'site_info.csv',
   'panda.jpg',
   'santa_rosa_CIMIS_83.csv',
   'RussianRiverGWsites.csv',
   'RR_gage_data.csv']),
 ('../part0_python_intro/data/rasterio',
  [],
  ['20150818_rainier_summer-tile-30.tif',
   '20080901_rainierlidar_30m-adj.tif',
   '19700901_ned1_2003_adj_warp.tif',
   'rgi60_glacierpoly_rainier.shx',
   'rgi60_glacierpoly_rainier.dbf',
   'rgi60_glacierpoly_rainier.shp',
   'rgi60_glacierpoly_rainier.prj']),
 ('../part0_python_intro/data/numpy',
  [],
  ['mt_st_helens_before.dat',
   'bottom_commented.dat',
   'ahf.csv',
   'bottom.dat',
   'mt_st_helens_after.dat',
   'bottom.txt']),
 ('../part0_python_intro/data/xarray',
  [],
  ['daymet_prcp_rainier_1980-2018.nc',
   'aligned-19700901_ned1_2003_adj_4269.tif']),
 ('../part0_python_intro/solutions',
  [],
  ['09_Geopandas__solutions.ipynb',
   '04_files_and_strings.ipynb',
   '08_pandas.ipynb',
   '01_functions_script__solution.ipynb',
   '05_numpy__solutions.ipynb',
   '02_Namespace_objects_modules_packages__solution.ipynb',
   '06_matplotlib__solution.ipynb',
   '07a_Theis-exercise-solution.ipynb',
   '03_useful-std-library-modules-solutions.ipynb']),
 ('../part0_python_intro/another_subfolder', ['scripts'], []),
 ('../part0_python_intro/another_subfolder/scripts',
  ['__pycache__'],
  ['mycode.py']),
 ('../part0_python_intro/another_subfolder/scripts/__pycache__',
  [],
  ['mycode.cpython-311.pyc']),
 ('../part1_flopy',
  ['data', 'solutions', 'data_project'],
  ['01-Flopy-intro.ipynb',
   '05-unstructured-grids.ipynb',
   '08_Modflow-setup-demo.ipynb',
   '09-gwt-voronoi-demo.ipynb',
   '10_modpath_particle_tracking-demo.ipynb',
   'basin.py']),
 ('../part1_flopy/data',
  ['quadtree',
   'voronoi',
   'pleasant-lake',
   'modelgrid_intersection',
   'depletion_results'],
  ['flopylogo_sm.png', 'pleasant_lgr_inset.yml', 'pleasant_lgr_parent.yml']),
 ('../part1_flopy/data/quadtree',
  ['grid'],
  ['project.rcha',
   'project.disv',
   'project.sfr',
   'project.hds',
   'project.ims',
   'project.disv.grb',
   'project.ic',
   'project.oc',
   'sfr_obs.csv',
   'project.nam',
   'project.sfr.obs',
   'project.npf',
   'project.chd',
   'project.wel',
   'project.cbc',
   'mfsim.nam',
   'project.lst',
   'project.tdis',
   'mfsim.lst']),
 ('../part1_flopy/data/quadtree/grid',
  [],
  ['qtg.c2.dat',
   'qtg.vtu',
   'qtg_sv.vtu',
   'quadtreegrid.top1.dat',
   '_gridgen_build.dfn',
   'qtg.nodesperlay.dat',
   'qtgrid_pt.shx',
   'qtg.area.dat',
   'qtgrid.shx',
   'qtg.gnc.dat',
   'qtg.c1.dat',
   'quadtreegrid.bot1.dat',
   'qtg.fldr.dat',
   'qtgrid_pt.dbf',
   'qtgrid.dbf',
   'qtg.fahl.dat',
   'qtgrid.shp',
   'qtgrid_pt.shp',
   '_gridgen_export.dfn',
   'qtg.ia.dat',
   'qtg.nod',
   'qtg.ja.dat',
   'quadtreegrid.dfn',
   'quadtreegrid.tsf',
   'qtg.iac.dat']),
 ('../part1_flopy/data/voronoi',
  ['grid'],
  ['project.rcha',
   'project.disv',
   'project.sfr',
   'project.hds',
   'project.ims',
   'project.disv.grb',
   'project.ic',
   'project.oc',
   'sfr_obs.csv',
   'project.nam',
   'project.sfr.obs',
   'project.npf',
   'project.chd',
   'project.wel',
   'project.cbc',
   'mfsim.nam',
   'project.lst',
   'project.tdis',
   'mfsim.lst']),
 ('../part1_flopy/data/voronoi/grid',
  [],
  ['_triangle.1.neigh',
   '_triangle.1.poly',
   '_triangle.1.edge',
   '_triangle.0.node',
   '_triangle.1.node',
   '_triangle.1.ele',
   '_triangle.0.poly']),
 ('../part1_flopy/data/pleasant-lake',
  ['external', 'source_data'],
  ['pleasant.lak.obs',
   'pleasant.chd.obs',
   'pleasant.rcha',
   'pleasant.nam',
   'pleasant.sfr.obs',
   'pleasant.dis',
   'pleasant.npf',
   'mfsim.nam',
   'pleasant.sto',
   'pleasant.oc',
   'pleasant.tdis',
   'pleasant.sfr',
   'pleasant.hds',
   'pleasant.chd',
   'pleasant.ic',
   'pleasant.obs',
   'pleasant.wel',
   'pleasant.ims',
   'pleasant.lak']),
 ('../part1_flopy/data/pleasant-lake/external',
  [],
  ['chd_001.dat',
   'rch_007.dat',
   'rch_004.dat',
   'botm_002.dat',
   'botm_003.dat',
   'k33_000.dat',
   'rch_011.dat',
   'pleasant_packagedata.dat',
   'chd_004.dat',
   'rch_009.dat',
   'rch_003.dat',
   'wel_001.dat',
   'chd_002.dat',
   'rch_006.dat',
   '600059060_stage_area_volume.dat',
   'chd_005.dat',
   'strt_003.dat',
   'wel_004.dat',
   'chd_007.dat',
   'k_003.dat',
   'chd_011.dat',
   'rch_000.dat',
   'irch.dat',
   'chd_008.dat',
   'rch_001.dat',
   'k_002.dat',
   'strt_002.dat',
   'wel_010.dat',
   'rch_002.dat',
   'wel_008.dat',
   'ss_003.dat',
   'idomain_003.dat',
   'wel_000.dat',
   'wel_009.dat',
   'rch_010.dat',
   'sy_000.dat',
   'k33_001.dat',
   'idomain_001.dat',
   'ss_000.dat',
   'strt_000.dat',
   'k_000.dat',
   'rch_012.dat',
   'wel_005.dat',
   'k33_002.dat',
   'ss_001.dat',
   'botm_000.dat',
   'sy_002.dat',
   'top.dat',
   'chd_000.dat',
   'chd_010.dat',
   'botm_001.dat',
   'k_001.dat',
   'sy_001.dat',
   'wel_006.dat',
   'idomain_002.dat',
   'rch_005.dat',
   'chd_006.dat',
   'pleasant_top.dat.original',
   'k33_003.dat',
   'strt_001.dat',
   'idomain_000.dat',
   'chd_009.dat',
   'wel_007.dat',
   'ss_002.dat',
   'rch_008.dat',
   'chd_003.dat',
   'chd_012.dat',
   'sy_003.dat']),
 ('../part1_flopy/data/pleasant-lake/source_data',
  ['rasters', 'shps', 'tables'],
  ['PRISM_ppt_tmean_stable_4km_189501_201901_43.9850_-89.5522.csv']),
 ('../part1_flopy/data/pleasant-lake/source_data/rasters',
  [],
  ['botm3.tif',
   'dem40m.tif',
   'botm2.tif',
   'pleasant_bathymetry.tif',
   'botm1.tif',
   'botm0.tif']),
 ('../part1_flopy/data/pleasant-lake/source_data/shps',
  ['NHDSnapshot', 'NHDPlusAttributes'],
  ['all_lakes.shx',
   'all_lakes.cpg',
   'all_lakes.shp',
   'all_lakes.dbf',
   'all_lakes.prj']),
 ('../part1_flopy/data/pleasant-lake/source_data/shps/NHDSnapshot',
  ['Hydrography'],
  []),
 ('../part1_flopy/data/pleasant-lake/source_data/shps/NHDSnapshot/Hydrography',
  [],
  ['NHDFlowline.prj',
   'NHDFlowline.shp',
   'NHDFlowline.shx',
   'NHDFlowline.cpg',
   'NHDFlowline.dbf']),
 ('../part1_flopy/data/pleasant-lake/source_data/shps/NHDPlusAttributes',
  [],
  ['elevslope.cpg',
   'PlusFlow.cpg',
   'elevslope.dbf',
   'PlusFlowlineVAA.cpg',
   'PlusFlowlineVAA.dbf',
   'PlusFlow.dbf']),
 ('../part1_flopy/data/pleasant-lake/source_data/tables',
  [],
  ['nwis_heads_info_file.csv',
   'wgnhs_head_targets.csv',
   'uwsp_heads.csv',
   'wdnr_gw_sites.csv',
   'area_stage_vol_Pleasant.csv',
   'gages.csv',
   'lake_sites.csv']),
 ('../part1_flopy/data/modelgrid_intersection',
  [],
  ['prcp.tif',
   'sagehen_nhd.shx',
   'sagehen_gage_data.csv',
   'sagehen_nhd.dbf',
   'sagehen_main_nhd.cpg',
   'pet.tif',
   'active_area.prj',
   'refined_area.dbf',
   'sagehen_nhd.shp',
   'ksat.img',
   'sagehen_basin.shp',
   'sagehen_basin.prj',
   'sagehen_main_nhd.shp',
   'sagehen_basin.cpg',
   'sagehen_nhd.cpg',
   'sagehen_main_nhd.dbf',
   'refined_area.prj',
   'sagehen_nhd.prj',
   'active_area.shp',
   'sagehen_main_nhd.shx',
   'refined_area.shp',
   'active_area.dbf',
   'trib_cells.txt',
   'refined_area.shx',
   'dem_30m.img',
   'sagehen_main_nhd.prj',
   'active_area.shx',
   'sagehen_basin.dbf',
   'pet.txt',
   'sagehen_basin.shx']),
 ('../part1_flopy/data/depletion_results', [], ['depletion_results.csv']),
 ('../part1_flopy/solutions',
  [],
  ['04_Modelgrid_and_intersection_solution.ipynb',
   '02-Building-Post-Processing-MODFLOW6__solutions.ipynb',
   '06-Project-voronoi.ipynb',
   '03_Loading_and_visualizing_models-solutions.ipynb',
   '06-Project-quadtree.ipynb',
   '07-stream_capture_voronoi.ipynb',
   '06-Project-structured_completed.ipynb']),
 ('../part1_flopy/data_project',
  [],
  ['pumping_well_locations.dbf',
   'inactive_area.dbf',
   'Flowline_river.prj',
   'aquifer_bottom.asc',
   'inactive_area.shp',
   'Flowline_river.shx',
   'Flowline_river.shp',
   'aquifer_k.asc',
   'active_area.shp',
   'active_area.dbf',
   'pumping_well_locations.shx',
   'pumping_well_locations.shp',
   'active_area.shx',
   'Flowline_river.dbf',
   'inactive_area.shx',
   'aquifer_top.asc'])]

Make a more readable list of just the jupyter notebooks

Note: the key advantage of os.walk over glob is the recursion– individual subfolder levels don’t need to be known or specified a priori.

[50]:
for root, dirs, files in os.walk(pth):
    for f in files:
        filepath = Path(root, f)
        if filepath.suffix == '.ipynb':
            print(filepath)
../part0_python_intro/10_Rasterio.ipynb
../part0_python_intro/05_numpy.ipynb
../part0_python_intro/11_xarray_mt_rainier_precip.ipynb
../part0_python_intro/00_python_basics_review.ipynb
../part0_python_intro/06b_matplotlib_animation.ipynb
../part0_python_intro/09_a_Geopandas.ipynb
../part0_python_intro/09_b_Geopandas_ABQ.ipynb
../part0_python_intro/03_useful-std-library-modules.ipynb
../part0_python_intro/solutions/09_Geopandas__solutions.ipynb
../part0_python_intro/solutions/04_files_and_strings.ipynb
../part0_python_intro/solutions/08_pandas.ipynb
../part0_python_intro/solutions/01_functions_script__solution.ipynb
../part0_python_intro/solutions/05_numpy__solutions.ipynb
../part0_python_intro/solutions/02_Namespace_objects_modules_packages__solution.ipynb
../part0_python_intro/solutions/06_matplotlib__solution.ipynb
../part0_python_intro/solutions/07a_Theis-exercise-solution.ipynb
../part0_python_intro/solutions/03_useful-std-library-modules-solutions.ipynb
../part1_flopy/01-Flopy-intro.ipynb
../part1_flopy/05-unstructured-grids.ipynb
../part1_flopy/08_Modflow-setup-demo.ipynb
../part1_flopy/09-gwt-voronoi-demo.ipynb
../part1_flopy/10_modpath_particle_tracking-demo.ipynb
../part1_flopy/solutions/04_Modelgrid_and_intersection_solution.ipynb
../part1_flopy/solutions/02-Building-Post-Processing-MODFLOW6__solutions.ipynb
../part1_flopy/solutions/06-Project-voronoi.ipynb
../part1_flopy/solutions/03_Loading_and_visualizing_models-solutions.ipynb
../part1_flopy/solutions/06-Project-quadtree.ipynb
../part1_flopy/solutions/07-stream_capture_voronoi.ipynb
../part1_flopy/solutions/06-Project-structured_completed.ipynb
[51]:
os.environ
[51]:
environ{'GITHUB_STATE': '/home/runner/work/_temp/_runner_file_commands/save_state_88f62326-4df8-488e-9a56-f6f46b84f903',
        'CONDA_PROMPT_MODIFIER': '(pyclass-docs) ',
        'STATS_TRP': 'true',
        'DOTNET_NOLOGO': '1',
        'DEPLOYMENT_BASEPATH': '/opt/runner',
        'USER': 'runner',
        'CI': 'true',
        'GITHUB_ENV': '/home/runner/work/_temp/_runner_file_commands/set_env_88f62326-4df8-488e-9a56-f6f46b84f903',
        'PIPX_HOME': '/opt/pipx',
        'RUNNER_ENVIRONMENT': 'github-hosted',
        'JAVA_HOME_8_X64': '/usr/lib/jvm/temurin-8-jdk-amd64',
        'SHLVL': '1',
        'CONDA_SHLVL': '1',
        'HOME': '/home/runner',
        'RUNNER_TEMP': '/home/runner/work/_temp',
        'GITHUB_EVENT_PATH': '/home/runner/work/_temp/_github_workflow/event.json',
        'GITHUB_REPOSITORY_OWNER': 'DOI-USGS',
        'JAVA_HOME_11_X64': '/usr/lib/jvm/temurin-11-jdk-amd64',
        'PIPX_BIN_DIR': '/opt/pipx_bin',
        'STATS_RDCL': 'true',
        'ANDROID_NDK_LATEST_HOME': '/usr/local/lib/android/sdk/ndk/27.0.12077973',
        'GRADLE_HOME': '/usr/share/gradle-8.9',
        'GITHUB_RETENTION_DAYS': '80',
        'JAVA_HOME_21_X64': '/usr/lib/jvm/temurin-21-jdk-amd64',
        'POWERSHELL_DISTRIBUTION_CHANNEL': 'GitHub-Actions-ubuntu22',
        'CPL_ZIP_ENCODING': 'UTF-8',
        'GITHUB_HEAD_REF': '',
        'GITHUB_REPOSITORY_OWNER_ID': '65027635',
        'AZURE_EXTENSION_DIR': '/opt/az/azcliextensions',
        'MAKEFLAGS': 'w',
        'SYSTEMD_EXEC_PID': '591',
        'GITHUB_GRAPHQL_URL': 'https://api.github.com/graphql',
        'NVM_DIR': '/home/runner/.nvm',
        'GOROOT_1_20_X64': '/opt/hostedtoolcache/go/1.20.14/x64',
        'DOTNET_SKIP_FIRST_TIME_EXPERIENCE': '1',
        'JAVA_HOME_17_X64': '/usr/lib/jvm/temurin-17-jdk-amd64',
        'GOROOT_1_21_X64': '/opt/hostedtoolcache/go/1.21.12/x64',
        'ImageVersion': '20240804.1.0',
        'RUNNER_OS': 'Linux',
        'GITHUB_API_URL': 'https://api.github.com',
        'SWIFT_PATH': '/usr/share/swift/usr/bin',
        'GOROOT_1_22_X64': '/opt/hostedtoolcache/go/1.22.5/x64',
        'RUNNER_USER': 'runner',
        'CHROMEWEBDRIVER': '/usr/local/share/chromedriver-linux64',
        '_': '/usr/bin/make',
        'JOURNAL_STREAM': '8:20513',
        'GITHUB_WORKFLOW': 'Publish Docs',
        'STATS_V3PS': 'true',
        'CONDARC': '/home/runner/work/_temp/setup-micromamba/.condarc',
        'STATS_D': 'false',
        'GITHUB_RUN_ID': '10290788848',
        'ACTIONS_RUNNER_ACTION_ARCHIVE_CACHE': '/opt/actionarchivecache',
        'STATS_VMFE': 'true',
        'GITHUB_WORKFLOW_SHA': 'a476e0dee9494b2799a44e2b34a694c1d4b6fc9f',
        'MODFLOW_BIN_PATH': '/home/runner/.local/bin',
        'BOOTSTRAP_HASKELL_NONINTERACTIVE': '1',
        'GITHUB_REF_TYPE': 'branch',
        'ImageOS': 'ubuntu22',
        'GITHUB_BASE_REF': '',
        'GITHUB_ACTION_REPOSITORY': '',
        'PERFLOG_LOCATION_SETTING': 'RUNNER_PERFLOG',
        'GITHUB_WORKFLOW_REF': 'DOI-USGS/python-for-hydrology/.github/workflows/build_docs.yaml@refs/heads/main',
        'PATH': '/home/runner/micromamba/envs/pyclass-docs/bin:/home/runner/micromamba/condabin:/home/runner/.local/bin:/home/runner/work/_temp/setup-micromamba:/home/runner/micromamba-bin:/snap/bin:/home/runner/.local/bin:/opt/pipx_bin:/home/runner/.cargo/bin:/home/runner/.config/composer/vendor/bin:/usr/local/.ghcup/bin:/home/runner/.dotnet/tools:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/runner/.dotnet/tools',
        'RUNNER_TRACKING_ID': 'github_3426e3a0-20a6-4f55-9579-5f26dfb9ca8d',
        'DOTNET_MULTILEVEL_LOOKUP': '0',
        'INVOCATION_ID': 'd06df2c0cd08434eb993ade434351fa7',
        'PROJ_NETWORK': 'ON',
        'RUNNER_TOOL_CACHE': '/opt/hostedtoolcache',
        'ANT_HOME': '/usr/share/ant',
        'GITHUB_TRIGGERING_ACTOR': 'mnfienen',
        'GITHUB_RUN_NUMBER': '35',
        'RUNNER_ARCH': 'X64',
        'XDG_RUNTIME_DIR': '/run/user/1001',
        'AGENT_TOOLSDIRECTORY': '/opt/hostedtoolcache',
        'GITHUB_ACTION': '__run_7',
        'PROJ_DATA': '/home/runner/micromamba/envs/pyclass-docs/share/proj',
        'MAKELEVEL': '1',
        'MAMBA_ROOT_PREFIX': '/home/runner/micromamba',
        'LANG': 'C.UTF-8',
        'VCPKG_INSTALLATION_ROOT': '/usr/local/share/vcpkg',
        'RUNNER_NAME': 'GitHub Actions 381',
        'GITHUB_REF_NAME': 'main',
        'STATS_D_D': 'false',
        'XDG_CONFIG_HOME': '/home/runner/.config',
        'CONDA': '/usr/share/miniconda',
        'STATS_VMD': 'true',
        'GITHUB_REPOSITORY': 'DOI-USGS/python-for-hydrology',
        'XML_CATALOG_FILES': 'file:///home/runner/micromamba/envs/pyclass-docs/etc/xml/catalog file:///etc/xml/catalog',
        'STATS_UE': 'true',
        'GITHUB_ACTION_REF': '',
        'ANDROID_NDK_ROOT': '/usr/local/lib/android/sdk/ndk/27.0.12077973',
        'DEBIAN_FRONTEND': 'noninteractive',
        'GSETTINGS_SCHEMA_DIR': '/home/runner/micromamba/envs/pyclass-docs/share/glib-2.0/schemas',
        'GITHUB_REPOSITORY_ID': '537542085',
        'GITHUB_ACTIONS': 'true',
        'GDAL_DRIVER_PATH': '/home/runner/micromamba/envs/pyclass-docs/lib/gdalplugins',
        'GITHUB_REF_PROTECTED': 'false',
        'ACCEPT_EULA': 'Y',
        'RUNNER_PERFLOG': '/home/runner/perflog',
        'GITHUB_JOB': 'docs',
        'CONDA_DEFAULT_ENV': 'pyclass-docs',
        'GITHUB_WORKSPACE': '/home/runner/work/python-for-hydrology/python-for-hydrology',
        'GITHUB_SHA': 'a476e0dee9494b2799a44e2b34a694c1d4b6fc9f',
        'GITHUB_RUN_ATTEMPT': '1',
        'GITHUB_REF': 'refs/heads/main',
        'ANDROID_SDK_ROOT': '/usr/local/lib/android/sdk',
        'MAMBA_EXE': '/home/runner/micromamba-bin/micromamba',
        'GITHUB_ACTOR': 'mnfienen',
        'LEIN_HOME': '/usr/local/lib/lein',
        'JAVA_HOME': '/usr/lib/jvm/temurin-11-jdk-amd64',
        'PWD': '/home/runner/work/python-for-hydrology/python-for-hydrology/docs',
        'RUNNER_WORKSPACE': '/home/runner/work/python-for-hydrology',
        'GITHUB_ACTOR_ID': '1110827',
        'GITHUB_PATH': '/home/runner/work/_temp/_runner_file_commands/add_path_88f62326-4df8-488e-9a56-f6f46b84f903',
        'GHCUP_INSTALL_BASE_PREFIX': '/usr/local',
        'GITHUB_EVENT_NAME': 'push',
        'XDG_DATA_DIRS': '/usr/local/share:/usr/share:/var/lib/snapd/desktop',
        'GITHUB_SERVER_URL': 'https://github.com',
        'STATS_TIS': 'mining',
        'ANDROID_HOME': '/usr/local/lib/android/sdk',
        'LEIN_JAR': '/usr/local/lib/lein/self-installs/leiningen-2.11.2-standalone.jar',
        'GECKOWEBDRIVER': '/usr/local/share/gecko_driver',
        'NVM_CD_FLAGS': '',
        'HOMEBREW_CLEANUP_PERIODIC_FULL_DAYS': '3650',
        'GITHUB_OUTPUT': '/home/runner/work/_temp/_runner_file_commands/set_output_88f62326-4df8-488e-9a56-f6f46b84f903',
        'HOMEBREW_NO_AUTO_UPDATE': '1',
        'EDGEWEBDRIVER': '/usr/local/share/edge_driver',
        'STATS_EXT': 'true',
        'SGX_AESM_ADDR': '1',
        'CHROME_BIN': '/usr/bin/google-chrome',
        'MFLAGS': '-w',
        'CONDA_PREFIX': '/home/runner/micromamba/envs/pyclass-docs',
        'ANDROID_NDK': '/usr/local/lib/android/sdk/ndk/27.0.12077973',
        'GSETTINGS_SCHEMA_DIR_CONDA_BACKUP': '',
        'SELENIUM_JAR_PATH': '/usr/share/java/selenium-server.jar',
        'STATS_EXTP': 'https://provjobdsettingscdn.blob.core.windows.net/settings/provjobdsettings-0.5.181+6/provjobd.data',
        'ANDROID_NDK_HOME': '/usr/local/lib/android/sdk/ndk/27.0.12077973',
        'GDAL_DATA': '/home/runner/micromamba/envs/pyclass-docs/share/gdal',
        'GITHUB_STEP_SUMMARY': '/home/runner/work/_temp/_runner_file_commands/step_summary_88f62326-4df8-488e-9a56-f6f46b84f903',
        'NVM_RC_VERSION': '',
        'DOCUTILSCONFIG': '/home/runner/work/python-for-hydrology/python-for-hydrology/docs/source/docutils.conf',
        'JPY_PARENT_PID': '4227',
        'PYDEVD_USE_FRAME_EVAL': 'NO',
        'TERM': 'xterm-color',
        'CLICOLOR': '1',
        'FORCE_COLOR': '1',
        'CLICOLOR_FORCE': '1',
        'PAGER': 'cat',
        'GIT_PAGER': 'cat',
        'MPLBACKEND': 'module://matplotlib_inline.backend_inline'}

Example: get the location of the current python (Conda) environment

[52]:
os.environ['CONDA_PREFIX']
[52]:
'/home/runner/micromamba/envs/pyclass-docs'

subprocess — Subprocess management

The subprocess module offers a way to execute system commands, for example MODFLOW, or any operating system command that you can type at the command line.

The recommended approach to invoking subprocesses is to use the run() function for all use cases it can handle. For more advanced use cases, the underlying Popen interface can be used directly.

Take a look at the following help descriptions for run.

Note, that on Windows, you may have to specify “shell=True” in order to access system commands.

[53]:
help(subprocess.run)
Help on function run in module subprocess:

run(*popenargs, input=None, capture_output=False, timeout=None, check=False, **kwargs)
    Run command with arguments and return a CompletedProcess instance.

    The returned instance will have attributes args, returncode, stdout and
    stderr. By default, stdout and stderr are not captured, and those attributes
    will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them,
    or pass capture_output=True to capture both.

    If check is True and the exit code was non-zero, it raises a
    CalledProcessError. The CalledProcessError object will have the return code
    in the returncode attribute, and output & stderr attributes if those streams
    were captured.

    If timeout is given, and the process takes too long, a TimeoutExpired
    exception will be raised.

    There is an optional argument "input", allowing you to
    pass bytes or a string to the subprocess's stdin.  If you use this argument
    you may not also use the Popen constructor's "stdin" argument, as
    it will be used internally.

    By default, all communication is in bytes, and therefore any "input" should
    be bytes, and the stdout and stderr will be bytes. If in text mode, any
    "input" should be a string, and stdout and stderr will be strings decoded
    according to locale encoding, or by "encoding" if set. Text mode is
    triggered by setting any of text, encoding, errors or universal_newlines.

    The other arguments are the same as for the Popen constructor.

[54]:
# if on mac/unix
print(subprocess.run(['ls', '-l'], shell=True))
00_python_basics_review.ipynb
03_useful-std-library-modules.ipynb
05_numpy.ipynb
06b_matplotlib_animation.ipynb
07b_VSCode.md
09_a_Geopandas.ipynb
09_b_Geopandas_ABQ.ipynb
10_Rasterio.ipynb
11_xarray_mt_rainier_precip.ipynb
another_subfolder
data
solutions
CompletedProcess(args=['ls', '-l'], returncode=0)

With the cwd argument, we can control the working directory for the command. Here we list the files in the parent directory.

[55]:
print(subprocess.run(['ls', '-l'], shell=True, cwd='..'))
part0_python_intro
part1_flopy
CompletedProcess(args=['ls', '-l'], returncode=0)
[56]:
# if on windows
print(subprocess.run(['dir'], shell=True))
00_python_basics_review.ipynb        09_b_Geopandas_ABQ.ipynb
03_useful-std-library-modules.ipynb  10_Rasterio.ipynb
05_numpy.ipynb                       11_xarray_mt_rainier_precip.ipynb
06b_matplotlib_animation.ipynb       another_subfolder
07b_VSCode.md                        data
09_a_Geopandas.ipynb                 solutions
CompletedProcess(args=['dir'], returncode=0)

zipfile — Work with ZIP archives

zip up one of the files in data/

[57]:
with zipfile.ZipFile('junk.zip', 'w') as dest:
    dest.write('data/xarray/daymet_prcp_rainier_1980-2018.nc')

now extract it

[58]:
with zipfile.ZipFile('junk.zip') as src:
    src.extract('data/xarray/daymet_prcp_rainier_1980-2018.nc', path='extracted_data')

Testing Your Skills with a truly awful example:

the problem:

Pretend that the file data/fileio/netcdf_data.zip contains some climate data (in the NetCDF format with the *.nc extension) that we downloaded. If you open data/fileio/netcdf_data.zip, you’ll see that within a subfolder zipped are a bunch of additional subfolders, each for a different year. Within each subfolder is another zipfile. Within each of these zipfiles is yet another subfolder, inside of which is the actual data file we want (prcp.nc).

[59]:
with zipfile.ZipFile('data/netcdf_data.zip') as src:
    for f in src.namelist()[:10]:
        print(f)
netcdf_data/
netcdf_data/zipped/
netcdf_data/zipped/zipped_1991/
netcdf_data/zipped/zipped_1991/12270_1991.zip
netcdf_data/zipped/zipped_1996/
netcdf_data/zipped/zipped_1996/12270_1996.zip
netcdf_data/zipped/zipped_1998/
netcdf_data/zipped/zipped_1998/12270_1998.zip
netcdf_data/zipped/zipped_1999/
netcdf_data/zipped/zipped_1999/12270_1999.zip

the goal:

To extract all of these prcp.nc files into a single folder, after renaming them with their respective years (obtained from their enclosing folders or zip files). e.g.

prcp_1980.nc
prcp_1981.nc
...

This will allow us to open them together as a dataset in xarray (more on that later). Does this sound awful? I’m not making this up. This is the kind of structure you get when downloading tiles of climate data with the Daymet Tile Selection Tool

hint:

you might find these functions helpful:

ZipFile.extractall
ZipFile.extract
Path.glob
Path.mkdir
Path.stem
Path.parent
Path.name
shutil.move
Path.rmdir()

hint: start by using ZipFile.extractall() to extract all of the individual zip files from the main zip archive

This extracts the entire contents of the zip file to a designated folder

[60]:
output_folder = Path('03-output')
output_folder.mkdir(exist_ok=True)

with zipfile.ZipFile('data/netcdf_data.zip') as src:
    src.extractall(output_folder)

Make a list of the zipfiles

[61]:
zipfiles = list(output_folder.glob('netcdf_data/zipped/*/*.zip'))
zipfiles[:5]
[61]:
[PosixPath('03-output/netcdf_data/zipped/zipped_2017/12270_2017.zip'),
 PosixPath('03-output/netcdf_data/zipped/zipped_2015/12270_2015.zip'),
 PosixPath('03-output/netcdf_data/zipped/zipped_2000/12270_2000.zip'),
 PosixPath('03-output/netcdf_data/zipped/zipped_2006/12270_2006.zip'),
 PosixPath('03-output/netcdf_data/zipped/zipped_2013/12270_2013.zip')]
[62]:
f = zipfiles[0]
f
[62]:
PosixPath('03-output/netcdf_data/zipped/zipped_2017/12270_2017.zip')

1a) Use ZipFile.namelist() (as above) list the contents

This will yield the name of the *.nc file that we need to extract

[ ]:

1b) Use ZipFile.extract() to extract the *.nc file to the destination folder

(you may need to create the destination folder first)

[ ]:

1c) Move the extracted file out of any enclosing subfolders, and rename to prcp_<year>.nc

(so that if we repeat this for subsequent files, the extracted *.nc files will end up in the same place)

[ ]:

1d) Remove the extra subfolders that were extracted

[ ]:

[ ]:

Bonus Application – Using os to find the location of an executable

There are often times that you run an executable that is nested somewhere deep within your system path. It can often be a good idea to know exactly where that executable is located. This might help you one day from accidently using an older version of an executable, such as MODFLOW.

[63]:
# Define two functions to help determine 'which' program you are using
def is_exe(fpath):
    """
    Return True if fpath is an executable, otherwise return False
    """
    return os.path.isfile(fpath) and os.access(fpath, os.X_OK)

def which(program):
    """
    Locate the program and return its full path.  Return
    None if the program cannot be located.
    """
    fpath, fname = os.path.split(program)
    if fpath:
        if is_exe(program):
            return program
    else:
        # test for exe in current working directory
        if is_exe(program):
            return program
        # test for exe in path statement
        for path in os.environ["PATH"].split(os.pathsep):
            path = path.strip('"')
            exe_file = os.path.join(path, program)
            if is_exe(exe_file):
                return exe_file
    return None
[64]:
which('mf6')
[64]:
'/home/runner/.local/bin/mf6'
[ ]: