03: Useful standard library modules¶
(pathlib, shutil, sys, os, subprocess, zipfile, etc.)
These packages are part of the standard python library and provide very useful functionality for working with your operating system and files. This notebook will provide explore these packages and demonstrate some of their functionality. Online documentation is at https://docs.python.org/3/library/.
Topics covered:¶
pathlib:
listing files
creating, moving and deleting files
absolute vs relative paths
useful path object attributes
shutil:
copying, moving and deleting files AND folders
sys:
python and platform information
command line arguments
modifying the python path to import code from other locations
os:
changing the working directory
recursive iteration through folder structures
accessing environmental variables
subprocess:
running system commands and checking the results
zipfile:
creating and extracting from zip archives
[1]:
import os
from pathlib import Path
import shutil
import subprocess
import sys
import zipfile
pathlib
— Object-oriented filesystem paths¶
Pathlib provides convenient “pathlike” objects for working with file paths across platforms (meaning paths or operations done with pathlib work the same on Windows or POSIX systems (Linux, OSX, etc)). The main entry point for users is the Path()
class.
Make a Path()
object for the current folder¶
[2]:
cwd = Path('.')
cwd
[2]:
PosixPath('.')
[3]:
for f in cwd.iterdir():
print(f)
09_b_Geopandas_ABQ.ipynb
10_Rasterio.ipynb
data
06b_matplotlib_animation.ipynb
solutions
11_xarray_mt_rainier_precip.ipynb
09_a_Geopandas.ipynb
05_numpy.ipynb
07b_VSCode.md
00_python_basics_review.ipynb
03_useful-std-library-modules.ipynb
List just the notebooks using the .glob()
method¶
[4]:
for nb in cwd.glob('*.ipynb'):
print(nb)
09_b_Geopandas_ABQ.ipynb
10_Rasterio.ipynb
06b_matplotlib_animation.ipynb
11_xarray_mt_rainier_precip.ipynb
09_a_Geopandas.ipynb
05_numpy.ipynb
00_python_basics_review.ipynb
03_useful-std-library-modules.ipynb
Note: .glob()
works across folders too¶
List all notebooks for both class components
[5]:
for nb in cwd.glob('../*/*.ipynb'):
print(nb)
../part0_python_intro/09_b_Geopandas_ABQ.ipynb
../part0_python_intro/10_Rasterio.ipynb
../part0_python_intro/06b_matplotlib_animation.ipynb
../part0_python_intro/11_xarray_mt_rainier_precip.ipynb
../part0_python_intro/09_a_Geopandas.ipynb
../part0_python_intro/05_numpy.ipynb
../part0_python_intro/00_python_basics_review.ipynb
../part0_python_intro/03_useful-std-library-modules.ipynb
../part1_flopy/08_Modflow-setup-demo.ipynb
../part1_flopy/09-gwt-voronoi-demo.ipynb
../part1_flopy/01-Flopy-intro.ipynb
../part1_flopy/10_modpath_particle_tracking-demo.ipynb
../part1_flopy/05-unstructured-grids.ipynb
But glob
results aren’t sorted alphabetically!¶
(and the sorting is platform-dependent)
we can easily sort them by casting the results to a list
[6]:
sorted(list(cwd.glob('../*/*.ipynb')))
[6]:
[PosixPath('../part0_python_intro/00_python_basics_review.ipynb'),
PosixPath('../part0_python_intro/03_useful-std-library-modules.ipynb'),
PosixPath('../part0_python_intro/05_numpy.ipynb'),
PosixPath('../part0_python_intro/06b_matplotlib_animation.ipynb'),
PosixPath('../part0_python_intro/09_a_Geopandas.ipynb'),
PosixPath('../part0_python_intro/09_b_Geopandas_ABQ.ipynb'),
PosixPath('../part0_python_intro/10_Rasterio.ipynb'),
PosixPath('../part0_python_intro/11_xarray_mt_rainier_precip.ipynb'),
PosixPath('../part1_flopy/01-Flopy-intro.ipynb'),
PosixPath('../part1_flopy/05-unstructured-grids.ipynb'),
PosixPath('../part1_flopy/08_Modflow-setup-demo.ipynb'),
PosixPath('../part1_flopy/09-gwt-voronoi-demo.ipynb'),
PosixPath('../part1_flopy/10_modpath_particle_tracking-demo.ipynb')]
Note: There is also a glob module in the standard python library that works directly with string paths
[7]:
import glob
sorted(list(glob.glob('../*/*.ipynb')))
[7]:
['../part0_python_intro/00_python_basics_review.ipynb',
'../part0_python_intro/03_useful-std-library-modules.ipynb',
'../part0_python_intro/05_numpy.ipynb',
'../part0_python_intro/06b_matplotlib_animation.ipynb',
'../part0_python_intro/09_a_Geopandas.ipynb',
'../part0_python_intro/09_b_Geopandas_ABQ.ipynb',
'../part0_python_intro/10_Rasterio.ipynb',
'../part0_python_intro/11_xarray_mt_rainier_precip.ipynb',
'../part1_flopy/01-Flopy-intro.ipynb',
'../part1_flopy/05-unstructured-grids.ipynb',
'../part1_flopy/08_Modflow-setup-demo.ipynb',
'../part1_flopy/09-gwt-voronoi-demo.ipynb',
'../part1_flopy/10_modpath_particle_tracking-demo.ipynb']
List just the subfolders¶
[8]:
[f for f in cwd.iterdir() if f.is_dir()]
[8]:
[PosixPath('data'), PosixPath('solutions')]
Create a new path for the data subfolder¶
[9]:
data_path = cwd / 'data'
data_path
[9]:
PosixPath('data')
or an individual file¶
[10]:
f = cwd / '00_python_basics_review.ipynb'
f
[10]:
PosixPath('00_python_basics_review.ipynb')
check if it exists, or if it’s a directory¶
[11]:
f.exists(), f.is_dir()
[11]:
(True, False)
make a new subdirectory¶
[12]:
new_folder = cwd / 'more_files'
new_folder
[12]:
PosixPath('more_files')
[13]:
new_folder.exists()
[13]:
False
[14]:
new_folder.mkdir(); new_folder.exists()
[14]:
True
Note that if you try to run the above cell twice, you’ll get an error that the folder already exists exist_ok=True
suppresses these errors.
[15]:
new_folder.mkdir(exist_ok=True)
make a new subfolder within a new subfolder¶
The parents=True
argument allows for making subfolders within new subfolders
[16]:
(new_folder / 'subfolder').mkdir(exist_ok=True, parents=True)
Get the absolute location of the current working directory
[17]:
abs_cwd = Path.cwd()
abs_cwd
[17]:
PosixPath('/home/runner/work/python-for-hydrology/python-for-hydrology/docs/source/notebooks/part0_python_intro')
Go up two levels to the course repository
[18]:
class_root = (abs_cwd / '../../')
class_root
[18]:
PosixPath('/home/runner/work/python-for-hydrology/python-for-hydrology/docs/source/notebooks/part0_python_intro/../..')
Simplify or resolve the path
[19]:
class_root = class_root.resolve()
class_root
[19]:
PosixPath('/home/runner/work/python-for-hydrology/python-for-hydrology/docs/source')
Get the cwd relative to the course repository
[20]:
abs_cwd.relative_to(class_root)
[20]:
PosixPath('notebooks/part0_python_intro')
check if this is an absolute or relative path
[21]:
abs_cwd.relative_to(class_root).is_absolute()
[21]:
False
[22]:
abs_cwd.is_absolute()
[22]:
True
gottcha: Path.relative_to()
only works when the first path is a subpath of the second path, or if both paths are absolute
For example, try executing this line:
Path('../part1_flopy/').relative_to('data')
If you need a relative path that will work robustly in a script, os.path.relpath
might be a better choice
[23]:
os.path.relpath('../part1_flopy/', 'data')
[23]:
'../../part1_flopy'
[24]:
os.path.relpath('data', '../part1_flopy/')
[24]:
'../part0_python_intro/data'
[25]:
abs_cwd.parent
[25]:
PosixPath('/home/runner/work/python-for-hydrology/python-for-hydrology/docs/source/notebooks')
[26]:
abs_cwd.parent.parent
[26]:
PosixPath('/home/runner/work/python-for-hydrology/python-for-hydrology/docs/source')
[27]:
f.name
[27]:
'00_python_basics_review.ipynb'
[28]:
f.suffix
[28]:
'.ipynb'
[29]:
f.with_suffix('.junk')
[29]:
PosixPath('00_python_basics_review.junk')
[30]:
f.stem
[30]:
'00_python_basics_review'
Make a file
[31]:
fname = Path('new_file.txt')
with open(fname, 'w') as dest:
dest.write("A new text file.")
[32]:
fname.exists()
[32]:
True
Move the file
[33]:
fname2 = Path('new_file2.txt')
fname.rename(fname2)
[33]:
PosixPath('new_file2.txt')
[34]:
fname.exists()
[34]:
False
Delete the file
[35]:
fname2.unlink()
[36]:
fname2.exists()
[36]:
False
Delete the empty folder we made above¶
Note: this only works for empty directories (use shutil.rmtree()
very carefully for removing folders and all contents within)
[37]:
Path('more_files/subfolder/').rmdir()
shutil
— High-level file operations¶
module for copying, moving, and deleting files and directories.
https://docs.python.org/3/library/shutil.html
The functions from shutil that you may find useful are:
shutil.copy()
shutil.copy2() # this preserves most metadata (i.e. dates); unlike copy()
shutil.copytree()
shutil.move()
shutil.rmtree() #obviously, you need to be careful with this one!
Give these guys a shot and see what they do. Remember, you can always get help by typing:
help(shutil.copy)
[38]:
#try them here. Be careful!
[39]:
shutil.rmtree(new_folder)
sys
— System-specific parameters and functions¶
Getting information about python and the os¶
where python is installed
[40]:
print(sys.prefix)
/home/runner/micromamba/envs/pyclass-docs
[41]:
print(sys.version_info)
sys.version_info(major=3, minor=11, micro=10, releaselevel='final', serial=0)
[42]:
sys.platform
[42]:
'linux'
Adding command line arguments to a script¶
Here the command line arguments reflect that we’re running a Juptyer Notebook.
In a python script, command line arguments are listed after the first item in the list.
[43]:
sys.argv
[43]:
['/home/runner/micromamba/envs/pyclass-docs/lib/python3.11/site-packages/ipykernel_launcher.py',
'-f',
'/tmp/tmpc1aujvul.json',
'--HistoryManager.hist_file=:memory:']
Exercise: Make a script with a command line argument using sys.argv¶
Using a text editor such as VSCode, make a new
*.py
file with the following contents:
import sys
if len(sys.argv) > 1:
for argument in sys.argv[1:]:
print(argument)
else:
print("usage is: python <script name>.py argument")
quit()
Try running the script at the command line
modifying the python path¶
If you haven’t seen sys.path
already mentioned in a python script, you will soon. sys.path
is a list of directories. This path list is used by python to search for python modules and packages. If for some reason, you want to use a python package or module that is not installed in the main python folder, you can add the directory containing your module to sys.path.
Any packages installed by linking the source code in place (i.e. pip install -e .
will also show up here.
[44]:
for pth in sys.path:
print(pth)
/home/runner/micromamba/envs/pyclass-docs/lib/python311.zip
/home/runner/micromamba/envs/pyclass-docs/lib/python3.11
/home/runner/micromamba/envs/pyclass-docs/lib/python3.11/lib-dynload
/home/runner/micromamba/envs/pyclass-docs/lib/python3.11/site-packages
Using sys.path
to import code from an arbitrary location¶
Using a text editor such as VSCode (or
pathlib
and python) make a new*.py
file in another folder (anything in the same folder as this notebook can already be imported). For example:
[45]:
subfolder = Path('another_subfolder/scripts')
subfolder.mkdir(exist_ok=True, parents=True)
with open(subfolder / 'mycode.py', 'w') as dest:
dest.write("stuff = {'this is': 'a dictionary'}")
Now add this folder to the python path
[46]:
sys.path.append('another_subfolder/scripts')
Code can be imported by calling the containing module
[47]:
from mycode import stuff
stuff
[47]:
{'this is': 'a dictionary'}
Note: Generally, importing code using sys.path
is considered bad practice, because
it can hide dependencies.
from the information above, we don’t know whether
mycode
is a package that is installed, a module in the current folder, or anywhere else for that matter.Similarly, we know that any modules from
'another_subfolder/scripts'
can be imported, but we don’t know which modules in that folder are needed without some additional checking.
importing code using
sys.path
is also sensitive to the location of the script relative to the path. If the script is moved or used on someone else’s computer with a different file structure, it’ll break.
In general, installing reusable code in a package is the best way to go. Packages provide a framework for organizing, documenting, testing and sharing code in a way that is easily understood by others.
Whatever you do, avoid importing with an *
(i.e. from mycode import *
) at all costs. This imports everything from the namespace of a module, which can lead to unintended consequences.
os
— Miscellaneous operating system interfaces¶¶
Historically, the os.path
module was the de facto standard for file and path manipulation. Since python 3.4 however, pathlib
is generally cleaner and easier to use for most of these operations. But there are some exceptions.
Changing the current working directory¶
pathlib
doesn’t do this.[48]:
# Example of changing the working directory
old_wd = os.getcwd()
# Go up one directory
os.chdir('..')
cwd = os.getcwd()
print ('Now in: ', cwd)
# Change back to original
os.chdir(old_wd)
cwd = os.getcwd()
print('Switched back to: ', cwd)
Now in: /home/runner/work/python-for-hydrology/python-for-hydrology/docs/source/notebooks
Switched back to: /home/runner/work/python-for-hydrology/python-for-hydrology/docs/source/notebooks/part0_python_intro
os.walk¶
os.walk() is a great way to recursively generate all the file names and folders in a directory. The following shows how it can be used to identify large directories.
[49]:
pth = Path('..')
results = list(os.walk(pth))
results
[49]:
[('..', ['part0_python_intro', 'part1_flopy'], []),
('../part0_python_intro',
['data', 'solutions', 'another_subfolder'],
['09_b_Geopandas_ABQ.ipynb',
'10_Rasterio.ipynb',
'06b_matplotlib_animation.ipynb',
'11_xarray_mt_rainier_precip.ipynb',
'09_a_Geopandas.ipynb',
'05_numpy.ipynb',
'07b_VSCode.md',
'00_python_basics_review.ipynb',
'03_useful-std-library-modules.ipynb']),
('../part0_python_intro/data',
['fileio', 'geopandas', 'rasterio', 'pandas', 'numpy', 'xarray'],
['theis_charles_vernon.jpg', 'netcdf_data.zip', 'dream.txt']),
('../part0_python_intro/data/fileio',
[],
['FileWithComments.txt', 'friends.txt']),
('../part0_python_intro/data/geopandas',
['abq'],
['Street_Trees.geojson',
'Madison_Tree_Species_Lookup.xlsx',
'Madison_Parks.geojson',
'Neighborhood_Associations.geojson']),
('../part0_python_intro/data/geopandas/abq',
[],
['zoneatlaspagegrid.kmz', 'abq_films.geojson']),
('../part0_python_intro/data/rasterio',
[],
['20150818_rainier_summer-tile-30.tif',
'rgi60_glacierpoly_rainier.shp',
'rgi60_glacierpoly_rainier.shx',
'19700901_ned1_2003_adj_warp.tif',
'rgi60_glacierpoly_rainier.prj',
'rgi60_glacierpoly_rainier.dbf',
'20080901_rainierlidar_30m-adj.tif']),
('../part0_python_intro/data/pandas',
[],
['RussianRiverGWsites.csv',
'stock_russian.jpg',
'RR_gage_data.csv',
'santa_rosa_CIMIS_83.csv',
'site_info.csv',
'panda.jpg']),
('../part0_python_intro/data/numpy',
[],
['mt_st_helens_before.dat',
'bottom.txt',
'bottom_commented.dat',
'bottom.dat',
'mt_st_helens_after.dat',
'ahf.csv']),
('../part0_python_intro/data/xarray',
[],
['daymet_prcp_rainier_1980-2018.nc',
'aligned-19700901_ned1_2003_adj_4269.tif']),
('../part0_python_intro/solutions',
[],
['01_functions_script__solution.ipynb',
'08_pandas.ipynb',
'04_files_and_strings.ipynb',
'06_matplotlib__solution.ipynb',
'03_useful-std-library-modules-solutions.ipynb',
'07a_Theis-exercise-solution.ipynb',
'05_numpy__solutions.ipynb',
'02_Namespace_objects_modules_packages__solution.ipynb',
'09_Geopandas__solutions.ipynb']),
('../part0_python_intro/another_subfolder', ['scripts'], []),
('../part0_python_intro/another_subfolder/scripts',
['__pycache__'],
['mycode.py']),
('../part0_python_intro/another_subfolder/scripts/__pycache__',
[],
['mycode.cpython-311.pyc']),
('../part1_flopy',
['data', 'solutions', 'data_project'],
['08_Modflow-setup-demo.ipynb',
'09-gwt-voronoi-demo.ipynb',
'01-Flopy-intro.ipynb',
'10_modpath_particle_tracking-demo.ipynb',
'05-unstructured-grids.ipynb',
'basin.py']),
('../part1_flopy/data',
['quadtree',
'voronoi',
'modelgrid_intersection',
'depletion_results',
'pleasant-lake'],
['pleasant_lgr_parent.yml', 'flopylogo_sm.png', 'pleasant_lgr_inset.yml']),
('../part1_flopy/data/quadtree',
['grid'],
['mfsim.nam',
'project.disv',
'project.wel',
'project.npf',
'project.ims',
'project.lst',
'project.sfr.obs',
'project.oc',
'sfr_obs.csv',
'mfsim.lst',
'project.ic',
'project.rcha',
'project.cbc',
'project.chd',
'project.disv.grb',
'project.hds',
'project.nam',
'project.tdis',
'project.sfr']),
('../part1_flopy/data/quadtree/grid',
[],
['qtgrid_pt.dbf',
'qtg.iac.dat',
'qtg.fldr.dat',
'qtg.fahl.dat',
'_gridgen_build.dfn',
'qtg.area.dat',
'qtgrid.dbf',
'_gridgen_export.dfn',
'qtg.vtu',
'quadtreegrid.dfn',
'qtgrid_pt.shp',
'qtg.ja.dat',
'qtgrid.shx',
'qtg.gnc.dat',
'qtg.c2.dat',
'qtg.nod',
'qtg.ia.dat',
'qtg_sv.vtu',
'quadtreegrid.tsf',
'qtgrid_pt.shx',
'quadtreegrid.top1.dat',
'quadtreegrid.bot1.dat',
'qtg.nodesperlay.dat',
'qtgrid.shp',
'qtg.c1.dat']),
('../part1_flopy/data/voronoi',
['grid'],
['mfsim.nam',
'project.disv',
'project.wel',
'project.npf',
'project.ims',
'project.lst',
'project.sfr.obs',
'project.oc',
'sfr_obs.csv',
'mfsim.lst',
'project.ic',
'project.rcha',
'project.cbc',
'project.chd',
'project.disv.grb',
'project.hds',
'project.nam',
'project.tdis',
'project.sfr']),
('../part1_flopy/data/voronoi/grid',
[],
['_triangle.0.poly',
'_triangle.1.poly',
'_triangle.1.edge',
'_triangle.0.node',
'_triangle.1.ele',
'_triangle.1.neigh',
'_triangle.1.node']),
('../part1_flopy/data/modelgrid_intersection',
[],
['sagehen_nhd.shx',
'sagehen_gage_data.csv',
'sagehen_nhd.cpg',
'active_area.shp',
'sagehen_main_nhd.dbf',
'sagehen_main_nhd.shp',
'refined_area.shp',
'pet.tif',
'ksat.img',
'sagehen_nhd.dbf',
'sagehen_main_nhd.prj',
'sagehen_basin.dbf',
'prcp.tif',
'sagehen_nhd.prj',
'trib_cells.txt',
'sagehen_main_nhd.cpg',
'dem_30m.img',
'sagehen_nhd.shp',
'sagehen_basin.prj',
'sagehen_basin.shp',
'sagehen_basin.cpg',
'active_area.shx',
'refined_area.prj',
'sagehen_main_nhd.shx',
'refined_area.shx',
'active_area.prj',
'refined_area.dbf',
'pet.txt',
'sagehen_basin.shx',
'active_area.dbf']),
('../part1_flopy/data/depletion_results', [], ['depletion_results.csv']),
('../part1_flopy/data/pleasant-lake',
['external', 'source_data'],
['pleasant.dis',
'pleasant.obs',
'mfsim.nam',
'pleasant.hds',
'pleasant.chd.obs',
'pleasant.wel',
'pleasant.lak.obs',
'pleasant.ims',
'pleasant.chd',
'pleasant.lak',
'pleasant.oc',
'pleasant.tdis',
'pleasant.sfr.obs',
'pleasant.nam',
'pleasant.ic',
'pleasant.sfr',
'pleasant.rcha',
'pleasant.npf',
'pleasant.sto']),
('../part1_flopy/data/pleasant-lake/external',
[],
['chd_001.dat',
'strt_002.dat',
'chd_008.dat',
'k_002.dat',
'chd_002.dat',
'rch_008.dat',
'rch_001.dat',
'rch_007.dat',
'botm_000.dat',
'sy_003.dat',
'wel_010.dat',
'strt_003.dat',
'botm_001.dat',
'wel_004.dat',
'chd_010.dat',
'k_003.dat',
'botm_002.dat',
'wel_007.dat',
'wel_006.dat',
'wel_005.dat',
'chd_011.dat',
'top.dat',
'chd_006.dat',
'strt_001.dat',
'k_001.dat',
'chd_007.dat',
'chd_005.dat',
'rch_000.dat',
'pleasant_top.dat.original',
'rch_002.dat',
'wel_000.dat',
'chd_009.dat',
'wel_009.dat',
'rch_006.dat',
'chd_003.dat',
'sy_001.dat',
'sy_002.dat',
'chd_000.dat',
'k_000.dat',
'strt_000.dat',
'ss_002.dat',
'k33_000.dat',
'idomain_002.dat',
'pleasant_packagedata.dat',
'ss_001.dat',
'wel_001.dat',
'chd_012.dat',
'rch_005.dat',
'irch.dat',
'wel_008.dat',
'chd_004.dat',
'ss_000.dat',
'idomain_001.dat',
'rch_012.dat',
'rch_011.dat',
'sy_000.dat',
'ss_003.dat',
'k33_001.dat',
'idomain_000.dat',
'rch_009.dat',
'k33_002.dat',
'idomain_003.dat',
'rch_010.dat',
'botm_003.dat',
'k33_003.dat',
'rch_003.dat',
'rch_004.dat',
'600059060_stage_area_volume.dat']),
('../part1_flopy/data/pleasant-lake/source_data',
['tables', 'rasters', 'shps'],
['PRISM_ppt_tmean_stable_4km_189501_201901_43.9850_-89.5522.csv']),
('../part1_flopy/data/pleasant-lake/source_data/tables',
[],
['nwis_heads_info_file.csv',
'wgnhs_head_targets.csv',
'area_stage_vol_Pleasant.csv',
'uwsp_heads.csv',
'wdnr_gw_sites.csv',
'lake_sites.csv',
'gages.csv']),
('../part1_flopy/data/pleasant-lake/source_data/rasters',
[],
['botm2.tif',
'botm1.tif',
'dem40m.tif',
'botm0.tif',
'botm3.tif',
'pleasant_bathymetry.tif']),
('../part1_flopy/data/pleasant-lake/source_data/shps',
['NHDSnapshot', 'NHDPlusAttributes'],
['all_lakes.shp',
'all_lakes.shx',
'all_lakes.cpg',
'all_lakes.prj',
'all_lakes.dbf']),
('../part1_flopy/data/pleasant-lake/source_data/shps/NHDSnapshot',
['Hydrography'],
[]),
('../part1_flopy/data/pleasant-lake/source_data/shps/NHDSnapshot/Hydrography',
[],
['NHDFlowline.cpg',
'NHDFlowline.shp',
'NHDFlowline.dbf',
'NHDFlowline.shx',
'NHDFlowline.prj']),
('../part1_flopy/data/pleasant-lake/source_data/shps/NHDPlusAttributes',
[],
['PlusFlow.cpg',
'PlusFlowlineVAA.cpg',
'PlusFlowlineVAA.dbf',
'elevslope.dbf',
'elevslope.cpg',
'PlusFlow.dbf']),
('../part1_flopy/solutions',
[],
['07-stream_capture_voronoi.ipynb',
'06-Project-structured_completed.ipynb',
'02-Building-Post-Processing-MODFLOW6__solutions.ipynb',
'03_Loading_and_visualizing_models-solutions.ipynb',
'04_Modelgrid_and_intersection_solution.ipynb',
'06-Project-quadtree.ipynb',
'06-Project-voronoi.ipynb']),
('../part1_flopy/data_project',
[],
['active_area.shp',
'aquifer_top.asc',
'aquifer_k.asc',
'Flowline_river.shp',
'pumping_well_locations.shx',
'aquifer_bottom.asc',
'Flowline_river.dbf',
'pumping_well_locations.shp',
'inactive_area.shx',
'Flowline_river.prj',
'inactive_area.dbf',
'active_area.shx',
'pumping_well_locations.dbf',
'Flowline_river.shx',
'inactive_area.shp',
'active_area.dbf'])]
Make a more readable list of just the jupyter notebooks¶
Note: the key advantage of os.walk
over glob
is the recursion– individual subfolder levels don’t need to be known or specified a priori.
[50]:
for root, dirs, files in os.walk(pth):
for f in files:
filepath = Path(root, f)
if filepath.suffix == '.ipynb':
print(filepath)
../part0_python_intro/09_b_Geopandas_ABQ.ipynb
../part0_python_intro/10_Rasterio.ipynb
../part0_python_intro/06b_matplotlib_animation.ipynb
../part0_python_intro/11_xarray_mt_rainier_precip.ipynb
../part0_python_intro/09_a_Geopandas.ipynb
../part0_python_intro/05_numpy.ipynb
../part0_python_intro/00_python_basics_review.ipynb
../part0_python_intro/03_useful-std-library-modules.ipynb
../part0_python_intro/solutions/01_functions_script__solution.ipynb
../part0_python_intro/solutions/08_pandas.ipynb
../part0_python_intro/solutions/04_files_and_strings.ipynb
../part0_python_intro/solutions/06_matplotlib__solution.ipynb
../part0_python_intro/solutions/03_useful-std-library-modules-solutions.ipynb
../part0_python_intro/solutions/07a_Theis-exercise-solution.ipynb
../part0_python_intro/solutions/05_numpy__solutions.ipynb
../part0_python_intro/solutions/02_Namespace_objects_modules_packages__solution.ipynb
../part0_python_intro/solutions/09_Geopandas__solutions.ipynb
../part1_flopy/08_Modflow-setup-demo.ipynb
../part1_flopy/09-gwt-voronoi-demo.ipynb
../part1_flopy/01-Flopy-intro.ipynb
../part1_flopy/10_modpath_particle_tracking-demo.ipynb
../part1_flopy/05-unstructured-grids.ipynb
../part1_flopy/solutions/07-stream_capture_voronoi.ipynb
../part1_flopy/solutions/06-Project-structured_completed.ipynb
../part1_flopy/solutions/02-Building-Post-Processing-MODFLOW6__solutions.ipynb
../part1_flopy/solutions/03_Loading_and_visualizing_models-solutions.ipynb
../part1_flopy/solutions/04_Modelgrid_and_intersection_solution.ipynb
../part1_flopy/solutions/06-Project-quadtree.ipynb
../part1_flopy/solutions/06-Project-voronoi.ipynb
[51]:
os.environ
[51]:
environ{'GITHUB_STATE': '/home/runner/work/_temp/_runner_file_commands/save_state_65b2201f-142e-43c2-8643-9e5076048163',
'CONDA_PROMPT_MODIFIER': '(pyclass-docs) ',
'STATS_TRP': 'true',
'DOTNET_NOLOGO': '1',
'DEPLOYMENT_BASEPATH': '/opt/runner',
'USER': 'runner',
'CI': 'true',
'GITHUB_ENV': '/home/runner/work/_temp/_runner_file_commands/set_env_65b2201f-142e-43c2-8643-9e5076048163',
'PIPX_HOME': '/opt/pipx',
'RUNNER_ENVIRONMENT': 'github-hosted',
'JAVA_HOME_8_X64': '/usr/lib/jvm/temurin-8-jdk-amd64',
'SHLVL': '1',
'CONDA_SHLVL': '1',
'HOME': '/home/runner',
'RUNNER_TEMP': '/home/runner/work/_temp',
'GITHUB_EVENT_PATH': '/home/runner/work/_temp/_github_workflow/event.json',
'GITHUB_REPOSITORY_OWNER': 'DOI-USGS',
'JAVA_HOME_11_X64': '/usr/lib/jvm/temurin-11-jdk-amd64',
'PIPX_BIN_DIR': '/opt/pipx_bin',
'STATS_RDCL': 'true',
'ANDROID_NDK_LATEST_HOME': '/usr/local/lib/android/sdk/ndk/27.2.12479018',
'GRADLE_HOME': '/usr/share/gradle-8.11',
'GITHUB_RETENTION_DAYS': '80',
'JAVA_HOME_21_X64': '/usr/lib/jvm/temurin-21-jdk-amd64',
'POWERSHELL_DISTRIBUTION_CHANNEL': 'GitHub-Actions-ubuntu22',
'CPL_ZIP_ENCODING': 'UTF-8',
'GITHUB_HEAD_REF': '',
'GITHUB_REPOSITORY_OWNER_ID': '65027635',
'AZURE_EXTENSION_DIR': '/opt/az/azcliextensions',
'MAKEFLAGS': 'w',
'SYSTEMD_EXEC_PID': '598',
'GITHUB_GRAPHQL_URL': 'https://api.github.com/graphql',
'NVM_DIR': '/home/runner/.nvm',
'DOTNET_SKIP_FIRST_TIME_EXPERIENCE': '1',
'JAVA_HOME_17_X64': '/usr/lib/jvm/temurin-17-jdk-amd64',
'GOROOT_1_21_X64': '/opt/hostedtoolcache/go/1.21.13/x64',
'ImageVersion': '20241117.1.0',
'RUNNER_OS': 'Linux',
'GITHUB_API_URL': 'https://api.github.com',
'SWIFT_PATH': '/usr/share/swift/usr/bin',
'GOROOT_1_22_X64': '/opt/hostedtoolcache/go/1.22.9/x64',
'RUNNER_USER': 'runner',
'GOROOT_1_23_X64': '/opt/hostedtoolcache/go/1.23.3/x64',
'CHROMEWEBDRIVER': '/usr/local/share/chromedriver-linux64',
'_': '/usr/bin/make',
'JOURNAL_STREAM': '8:19163',
'GITHUB_WORKFLOW': 'Publish Docs',
'STATS_V3PS': 'true',
'CONDARC': '/home/runner/work/_temp/setup-micromamba/.condarc',
'STATS_D': 'false',
'GITHUB_RUN_ID': '11939111459',
'ACTIONS_RUNNER_ACTION_ARCHIVE_CACHE': '/opt/actionarchivecache',
'STATS_VMFE': 'true',
'GITHUB_WORKFLOW_SHA': '5560ca304f3ef0601d455df4b00007c4d49df1f2',
'MODFLOW_BIN_PATH': '/home/runner/.local/bin',
'BOOTSTRAP_HASKELL_NONINTERACTIVE': '1',
'GITHUB_REF_TYPE': 'branch',
'ImageOS': 'ubuntu22',
'GITHUB_BASE_REF': '',
'STATS_BLT': 'true',
'GITHUB_ACTION_REPOSITORY': '',
'PERFLOG_LOCATION_SETTING': 'RUNNER_PERFLOG',
'GITHUB_WORKFLOW_REF': 'DOI-USGS/python-for-hydrology/.github/workflows/build_docs.yaml@refs/heads/main',
'PATH': '/home/runner/micromamba/envs/pyclass-docs/bin:/home/runner/micromamba/condabin:/home/runner/.local/bin:/home/runner/work/_temp/setup-micromamba:/home/runner/micromamba-bin:/snap/bin:/home/runner/.local/bin:/opt/pipx_bin:/home/runner/.cargo/bin:/home/runner/.config/composer/vendor/bin:/usr/local/.ghcup/bin:/home/runner/.dotnet/tools:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/runner/.dotnet/tools',
'RUNNER_TRACKING_ID': 'github_c32e7e1e-0988-460d-849c-8626b5809a40',
'DOTNET_MULTILEVEL_LOOKUP': '0',
'INVOCATION_ID': 'a11ad719e8514265a31026602c4bbbdb',
'PROJ_NETWORK': 'ON',
'RUNNER_TOOL_CACHE': '/opt/hostedtoolcache',
'ANT_HOME': '/usr/share/ant',
'GITHUB_TRIGGERING_ACTOR': 'mnfienen',
'GITHUB_RUN_NUMBER': '38',
'RUNNER_ARCH': 'X64',
'XDG_RUNTIME_DIR': '/run/user/1001',
'AGENT_TOOLSDIRECTORY': '/opt/hostedtoolcache',
'GITHUB_ACTION': '__run_7',
'PROJ_DATA': '/home/runner/micromamba/envs/pyclass-docs/share/proj',
'MAKELEVEL': '1',
'MAMBA_ROOT_PREFIX': '/home/runner/micromamba',
'LANG': 'C.UTF-8',
'VCPKG_INSTALLATION_ROOT': '/usr/local/share/vcpkg',
'RUNNER_NAME': 'GitHub Actions 211',
'GITHUB_REF_NAME': 'main',
'STATS_D_D': 'false',
'XDG_CONFIG_HOME': '/home/runner/.config',
'CONDA': '/usr/share/miniconda',
'STATS_VMD': 'true',
'GITHUB_REPOSITORY': 'DOI-USGS/python-for-hydrology',
'XML_CATALOG_FILES': 'file:///home/runner/micromamba/envs/pyclass-docs/etc/xml/catalog file:///etc/xml/catalog',
'STATS_UE': 'true',
'GITHUB_ACTION_REF': '',
'ANDROID_NDK_ROOT': '/usr/local/lib/android/sdk/ndk/27.2.12479018',
'DEBIAN_FRONTEND': 'noninteractive',
'GSETTINGS_SCHEMA_DIR': '/home/runner/micromamba/envs/pyclass-docs/share/glib-2.0/schemas',
'GITHUB_REPOSITORY_ID': '537542085',
'STATS_PIP': 'false',
'GITHUB_ACTIONS': 'true',
'GDAL_DRIVER_PATH': '/home/runner/micromamba/envs/pyclass-docs/lib/gdalplugins',
'GITHUB_REF_PROTECTED': 'false',
'ACCEPT_EULA': 'Y',
'RUNNER_PERFLOG': '/home/runner/perflog',
'GITHUB_JOB': 'docs',
'CONDA_DEFAULT_ENV': 'pyclass-docs',
'GITHUB_WORKSPACE': '/home/runner/work/python-for-hydrology/python-for-hydrology',
'GITHUB_SHA': '5560ca304f3ef0601d455df4b00007c4d49df1f2',
'GITHUB_RUN_ATTEMPT': '1',
'GITHUB_REF': 'refs/heads/main',
'ANDROID_SDK_ROOT': '/usr/local/lib/android/sdk',
'MAMBA_EXE': '/home/runner/micromamba-bin/micromamba',
'GITHUB_ACTOR': 'mnfienen',
'LEIN_HOME': '/usr/local/lib/lein',
'JAVA_HOME': '/usr/lib/jvm/temurin-11-jdk-amd64',
'PWD': '/home/runner/work/python-for-hydrology/python-for-hydrology/docs',
'RUNNER_WORKSPACE': '/home/runner/work/python-for-hydrology',
'GITHUB_ACTOR_ID': '1110827',
'GITHUB_PATH': '/home/runner/work/_temp/_runner_file_commands/add_path_65b2201f-142e-43c2-8643-9e5076048163',
'GHCUP_INSTALL_BASE_PREFIX': '/usr/local',
'GITHUB_EVENT_NAME': 'push',
'XDG_DATA_DIRS': '/usr/local/share:/usr/share:/var/lib/snapd/desktop',
'GITHUB_SERVER_URL': 'https://github.com',
'STATS_TIS': 'mining',
'ANDROID_HOME': '/usr/local/lib/android/sdk',
'LEIN_JAR': '/usr/local/lib/lein/self-installs/leiningen-2.11.2-standalone.jar',
'GECKOWEBDRIVER': '/usr/local/share/gecko_driver',
'NVM_CD_FLAGS': '',
'HOMEBREW_CLEANUP_PERIODIC_FULL_DAYS': '3650',
'GITHUB_OUTPUT': '/home/runner/work/_temp/_runner_file_commands/set_output_65b2201f-142e-43c2-8643-9e5076048163',
'HOMEBREW_NO_AUTO_UPDATE': '1',
'EDGEWEBDRIVER': '/usr/local/share/edge_driver',
'STATS_EXT': 'true',
'SGX_AESM_ADDR': '1',
'CHROME_BIN': '/usr/bin/google-chrome',
'MFLAGS': '-w',
'CONDA_PREFIX': '/home/runner/micromamba/envs/pyclass-docs',
'ANDROID_NDK': '/usr/local/lib/android/sdk/ndk/27.2.12479018',
'GSETTINGS_SCHEMA_DIR_CONDA_BACKUP': '',
'SELENIUM_JAR_PATH': '/usr/share/java/selenium-server.jar',
'STATS_EXTP': 'https://provjobdprod.z13.web.core.windows.net/settings/provjobdsettings-latest/provjobd.data',
'ANDROID_NDK_HOME': '/usr/local/lib/android/sdk/ndk/27.2.12479018',
'GDAL_DATA': '/home/runner/micromamba/envs/pyclass-docs/share/gdal',
'GITHUB_STEP_SUMMARY': '/home/runner/work/_temp/_runner_file_commands/step_summary_65b2201f-142e-43c2-8643-9e5076048163',
'DOCUTILSCONFIG': '/home/runner/work/python-for-hydrology/python-for-hydrology/docs/source/docutils.conf',
'JPY_PARENT_PID': '4142',
'PYDEVD_USE_FRAME_EVAL': 'NO',
'TERM': 'xterm-color',
'CLICOLOR': '1',
'FORCE_COLOR': '1',
'CLICOLOR_FORCE': '1',
'PAGER': 'cat',
'GIT_PAGER': 'cat',
'MPLBACKEND': 'module://matplotlib_inline.backend_inline'}
Example: get the location of the current python (Conda) environment¶
[52]:
os.environ['CONDA_PREFIX']
[52]:
'/home/runner/micromamba/envs/pyclass-docs'
subprocess
— Subprocess management¶
The subprocess module offers a way to execute system commands, for example MODFLOW, or any operating system command that you can type at the command line.
The recommended approach to invoking subprocesses is to use the run()
function for all use cases it can handle. For more advanced use cases, the underlying Popen
interface can be used directly.
Take a look at the following help descriptions for run
.
Note, that on Windows, you may have to specify “shell=True” in order to access system commands.
[53]:
help(subprocess.run)
Help on function run in module subprocess:
run(*popenargs, input=None, capture_output=False, timeout=None, check=False, **kwargs)
Run command with arguments and return a CompletedProcess instance.
The returned instance will have attributes args, returncode, stdout and
stderr. By default, stdout and stderr are not captured, and those attributes
will be None. Pass stdout=PIPE and/or stderr=PIPE in order to capture them,
or pass capture_output=True to capture both.
If check is True and the exit code was non-zero, it raises a
CalledProcessError. The CalledProcessError object will have the return code
in the returncode attribute, and output & stderr attributes if those streams
were captured.
If timeout is given, and the process takes too long, a TimeoutExpired
exception will be raised.
There is an optional argument "input", allowing you to
pass bytes or a string to the subprocess's stdin. If you use this argument
you may not also use the Popen constructor's "stdin" argument, as
it will be used internally.
By default, all communication is in bytes, and therefore any "input" should
be bytes, and the stdout and stderr will be bytes. If in text mode, any
"input" should be a string, and stdout and stderr will be strings decoded
according to locale encoding, or by "encoding" if set. Text mode is
triggered by setting any of text, encoding, errors or universal_newlines.
The other arguments are the same as for the Popen constructor.
[54]:
# if on mac/unix
print(subprocess.run(['ls', '-l'], shell=True))
00_python_basics_review.ipynb
03_useful-std-library-modules.ipynb
05_numpy.ipynb
06b_matplotlib_animation.ipynb
07b_VSCode.md
09_a_Geopandas.ipynb
09_b_Geopandas_ABQ.ipynb
10_Rasterio.ipynb
11_xarray_mt_rainier_precip.ipynb
another_subfolder
data
solutions
CompletedProcess(args=['ls', '-l'], returncode=0)
With the cwd
argument, we can control the working directory for the command. Here we list the files in the parent directory.
[55]:
print(subprocess.run(['ls', '-l'], shell=True, cwd='..'))
part0_python_intro
part1_flopy
CompletedProcess(args=['ls', '-l'], returncode=0)
[56]:
# if on windows
print(subprocess.run(['dir'], shell=True))
00_python_basics_review.ipynb 09_b_Geopandas_ABQ.ipynb
03_useful-std-library-modules.ipynb 10_Rasterio.ipynb
05_numpy.ipynb 11_xarray_mt_rainier_precip.ipynb
06b_matplotlib_animation.ipynb another_subfolder
07b_VSCode.md data
09_a_Geopandas.ipynb solutions
CompletedProcess(args=['dir'], returncode=0)
zipfile
— Work with ZIP archives¶
zip up one of the files in data/¶
[57]:
with zipfile.ZipFile('junk.zip', 'w') as dest:
dest.write('data/xarray/daymet_prcp_rainier_1980-2018.nc')
now extract it¶
[58]:
with zipfile.ZipFile('junk.zip') as src:
src.extract('data/xarray/daymet_prcp_rainier_1980-2018.nc', path='extracted_data')
Testing Your Skills with a truly awful example:¶
the problem:¶
Pretend that the file data/fileio/netcdf_data.zip
contains some climate data (in the NetCDF format with the *.nc
extension) that we downloaded. If you open data/fileio/netcdf_data.zip
, you’ll see that within a subfolder zipped
are a bunch of additional subfolders, each for a different year. Within each subfolder is another zipfile. Within each of these zipfiles is yet another subfolder, inside of which is the actual data file we want (prcp.nc
).
[59]:
with zipfile.ZipFile('data/netcdf_data.zip') as src:
for f in src.namelist()[:10]:
print(f)
netcdf_data/
netcdf_data/zipped/
netcdf_data/zipped/zipped_1991/
netcdf_data/zipped/zipped_1991/12270_1991.zip
netcdf_data/zipped/zipped_1996/
netcdf_data/zipped/zipped_1996/12270_1996.zip
netcdf_data/zipped/zipped_1998/
netcdf_data/zipped/zipped_1998/12270_1998.zip
netcdf_data/zipped/zipped_1999/
netcdf_data/zipped/zipped_1999/12270_1999.zip
the goal:¶
To extract all of these prcp.nc
files into a single folder, after renaming them with their respective years (obtained from their enclosing folders or zip files). e.g.
prcp_1980.nc
prcp_1981.nc
...
This will allow us to open them together as a dataset in xarray
(more on that later). Does this sound awful? I’m not making this up. This is the kind of structure you get when downloading tiles of climate data with the Daymet Tile Selection Tool
hint:¶
you might find these functions helpful:
ZipFile.extractall
ZipFile.extract
Path.glob
Path.mkdir
Path.stem
Path.parent
Path.name
shutil.move
Path.rmdir()
hint: start by using ZipFile.extractall()
to extract all of the individual zip files from the main zip archive¶
This extracts the entire contents of the zip file to a designated folder
[60]:
output_folder = Path('03-output')
output_folder.mkdir(exist_ok=True)
with zipfile.ZipFile('data/netcdf_data.zip') as src:
src.extractall(output_folder)
Make a list of the zipfiles
[61]:
zipfiles = list(output_folder.glob('netcdf_data/zipped/*/*.zip'))
zipfiles[:5]
[61]:
[PosixPath('03-output/netcdf_data/zipped/zipped_1987/12270_1987.zip'),
PosixPath('03-output/netcdf_data/zipped/zipped_1988/12270_1988.zip'),
PosixPath('03-output/netcdf_data/zipped/zipped_1995/12270_1995.zip'),
PosixPath('03-output/netcdf_data/zipped/zipped_1981/12270_1981.zip'),
PosixPath('03-output/netcdf_data/zipped/zipped_1993/12270_1993.zip')]
[62]:
f = zipfiles[0]
f
[62]:
PosixPath('03-output/netcdf_data/zipped/zipped_1987/12270_1987.zip')
1a) Use ZipFile.namelist()
(as above) list the contents¶
This will yield the name of the *.nc
file that we need to extract
[ ]:
1b) Use ZipFile.extract()
to extract the *.nc
file to the destination folder¶
(you may need to create the destination folder first)
[ ]:
1c) Move the extracted file out of any enclosing subfolders, and rename to prcp_<year>.nc
¶
(so that if we repeat this for subsequent files, the extracted *.nc
files will end up in the same place)
[ ]:
1d) Remove the extra subfolders that were extracted¶
[ ]:
[ ]:
Bonus Application – Using os
to find the location of an executable¶
There are often times that you run an executable that is nested somewhere deep within your system path. It can often be a good idea to know exactly where that executable is located. This might help you one day from accidentally using an older version of an executable, such as MODFLOW.
[63]:
# Define two functions to help determine 'which' program you are using
def is_exe(fpath):
"""
Return True if fpath is an executable, otherwise return False
"""
return os.path.isfile(fpath) and os.access(fpath, os.X_OK)
def which(program):
"""
Locate the program and return its full path. Return
None if the program cannot be located.
"""
fpath, fname = os.path.split(program)
if fpath:
if is_exe(program):
return program
else:
# test for exe in current working directory
if is_exe(program):
return program
# test for exe in path statement
for path in os.environ["PATH"].split(os.pathsep):
path = path.strip('"')
exe_file = os.path.join(path, program)
if is_exe(exe_file):
return exe_file
return None
[64]:
which('mf6')
[64]:
'/home/runner/.local/bin/mf6'
[ ]: