10a: Introduction to Rasterio for working with raster data

This exercise introduces ratsterio for working with GIS raster datasets.

We will use the rasterio library, along with numpy and matplotlib to open, explore, and create raster datasets.

[1]:
import numpy as np
import rasterio
from rasterio.plot import show
import matplotlib.pyplot as plt
from pathlib import Path
[2]:
data_path = Path("data/rasterio")  # set path to raster data folder

Opening and closing raster datasets using rasterio

We can open a GeoTIFF file containing a digital elevation model of Mt. Rainier (20150818_rainier_summer-tile-30.tif) using rasterio.open, in a way that’s similar to using Python’s built-in open function for text files.

[3]:
raster_file = data_path / '20150818_rainier_summer-tile-30.tif'  # example raster data
[4]:
dataset = rasterio.open(raster_file)

This returns a rasterio DatasetReader object (in this case called “dataset”). This object provides methods and attributes for accessing raster properties. Let’s explore some of these properties.

[5]:
dataset.name  #  file name
[5]:
'data/rasterio/20150818_rainier_summer-tile-30.tif'
[6]:
dataset.driver  # The file format (`GTiff` is recommended)
[6]:
'GTiff'
[7]:
dataset.mode  # Opening mode - 'r' for read mode by default
[7]:
'r'
[8]:
dataset.count  # number of datasets or "bands" contained in the raster dataset
[8]:
1
[9]:
dataset.dtypes  # The raster data type, one of `numpy` types supported by the driver
[9]:
('float32',)
[10]:
dataset.width  # number of columns
[10]:
827
[11]:
dataset.height  # number of rows
[11]:
761
[12]:
dataset.nodata  # The value which represents “No Data”, if any
[12]:
0.0
[13]:
dataset.crs  # The coordinate reference system EPSG code
[13]:
CRS.from_wkt('PROJCS["WGS 84 / UTM zone 10N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-123],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32610"]]')
[14]:
dataset.transform  # the transform matrix
[14]:
Affine(30.0, 0.0, 583464.253406,
       0.0, -30.0, 5202549.0989)

More on the transform later…

Most of this information is contained within the dataset’s metadata dictionary and can be accessed at once.

[15]:
dataset.meta
[15]:
{'driver': 'GTiff',
 'dtype': 'float32',
 'nodata': 0.0,
 'width': 827,
 'height': 761,
 'count': 1,
 'crs': CRS.from_wkt('PROJCS["WGS 84 / UTM zone 10N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER["latitude_of_origin",0],PARAMETER["central_meridian",-123],PARAMETER["scale_factor",0.9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","32610"]]'),
 'transform': Affine(30.0, 0.0, 583464.253406,
        0.0, -30.0, 5202549.0989)}

Plotting

rasterio reads raster data into numpy arrays so plotting a single band as two dimensional data can be accomplished directly with matplotlib.pyplot.imshow().

[16]:
array = dataset.read(1)
type(array)
[16]:
numpy.ndarray

using plt.imshow(), we can visualize the array data contained in the raster dataset

[17]:
plt.imshow(array, vmin=500, vmax=4500);
../../_images/notebooks_part0_python_intro_10a_Rasterio_intro_24_0.png

rasterio also offers rasterio.plot.show() to display rasters images and label the axes with geo-referenced extents. It also masks “nodata” cells.

[18]:
show(dataset, vmin=500, vmax=4500);
../../_images/notebooks_part0_python_intro_10a_Rasterio_intro_26_0.png

To plot a raster and a vector layers together, we need to initialize a plot object with plt.subplots, and pass the same ax argument to both plots. For example, let’s import the HUC12 polygon of the upper White River (see the GeoPandas notebooks for more information on working with geospatial vector data):

[19]:
import geopandas as gpd
white_riv_huc12 = gpd.read_file('data/rasterio/white_river_huc12.shp')
[20]:
fig, ax = plt.subplots()
show(dataset, ax=ax, vmin=500, vmax=4500)
white_riv_huc12.plot(ax=ax, edgecolor='red', facecolor='None');
../../_images/notebooks_part0_python_intro_10a_Rasterio_intro_29_0.png

Close the dataset when we’re done

the close() method is used to close a dataset, releasing the underlying file handle and any associated resources. This is important for proper resource management, especially when dealing with many files or in long-running processes.

[21]:
dataset.closed
[21]:
False
[22]:
dataset.close()
[23]:
dataset.closed
[23]:
True

Accessing and manipulating raster data

It is best practice to use rasterio.open() within a with statement. This ensures proper closing of the dataset and resource management, even if errors occur. You might recognize this syntax from working with files in notebook 04_files_and_strings.ipynb

[24]:
with rasterio.open(data_path / '20150818_rainier_summer-tile-30.tif') as dataset:
    array_m = dataset.read(1)
[25]:
type(array_m)
[25]:
numpy.ndarray

Convert to units of feet

[26]:
array_ft = array_m * 3.2084
[27]:
plt.imshow(array_m,  vmin=500, vmax=4500)
plt.colorbar();
../../_images/notebooks_part0_python_intro_10a_Rasterio_intro_40_0.png
[28]:
plt.imshow(array_ft,  vmin=1600, vmax=14450)
plt.colorbar();
../../_images/notebooks_part0_python_intro_10a_Rasterio_intro_41_0.png

Looks the same! But notice the new units on the colobar scale.

Now let’s save this raster to a new file.

Make the data categorical

Bin the array data into three categories (1, 2, or 3) based on elevation values using numpy indexing

[29]:
array_classified = array_ft.copy()

array_classified[array_classified < 6000] = 1                                 # Low elevation zone, less than 6,000 ft
array_classified[(array_classified < 10000) & (array_classified > 6000)] = 2  # Mid elevation zone, between 6,000 and 10,000 ft
array_classified[array_classified >= 10000] = 3                               # High elevation zone, greater than 10,000 ft
[30]:
fig, ax = plt.subplots()

ct = show(array_classified, ax=ax, transform=dataset.transform)

im = ct.get_images()[0]
cbar = fig.colorbar(im, ticks=[1, 2, 3], shrink=0.8)
cbar.ax.set_yticklabels(['Low Elev. ', 'Mid Elev.', 'High Elev.']);
../../_images/notebooks_part0_python_intro_10a_Rasterio_intro_45_0.png

Writing Raster files

Writing a file with rasterio involves defining the characteristics of the output raster and then writing your data array to it.

The characteristics are provided in the form of metadata (or “profile”) which takes the form of a dictionary containing parameters we looked at above, including:

  • dimensions of the raster (height, width)

  • data type (dtype)

  • number of bands (count)

  • coordinate reference system (crs), and;

  • georeferencing transform (transform)

A word on transforms

In a raster dataset, an affine transformation is a mathematical mapping that relates the dataset’s internal pixel coordinates to a real-world coordinate system. This process, known as georeferencing, allows software to correctly display, scale, and align the image with other spatial data.

The affine matrix

A raster georeferencing transform can be expressed in mathematical terms using an affine matrix. This matrix defines a set of equations to convert a pixel’s row and column coordinates into real-world x-y coordinates.

Geospatial coordinate reference systems are typically defined on a cartesian plane with the 0,0 origin in the bottom left. Raster data uses a different referencing system. Rows and columns with the 0,0 origin in the upper left and rows increase and you move down while the columns increase as you go right.

title

Image from https://www.perrygeo.com/python-affine-transforms.html. More info on the Python Affine library.

[31]:
with rasterio.open(data_path / '20150818_rainier_summer-tile-30.tif') as dataset:
    transform = dataset.transform
    meta = dataset.meta

transform
[31]:
Affine(30.0, 0.0, 583464.253406,
       0.0, -30.0, 5202549.0989)

There are six elements in the affine transform (a, b, c, d, e, f)

  • a = width of a pixel

  • b = row rotation (typically zero)

  • c = x-coordinate of the upper-left corner of the upper-left pixel

  • d = column rotation (typically zero)

  • e = height of a pixel (typically negative)

  • f = y-coordinate of the of the upper-left corner of the upper-left pixel

rasterio provides an rasterio.Affine class, which can be used to create and manipulate new transforms

[32]:
transform = rasterio.Affine(
    a=30.0,
    b=0.0,
    c=583464.253406,
    d=0.0,
    e=-30.0,
    f=5202549.0989
)

transform
[32]:
Affine(30.0, 0.0, 583464.253406,
       0.0, -30.0, 5202549.0989)

… now back to writing that raster

[33]:
new_raster_file = data_path / '20150818_rainier_summer-tile-30_feet.tif'

The long way - specify metadata and close manually:

[34]:
out_dataset = rasterio.open(
    new_raster_file,
    mode='w',         # open file in write mode this time
    driver='GTiff',
    width=827,
    height=761,
    count=1,
    crs=32610,
    transform=transform,
    dtype='float32',
    nodata=0.0
)

out_dataset.write(array_ft, 1)
out_dataset.close()

The short way - use with and the original file metadata, meta, using ** to unpack the dictionary into arguments

[35]:
with rasterio.open(new_raster_file, 'w', **meta) as out_dataset:
    out_dataset.write(array_ft, 1)

Building a raster from scratch

Class activity

Create and write your own raster. This raster should be:

  1. 10 rows, 10 columns

  2. The raster can contain any value(s) you want, but bonus points for a checkerboard pattern

  3. Set upper left coordinates to:

    • xul: 569005.45

    • yul: 10746521.08

  4. EPSG is 3070

Use the completed code blocks above as a template. Don’t be ashamed to adapt code you got from someone else or on the internet to accomplish something useful.

[ ]:

[ ]:

[ ]:

[ ]:

Masking (aka clipping) rasters using polygons

The rasterio mask function can be used to clip a raster to exclude areas outside of the polygon(s) defined in the shapefile.

[36]:
from rasterio.mask import mask

As an example, we can use the geometry information contained in the shapefile we plotted earlier as the mask

[37]:
white_riv_huc12.loc[0,'geometry']
[37]:
../../_images/notebooks_part0_python_intro_10a_Rasterio_intro_68_0.svg
[38]:
crop_shape = white_riv_huc12['geometry']

A quick refersher - here’s what we’re working with:

[39]:
fig, ax = plt.subplots()
with rasterio.open(raster_file) as dataset:
    show(dataset, ax=ax, vmin=500, vmax=4500)
    white_riv_huc12.plot(ax=ax, edgecolor='red', facecolor='None');
../../_images/notebooks_part0_python_intro_10a_Rasterio_intro_71_0.png

Now let’s use the mask function to clip the raster data with the polygon.

Set crop=True to crop the raster’s extent to the bounding box of the polygon(s).

[40]:
with rasterio.open(raster_file) as dataset:
    mask_image, mask_transform = mask(
        dataset,
        crop_shape,
        crop=True
    )
    mask_meta = dataset.meta

show(mask_image, transform=mask_transform);
../../_images/notebooks_part0_python_intro_10a_Rasterio_intro_73_0.png
[41]:
mask_meta.update({"driver": "GTiff",
                 "height": mask_image.shape[1],
                 "width": mask_image.shape[2],
                 "transform": mask_transform})

masked_file = data_path / '20150818_rainier_summer-tile-30_masked.tif'

with rasterio.open(masked_file, "w", **mask_meta) as out_dataset:
    out_dataset.write(mask_image)
[ ]: