09: Geopandas exercise solutions¶

[1]:

import os
import numpy as np
import pandas as pd
import geopandas as gp
from shapely.geometry import Point, Polygon
from pathlib import Path
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages

[2]:

datapath = Path('../data/geopandas/')

TEST YOUR SKILLS #0¶

make a new geodataframe of the parks
add a columns with centroids for each park
plot an interactive window with the park centroids and the neighborhoods
hints:
- remember the shapely methods are available for each geometry object (e.g. centroid())
- you can loop over the column in a couple different ways
- you can define which columns contains the geometry of a geodataframe
- you will likely have to define the CRS

[3]:

parks = gp.read_file(datapath / 'Madison_Parks.geojson')
hoods = gp.read_file(datapath / 'Neighborhood_Associations.geojson')

[4]:

# loopy solution
parks_cent = parks.copy()
centroids = []
for i in parks_cent.geometry.values:
    centroids.append(i.centroid)
parks_cent['centroid'] = centroids

[5]:

# do it all at once with a list comprehension
parks_cent['centroid'] = [i.centroid for i in parks_cent.geometry]

[6]:

# set the geometry and CRS
parks_cent.set_geometry('centroid', inplace=True)
parks_cent.set_crs(parks.crs, inplace=True);

[7]:

m_hoods = hoods.explore()
parks_cent.explore(m=m_hoods)

[7]:

Make this Notebook Trusted to load map: File -> Trust Notebook

[ ]:

TEST YOUR SKILLS #1¶

Using the bounds geodataframe you just made, write a function to visualize predicate behaviors.

your function should accept a left geodataframe, a right geodataframe, and a string for the predicate
your function should plot:
- the left geodataframe in (default) blue
- the result of the spatial join operation in another color
- the right geodataframe in another color with outline only
then you should set the title of the plot to the string predicate value used
the geodataframes to test with are isthmus_parks and bounds
your function should return the joined geodataframe
a couple hints:
- in the plot method are a couple args called facecolor and edgecolor that will help plot the rectangle
- there are other predicates to try out
advanced options: if that was easy, you can try a couple other things like:
- explore joins with points and lines in addition to just polygons
- change around the bounds polygon dimensions
- use explore() to make an interactive map

first have to bring over some things from the main lesson¶

[8]:

parks.to_crs(3071, inplace=True)
hoods.to_crs(parks.crs, inplace=True)
isthmus = hoods.loc[hoods['NEIGHB_NAME'].str.contains('Marquette') |
                   hoods['NEIGHB_NAME'].str.contains('Tenney')]
from shapely.geometry import box
bbox = box(570600, 290000, 573100, 291700)
bounds = gp.GeoDataFrame(geometry=[bbox],crs=parks.crs)
isthmus_parks = gp.sjoin(left_df=parks, right_df=isthmus)
isthmus_parks.drop(columns=[ 'index_right','OBJECTID_right', 'NA_ID', 'STATUS', 'CLASSIFICA', 'Web',
       'ShapeSTArea', 'ShapeSTLength'], inplace=True)

[9]:

def show_predicate(ldf,rdf,predicate):
    sj = gp.sjoin(ldf, rdf, predicate=predicate)
    ax = ldf.plot()
    sj.plot(ax=ax, color='black')
    rdf.plot(facecolor='none', edgecolor='orange', ax=ax)
    ax.set_title(predicate)
    return sj

[10]:

sj = show_predicate(isthmus_parks, bounds, 'intersects')
sj.head()

[10]:

	OBJECTID_left	Park_ID	Type	Acreage	Park_Name	ShortName	Subtype	SHAPESTArea	SHAPESTLength	geometry	NEIGHB_NAME
6	6422	1360	NEIGHBORHOOD	6.08	Yahara Place Park	Yahara Place	None	264814.254303	4619.706039	POLYGON ((572969.327 290871.975, 572979.154 29...	Marquette Neighborhood Association
12	6428	3540	TRAFFICWAY	0.10	Few Street (South) Street End	Few St	None	4522.502594	270.158029	POLYGON ((572000.229 289978.256, 572017.381 28...	Marquette Neighborhood Association
19	6435	3480	TRAFFICWAY	0.12	Baldwin Street End	Baldwin St	None	5319.010742	294.682954	POLYGON ((572164.248 290098.321, 572179.960 29...	Marquette Neighborhood Association
49	6465	1240	MINI	0.66	Morrison Park	Morrison	None	28667.633148	680.142299	POLYGON ((572460.118 290317.086, 572465.604 29...	Marquette Neighborhood Association
56	6472	1100	OPEN SPACE	7.66	Yahara River Parkway	Yahara River	None	333593.558197	13219.124814	MULTIPOLYGON (((572463.071 290603.880, 572425....	Marquette Neighborhood Association

../../../_images/notebooks_part0_python_intro_solutions_09_Geopandas__solutions_14_1.png

[11]:

sj = show_predicate(bounds, isthmus_parks, 'overlaps')
sj.head()

[11]:

geometry	index_right	OBJECTID_left	Park_ID	Type	Acreage	Park_Name	ShortName	Subtype	SHAPESTArea	SHAPESTLength	NEIGHB_NAME
POLYGON ((573100.000 290000.000, 573100.000 29...	35	6451	3590	TRAFFICWAY	0.08	Paterson Street (North) Street End	Paterson St	None	3549.678101	239.046579	Tenney-Lapham Neighborhood Association
POLYGON ((573100.000 290000.000, 573100.000 29...	12	6428	3540	TRAFFICWAY	0.10	Few Street (South) Street End	Few St	None	4522.502594	270.158029	Marquette Neighborhood Association
POLYGON ((573100.000 290000.000, 573100.000 29...	92	6509	1000	MINI	3.58	Orton Park	Orton	None	156086.664276	1712.836076	Marquette Neighborhood Association

../../../_images/notebooks_part0_python_intro_solutions_09_Geopandas__solutions_15_1.png

TEST YOUR SKILLS OPTIONAL¶

We have an Excel file that contains a crosswalk between SPECIES number as provided and species name. Can we bring that into our dataset and evaluate some conclusions about tree species by neighborhood?

start with the trees_with_hoods GeoDataFrame
load up and join the data from datapath / ‘Madison_Tree_Species_Lookup.xlsx’
hint: check the dtypes before merging - if you are going to join on a column, the column must be the same dtype in both dataframes
Make a multipage PDF with a page for each neighborhood showing a bar chart of the top ten tree species (by name) in each neighborhood
Make a map (use explore, or save to SHP or geojson) showing the neighborhoods with a color-coded field showing the most common tree species for each neighborhood

You will need a few pandas operations that we have only touched on a bit:

groupby, count, merge, read_excel, sort_values, iloc

[12]:

# read back in the trees and hoods data
trees = gp.read_file(datapath / 'Street_Trees.geojson', index_col=0)
trees.to_crs(hoods.crs, inplace=True)
trees_with_hoods = trees[['SPECIES','DIAMETER','geometry']].sjoin(hoods[['NEIGHB_NAME','geometry']])
trees_with_hoods.head()

[12]:

	SPECIES	DIAMETER	geometry	index_right	NEIGHB_NAME
0	554	22.0	POINT (569406.122 285635.225)	6	Bram's Addition Neighborhood Association
1	554	20.0	POINT (569391.615 285635.545)	6	Bram's Addition Neighborhood Association
2	320	14.0	POINT (569383.333 285772.643)	6	Bram's Addition Neighborhood Association
3	320	20.0	POINT (569407.725 285782.686)	6	Bram's Addition Neighborhood Association
4	320	20.0	POINT (569431.459 285792.707)	6	Bram's Addition Neighborhood Association

[13]:

# now read the excel file with tree species lookup - might need to fiddle with skiprows parameter
tree_species = pd.read_excel(datapath / 'Madison_Tree_Species_Lookup.xlsx',
                             skiprows = 6)
tree_species

[13]:

	Code	Description
0	768	Cherry 'Pink Flair'
1	769	Amur Chokecherry
2	762	Black Cherry
3	666	Crabapple 'Adirondack'
4	665	Crabapple 'Sugar Tyme'
...	...	...
248	800	Oak Spp.
249	681	White Mulberry
250	680	Mulberry Spp.
251	805	Buckthorn Spp.
252	804	Swamp White Oak

253 rows × 2 columns

[14]:

# check out the data types
tree_species.dtypes

[14]:

Code            int64
Description    object
dtype: object

[15]:

trees_with_hoods.dtypes

[15]:

SPECIES          object
DIAMETER        float64
geometry       geometry
index_right       int64
NEIGHB_NAME      object
dtype: object

[16]:

# d'oh! Code in tree_species and SPECIES in trees_with_hoods are different types.
# To make them consistent, let's convert SPECIES in trees_with_hoods to int
trees_with_hoods.SPECIES = [int(i) for i in trees_with_hoods.SPECIES]

[17]:

trees_with_hoods.dtypes

[17]:

SPECIES           int64
DIAMETER        float64
geometry       geometry
index_right       int64
NEIGHB_NAME      object
dtype: object

[18]:

# now we can merge - check out the left_on, right_on args
trees_final = trees_with_hoods.merge(tree_species, left_on='SPECIES', right_on='Code')
trees_final

[18]:

	SPECIES	DIAMETER	geometry	index_right	NEIGHB_NAME	Code	Description
0	554	22.0	POINT (569406.122 285635.225)	6	Bram's Addition Neighborhood Association	554	Honeylocust Var.
1	554	20.0	POINT (569391.615 285635.545)	6	Bram's Addition Neighborhood Association	554	Honeylocust Var.
2	554	8.0	POINT (569453.239 285930.349)	6	Bram's Addition Neighborhood Association	554	Honeylocust Var.
3	554	17.0	POINT (569463.866 285930.348)	6	Bram's Addition Neighborhood Association	554	Honeylocust Var.
4	554	20.0	POINT (569478.882 285930.117)	6	Bram's Addition Neighborhood Association	554	Honeylocust Var.
...	...	...	...	...	...	...	...
109750	976	19.0	POINT (568756.395 285356.696)	11	Burr Oaks Neighborhood Association	976	Elm 'Regal'
109751	976	15.0	POINT (568814.361 285356.511)	11	Burr Oaks Neighborhood Association	976	Elm 'Regal'
109752	976	10.0	POINT (568827.546 285356.119)	11	Burr Oaks Neighborhood Association	976	Elm 'Regal'
109753	80	4.0	POINT (566329.451 283988.236)	2	Arbor Hills Neighborhood Association	80	Dawn Redwood
109754	80	4.0	POINT (566329.451 283988.236)	2	Arbor Hills Neighborhood Association	80	Dawn Redwood

109755 rows × 7 columns

[19]:

# now the multipage plots
with PdfPages(datapath / 'TreePlots.pdf') as outpdf:
    # first groupby neighborhoods
    for cn, cg in trees_final.groupby('NEIGHB_NAME'):
        #then, for each neighborhood, group by "Description" to get counts by name
        counts = cg.groupby('Description')['SPECIES'].count()
        # sort them in reverse value
        counts.sort_values(ascending=False, inplace=True)
        #make a bar chart of the top ten counts
        counts[:10].plot.bar()
        # set up a title
        plt.title(f'top 10 trees in {cn}')
        # when the x-axis labels are long they can get cut off. tight_layout can help
        plt.tight_layout()
        outpdf.savefig()
        plt.close('all')

[20]:

# we can do this in an extra-pythonic way as well, chaining operations together
# advantage is it's faster to run but can be harder to initially understand and debug
with PdfPages(datapath / 'TreePlots.pdf') as outpdf:
    # first groupby neighborhoods
    for cn, cg in trees_final.groupby('NEIGHB_NAME'):
        cg.groupby('Description')['SPECIES'].count().sort_values(ascending=False)[:10].plot.bar()
        # set up a title
        plt.title(f'top 10 trees in {cn}')
        # when the x-axis labels are long they can get cut off. tight_layout can help
        plt.tight_layout()
        outpdf.savefig()
        plt.close('all')

Now let’s find the most common tree species in each neighborhood and make a map. There are some “sophisticated” ways using lots of pandas intrinsic functionality that can work, but we can also do it in a few (hopefully) logical explicit steps.¶

[21]:

# we can make a couple empty lists and just append the neighborhood name and the index of the
# maximum occurring tree species in each in a loop. Still some "cleverness"
hood = []
max_tree = []
for cn, cg in trees_final.groupby('NEIGHB_NAME'):
    hood.append(cn)
    max_tree.append(cg.groupby('Description').count().sort_values(by='SPECIES', ascending=False).iloc[0].name)

[22]:

# make a dataframe with these values
mts = pd.DataFrame(index=hood, data={'max_tree':max_tree})

[23]:

#now join this back into the GeoDataFrame of hoods
hoods.merge(mts, left_on='NEIGHB_NAME', right_index=True)[['NEIGHB_NAME','max_tree', 'geometry']].explore(column='max_tree')

[23]:

Make this Notebook Trusted to load map: File -> Trust Notebook

[ ]: