{ "cells": [ { "cell_type": "markdown", "id": "1eb9cb43-692a-4d8c-83d2-85065aa4e1ae", "metadata": {}, "source": [ "# Let's get some free city data for Albuquerque, NM and make a heatmap of where in town most Breaking Bad filming took place" ] }, { "cell_type": "code", "execution_count": null, "id": "cc1d171e-aab1-4982-a03b-28c72e4eb021", "metadata": {}, "outputs": [], "source": [ "import geopandas as gp\n", "import pathlib as pl\n", "import fiona\n", "fiona.drvsupport.supported_drivers['libkml'] = 'rw' # enable KML support which is disabled by default\n", "fiona.drvsupport.supported_drivers['LIBKML'] = 'rw' # enable KML support which is disabled by default\n" ] }, { "cell_type": "code", "execution_count": null, "id": "d8485fb9-0c3f-4f8d-b7c5-2fd891c577b5", "metadata": {}, "outputs": [], "source": [ "# data from https://www.cabq.gov/abq-data/" ] }, { "cell_type": "code", "execution_count": null, "id": "04b18ca6-1224-4b8d-8ae7-61ec13ec7953", "metadata": {}, "outputs": [], "source": [ "datapath = pl.Path('./data/geopandas/abq/')" ] }, { "cell_type": "markdown", "id": "ce611231-207c-4572-94cb-fc800c4387b3", "metadata": {}, "source": [ "### here we have to do a bunch of data munging - this is always a big part of any data project" ] }, { "cell_type": "code", "execution_count": null, "id": "7e6ef5b9-afd4-4e04-a03a-7588bf6be72c", "metadata": {}, "outputs": [], "source": [ "gdf = gp.read_file(datapath / 'abq_films.geojson')\n", "gdf = gdf.loc[gdf.geometry != None] # there are some missing geometries, so we can skip those\n", "gdf['Title'] = gdf['Title'].apply(lambda x:x.strip()) # let's strip off extra spaces from the Titles field\n", "gdf.Title = ['In Plain Sight' if \"Plain Sight\" in i else i for i in gdf.Title]\n", "gdf['Year'] = [i.year for i in gdf.ShootDate]\n", "gdf = gdf.drop(columns='ShootDate') # something with this " ] }, { "cell_type": "markdown", "id": "69d88a70-e721-422e-a304-b2139b534e20", "metadata": {}, "source": [ "### what are the top 20 projects filming in ABQ?" ] }, { "cell_type": "code", "execution_count": null, "id": "3e691881-5402-4445-a315-7aafd370b6fa", "metadata": {}, "outputs": [], "source": [ "gdf.groupby('Title').count()['Type'].sort_values(ascending=False)[:20].plot.bar()" ] }, { "cell_type": "markdown", "id": "fbbe8751-5a21-46d5-9970-c1647a0b9898", "metadata": {}, "source": [ "### we can just take a look at where Breaking Bad filming events took place as dots on the map" ] }, { "cell_type": "code", "execution_count": null, "id": "646e80be-53f0-4797-8ae5-eef8d2143286", "metadata": {}, "outputs": [], "source": [ "gdf.loc[gdf.Title=='Breaking Bad'].explore()" ] }, { "cell_type": "markdown", "id": "927d8216-86a6-4194-8b92-c9571aa952f4", "metadata": {}, "source": [ "### What are the other titles?" ] }, { "cell_type": "code", "execution_count": null, "id": "7d9b046d-1e9c-4e83-a60a-661175e9f959", "metadata": {}, "outputs": [], "source": [ "gdf['Title'].unique()" ] }, { "cell_type": "code", "execution_count": null, "id": "6c0c882d-1497-4d84-8fd5-59c28e112e6a", "metadata": {}, "outputs": [], "source": [ "gdf['Type'].unique()" ] }, { "cell_type": "markdown", "id": "1d96ecb2-ce40-446a-990d-6d22743523ec", "metadata": {}, "source": [ "## read in the zone atlas - this is a grid from the city that is used for zoning, but we can use it as a base for a heatmap as it covers the city with equal-sized squares" ] }, { "cell_type": "code", "execution_count": null, "id": "9b5e5c62-db6a-490d-95e1-63c3157a950b", "metadata": {}, "outputs": [], "source": [ "gdf_pg = gp.read_file(datapath / 'zoneatlaspagegrid.kmz')\n", "gdf_pg.explore()" ] }, { "cell_type": "markdown", "id": "c5b9087c-b00d-4095-8a63-96f93b453eb2", "metadata": {}, "source": [ "### we can assign a variable `title_to_plot` to be able to check out heatmaps of any of the titles above" ] }, { "cell_type": "code", "execution_count": null, "id": "3b7e5652-89f3-42b3-931a-d65e1ad99e18", "metadata": {}, "outputs": [], "source": [ "title_to_plot = 'Breaking Bad'" ] }, { "cell_type": "markdown", "id": "ff353450-af1c-412a-89dd-77d11466812d", "metadata": {}, "source": [ "### Finally, we need to do a few things and set them to intermediate results:\n", "- load a fresh copy of the polygons\n", "- use `sjoin` to intersect the `gdf` points corresponding to the title named in `title_to_plot` which returns only the polygon of intersection for each point in the original point dataframe and convert to the correct CRS. Note that we can use only the columns we need (e.g. `[['Name','geometry']]) as long as `geometry` is included. This makes the results a little cleaner\n", "- next, with that result, use `groupby` to count all the points based on their `Name` column\n", "- count these up, choose a column from the result (we randomly chose `geometry` here) and rename the counts to a column called `points`\n", "- finally merge the column of counts back to the original dataframe of zones noting that we will use `Name` on the left and `index` on the right" ] }, { "cell_type": "code", "execution_count": null, "id": "07fb6e15-0924-4424-b787-f7c4ba64909d", "metadata": {}, "outputs": [], "source": [ "gdf_pg = gp.read_file(datapath / 'zoneatlaspagegrid.kmz') # load a fresh copy of the polygons \n", "gdf_pg.head(3)" ] }, { "cell_type": "markdown", "id": "a88295ee-cc5b-4a38-b82c-daf8a3a344d9", "metadata": {}, "source": [ "use `sjoin` to intersect the `gdf` points corresponding to the title named in `title_to_plot` which returns only the polygon of intersection for each point in the original point dataframe and convert to the correct CRS. Note that we can use only the columns we need (e.g. `[['Name','geometry']]) as long as `geometry` is included. This makes the results a little cleaner. \n", "\n", "Note we print out the lengths of the starting and ending results for reference" ] }, { "cell_type": "code", "execution_count": null, "id": "45d18e70-cdaa-47e6-8843-375e26a9f8f8", "metadata": {}, "outputs": [], "source": [ "sj_result = gdf_pg[['Name','geometry']].sjoin(\n", " gdf[['Title','geometry']].loc[gdf.Title==title_to_plot].to_crs(gdf_pg.crs))\n", "print(len(sj_result), len(gdf.loc[gdf.Title==title_to_plot]), len(gdf_pg))\n", "sj_result.head(3)" ] }, { "cell_type": "markdown", "id": "7efeaf67-6f03-4204-8735-6413841a8be9", "metadata": {}, "source": [ "Next, with that result, use `groupby` to count all the points based on their `Name` column.\n", "\n", "This gives us counts for each column..." ] }, { "cell_type": "code", "execution_count": null, "id": "2d119e22-bcb8-4693-bd7e-725af49f599e", "metadata": {}, "outputs": [], "source": [ "sj_count = sj_result.groupby(\n", " 'Name').count()\n", "print(len(sj_count))\n", "sj_count.sample(4)" ] }, { "cell_type": "markdown", "id": "468ec5ad-7a1b-4a0e-b4a3-b705c499dfdc", "metadata": {}, "source": [ "Count these up, choose a column from the result (we randomly chose `geometry` here)" ] }, { "cell_type": "code", "execution_count": null, "id": "4d04966b-5f7e-44f7-994c-78e823c10a0d", "metadata": {}, "outputs": [], "source": [ "sj_sum = sj_count.geometry.rename('points')\n", "sj_sum.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "3f147222-40bf-4a9d-8698-3242a09d86e4", "metadata": {}, "outputs": [], "source": [ "gdf_pg = gdf_pg[['Name','geometry']].merge(sj_sum, left_on='Name', right_index=True)" ] }, { "cell_type": "markdown", "id": "bf2ff361-6350-4b88-a095-02f130806573", "metadata": {}, "source": [ "### Or, we can smack _all_ that work into a single chain (WAT!?)" ] }, { "cell_type": "code", "execution_count": null, "id": "68a06afc-b668-4fc9-8640-82290ecf23ff", "metadata": {}, "outputs": [], "source": [ "gdf_pg = gp.read_file(datapath / 'zoneatlaspagegrid.kmz') # load a fresh copy of the polygons \n", "gdf_pg = gdf_pg.merge(gdf_pg[['Name','geometry']].sjoin(\n", " gdf[['Title','geometry']].loc[gdf.Title==title_to_plot].to_crs(gdf_pg.crs)).groupby(\n", " 'Name').count().geometry.rename('points'), left_on='Name', right_index=True)" ] }, { "cell_type": "markdown", "id": "1ba52905-a539-472d-893a-917db7807a6d", "metadata": {}, "source": [ "### finally, let's look at the result on the map overlaying the points with the heatmap. Whew!" ] }, { "cell_type": "code", "execution_count": null, "id": "454c001b-de05-43e5-8b28-f9076a1ea23f", "metadata": {}, "outputs": [], "source": [ "m = gdf_pg[['points','geometry']].explore(column='points', cmap='magma_r')\n", "gdf.loc[gdf.Title==title_to_plot][['Site', 'geometry']].explore(m=m)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "47abb704-2dc5-4a22-b628-7a3258a96ae2", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.7" } }, "nbformat": 4, "nbformat_minor": 5 }