D-Lab's 6 hour introduction to working with geospatial data in Python. Learn how to import, visualize, and analyze geospatial data using GeoPandas in Python.

License: Other

Jupyter Notebook 68.37% HTML 31.63%

python-geospatial-fundamentals-legacy's Introduction

D-Lab Python Geospatial Fundamentals Workshop

This repository contains the materials for D-Lab's Python Text Analysis. Prior experience with Python Fundamentals and Python Data Wrangling is assumed.

Workshop Goals

Geospatial data are an important component of data visualization and analysis in the social sciences, humanities, and elsewhere. The Python programming language is a great platform for exploring these data and integrating them into your research.

This workshop is divided in two parts:

Part 1: Getting started with spatial dataframes. Part one of this two-part workshop series will introduce basic methods for working with geospatial data in Python using the GeoPandas library. Participants will learn how to import and export spatial data and store them as GeoPandas GeoDataFrames (or spatial dataframes). We will explore and compare several methods for mapping the data including the GeoPandas plot function and the matplotlib library. We will review coordinate reference systems and methods for reading, defining and transforming these. Note that this workshop focuses on vector spatial data.
Part 2: Geoprocessing and analysis. In Part 2, we dive deeper into data driven mapping in Python, using color palettes and data classification to communicate information with maps. We will also introduce basic methods for processing spatial data, which are the building blocks of common spatial analysis workflows.

Installation Instructions

Anaconda is a useful package management software that allows you to run Python and Jupyter notebooks very easily. Installing Anaconda is the easiest way to make sure you have all the necessary software to run the materials for this workshop. Complete the following steps:

Download and install Anaconda (Python 3.8 distribution). Click "Download" and then click 64-bit "Graphical Installer" for your current operating system.
Download the Python-Geospatial-Fundamentals workshop materials:

Click the green "Code" button in the top right of the repository information.
Click "Download Zip".
Extract this file to a folder on your computer where you can easily access it (we recommend Desktop).

Optional: if you're familiar with git, you can instead clone this repository by opening a terminal and entering git clone [email protected]:dlab-berkeley/Python-Geospatial-Fundamentals.git.

Run the code

Now that you have all the required software and materials, you need to run the code:

Open the Anaconda Navigator application. You should see the green snake logo appear on your screen. Note that this can take a few minutes to load up the first time.
Click the "Launch" button under "Jupyter Notebooks" and navigate through your file system to the Python-Geospatial-Fundamentals folder you downloaded above.
Go to the lessons folder and find the notebook corresponding to the workshop you are attending.
Press Shift + Enter (or Ctrl + Enter) to run a cell.
You will need to install additional packages depending on which workshop you are attending.

Note that all of the above steps can be run from the terminal, if you're familiar with how to interact with Anaconda in that fashion. However, using Anaconda Navigator is the easiest way to get started if this is your first time working with Anaconda.

Is Python not working on your laptop?

If you do not have Anaconda installed and the materials loaded on your workshop by the time it starts, we strongly recommend using the UC Berkeley Datahub to run the materials for these lessons. You can access the DataHub by clicking the following button:

The DataHub downloads this repository, along with any necessary packages, and allows you to run the materials in a Jupyter notebook that is stored on UC Berkeley's servers. No installation is necessary from your end - you only need an internet browser and a CalNet ID to log in. By using the DataHub, you can save your work and come back to it at any time. When you want to return to your saved work, just go straight to DataHub, sign in, and you click on the Python-Geospatial-Fundamentals folder.

If you don't have a Berkeley CalNet ID, you can still run these lessons in the cloud, by clicking this button:

By using this button, however, you cannot save your work.

About the UC Berkeley D-Lab

D-Lab works with Berkeley faculty, research staff, and students to advance data-intensive social science and humanities research. Our goal at D-Lab is to provide practical training, staff support, resources, and space to enable you to use R for your own research applications. Our services cater to all skill levels and no programming, statistical, or computer science backgrounds are necessary. We offer these services in the form of workshops, one-to-one consulting, and working groups that cover a variety of research topics, digital tools, and programming languages.

Visit the D-Lab homepage to learn more about us. You can view our calendar for upcoming events, learn about how to utilize our consulting and data services, and check out upcoming workshops.

Other D-Lab Python Workshops

Here are other Python workshops offered by the D-Lab:

Introductory Workshops

Advanced Workshops

Contributors

python-geospatial-fundamentals-legacy's People

Contributors

Stargazers

Watchers

python-geospatial-fundamentals-legacy's Issues

very slow conda installation

@hikari-murayama @chengren I was helping out one of the geopandas workshops participants briefly today on the frontdesk.... they were having trouble going through the install instructions because it was taking a VERY long time on their 2017 MacBook Air w/8GB of RAM, so we may want to double-check the install process for folks.

As an interim solution, I had them use https://datahub.berkeley.edu/ and they were able to install everything fine there, so that's something to recommend so that people can follow along even if they can't use their own laptop.

We can update this repo to add a 1-click button to get them going in the cloud while they wait for it to install on their laptops.

Notebook 4 solutions gives error

In the second cell of the solutions notebook 4, the following line gives an error:
schools_gdf_utm10 = schools_gdf.to_crs("epsg:26910")

ValueError: Cannot transform naive geometries. Please set a crs on the object first.

Notebook 5 challenge 3

The challenge says 'Run the next cell to load a dataset containing Berkeley's bicycle boulevards (which we'll be using more in the following notebook).' but the cell with the code to load the data set is not there

Typo in notebook 6

When checking the results of the intersect query:

pas_utm10[parks_in_ac].head() --> parks_utm10[parks_in_ac].head()

Cleaning geospatial data

It could be helpful to include more instruction on how to clean geospatial data. The data used in the notebooks has been carefully curated and is very clean. However, raw data is often full of errors, missing values, etc.

Currently, there are two points in the notebooks (that I can think of) where the learner encounters (and learns how to handle) common issues with geospatial data: non-matching CRS codes in Notebook 3, and typos in the bike blvds dataset in Notebook 5.

I wonder if it might be useful to also show an example of missing or incorrect geospatial data? For example, a BART station that is clearly out of place (wrong coordinates), etc. How does one identify such errors and then correct them?

This is probably not essential, but after running into some messy geo data myself, I thought it might be something to add to the workshop in the future.

Missing package in notebook 9 (Datahub)

In DataHub, in the import line:

ModuleNotFoundError: No module named 'solution_hider'

Rename default branch to main from master

Repository currently does not follow D-Lab style guidelines for default branch name.

Runnning workshop on D-Lab Datahub

Running these notebooks on the D-Lab Datahub returns the following error:

/opt/conda/lib/python3.8/site-packages/geopandas/_compat.py:106: UserWarning: The Shapely GEOS version (3.8.0-CAPI-1.13.1 ) is incompatible with the GEOS version PyGEOS was compiled with (3.9.0-CAPI-1.16.2). Conversions between both will be slow.
 warnings.warn(

I think this requires a more recent version of shapely

Move optional notebooks to a subfolder

Move notebooks with the prefix optional_ to a subfolder of the lessons folder

Missing Berkeley Bus Routes zip file Ch 9

Hi,

In Chapter 9, section "8.7" (the one after 9.6), the command:
bus_routes = gpd.read_file('zip://notebook_data/transportation/Fall20Routeshape.zip')

exits with file not found.

I can't find the fie in the repo under "notebook_data/transportation" or any of the other subfolders in the repository.

I can't view the branch jupyterbook to see if it's in there.

Can you provide the file and/or update Chapter 9?

Thanks,

== todd

Make repo Private temporarily

Hi Hikari, I just realized that the D4H stuff cannot be in a public repo. So I'm going to make this a private repo for now. Best,
Patty

README.md: remove references to py2, put conda commands in text

Remove most references to py2 in 1.1 Download Anaconda
(and remove the #windowsDownload in the https://www.anaconda.com/distribution/#windowsDownload link)
the conda commands in second to last section: just put the text in rather than use the geopandas manual screenshot

Part 2, Notebook #7 - Needs .api added on print statement

Notebook #7, Checking Our Output: second line of print statements needs to have .api added:

print(schools_jointracts.shape)
print(schools_gdf_api.shape) (this line here!)
print(tracts_acs_gdf_ac.shape)

Error while running on Binder link

Binder link is unable to load geopandas module. Could be similar to what we've previously encountered with DataHub, which appears solved now!

Error Message Below:

`ImportError Traceback (most recent call last)
in
1 import pandas as pd
----> 2 import geopandas as gpd
3
4 import matplotlib # base python plotting library
5 import matplotlib.pyplot as plt # submodule of matplotlib

/srv/conda/envs/notebook/site-packages/geopandas/init.py in
----> 1 from geopandas._config import options # noqa
2
3 from geopandas.geoseries import GeoSeries # noqa
4 from geopandas.geodataframe import GeoDataFrame # noqa
5 from geopandas.array import points_from_xy # noqa

/srv/conda/envs/notebook/site-packages/geopandas/_config.py in
124 use_pygeos = Option(
125 key="use_pygeos",
--> 126 default_value=_default_use_pygeos(),
127 doc=(
128 "Whether to use PyGEOS to speed up spatial operations. The default is True "

/srv/conda/envs/notebook/site-packages/geopandas/_config.py in _default_use_pygeos()
110
111 def _default_use_pygeos():
--> 112 import geopandas._compat as compat
113
114 return compat.USE_PYGEOS

/srv/conda/envs/notebook/site-packages/geopandas/_compat.py in
6
7 import pandas as pd
----> 8 import pyproj
9 import shapely
10 import shapely.geos

/srv/conda/envs/notebook/site-packages/pyproj/init.py in
48 import warnings
49
---> 50 from pyproj import _datadir
51 from pyproj._list import ( # noqa: F401
52 get_angular_units_map,

ImportError: cannot import name '_datadir'`

In 'read in a shapefile': 'But it remains one of the most commonly used file format for vector spatial data' --> 'But shapefiles remain one of the most commonly used file formats for vector spatial data'
In Exploring the GeoPandas GeoDataFrame: "Notice at the end - just like we promised - a geometry column containing many numbers. Let's explore what this means, next."->"Notice at the end - just like we promised - there is a geometry column containing many numbers. Let's explore what this means next.
In 'GeoPandas Geometries': 'In each case, coordinates are separated by a spaces, and coordinate pairs are separated by commas.' --> 'In each case, coordinates are separated by a space, and coordinate pairs are separated by commas.'
In 'Subset the GeoDataframe': 'It looks like Alameda county is specified as "Alameda" in this dataset.' -->'It looks like Alameda County is specified as "Alameda" in this dataset.' Also the title of the section should be: 'Subset the GeoDataFrame'
In 'Plot the GeoDataFrame': Your geodataframe may also include the variants --> Your GeoDataFrame may also include the variants
In 'Save Your Data: Let's not forget to save out our Alameda County geodataframe alameda_county. This way we won't need to repeat the processing steps and attribute join we* did above.--> Let's not forget to save out our Alameda County GeoDataFrame alameda_county. This way we won't need to repeat the processing steps and attributejoin that we did above.
In 'Overview': - Subsetting GeoDatFrames --> - Subsetting GeoDataFrames

Workshop Title

Workshop Title should be "Python-Geopandas:-Parts-1-2"

Typo in notebook 7

In the section: Calculating Aggregated School Counts:

The next step is to group our GeoDataFrame by Census tract, and then summarize our data by group. We do this using the DataFrame method **groupy** --> groupby

dlab-berkeley / python-geospatial-fundamentals-legacy Goto Github PK

python-geospatial-fundamentals-legacy's Introduction

D-Lab Python Geospatial Fundamentals Workshop

Workshop Goals

Installation Instructions

Run the code

Is Python not working on your laptop?

About the UC Berkeley D-Lab

Other D-Lab Python Workshops

Introductory Workshops

Advanced Workshops

Contributors

python-geospatial-fundamentals-legacy's People

Contributors

Stargazers

Watchers

Forkers

python-geospatial-fundamentals-legacy's Issues

Recommend Projects

Recommend Topics

Recommend Org