Giter Club home page Giter Club logo

geocompy's Introduction

geocompy

Render

Open in GitHub Codespaces

https://py.geocompx.org

Running the code in this book requires the following:

  1. Python dependencies, which can be installed with pip, a package manager or a Docker container (see below)
  2. An integrated development environment (IDE) such as VS Code (running locally or on Codespaces/other host) or Jupyter Notebook for running and exploring the Python code interactively
  3. Quarto, which is used to generate the book

Reproduce the book with GitHub Codespaces

GitHub Codespaces minimise set-up costs by providing access to a modern IDE (VS Code) plus dependencies in your browser. This can save time on package installation. Codespaces allow you to make and commit changes, providing a way to test changes and contribute fixes in an instant.

To run the book in Codespaces, click on the link below.

Open in GitHub Codespaces Open in GitHub Codespaces

You should see something like this, the result of running all the code in the book by opening the terminal (e.g. with the command Ctrl+J) and entering the following command:

quarto preview

Reproduce the book with Docker (devcontainer)

If you can install Docker this is likely to be the quickest way to reproduce the contents of this book. To do this from within VS Code:

  1. Install Microsoft’s official Dev Container extension
  2. Open the folder containing the repo in VS Code and click on the ‘Reopen in container’ button that should appear, as shown below (you need to have Docker installed on your computer for this to work)

Edit the code in the containerised instance of VS Code that will appear 🎉

See details below for other ways to get the dependencies and reproduce the book.

Install dependencies with pip

Use pip to install the dependencies as follows, after cloning the repo and opening a terminal in the root folder of the repo.

First we’ll set-up a virtual environment to install the dependencies in:

# Create a virtual environment called geocompy
python -m venv geocompy
# Activate the virtual environment
source geocompy/bin/activate

Then install the dependencies (with the optional python -m prefix specifying the Python version):

# Install dependencies from the requirements.txt file
python -m pip install -r requirements.txt

You can also install packages individually, e.g.:

pip install jupyter-book

Deactivate the virtual environment when you’re done:

deactivate

Install dependencies with a package manager

The environment.yml file contains a list of dependencies that can be installed with a package manager such as conda, mamba or micromamba. The instructions below are for micromamba but should work for any package manager.

# For Linux, the default shell is bash:
curl -L micro.mamba.pm/install.sh | bash
# For macOS, the default shell is zsh:
curl -L micro.mamba.pm/install.sh | zsh

After answering the questions, install dependencies with the following command:

micromamba env create -f environment.yml

Activate the environment as follows:

micromamba activate geocompy

Install kernel, this will allow you to select the environment in vscode or IPython as follows:

python -m ipykernel install --user

You can now reproduce the book (requires quarto to be installed):

micromamba run -n geocompy quarto preview

Reproduce chapters with jupyter

VS Code’s quarto.quarto plugin can Python code in the chunks in the .qmd files in this book interactively.

However, you can also run any of the chapters in a Jupyter Notebook, e.g. as follows:

cd ipynb
# jupyter notebook . # open a notebook showing all chapters
jupyter notebook 02-spatial-data.ipynb

You should see something like this:

See documentation on running and developing Python code in a Jupyter notebook at docs.jupyter.org.

Additional information

If you’re interested in how to auto-generate and run the .py and .ipynb files from the .qmd files, see below.

Updating the .py and .ipynb files

The Python scripts and IPython notebook files stored in the code and ipynb folders are generated from the .qmd files. To regenerate them, you can use the following commands, to generate .ipynb and .py files for local versions of Chapter 2, for example:

quarto convert 02-spatial-data.qmd # generate .ipynb file
jupytext --to py *.ipynb # generate .py files .ipynb files

Do this for all chapters with the following bash script in the repo:

./convert.sh

Updating .py and .ipynb files with GitHub Actions

We have set-up a GitHub Action to do this automatically: every commit message that contains the text string ‘convert’ will create and push updated .ipynb and .py files.

Executing the .py and .ipynb files

Running the code chunks in the .qmd files in an IDE such as VSCode or directly with quarto is the main way code in this book is designed to be run interactively, but you can also execute the .py and .ipynb files directly. To run the code for chapter 2, for example, you can run one of the following commands from your system shell:

python code/chapters/02-spatial-data.py # currently requires manual intervention to complete, see #71
ipython ipynb/02-spatial-data.ipynb # currently requires manual intervention to complete, see #71
bash ./run-code.sh # run all .python files

Updating packages

We pin package versions in the environment.yml and requirements.txt files to ensure reproducibility.

To update the requirements.txt run the following:

python -m pip install pur
pur -r requirements.txt
python -m pip install -r requirements.txt

To update the environment.yml file in the same way based on your newly installed packages, run the following:

micromamba list export > environment.yml

geocompy's People

Contributors

anisotropi4 avatar anitagraser avatar joshcole-dta avatar jtmiclat avatar michaeldorman avatar nowosad avatar robinlovelace avatar robinlovelace-ate avatar sgillies avatar smkerr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

geocompy's Issues

Resolving use of both environment.yml and requirements.txt files

Currently we have two files defining dependencies:

https://github.com/geocompr/py/blob/main/data/requirements.txt

and

https://github.com/geocompr/py/blob/main/environment.yml

The fact that we have two versions of the deps can cause issues, e.g. #31.

I think we should standardise and, looking at various help files, am thinking that an environment.yml file that points to the pip deps should be fine. Source: https://stackoverflow.com/a/68164027

# environment.yml
name: geocompy
dependencies:
  - python>=3.9
  - miniconda3
  - pip
  - pip:
    - -r file:data/requirements.txt

However, I note these caveats from https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html :

Issues may arise when using pip and conda together. When combining conda and pip, it is best to use an isolated conda environment. Only after conda has been used to install as many packages as possible should pip be used to install any remaining software. If modifications are needed to the environment, it is best to create a new environment rather than running conda after pip. When appropriate, conda and pip requirements should be stored in text files.

Thoughts? Another advantage of standardising this could be to automatically update the Docker images when the requirements get updated here, linking to #6

Graph showing popularity of languages

Inspired by https://geobgu.xyz/py/ by @michaeldorman

Updated image from StackOverflow:

image

Points: although Python is more popular overall, for data processing (roughly but not completely) captured by pandas questions, Python-for-data-science and R are more similar. I've thrown a few new ones in there, Rust is super future proof systems language (with low level geo crates in GeoRust), Julia is a new kid on the block with a similar focus as R.

Making `landast.tif` avaiable to GitHub actions

I've added few more files to the (local) data directory, one of them (landast.tif) is large and in .gitignore, which makes the build fail. It's used in this example:

Screenshot from 2022-04-02 01-14-03

Will be happy to hear any ideas how we can make landast.tif available to GitHub actions. Thanks!

Switch to Python 3.11

It seems this will be a major release with lots of performance enhancements: https://docs.python.org/3.11/whatsnew/3.11.html

Problem is, seems that conda will lag behind the official release: conda-forge/conda-forge.github.io#1629

According to the release schedule the full 3.11 release will be published around October 2022 so we don't need to do anything atm, just opening this up here so we're ready and to get feedback on that. From then it seems that Python 3.11 will be supported for the next 5 years so I suggest we pin to that version for stability when it's ready and stable, including on conda. Sound reasonable @anitagraser ?

Use explore instead of hvplot for geopandas

Hey,

this is a great initiative! Thanks! I was checking the section 2 and noticed that in 2.2.2 you try to get an interactive plot using hvplot. I would suggest going with the built-in explore method as it is already there and you don't need any other external package. So unless you really prefer holoviews or need to do something explore cannot, I believe it is better to point users to the built in interactive plotting. Depending on folium and leaflet instead of hvplot is also much more lightweight.

Set-up notes on Windows

Making some notes here from recent installation experience.

  • Installed Chocolatey
  • Installed miniconda3
  • Installed VS Code
  • Installed the Quarto plugin in VS Code

At that point I tried running some Python code and was asked to select the Python Interpreter:

image

IPython versions of chapters

  • You can convert from qmd to ipynb with quarto convert test.qmd
  • Should we do that automatically in the CI?
  • Plan: have unevaluated ipynb versions of every chapter to be format agnostic and as accessible as possible
  • May make it easier when making Binder

Domain name

Currently the book is hosted in 2 places:

Directly from GitHub pages: http://geocompr.github.io/py

Random netlify deploy: https://vocal-longma-0d407b.netlify.app/ #68

We can continue with just the former no problem as the default. Advantage: simplicity.

Advantage of custom domain hosted by Netlify:

  • Netlify deploys do not add anything to the commit history or repo size keeping it clean
  • Potential advantage in terms of discoverability with custom domain

This is linked to a broader question about where to host all 'geocompr' repos, we could get economies of scale by hosting many of them on the same site:

  • Geocomputation with R, deployed from gh-pages branch via Netlify to https://geocompr.robinlovelace.net/ default plan is to continue to host it there but am open to moving it at some point, e.g. to geocomp.xyz/r
  • Associated book website, blog and solutions
  • Spanish translation
  • French translation
  • Japanese translation
  • ... translation in another human language
  • translation into another computer language e.g. JavaScript or Rust

All this and more could live on a custom domain. That may take some setting up. Options

geocomp.xyz suggested by Jakub

geowith.com suggested by me (over a coffee ☕ ) thinking that r.geowith.com and py.geowith.com, as in geo(computation) with r/py/.../ could be good and then have r.geowith.com/en etc as consistent sites.

This would be a big operation and not to be taken lightly. Simple is good. So the default is to keep it as is for now with the silent Netlify deploys as an experimental backup but thought I'd dump thoughts on this hear while fresh in my head (have discussed broadly with Jakub who suggested a dedicated URL partly as geocompr does not fit so well with the Pythonic nature of this project already but keen to hear what others think).

Switch to conda only build/deploy

Just noticed another CI fail, with this message:

NameError                                 Traceback (most recent call last)
Input In [22], in <cell line: 1>()
----> 1 multipolygon = shapely.geometry.MultiPolygon([Polygon([(1,5), (2,2), (4,1), (4,4), (1,5)]), 
      2                              Polygon([(0,2), (1,2), (1,3), (0,3), (0,2)])])
      3 multipolygon
NameError: name 'Polygon' is not defined
NameError: name 'Polygon' is not defined

Source: https://github.com/geocompr/py/runs/7087792814?check_suite_focus=true#step:4:107

I'm wondering at this point if we should use the conda image in the main.yml file. However I do not understand this latest error message and would be good to understand the cause before acting.

Also thinking we should have CI on Windows, Mac and Linux but can save that for another issue. Thoughts?

Pitching Ideas

Good evening,

Following a short exchange with Robin on Twitter I'd like see how I could contribute to the project. Which prompts the question, who is the target audience? and how can I help?

For my part I use Open Data sources to present interactive data visualization as hobby. Examples includes:

  • Population and census data from the Office of National Statistics (ONS) and National Library of Scotland,
  • Transport network data from OpenStreetMap (OSM)
  • Transport locations from the National Passenger Transport Node (NaPTAN)
  • Planning and operation data from Network Rail OpenDataFeeds

You appear to have covered these already but I typically use shell scripts and python on Linux to extract, clean and wrangle data using python with numpy, pandas, GeoPandas, Networkx, OSMnx and SciPy. I've also being playing with PySAL (Python Spacial Analytics Library) and sklearn (Scikit Learn).

For what it's worth, my day job is bit more prosaic, I am technical lead on projects involving national planning, operations and performance industry data exchange on the British network and have a role in European rail data exchange governance.

Given all this, my initial thought would be to take one of the visualizations, say how to draw an interactive population map here density

Or perhaps trunk road and rail transport map here or fizzy-knitting* map of the British Railway such as here.

Does any of this work?

Apologies for the length of the note. Shout if you have any questions with this stuff.

Ta,

Will

How to reproduce the book on Windows?

Just trying this after #22 and hitting this error message:

quarto preview
Preparing to preview
[ 1/10] index.qmd

Starting Jupyter kernel...
ERROR: Error executing 'C:/tools/miniconda3/python.exe': The pipe is being closed. (os error 232)

Anyone got any ideas? I guess I should set the environment but don't know how...

Update appendix

I think this is a bit of an MVP before we put this out there: https://geocompr.github.io/py/a1-starting.html

Reasoning: great that it's got installation instructions but it's for the wrong language (result of copying template from jjallaire)!

Happy to give this a bash but if anyone else has insight/experience into good practice when installing Python for geo work on Linux/Windows/Mac/Docker here's the place to say.

Key links:

Build failing

Since removing the landsat and 'air' datasets for #12 the build is failing : ( sorry!

I suggest we solve this by downloading the file if it does not exists, e.g. with:

import subprocess
url = "https://github.com/geocompr/py/releases/download/0.1/landsat.tif"
path = "landsat.tif"
subprocess.run(["wget", "-r", "-nc", "-P", path, url])

Error message related to CRSs in c2

This is my output:

quarto preview # generate live preview of the book

Preparing to preview
[1/8] 02-spatial-data.qmd

Starting Jupyter kernel...Done

Executing '02-spatial-data.ipynb'
  Cell 1/47...Done
  Cell 2/47...Done
  Cell 3/47...Done
  Cell 4/47...

An error occurred while executing the following cell:
------------------
gdf = gpd.read_file("data/world.gpkg")
------------------

---------------------------------------------------------------------------
CRSError                                  Traceback (most recent call last)
Input In [4], in <module>
----> 1 gdf = gpd.read_file("data/world.gpkg")

File /opt/anaconda3/envs/OSMNX/lib/python3.10/site-packages/geopandas/io/file.py:244, in _read_file(filename, bbox, mask, rows, **kwargs)
    239 if kwargs.get("ignore_geometry", False):
    240     return pd.DataFrame(
    241         [record["properties"] for record in f_filt], columns=columns
    242     )
--> 244 return GeoDataFrame.from_features(
    245     f_filt, crs=crs, columns=columns + ["geometry"]
    246 )

File /opt/anaconda3/envs/OSMNX/lib/python3.10/site-packages/geopandas/geodataframe.py:610, in GeoDataFrame.from_features(cls, features, crs, columns)
    608     row.update(feature["properties"])
    609     rows.append(row)
--> 610 return GeoDataFrame(rows, columns=columns, crs=crs)

File /opt/anaconda3/envs/OSMNX/lib/python3.10/site-packages/geopandas/geodataframe.py:126, in GeoDataFrame.__init__(self, data, geometry, crs, *args, **kwargs)
    122     super().__init__(data, *args, **kwargs)
    124 # need to set this before calling self['geometry'], because
    125 # getitem accesses crs
--> 126 self._crs = CRS.from_user_input(crs) if crs else None
    128 # set_geometry ensures the geometry data have the proper dtype,
    129 # but is not called if `geometry=None` ('geometry' column present
    130 # in the data), so therefore need to ensure it here manually
   (...)
    134 
    135 # if gdf passed in and geo_col is set, we use that for geometry
    136 if geometry is None and isinstance(data, GeoDataFrame):

File /opt/anaconda3/envs/OSMNX/lib/python3.10/site-packages/pyproj/crs/crs.py:479, in CRS.from_user_input(cls, value, **kwargs)
    477 if isinstance(value, cls):
    478     return value
--> 479 return cls(value, **kwargs)

File /opt/anaconda3/envs/OSMNX/lib/python3.10/site-packages/pyproj/crs/crs.py:326, in CRS.__init__(self, projparams, **kwargs)
    324     self._local.crs = projparams
    325 else:
--> 326     self._local.crs = _CRS(self.srs)

File pyproj/_crs.pyx:2352, in pyproj._crs._CRS.__init__()

CRSError: Invalid projection: epsg:4326: (Internal Proj Error: proj_create: no database context specified)
CRSError: Invalid projection: epsg:4326: (Internal Proj Error: proj_create: no database context specified)

(geocompy) 

Is this a PROJ version issue? Will try in a different environment.

GitHub actions error

The GitHub action produces an error as follows:

 fatal: not in a git directory
Error: Process completed with exit code 128.

Screenshot from 2022-05-08 10-50-53

However, the book is built nevertheless:

Screenshot from 2022-05-08 10-50-10

Use of an explicit index paradigm

Please would you consider the following approach in the introductory chapters of introducing an example of using an index to filter dataframes so that rather than writing

df[df['A'] == 3]

I would create an variable idx to hold the boolean filter

idx = df['A'] == 3
df[idx]

This has the advantage when the filter becomes more complex, it makes it clearer what the intent is. For example when using the ~ ($not$) operation.

Then then working with multiple filters makes using the & and | (logical and and or) more straightforward.

For example I feel the intent of following examples is somewhat clearer:

idx1 = df['A'] == 3
idx2 = df['B'] == 'Q'

df[~idx1]
df[idx1 & ~idx2]

Rather than

df[~(df['A'] == 3)]
df[(df['A'] == 3) & ~df['B'] == 'Q']

Update instructions to run .ipynb notebooks

Currently the README states:

Then, navigate to the above-mentioned working directory, and open the Jupyter Notebook of any of chapters using a command such as:

jupyter notebook 02-spatial-data.ipynb

But now the .ipynb files are in the ipynb/ directory.

Add text around new content for chapter 9

Following recently merged #57 we should add some text to describe the code at the very least. I'm up for taking a first look at this so assigning myself but may ask for input @anisotropi4 so any comments from you or anyone welcome. The vis. chapter is in an early state of development so may be worth adding basic content before this more advances stuff but we can add a link and a brief description at the very least.

Operations on data and Geometry in GeoPandas and Shapely

Having more or less completed the NaPTAN PR I have been having a look at the other chapters which look very good.

They did pose a question about how to keep consistency with the PR or example:

  1. Reading CSV data files from a file, buffer or data either in Pandas (.read_csv) or memory (.from_dict)
  2. Extracting dataframe columns to numpy array, and handling issues around numpy array shape
  3. Creating GeoSeries geometry from a 2-d numpy using .from_xy
  4. Using Shapely binary predicates to identify geometric relationships, for example .within with a GeoSeries

There are then other interesting predicate functions such as .contains, .crosses and so on.

Error when the correct environment is not activated

Documenting here as a common issue that people will hit, from the PowerShell command line this time (really don't like using Windows but good to see how ~90% of people will hit these questions of reproducibility):

quarto preview
Preparing to preview
[1/5] 02-spatial-data.qmd

Starting Jupyter kernel...Done

Executing '02-spatial-data.ipynb'
  Cell 1/40...

An error occurred while executing the following cell:
------------------
import geopandas as gpd
------------------

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Input In [1], in <cell line: 1>()

ModuleNotFoundError: No module named 'geopandas'
ModuleNotFoundError: No module named 'geopandas'

There is a environment.yml file in this directory. Is this for a conda env that you need to restore?

Attempted solution:


PS C:\Users\robinadmin\gh\geocompr\py> conda env activate geocompy
conda : The term 'conda' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was 
included, verify that the path is correct and try again.
At line:1 char:1
+ conda env activate geocompy
+ ~~~~~
    + CategoryInfo          : ObjectNotFound: (conda:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

Error in requirements.txt?

Heads-up @michaeldorman I get the following:

pip install -r data/requirements.txt                     

ERROR: Invalid requirement: 'affine=2.3.0=py_0' (from line 4 of data/requirements.txt)
Hint: = is not a valid operator. Did you mean == ?

No module named topojson issue

That error message appeared after running

conda activate geocompy # the default name of the environment
quarto preview

I guess it's not in the environment.yml file...

Add sections on reproducing the book on the landing page

One of the most important things for reproducibility and accessibility. Heads-up @anitagraser any suggestions on the local installation side v. welcome

  • Running the code locally (including python set-up instructions)
  • Running the code in Docker with IPython notebook (possibly the second most used approach)
  • Running the code in Docker with VSCode
  • Running the code in Docker with RStudio
  • Running the code in Binder

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.