geocompx / geocompy Goto Github PK

View Code? Open in Web Editor NEW

245.0 11.0 44.0 264.46 MB

Geocomputation with Python: an open source book and online resource for getting started in this space

Home Page: https://py.geocompx.org/

License: Other

TeX 24.12% Shell 0.16% Python 74.88% HTML 0.84%

geocomputation book geo geopython python spatial

geocompy's Issues

Make test website work

https://geocompr.github.io/pytest

`rasterstats` package not installed

The rasterstats package is missing so at the moment the book does not render on GitHub. I'm not sure how to solve this, I guess that RUN pip3 install rasterstats needs to be added to https://github.com/geocompr/docker/edit/master/python/Dockerfile ?

Operations on data and Geometry in GeoPandas and Shapely

Having more or less completed the NaPTAN PR I have been having a look at the other chapters which look very good.

They did pose a question about how to keep consistency with the PR or example:

Reading CSV data files from a file, buffer or data either in Pandas (.read_csv) or memory (.from_dict)
Extracting dataframe columns to numpy array, and handling issues around numpy array shape
Creating GeoSeries geometry from a 2-d numpy using .from_xy
Using Shapely binary predicates to identify geometric relationships, for example .within with a GeoSeries

There are then other interesting predicate functions such as .contains, .crosses and so on.

Use Netlify Actions

Context: #66

Docs: https://github.com/marketplace/actions/netlify-actions

Update instructions to run .ipynb notebooks

Currently the README states:

Then, navigate to the above-mentioned working directory, and open the Jupyter Notebook of any of chapters using a command such as:

jupyter notebook 02-spatial-data.ipynb

But now the .ipynb files are in the ipynb/ directory.

Prevent more errors when data directory is not available

Discovered this when testing out the new Binder instance

Suggested solution, something like this at the beginning of each chapter (we could also check if the ../data directory exists):

https://github.com/geocompr/py/blob/50b2ba4b02be136ce86a23d03f0e02df79199604/02-spatial-data.qmd#L57-L69

Add action to identify failing lines of code

Split-out from #66 and linked to #71.

Error in requirements.txt?

Heads-up @michaeldorman I get the following:

pip install -r data/requirements.txt                     

ERROR: Invalid requirement: 'affine=2.3.0=py_0' (from line 4 of data/requirements.txt)
Hint: = is not a valid operator. Did you mean == ?

Making `landast.tif` avaiable to GitHub actions

I've added few more files to the (local) data directory, one of them (landast.tif) is large and in .gitignore, which makes the build fail. It's used in this example:

Will be happy to hear any ideas how we can make landast.tif available to GitHub actions. Thanks!

Evaluate all relevant lines in c2

E.g.

https://github.com/geocompr/py/blob/c4cea549ff5b17e93857d68072d332a0d34b730a/02-spatial-data.qmd#L90

How to run Python scripts in the code folder non-interactively

For minimal dependency testing. Am thinking of a CI workflow that does not depend on Quarto which is a bit niche and which still has no dedicated quarto package on pypi.

IPython versions of chapters

You can convert from qmd to ipynb with quarto convert test.qmd
Should we do that automatically in the CI?
Plan: have unevaluated ipynb versions of every chapter to be format agnostic and as accessible as possible
May make it easier when making Binder

Experience working with quarto

So far, really, really good!

Input datasets

To go into releases if over 1MB.

Download and unzip data folder if it does not exist

To enable people to run the .ipynb files without fail. Split out from #8 which contained the following comment:

Problem: the data folder doesn't exist in the ipynb or python directories. Proposed solution: download and unzip a folder containing the input data if it doesn't exist, e.g. using solutions mentioned here: https://stackoverflow.com/questions/9419162/download-returned-zip-file-from-url

Add workflow to test PRs

Current the CI only triggers on pushes to the main branch, meaning PRs such as #17 don't get tested.

Add GitPod

After waiting ages for Binder to start up I'm wondering about adding support for GitPod: https://www.gitpod.io/docs/getting-started

Add text around new content for chapter 9

Following recently merged #57 we should add some text to describe the code at the very least. I'm up for taking a first look at this so assigning myself but may ask for input @anisotropi4 so any comments from you or anyone welcome. The vis. chapter is in an early state of development so may be worth adding basic content before this more advances stuff but we can add a link and a brief description at the very least.

Lighten up repo

The zip file alone is now 110 MB 😵‍💫

Good news: we can purge the giant files with https://rtyley.github.io/bfg-repo-cleaner/

I can give this a go... You may need to reclone the (much lighter) repo after the job is done as it 'rewrites history' so heads-up @Nowosad, @anitagraser and @michaeldorman.

No module named topojson issue

That error message appeared after running

conda activate geocompy # the default name of the environment
quarto preview

I guess it's not in the environment.yml file...

Error when the correct environment is not activated

Documenting here as a common issue that people will hit, from the PowerShell command line this time (really don't like using Windows but good to see how ~90% of people will hit these questions of reproducibility):

quarto preview
Preparing to preview
[1/5] 02-spatial-data.qmd

Starting Jupyter kernel...Done

Executing '02-spatial-data.ipynb'
  Cell 1/40...

An error occurred while executing the following cell:
------------------
import geopandas as gpd
------------------

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Input In [1], in <cell line: 1>()

ModuleNotFoundError: No module named 'geopandas'
ModuleNotFoundError: No module named 'geopandas'

There is a environment.yml file in this directory. Is this for a conda env that you need to restore?

Attempted solution:


PS C:\Users\robinadmin\gh\geocompr\py> conda env activate geocompy
conda : The term 'conda' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was 
included, verify that the path is correct and try again.
At line:1 char:1
+ conda env activate geocompy
+ ~~~~~
    + CategoryInfo          : ObjectNotFound: (conda:String) [], CommandNotFoundException
    + FullyQualifiedErrorId : CommandNotFoundException

Domain name

Currently the book is hosted in 2 places:

Directly from GitHub pages: http://geocompr.github.io/py

Random netlify deploy: https://vocal-longma-0d407b.netlify.app/ #68

We can continue with just the former no problem as the default. Advantage: simplicity.

Advantage of custom domain hosted by Netlify:

Netlify deploys do not add anything to the commit history or repo size keeping it clean
Potential advantage in terms of discoverability with custom domain

This is linked to a broader question about where to host all 'geocompr' repos, we could get economies of scale by hosting many of them on the same site:

Geocomputation with R, deployed from gh-pages branch via Netlify to https://geocompr.robinlovelace.net/ default plan is to continue to host it there but am open to moving it at some point, e.g. to geocomp.xyz/r
Associated book website, blog and solutions
Spanish translation
French translation
Japanese translation
... translation in another human language
translation into another computer language e.g. JavaScript or Rust

All this and more could live on a custom domain. That may take some setting up. Options

geocomp.xyz suggested by Jakub

geowith.com suggested by me (over a coffee ☕ ) thinking that r.geowith.com and py.geowith.com, as in geo(computation) with r/py/.../ could be good and then have r.geowith.com/en etc as consistent sites.

This would be a big operation and not to be taken lightly. Simple is good. So the default is to keep it as is for now with the silent Netlify deploys as an experimental backup but thought I'd dump thoughts on this hear while fresh in my head (have discussed broadly with Jakub who suggested a dedicated URL partly as geocompr does not fit so well with the Pythonic nature of this project already but keen to hear what others think).

Remove 'axis subplot' messages in Chapter 9

Learning as I go here, just remembered this tip, thanks @michaeldorman !

Before:

After...

Add sections on reproducing the book on the landing page

One of the most important things for reproducibility and accessibility. Heads-up @anitagraser any suggestions on the local installation side v. welcome

Running the code locally (including python set-up instructions)
Running the code in Docker with IPython notebook (possibly the second most used approach)
Running the code in Docker with VSCode
Running the code in Docker with RStudio
Running the code in Binder

Show graphical output side-by-side in first figure of Chapter 9

Issue, these should be next to each other, not below each other:

GitHub actions error

The GitHub action produces an error as follows:

 fatal: not in a git directory
Error: Process completed with exit code 128.

However, the book is built nevertheless:

Use explore instead of hvplot for geopandas

Hey,

this is a great initiative! Thanks! I was checking the section 2 and noticed that in 2.2.2 you try to get an interactive plot using hvplot. I would suggest going with the built-in explore method as it is already there and you don't need any other external package. So unless you really prefer holoviews or need to do something explore cannot, I believe it is better to point users to the built in interactive plotting. Depending on folium and leaflet instead of hvplot is also much more lightweight.

Workflow to build book from geocompr/geocompr:conda

This should make the build process more resilient and easier to maintain, building on the new conda image: geocompx/docker#27

Pitching Ideas

Good evening,

Following a short exchange with Robin on Twitter I'd like see how I could contribute to the project. Which prompts the question, who is the target audience? and how can I help?

For my part I use Open Data sources to present interactive data visualization as hobby. Examples includes:

Population and census data from the Office of National Statistics (ONS) and National Library of Scotland,
Transport network data from OpenStreetMap (OSM)
Transport locations from the National Passenger Transport Node (NaPTAN)
Planning and operation data from Network Rail OpenDataFeeds

You appear to have covered these already but I typically use shell scripts and python on Linux to extract, clean and wrangle data using python with numpy, pandas, GeoPandas, Networkx, OSMnx and SciPy. I've also being playing with PySAL (Python Spacial Analytics Library) and sklearn (Scikit Learn).

For what it's worth, my day job is bit more prosaic, I am technical lead on projects involving national planning, operations and performance industry data exchange on the British network and have a role in European rail data exchange governance.

Given all this, my initial thought would be to take one of the visualizations, say how to draw an interactive population map here

Or perhaps trunk road and rail transport map here or fizzy-knitting* map of the British Railway such as here.

Does any of this work?

Apologies for the length of the note. Shout if you have any questions with this stuff.

Ta,

Will

Graph showing popularity of languages

Inspired by https://geobgu.xyz/py/ by @michaeldorman

Updated image from StackOverflow:

Points: although Python is more popular overall, for data processing (roughly but not completely) captured by pandas questions, Python-for-data-science and R are more similar. I've thrown a few new ones in there, Rust is super future proof systems language (with low level geo crates in GeoRust), Julia is a new kid on the block with a similar focus as R.

Build fail due to lack of pygeos (or rtree)

Source: https://github.com/geocompr/py/runs/7059648042?check_suite_focus=true#step:4:239

Quarto version issue after running in Docker container

After following these instructions:

https://github.com/geocompr/py/blob/64a9dc742b521b502c42106f8876cd13b793a565/index.qmd#L47-L50

I'm seeing this issue:

Use of an explicit index paradigm

Please would you consider the following approach in the introductory chapters of introducing an example of using an index to filter dataframes so that rather than writing

df[df['A'] == 3]

I would create an variable idx to hold the boolean filter

idx = df['A'] == 3
df[idx]

This has the advantage when the filter becomes more complex, it makes it clearer what the intent is. For example when using the ~ ($not$) operation.

Then then working with multiple filters makes using the & and | (logical and and or) more straightforward.

For example I feel the intent of following examples is somewhat clearer:

idx1 = df['A'] == 3
idx2 = df['B'] == 'Q'

df[~idx1]
df[idx1 & ~idx2]

Rather than

df[~(df['A'] == 3)]
df[(df['A'] == 3) & ~df['B'] == 'Q']

Set-up notes on Windows

Making some notes here from recent installation experience.

Installed Chocolatey
Installed miniconda3
Installed VS Code
Installed the Quarto plugin in VS Code

At that point I tried running some Python code and was asked to select the Python Interpreter:

Turn off netlify commit messages

Fixing #68 led to annoying commit messages that are not currently desirable IMO.

Build error

Shown here: https://github.com/geocompr/py/runs/6642744049?check_suite_focus=true

I'm on the case...

Update appendix

I think this is a bit of an MVP before we put this out there: https://geocompr.github.io/py/a1-starting.html

Reasoning: great that it's got installation instructions but it's for the wrong language (result of copying template from jjallaire)!

Happy to give this a bash but if anyone else has insight/experience into good practice when installing Python for geo work on Linux/Windows/Mac/Docker here's the place to say.

Key links:

The source code of the appendix (nice Quarto tricks in there): https://github.com/geocompr/py/edit/main/a1-starting.qmd
Section on reproducing the book which has become a bit long, I suggest we move some of this into the appendix: https://geocompr.github.io/py/#reproducing-this-book

Run convert action only when commit message contains a certain text string (convert)

As documented here https://stackoverflow.com/questions/70377390/is-there-any-way-to-trigger-a-specific-github-action-workflow-by-commit-message you can make workflows run only when a certain key word is in the commit message.

Thinking: we can use this to control when all the .py and .ipynb files are updated, after every commit is a bit much. Thoughts @anitagraser, @michaeldorman and @Nowosad ?

Switch to conda only build/deploy

Just noticed another CI fail, with this message:

NameError                                 Traceback (most recent call last)
Input In [22], in <cell line: 1>()
----> 1 multipolygon = shapely.geometry.MultiPolygon([Polygon([(1,5), (2,2), (4,1), (4,4), (1,5)]), 
      2                              Polygon([(0,2), (1,2), (1,3), (0,3), (0,2)])])
      3 multipolygon
NameError: name 'Polygon' is not defined
NameError: name 'Polygon' is not defined

Source: https://github.com/geocompr/py/runs/7087792814?check_suite_focus=true#step:4:107

I'm wondering at this point if we should use the conda image in the main.yml file. However I do not understand this latest error message and would be good to understand the cause before acting.

Also thinking we should have CI on Windows, Mac and Linux but can save that for another issue. Thoughts?

Error message related to CRSs in c2

This is my output:

quarto preview # generate live preview of the book

Preparing to preview
[1/8] 02-spatial-data.qmd

Starting Jupyter kernel...Done

Executing '02-spatial-data.ipynb'
  Cell 1/47...Done
  Cell 2/47...Done
  Cell 3/47...Done
  Cell 4/47...

An error occurred while executing the following cell:
------------------
gdf = gpd.read_file("data/world.gpkg")
------------------

---------------------------------------------------------------------------
CRSError                                  Traceback (most recent call last)
Input In [4], in <module>
----> 1 gdf = gpd.read_file("data/world.gpkg")

File /opt/anaconda3/envs/OSMNX/lib/python3.10/site-packages/geopandas/io/file.py:244, in _read_file(filename, bbox, mask, rows, **kwargs)
    239 if kwargs.get("ignore_geometry", False):
    240     return pd.DataFrame(
    241         [record["properties"] for record in f_filt], columns=columns
    242     )
--> 244 return GeoDataFrame.from_features(
    245     f_filt, crs=crs, columns=columns + ["geometry"]
    246 )

File /opt/anaconda3/envs/OSMNX/lib/python3.10/site-packages/geopandas/geodataframe.py:610, in GeoDataFrame.from_features(cls, features, crs, columns)
    608     row.update(feature["properties"])
    609     rows.append(row)
--> 610 return GeoDataFrame(rows, columns=columns, crs=crs)

File /opt/anaconda3/envs/OSMNX/lib/python3.10/site-packages/geopandas/geodataframe.py:126, in GeoDataFrame.__init__(self, data, geometry, crs, *args, **kwargs)
    122     super().__init__(data, *args, **kwargs)
    124 # need to set this before calling self['geometry'], because
    125 # getitem accesses crs
--> 126 self._crs = CRS.from_user_input(crs) if crs else None
    128 # set_geometry ensures the geometry data have the proper dtype,
    129 # but is not called if `geometry=None` ('geometry' column present
    130 # in the data), so therefore need to ensure it here manually
   (...)
    134 
    135 # if gdf passed in and geo_col is set, we use that for geometry
    136 if geometry is None and isinstance(data, GeoDataFrame):

File /opt/anaconda3/envs/OSMNX/lib/python3.10/site-packages/pyproj/crs/crs.py:479, in CRS.from_user_input(cls, value, **kwargs)
    477 if isinstance(value, cls):
    478     return value
--> 479 return cls(value, **kwargs)

File /opt/anaconda3/envs/OSMNX/lib/python3.10/site-packages/pyproj/crs/crs.py:326, in CRS.__init__(self, projparams, **kwargs)
    324     self._local.crs = projparams
    325 else:
--> 326     self._local.crs = _CRS(self.srs)

File pyproj/_crs.pyx:2352, in pyproj._crs._CRS.__init__()

CRSError: Invalid projection: epsg:4326: (Internal Proj Error: proj_create: no database context specified)
CRSError: Invalid projection: epsg:4326: (Internal Proj Error: proj_create: no database context specified)

(geocompy)

Is this a PROJ version issue? Will try in a different environment.

Test interactive plots

Do they work in Quarto. Worth switching to ipynb format?

Resolving use of both environment.yml and requirements.txt files

Currently we have two files defining dependencies:

https://github.com/geocompr/py/blob/main/data/requirements.txt

and

https://github.com/geocompr/py/blob/main/environment.yml

The fact that we have two versions of the deps can cause issues, e.g. #31.

I think we should standardise and, looking at various help files, am thinking that an environment.yml file that points to the pip deps should be fine. Source: https://stackoverflow.com/a/68164027

# environment.yml
name: geocompy
dependencies:
  - python>=3.9
  - miniconda3
  - pip
  - pip:
    - -r file:data/requirements.txt

However, I note these caveats from https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html :

Issues may arise when using pip and conda together. When combining conda and pip, it is best to use an isolated conda environment. Only after conda has been used to install as many packages as possible should pip be used to install any remaining software. If modifications are needed to the environment, it is best to create a new environment rather than running conda after pip. When appropriate, conda and pip requirements should be stored in text files.

Thoughts? Another advantage of standardising this could be to automatically update the Docker images when the requirements get updated here, linking to #6

Replica of data folder in zip

For ease of reproducing results

Set-up Binder

Ideally with Jupyter and RStudio options.

Make GitHub bot (not users) push commits from actions

Context: https://github.community/t/bot-user-and-email-to-push/174459

Suggesting:

git config user.name github-actions
git config user.email [email protected]

Switch to Python 3.11

It seems this will be a major release with lots of performance enhancements: https://docs.python.org/3.11/whatsnew/3.11.html

Problem is, seems that conda will lag behind the official release: conda-forge/conda-forge.github.io#1629

According to the release schedule the full 3.11 release will be published around October 2022 so we don't need to do anything atm, just opening this up here so we're ready and to get feedback on that. From then it seems that Python 3.11 will be supported for the next 5 years so I suggest we pin to that version for stability when it's ready and stable, including on conda. Sound reasonable @anitagraser ?

Chapters not displayed on GitHub pages

Pages other than index.html return error 404 at the moment, not sure why. For example:
https://geocompr.github.io/py/02-spatial-data.html

Build failing

Since removing the landsat and 'air' datasets for #12 the build is failing : ( sorry!

I suggest we solve this by downloading the file if it does not exists, e.g. with:

import subprocess
url = "https://github.com/geocompr/py/releases/download/0.1/landsat.tif"
path = "landsat.tif"
subprocess.run(["wget", "-r", "-nc", "-P", path, url])

How to reproduce the book on Windows?

Just trying this after #22 and hitting this error message:

quarto preview
Preparing to preview
[ 1/10] index.qmd

Starting Jupyter kernel...
ERROR: Error executing 'C:/tools/miniconda3/python.exe': The pipe is being closed. (os error 232)

Anyone got any ideas? I guess I should set the environment but don't know how...

Code blocks not showing with correct run labels

I think this was first mentioned by @michaeldorman and that I've found the cause. Reproducible example:

Solution described here: quarto-dev/quarto-vscode#5

Add tags like this:

```{r}
#| label: rtest

```

geocompx / geocompy Goto Github PK

geocompy's Issues

Recommend Projects

Recommend Topics

Recommend Org