geocompx / geocompy Goto Github PK
View Code? Open in Web Editor NEWGeocomputation with Python: an open source book and online resource for getting started in this space
Home Page: https://py.geocompx.org/
License: Other
Geocomputation with Python: an open source book and online resource for getting started in this space
Home Page: https://py.geocompx.org/
License: Other
The rasterstats
package is missing so at the moment the book does not render on GitHub. I'm not sure how to solve this, I guess that RUN pip3 install rasterstats
needs to be added to https://github.com/geocompr/docker/edit/master/python/Dockerfile ?
Having more or less completed the NaPTAN PR I have been having a look at the other chapters which look very good.
They did pose a question about how to keep consistency with the PR or example:
Pandas
(.read_csv) or memory (.from_dict)numpy
array, and handling issues around numpy
array shapeGeoSeries
geometry from a 2-d numpy
using .from_xy
Shapely
binary predicates to identify geometric relationships, for example .within
with a GeoSeries
There are then other interesting predicate functions such as .contains
, .crosses
and so on.
Context: #66
Docs: https://github.com/marketplace/actions/netlify-actions
Currently the README states:
Then, navigate to the above-mentioned working directory, and open the Jupyter Notebook of any of chapters using a command such as:
jupyter notebook 02-spatial-data.ipynb
But now the .ipynb files are in the ipynb/ directory.
Discovered this when testing out the new Binder instance
Suggested solution, something like this at the beginning of each chapter (we could also check if the ../data
directory exists):
Heads-up @michaeldorman I get the following:
pip install -r data/requirements.txt
ERROR: Invalid requirement: 'affine=2.3.0=py_0' (from line 4 of data/requirements.txt)
Hint: = is not a valid operator. Did you mean == ?
For minimal dependency testing. Am thinking of a CI workflow that does not depend on Quarto which is a bit niche and which still has no dedicated quarto package on pypi.
quarto convert test.qmd
To go into releases if over 1MB.
To enable people to run the .ipynb files without fail. Split out from #8 which contained the following comment:
Problem: the data folder doesn't exist in the ipynb or python directories. Proposed solution: download and unzip a folder containing the input data if it doesn't exist, e.g. using solutions mentioned here: https://stackoverflow.com/questions/9419162/download-returned-zip-file-from-url
Current the CI only triggers on pushes to the main branch, meaning PRs such as #17 don't get tested.
After waiting ages for Binder to start up I'm wondering about adding support for GitPod: https://www.gitpod.io/docs/getting-started
Following recently merged #57 we should add some text to describe the code at the very least. I'm up for taking a first look at this so assigning myself but may ask for input @anisotropi4 so any comments from you or anyone welcome. The vis. chapter is in an early state of development so may be worth adding basic content before this more advances stuff but we can add a link and a brief description at the very least.
The zip file alone is now 110 MB ๐ตโ๐ซ
Good news: we can purge the giant files with https://rtyley.github.io/bfg-repo-cleaner/
I can give this a go... You may need to reclone the (much lighter) repo after the job is done as it 'rewrites history' so heads-up @Nowosad, @anitagraser and @michaeldorman.
That error message appeared after running
conda activate geocompy # the default name of the environment
quarto preview
I guess it's not in the environment.yml file...
Documenting here as a common issue that people will hit, from the PowerShell command line this time (really don't like using Windows but good to see how ~90% of people will hit these questions of reproducibility):
quarto preview
Preparing to preview
[1/5] 02-spatial-data.qmd
Starting Jupyter kernel...Done
Executing '02-spatial-data.ipynb'
Cell 1/40...
An error occurred while executing the following cell:
------------------
import geopandas as gpd
------------------
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Input In [1], in <cell line: 1>()
ModuleNotFoundError: No module named 'geopandas'
ModuleNotFoundError: No module named 'geopandas'
There is a environment.yml file in this directory. Is this for a conda env that you need to restore?
Attempted solution:
PS C:\Users\robinadmin\gh\geocompr\py> conda env activate geocompy
conda : The term 'conda' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was
included, verify that the path is correct and try again.
At line:1 char:1
+ conda env activate geocompy
+ ~~~~~
+ CategoryInfo : ObjectNotFound: (conda:String) [], CommandNotFoundException
+ FullyQualifiedErrorId : CommandNotFoundException
Currently the book is hosted in 2 places:
Directly from GitHub pages: http://geocompr.github.io/py
Random netlify deploy: https://vocal-longma-0d407b.netlify.app/ #68
We can continue with just the former no problem as the default. Advantage: simplicity.
Advantage of custom domain hosted by Netlify:
This is linked to a broader question about where to host all 'geocompr' repos, we could get economies of scale by hosting many of them on the same site:
All this and more could live on a custom domain. That may take some setting up. Options
geocomp.xyz suggested by Jakub
geowith.com suggested by me (over a coffee โ ) thinking that r.geowith.com and py.geowith.com, as in geo(computation) with r/py/.../ could be good and then have r.geowith.com/en etc as consistent sites.
This would be a big operation and not to be taken lightly. Simple is good. So the default is to keep it as is for now with the silent Netlify deploys as an experimental backup but thought I'd dump thoughts on this hear while fresh in my head (have discussed broadly with Jakub who suggested a dedicated URL partly as geocompr does not fit so well with the Pythonic nature of this project already but keen to hear what others think).
One of the most important things for reproducibility and accessibility. Heads-up @anitagraser any suggestions on the local installation side v. welcome
Hey,
this is a great initiative! Thanks! I was checking the section 2 and noticed that in 2.2.2 you try to get an interactive plot using hvplot. I would suggest going with the built-in explore
method as it is already there and you don't need any other external package. So unless you really prefer holoviews or need to do something explore
cannot, I believe it is better to point users to the built in interactive plotting. Depending on folium and leaflet instead of hvplot is also much more lightweight.
This should make the build process more resilient and easier to maintain, building on the new conda image: geocompx/docker#27
Good evening,
Following a short exchange with Robin on Twitter I'd like see how I could contribute to the project. Which prompts the question, who is the target audience? and how can I help?
For my part I use Open Data sources to present interactive data visualization as hobby. Examples includes:
You appear to have covered these already but I typically use shell scripts and python on Linux to extract, clean and wrangle data using python with numpy, pandas, GeoPandas, Networkx, OSMnx and SciPy. I've also being playing with PySAL (Python Spacial Analytics Library) and sklearn (Scikit Learn).
For what it's worth, my day job is bit more prosaic, I am technical lead on projects involving national planning, operations and performance industry data exchange on the British network and have a role in European rail data exchange governance.
Given all this, my initial thought would be to take one of the visualizations, say how to draw an interactive population map here
Or perhaps trunk road and rail transport map here or fizzy-knitting* map of the British Railway such as here.
Does any of this work?
Apologies for the length of the note. Shout if you have any questions with this stuff.
Ta,
Will
Inspired by https://geobgu.xyz/py/ by @michaeldorman
Updated image from StackOverflow:
Points: although Python is more popular overall, for data processing (roughly but not completely) captured by pandas questions, Python-for-data-science and R are more similar. I've thrown a few new ones in there, Rust is super future proof systems language (with low level geo crates in GeoRust), Julia is a new kid on the block with a similar focus as R.
After following these instructions:
https://github.com/geocompr/py/blob/64a9dc742b521b502c42106f8876cd13b793a565/index.qmd#L47-L50
I'm seeing this issue:
Please would you consider the following approach in the introductory chapters of introducing an example of using an index to filter dataframes so that rather than writing
df[df['A'] == 3]
I would create an variable idx
to hold the boolean filter
idx = df['A'] == 3
df[idx]
This has the advantage when the filter becomes more complex, it makes it clearer what the intent is. For example when using the ~
(
Then then working with multiple filters makes using the &
and |
(logical and and or) more straightforward.
For example I feel the intent of following examples is somewhat clearer:
idx1 = df['A'] == 3
idx2 = df['B'] == 'Q'
df[~idx1]
df[idx1 & ~idx2]
Rather than
df[~(df['A'] == 3)]
df[(df['A'] == 3) & ~df['B'] == 'Q']
Fixing #68 led to annoying commit messages that are not currently desirable IMO.
Shown here: https://github.com/geocompr/py/runs/6642744049?check_suite_focus=true
I'm on the case...
I think this is a bit of an MVP before we put this out there: https://geocompr.github.io/py/a1-starting.html
Reasoning: great that it's got installation instructions but it's for the wrong language (result of copying template from jjallaire)!
Happy to give this a bash but if anyone else has insight/experience into good practice when installing Python for geo work on Linux/Windows/Mac/Docker here's the place to say.
Key links:
As documented here https://stackoverflow.com/questions/70377390/is-there-any-way-to-trigger-a-specific-github-action-workflow-by-commit-message you can make workflows run only when a certain key word is in the commit message.
Thinking: we can use this to control when all the .py and .ipynb files are updated, after every commit is a bit much. Thoughts @anitagraser, @michaeldorman and @Nowosad ?
Just noticed another CI fail, with this message:
NameError Traceback (most recent call last)
Input In [22], in <cell line: 1>()
----> 1 multipolygon = shapely.geometry.MultiPolygon([Polygon([(1,5), (2,2), (4,1), (4,4), (1,5)]),
2 Polygon([(0,2), (1,2), (1,3), (0,3), (0,2)])])
3 multipolygon
NameError: name 'Polygon' is not defined
NameError: name 'Polygon' is not defined
Source: https://github.com/geocompr/py/runs/7087792814?check_suite_focus=true#step:4:107
I'm wondering at this point if we should use the conda image in the main.yml file. However I do not understand this latest error message and would be good to understand the cause before acting.
Also thinking we should have CI on Windows, Mac and Linux but can save that for another issue. Thoughts?
This is my output:
quarto preview # generate live preview of the book
Preparing to preview
[1/8] 02-spatial-data.qmd
Starting Jupyter kernel...Done
Executing '02-spatial-data.ipynb'
Cell 1/47...Done
Cell 2/47...Done
Cell 3/47...Done
Cell 4/47...
An error occurred while executing the following cell:
------------------
gdf = gpd.read_file("data/world.gpkg")
------------------
---------------------------------------------------------------------------
CRSError Traceback (most recent call last)
Input In [4], in <module>
----> 1 gdf = gpd.read_file("data/world.gpkg")
File /opt/anaconda3/envs/OSMNX/lib/python3.10/site-packages/geopandas/io/file.py:244, in _read_file(filename, bbox, mask, rows, **kwargs)
239 if kwargs.get("ignore_geometry", False):
240 return pd.DataFrame(
241 [record["properties"] for record in f_filt], columns=columns
242 )
--> 244 return GeoDataFrame.from_features(
245 f_filt, crs=crs, columns=columns + ["geometry"]
246 )
File /opt/anaconda3/envs/OSMNX/lib/python3.10/site-packages/geopandas/geodataframe.py:610, in GeoDataFrame.from_features(cls, features, crs, columns)
608 row.update(feature["properties"])
609 rows.append(row)
--> 610 return GeoDataFrame(rows, columns=columns, crs=crs)
File /opt/anaconda3/envs/OSMNX/lib/python3.10/site-packages/geopandas/geodataframe.py:126, in GeoDataFrame.__init__(self, data, geometry, crs, *args, **kwargs)
122 super().__init__(data, *args, **kwargs)
124 # need to set this before calling self['geometry'], because
125 # getitem accesses crs
--> 126 self._crs = CRS.from_user_input(crs) if crs else None
128 # set_geometry ensures the geometry data have the proper dtype,
129 # but is not called if `geometry=None` ('geometry' column present
130 # in the data), so therefore need to ensure it here manually
(...)
134
135 # if gdf passed in and geo_col is set, we use that for geometry
136 if geometry is None and isinstance(data, GeoDataFrame):
File /opt/anaconda3/envs/OSMNX/lib/python3.10/site-packages/pyproj/crs/crs.py:479, in CRS.from_user_input(cls, value, **kwargs)
477 if isinstance(value, cls):
478 return value
--> 479 return cls(value, **kwargs)
File /opt/anaconda3/envs/OSMNX/lib/python3.10/site-packages/pyproj/crs/crs.py:326, in CRS.__init__(self, projparams, **kwargs)
324 self._local.crs = projparams
325 else:
--> 326 self._local.crs = _CRS(self.srs)
File pyproj/_crs.pyx:2352, in pyproj._crs._CRS.__init__()
CRSError: Invalid projection: epsg:4326: (Internal Proj Error: proj_create: no database context specified)
CRSError: Invalid projection: epsg:4326: (Internal Proj Error: proj_create: no database context specified)
(geocompy)
Is this a PROJ version issue? Will try in a different environment.
Do they work in Quarto. Worth switching to ipynb format?
Currently we have two files defining dependencies:
https://github.com/geocompr/py/blob/main/data/requirements.txt
and
https://github.com/geocompr/py/blob/main/environment.yml
The fact that we have two versions of the deps can cause issues, e.g. #31.
I think we should standardise and, looking at various help files, am thinking that an environment.yml file that points to the pip deps should be fine. Source: https://stackoverflow.com/a/68164027
# environment.yml
name: geocompy
dependencies:
- python>=3.9
- miniconda3
- pip
- pip:
- -r file:data/requirements.txt
However, I note these caveats from https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html :
Issues may arise when using pip and conda together. When combining conda and pip, it is best to use an isolated conda environment. Only after conda has been used to install as many packages as possible should pip be used to install any remaining software. If modifications are needed to the environment, it is best to create a new environment rather than running conda after pip. When appropriate, conda and pip requirements should be stored in text files.
Thoughts? Another advantage of standardising this could be to automatically update the Docker images when the requirements get updated here, linking to #6
For ease of reproducing results
Ideally with Jupyter and RStudio options.
Context: https://github.community/t/bot-user-and-email-to-push/174459
Suggesting:
git config user.name github-actions
git config user.email [email protected]
It seems this will be a major release with lots of performance enhancements: https://docs.python.org/3.11/whatsnew/3.11.html
Problem is, seems that conda
will lag behind the official release: conda-forge/conda-forge.github.io#1629
According to the release schedule the full 3.11 release will be published around October 2022 so we don't need to do anything atm, just opening this up here so we're ready and to get feedback on that. From then it seems that Python 3.11 will be supported for the next 5 years so I suggest we pin to that version for stability when it's ready and stable, including on conda
. Sound reasonable @anitagraser ?
Pages other than index.html
return error 404 at the moment, not sure why. For example:
https://geocompr.github.io/py/02-spatial-data.html
Since removing the landsat and 'air' datasets for #12 the build is failing : ( sorry!
I suggest we solve this by downloading the file if it does not exists, e.g. with:
import subprocess
url = "https://github.com/geocompr/py/releases/download/0.1/landsat.tif"
path = "landsat.tif"
subprocess.run(["wget", "-r", "-nc", "-P", path, url])
Just trying this after #22 and hitting this error message:
quarto preview
Preparing to preview
[ 1/10] index.qmd
Starting Jupyter kernel...
ERROR: Error executing 'C:/tools/miniconda3/python.exe': The pipe is being closed. (os error 232)
Anyone got any ideas? I guess I should set the environment but don't know how...
I think this was first mentioned by @michaeldorman and that I've found the cause. Reproducible example:
Solution described here: quarto-dev/quarto-vscode#5
Add tags like this:
```{r}
#| label: rtest
```
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.