geonode / geonode-importer Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
the importer needs some specific library (like ogr2ogr) that by default are not available in the geonode-project
As you know the new importer misses the CSV handler. We were waiting to implement a solution that should replace the upload steps that we have now, where the lat/lon column can be selected at upload time.
We cannot afford to implement a new UI for the custom selection of columns, so our proposal would be the following:
preconfigure the CSV handler with OGR X_POSSIBLE_NAMES="x,lon*" and Y_POSSIBLE_NAMES="y,lat*" options
accept a companion "*.csvt" file, as supported by the OGR CSV driver
This solution would provide an alternative that's not too expensive and complex to implement, and gives the opportunity to remove the current upload system (at the moment it's still required only for CSV files).
I'm not against the solution based on Tabular Data Resource and VSI.
I think all these options could coexist, letting the handler pick up the best depending on the provided files, with X_POSSIBLE_NAMES and Y_POSSIBLE_NAMES preconfigurations as a fallback.
What's your opinion?
The error showed to the user must be improved
ATM, GeoNode in his upload processing will take in consideration ALL the import that are in a certain status.
The idea is that GeoNode should ignore the import coming from the new importer flow
Having comments is important in order to have a code easy to read and let the other devs understand easily the code.
Also adding type hints is important
in case of creation of the dynamic model (not the field) and if it fails, the dynamic model should be deleted
To improve the reliability of the execution. The orchestrator should check the state of all the execution tasks to evaluate if the execution is finished or not
The idea is to have a panel in the front-end that is called BEFORE the import start to let the user choose the configuration needed for the dataset selected.
For example: if the dataset is a CSV, select the geometry columns in the file.
An estimate of the work is needed
Ones the data publishing is finised, the resource on GeoNode should be created
3d formats are not correctly handled
Data manager using django_dynamic_models, will create inside the geonode_data table, the layers coming from the geopackage.
As decided, to import the data we have to:
/usr/bin/ogr2ogr -f PostgreSQL PG:"dbname='gpkg' host='localhost' port=5432 user='geonode' password='geonode'" /home/geosolutions/mattia/PBAP_20200203_test.gpkg
example of dynamci models usage: https://github.com/GeoNode/geonode-contribs/blob/fbb0fae7c905664effb413197123246db1e21da4/sos/geonode_sos/sos_handler.py#L346-L354
The default style is missing
user x can only override his own resources
At the moment the upload size limit and the upload parallelism are handled by the layer form
and by the SizeRestrictedFileField
Those checks should be extracted from the FORMs to let them being used also at the API level
resource creation should be moved from the celery into the handler
Error handling for parallel tasks should be improved
in case of any layer have some issue or during the import/publishing/resource creation must be written inside the output_params or log field of the execution Request object
Improve the actual base error handling of celery.
If the max retries is reached and the task is failing, the execution request and the legacy upload should be always put in fail
The model ExecutionRequest
https://github.com/GeoNode/geonode/blob/c3739ac74eb7be1add71651365f0ef93afb8e63c/geonode/resource/models.py#L26
Should be improved by adding:
STEP
: will contain the exact step the async_execution is doingheartbeat
: datetime updated by each step to understand if the step is still alive or notextra_information
(optional): some log or information for each stepNOTE: To be decided if each step should have his own row with his own details or the step will update the information on a single row
the view will be injected into the _pages
.
The Django view should just take the request coming from the FE and use the data_retriever to clone the GPK in a local environment. All the other actions (like validation) should be handled by the Importer
Raster import could be implemented in way similar to vectors, using a shared volume from which the raster file could be published.
This approach needs careful analysis of the implications, although I don't see blockers.
Meanwhile the current importer-based flow could be wrapped inside a geonode-importer handler. This way we could get rid of the old implementation, both upload and import.
@etj @mattiagiupponi opinions?
Saving the module path of the handler that was used to handle the resource.
This is needed because, in the long-term view, the handler will take care of:
Each handler should handle all the aspects of the resource
Steps:
ogr2ogr -f "PostgreSQL" PG:"dbname=geonode_data host=localhost port=5434 password=geonode user=geonode" /mnt/c/Users/user/Desktop/gis_data/stations.geojson -lco DIM=2
(evaluate refactor to re-use the actual one from gpkg)Note:
ogr2ogr library looks like that has some issue getting the geometry column name from a geojson and needs to be evaluated:
driver = ogr.GetDriverByName("GeoJSON")
layers = driver.Open(path)
layer= [x for x in layers][0]
column = layer.GetGeometryColumn()
print(column)
''
This part should be improved and let the dictionary being hookable from external apps:
https://github.com/GeoNode/geonode/blob/25e9314b9843491595ae1b0c26056fa5c83d5733/geonode/upload/utils.py#L183
The desidered approach is the same used for services & metadata_parsers
Override the view of GeoNode instead of make it hookable
During the import of a geo package, during the validation process, the error tooltip is too small to correctly show the error:
From the following image, you can see that the error is cut:
We expect that the error should be like the following (is visible since the second row)
It may be enough to make the tooltip larger than the current one to show correctly the error
Ones the data has been saved inside the database, the layer should be published in geoserver.
Take as an example this to publish the resource on geoserver -> https://github.com/GeoNode/geonode-contribs/blob/fbb0fae7c905664effb413197123246db1e21da4/sos/geonode_sos/sos_handler.py#L382-L405
If the table name is > of 63 chars, Postgres raises an issue due to the length
We have to decide how to trim these table names since we append the execution_id to the table
The Importer will be an orchestrator that:
Upload
and on ExecutionRequest
Note: must be wrapper inside a async celery task
Each handler should be responsible to:
For example the GPKGFileHandler:
NOTE: we want to maintain the actual geonode behaviour, so if some layer already exists in the geonode_data database, a suffix should be added (an UUID) to create a new resource
To maintain consistency of the uploaded resource is needed that each task should be able to handle the rollback of the data.
Example:
on_failure
function of the base task, should call the orchestrator to roll back all the actions done and his preceding tasks (unpublish the resource, delete the imported data and the dynamic models)As an example, the task can have a boolean called "rollback". Each handler must define a rollback function and perform the action required to cleanup the state
at the moment the new importer is not considering zip files because the handlers take care of the base_file extension and zip doesn't have his own handler (is not needed)
The upload API should take care of the unzipping and then find the right handler
The orchestrator will evaluate if the task is finished or not.
To do it, when the task list is finished, the orchestrator (by using the execution_id) will:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.