geonode / geonode-importer Goto Github PK

View Code? Open in Web Editor NEW

2.0 21.0 14.0 645 KB

License: MIT License

Python 99.23% Dockerfile 0.11% Shell 0.66%

geojson geonode gpkg importer

geonode-importer's Issues

Fix up the Dockerfile for accept and run the importer

the importer needs some specific library (like ogr2ogr) that by default are not available in the geonode-project

Create CSV handler

As you know the new importer misses the CSV handler. We were waiting to implement a solution that should replace the upload steps that we have now, where the lat/lon column can be selected at upload time.
We cannot afford to implement a new UI for the custom selection of columns, so our proposal would be the following:

preconfigure the CSV handler with OGR X_POSSIBLE_NAMES="x,lon*" and Y_POSSIBLE_NAMES="y,lat*" options
accept a companion "*.csvt" file, as supported by the OGR CSV driver
This solution would provide an alternative that's not too expensive and complex to implement, and gives the opportunity to remove the current upload system (at the moment it's still required only for CSV files).

I'm not against the solution based on Tabular Data Resource and VSI.
I think all these options could coexist, letting the handler pick up the best depending on the provided files, with X_POSSIBLE_NAMES and Y_POSSIBLE_NAMES preconfigurations as a fallback.

What's your opinion?

GeoNode/geonode#8714 (comment)

Improve error showing for gpkg validation

The error showed to the user must be improved

importer benchmark and stresstests

Exclude new import approach from the GeoNode import flow

ATM, GeoNode in his upload processing will take in consideration ALL the import that are in a certain status.
The idea is that GeoNode should ignore the import coming from the new importer flow

Add comment information to the code

Having comments is important in order to have a code easy to read and let the other devs understand easily the code.

Also adding type hints is important

Publisher should rely on the resource handler

the dynamic model should be deleted if the creation task fail

in case of creation of the dynamic model (not the field) and if it fails, the dynamic model should be deleted

Add tests end2end for copy endpoint

Importer should rely on celery task results

To improve the reliability of the execution. The orchestrator should check the state of all the execution tasks to evaluate if the execution is finished or not

Evaluate effort for create the FE configuration panel

The idea is to have a panel in the front-end that is called BEFORE the import start to let the user choose the configuration needed for the dataset selected.
For example: if the dataset is a CSV, select the geometry columns in the file.

An estimate of the work is needed

Exception is not raised if ogr2ogr is missing

Create the GeoNodeResource

Ones the data publishing is finised, the resource on GeoNode should be created

BUG: FIx issue with 3d formats

3d formats are not correctly handled

GeoNode should ingest XML and SLD file during the import phase if provided

if XML is provided, it should be applied on all layers
if SLD is provided, it should be applied on all layers

DataManager

Data manager using django_dynamic_models, will create inside the geonode_data table, the layers coming from the geopackage.
As decided, to import the data we have to:

data validation
define the dynamic_models by reading the geopackage with FIONA or OGR library
define the schema of the dynamci_model and create the relative field schema
use ogr2ogr to run the import command via the terminal.
The basic command is the following:
/usr/bin/ogr2ogr -f PostgreSQL PG:"dbname='gpkg' host='localhost' port=5432 user='geonode' password='geonode'" /home/geosolutions/mattia/PBAP_20200203_test.gpkg

example of dynamci models usage: https://github.com/GeoNode/geonode-contribs/blob/fbb0fae7c905664effb413197123246db1e21da4/sos/geonode_sos/sos_handler.py#L346-L354

Create default style for the layer

The default style is missing

Improve override based on user

user x can only override his own resources

Upload size and Upload palallelism

At the moment the upload size limit and the upload parallelism are handled by the layer form and by the SizeRestrictedFileField

Those checks should be extracted from the FORMs to let them being used also at the API level

Refactor resource creation

resource creation should be moved from the celery into the handler

create hookable app for the new importer workflow

Error handling for parallel tasks

Error handling for parallel tasks should be improved
in case of any layer have some issue or during the import/publishing/resource creation must be written inside the output_params or log field of the execution Request object

Fix test and increase coverage

Improve error handling

Improve the actual base error handling of celery.
If the max retries is reached and the task is failing, the execution request and the legacy upload should be always put in fail

Improve ExecutionRequest model

The model ExecutionRequest
https://github.com/GeoNode/geonode/blob/c3739ac74eb7be1add71651365f0ef93afb8e63c/geonode/resource/models.py#L26

Should be improved by adding:

STEP: will contain the exact step the async_execution is doing
heartbeat: datetime updated by each step to understand if the step is still alive or not
extra_information (optional): some log or information for each step

NOTE: To be decided if each step should have his own row with his own details or the step will update the information on a single row

view to run the importer in async

the view will be injected into the _pages.
The Django view should just take the request coming from the FE and use the data_retriever to clone the GPK in a local environment. All the other actions (like validation) should be handled by the Importer

Block import it the gpkg contains not supported formats

Importer of GPKG

We want to let an external app in geonode being able to import the gpkg.
The document with the description of the approach is available here

cc: @giohappy @etj

Evaluate the implementation of a raster handler

Raster import could be implemented in way similar to vectors, using a shared volume from which the raster file could be published.

This approach needs careful analysis of the implications, although I don't see blockers.

Meanwhile the current importer-based flow could be wrapped inside a geonode-importer handler. This way we could get rid of the old implementation, both upload and import.

@etj @mattiagiupponi opinions?

Save the handler module path into the resource

Saving the module path of the handler that was used to handle the resource.
This is needed because, in the long-term view, the handler will take care of:

create the resource
update the resource
delete the resource
copy/clone the resource

Each handler should handle all the aspects of the resource

GPKG define validation steps

Increate tests coverage

estimate work for GeoJson handler

Steps:

create the GeoJson app into the importer/handler
dynamically register the app
define the task list
setup the basic functions required from the BaseHandler
create a task to handle the ogr2ogr command to import the GeoJson example: ogr2ogr -f "PostgreSQL" PG:"dbname=geonode_data host=localhost port=5434 password=geonode user=geonode" /mnt/c/Users/user/Desktop/gis_data/stations.geojson -lco DIM=2 (evaluate refactor to re-use the actual one from gpkg)
create a task to set up the dynamic fields (evaluate refactor to re-use the actual one from gpkg)
handle resource creation/import and publishing

Note:
ogr2ogr library looks like that has some issue getting the geometry column name from a geojson and needs to be evaluated:

driver = ogr.GetDriverByName("GeoJSON")
layers = driver.Open(path)
layer= [x for x in layers][0]
column = layer.GetGeometryColumn()
print(column)
''

Make the GeoNode import pages beeing hookable

This part should be improved and let the dictionary being hookable from external apps:
https://github.com/GeoNode/geonode/blob/25e9314b9843491595ae1b0c26056fa5c83d5733/geonode/upload/utils.py#L183

The desidered approach is the same used for services & metadata_parsers

Override the view of GeoNode instead of make it hookable

Error not correctly shown error

During the import of a geo package, during the validation process, the error tooltip is too small to correctly show the error:
From the following image, you can see that the error is cut:

We expect that the error should be like the following (is visible since the second row)

It may be enough to make the tooltip larger than the current one to show correctly the error

DataPublisher

Ones the data has been saved inside the database, the layer should be published in geoserver.
Take as an example this to publish the resource on geoserver -> https://github.com/GeoNode/geonode-contribs/blob/fbb0fae7c905664effb413197123246db1e21da4/sos/geonode_sos/sos_handler.py#L382-L405

convert SUPPORTED_TYPES into a registry

https://charlesreid1.github.io/python-patterns-the-registry.html

Fixup upload state for legacy upload status tracking

Creation of ShapeFile handler

extend resourcebase model to save the handler

table with lengh > 63 chars raise an error

If the table name is > of 63 chars, Postgres raises an issue due to the length

We have to decide how to trim these table names since we append the execution_id to the table

Create importer orchestrator

The Importer will be an orchestrator that:

know which is the first step of the import phase
know which are the step that every single import type needs to follow
update the execution status of the import on the old Upload and on ExecutionRequest

Note: must be wrapper inside a async celery task

GPKG hanlder parallelism

Each handler should be responsible to:

decide if some steps should be done synchronously or in parallel.

For example the GPKGFileHandler:

the validation can be synchronous since at least we are not loading the whole GPKG in the ram
The creation of the dynamic model and the importing can be done in parallel

NOTE: we want to maintain the actual geonode behaviour, so if some layer already exists in the geonode_data database, a suffix should be added (an UUID) to create a new resource

Copy / clone resource should clone also the dynamic model

task consistency

To maintain consistency of the uploaded resource is needed that each task should be able to handle the rollback of the data.
Example:

if during the publishing something goes wrong, the on_failure function of the base task, should call the orchestrator to roll back all the actions done and his preceding tasks (unpublish the resource, delete the imported data and the dynamic models)

As an example, the task can have a boolean called "rollback". Each handler must define a rollback function and perform the action required to cleanup the state

Improve geojson error for invalid file

fix neverending loading and space in filename

if during the first phase an error is raised, the status is not updated to fail
we should be able to handler layers name and filenames with space

Raise an error if any task_result that have as a param the execution_id is failed
Set as complete if all the task_result are success
if any task is still in another status != from success or failed, the progress is ignored and the task is not marked as completed

geonode / geonode-importer Goto Github PK

geonode-importer's Issues

Recommend Projects

Recommend Topics

Recommend Org