geonode / geonode-importer Goto Github PK

License: MIT License

Python 99.34% Dockerfile 0.11% Shell 0.55%

geonode-importer's Introduction

Table of Contents

What is GeoNode?

GeoNode is a geospatial content management system, a platform for the management and publication of geospatial data. It brings together mature and stable open-source software projects under a consistent and easy-to-use interface allowing non-specialized users to share data and create interactive maps.

Data management tools built into GeoNode allow for integrated creation of data, metadata, and map visualization. Each dataset in the system can be shared publicly or restricted to allow access to only specific users. Social features like user profiles and commenting and rating systems allow for the development of communities around each platform to facilitate the use, management, and quality control of the data the GeoNode instance contains.

It is also designed to be a flexible platform that software developers can extend, modify or integrate against to meet requirements in their own applications.

Try out GeoNode

If you just want to try out GeoNode visit our official Demo online at: https://development.demo.geonode.org. After your registration, you will be able to test all basic functionalities like uploading layers, creation of maps, editing metadata, styles, and much more. To get an overview what GeoNode can do we recommend having a look at the Users Workshop.

Quick Docker Start

  python create-envfile.py

create-envfile.py accepts the following arguments:

--https: Enable SSL. It's disabled by default
--env_type:
- When set to prod DEBUG is disabled and the creation of a valid SSL is requested to Letsencrypt's ACME server
- When set to test DEBUG is disabled and a test SSL certificate is generated for local testing
- When set to dev DEBUG is enabled and no SSL certificate is generated
--hostname: The URL that will serve GeoNode (localhost by default)
--email: The administrator's email. Notice that a real email and valid SMPT configurations are required if --env_type is set to prod. Letsencrypt uses email for issuing the SSL certificate
--geonodepwd: GeoNode's administrator password. A random value is set if left empty
--geoserverpwd: GeoNode's administrator password. A random value is set if left empty
--pgpwd: PostgreSQL's administrator password. A random value is set if left empty
--dbpwd: GeoNode DB user role's password. A random value is set if left empty
--geodbpwd: GeoNode data DB user role's password. A random value is set if left empty
--clientid: Client id of Geoserver's GeoNode Oauth2 client. A random value is set if left empty
--clientsecret: Client secret of Geoserver's GeoNode Oauth2 client. A random value is set if left empty

  docker compose build
  docker compose up -d

Learn GeoNode

After you´ve finished the setup process make yourself familiar with the general usage and settings of your GeoNodes instance. - the User Training is going in depth into what we can do. - the Administrators Workshop will guide you to the most important parts regarding management commands and configuration settings.

Development

GeoNode is a web-based GIS tool, and as such, in order to do development on GeoNode itself or to integrate it into your own application, you should be familiar with basic web development concepts as well as with general GIS concepts.

For development, GeoNode can be run in a 'development environment'. In contrast to a 'production environment' development differs as it uses lightweight components to speed up things.

To get started visit the Developer workshop for a basic overview.

If you're planning to customize your GeoNode instance or to extend its functionalities it's not advisable to change core files in any case. In this case, it's common to setup a GeoNode Project Template.

Contributing

GeoNode is an open source project and contributors are needed to keep this project moving forward. Learn more on how to contribute on our Community Bylaws.

Roadmap

GeoNode's development roadmap is documented in a series of GeoNode Improvement Projects (GNIPS). They are documented at GeoNode Wiki.

GNIPS are considered to be large undertakings that will add a large number of features to the project. As such they are the topic of community discussion and guidance. The community discusses these on the developer mailing list: http://lists.osgeo.org/pipermail/geonode-devel/

Showcase

A handful of other Open Source projects extend GeoNode’s functionality by tapping into the re-usability of Django applications. Visit our gallery to see how the community uses GeoNode: GeoNode Showcase.

The development community is very supportive of new projects and contributes ideas and guidance for newcomers.

Most useful links

General

Project homepage: https://geonode.org
Repository: https://github.com/GeoNode/geonode
Official Demos: https://demo.geonode.org
GeoNode Wiki: https://github.com/GeoNode/geonode/wiki
Issue tracker: https://github.com/GeoNode/geonode-project/issues

In case of sensitive bugs like security vulnerabilities, please contact a GeoNode Core Developer directly instead of using an issue tracker. We value your effort to improve the security and privacy of this project!

Related projects

GeoNode Project: https://github.com/GeoNode/geonode-project
GeoNode at Docker: https://hub.docker.com/u/geonode
GeoNode OSGeo-Live: https://live.osgeo.org/en/

Support

User Mailing List: https://lists.osgeo.org/cgi-bin/mailman/listinfo/geonode-users
Developer Mailing List: https://lists.osgeo.org/cgi-bin/mailman/listinfo/geonode-devel
Gitter Chat: https://gitter.im/GeoNode/general

Licensing

GeoNode is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. GeoNode is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with GeoNode. If not, see http://www.gnu.org/licenses.

geonode-importer's People

Contributors

Stargazers

Watchers

Forkers

metamediatechnology s-suraphat thuenen-geonode-development cec-tris 52north ricardogsilva hydrata mattiagiupponi cesarbenjamindotnet gobiernodigitalmonterrey anvcc-inecc pchevali geonodeusergroup-de gpetrak

geonode-importer's Issues

Improve error showing for gpkg validation

The error showed to the user must be improved

Make the GeoNode import pages beeing hookable

This part should be improved and let the dictionary being hookable from external apps:
https://github.com/GeoNode/geonode/blob/25e9314b9843491595ae1b0c26056fa5c83d5733/geonode/upload/utils.py#L183

The desidered approach is the same used for services & metadata_parsers

Override the view of GeoNode instead of make it hookable

Geonode should update the dataset info if overwrite is selected

Create the GeoNodeResource

Ones the data publishing is finised, the resource on GeoNode should be created

Improve ExecutionRequest model

The model ExecutionRequest
https://github.com/GeoNode/geonode/blob/c3739ac74eb7be1add71651365f0ef93afb8e63c/geonode/resource/models.py#L26

Should be improved by adding:

STEP: will contain the exact step the async_execution is doing
heartbeat: datetime updated by each step to understand if the step is still alive or not
extra_information (optional): some log or information for each step

NOTE: To be decided if each step should have his own row with his own details or the step will update the information on a single row

Creation of ShapeFile handler

BUG: FIx issue with 3d formats

3d formats are not correctly handled

Fixup upload state for legacy upload status tracking

Publisher should rely on the resource handler

Handle ZIP files

at the moment the new importer is not considering zip files because the handlers take care of the base_file extension and zip doesn't have his own handler (is not needed)

The upload API should take care of the unzipping and then find the right handler

Increate tests coverage

create hookable app for the new importer workflow

table with lengh > 63 chars raise an error

If the table name is > of 63 chars, Postgres raises an issue due to the length

We have to decide how to trim these table names since we append the execution_id to the table

GeoNode should ingest XML and SLD file during the import phase if provided

if XML is provided, it should be applied on all layers
if SLD is provided, it should be applied on all layers

GPKG hanlder parallelism

Each handler should be responsible to:

decide if some steps should be done synchronously or in parallel.

For example the GPKGFileHandler:

the validation can be synchronous since at least we are not loading the whole GPKG in the ram
The creation of the dynamic model and the importing can be done in parallel

NOTE: we want to maintain the actual geonode behaviour, so if some layer already exists in the geonode_data database, a suffix should be added (an UUID) to create a new resource

Block import it the gpkg contains not supported formats

Importer of GPKG

We want to let an external app in geonode being able to import the gpkg.
The document with the description of the approach is available here

cc: @giohappy @etj

DataPublisher

Ones the data has been saved inside the database, the layer should be published in geoserver.
Take as an example this to publish the resource on geoserver -> https://github.com/GeoNode/geonode-contribs/blob/fbb0fae7c905664effb413197123246db1e21da4/sos/geonode_sos/sos_handler.py#L382-L405

Refactor resource creation

resource creation should be moved from the celery into the handler

the dynamic model should be deleted if the creation task fail

in case of creation of the dynamic model (not the field) and if it fails, the dynamic model should be deleted

importer benchmark and stresstests

Evaluate effort for create the FE configuration panel

The idea is to have a panel in the front-end that is called BEFORE the import start to let the user choose the configuration needed for the dataset selected.
For example: if the dataset is a CSV, select the geometry columns in the file.

An estimate of the work is needed

Upload size and Upload palallelism

At the moment the upload size limit and the upload parallelism are handled by the layer form and by the SizeRestrictedFileField

Those checks should be extracted from the FORMs to let them being used also at the API level

task consistency

To maintain consistency of the uploaded resource is needed that each task should be able to handle the rollback of the data.
Example:

if during the publishing something goes wrong, the on_failure function of the base task, should call the orchestrator to roll back all the actions done and his preceding tasks (unpublish the resource, delete the imported data and the dynamic models)

As an example, the task can have a boolean called "rollback". Each handler must define a rollback function and perform the action required to cleanup the state

GPKG define validation steps

Create default style for the layer

The default style is missing

Fix test and increase coverage

Evaluate the implementation of a raster handler

Raster import could be implemented in way similar to vectors, using a shared volume from which the raster file could be published.

This approach needs careful analysis of the implications, although I don't see blockers.

Meanwhile the current importer-based flow could be wrapped inside a geonode-importer handler. This way we could get rid of the old implementation, both upload and import.

@etj @mattiagiupponi opinions?

Create CSV handler

As you know the new importer misses the CSV handler. We were waiting to implement a solution that should replace the upload steps that we have now, where the lat/lon column can be selected at upload time.
We cannot afford to implement a new UI for the custom selection of columns, so our proposal would be the following:

preconfigure the CSV handler with OGR X_POSSIBLE_NAMES="x,lon*" and Y_POSSIBLE_NAMES="y,lat*" options
accept a companion "*.csvt" file, as supported by the OGR CSV driver
This solution would provide an alternative that's not too expensive and complex to implement, and gives the opportunity to remove the current upload system (at the moment it's still required only for CSV files).

I'm not against the solution based on Tabular Data Resource and VSI.
I think all these options could coexist, letting the handler pick up the best depending on the provided files, with X_POSSIBLE_NAMES and Y_POSSIBLE_NAMES preconfigurations as a fallback.

What's your opinion?

GeoNode/geonode#8714 (comment)

Create importer orchestrator

The Importer will be an orchestrator that:

know which is the first step of the import phase
know which are the step that every single import type needs to follow
update the execution status of the import on the old Upload and on ExecutionRequest

Note: must be wrapper inside a async celery task

convert SUPPORTED_TYPES into a registry

https://charlesreid1.github.io/python-patterns-the-registry.html

fix neverending loading and space in filename

if during the first phase an error is raised, the status is not updated to fail
we should be able to handler layers name and filenames with space

Improve geojson error for invalid file

Error handling for parallel tasks

Error handling for parallel tasks should be improved
in case of any layer have some issue or during the import/publishing/resource creation must be written inside the output_params or log field of the execution Request object

Add comment information to the code

Having comments is important in order to have a code easy to read and let the other devs understand easily the code.

Also adding type hints is important

view to run the importer in async

the view will be injected into the _pages.
The Django view should just take the request coming from the FE and use the data_retriever to clone the GPK in a local environment. All the other actions (like validation) should be handled by the Importer

Importer should rely on celery task results

To improve the reliability of the execution. The orchestrator should check the state of all the execution tasks to evaluate if the execution is finished or not

extend resourcebase model to save the handler

Improve override based on user

user x can only override his own resources

Improve task progress consistency

The orchestrator will evaluate if the task is finished or not.
To do it, when the task list is finished, the orchestrator (by using the execution_id) will:

Raise an error if any task_result that have as a param the execution_id is failed
Set as complete if all the task_result are success
if any task is still in another status != from success or failed, the progress is ignored and the task is not marked as completed

estimate work for GeoJson handler

Steps:

create the GeoJson app into the importer/handler
dynamically register the app
define the task list
setup the basic functions required from the BaseHandler
create a task to handle the ogr2ogr command to import the GeoJson example: ogr2ogr -f "PostgreSQL" PG:"dbname=geonode_data host=localhost port=5434 password=geonode user=geonode" /mnt/c/Users/user/Desktop/gis_data/stations.geojson -lco DIM=2 (evaluate refactor to re-use the actual one from gpkg)
create a task to set up the dynamic fields (evaluate refactor to re-use the actual one from gpkg)
handle resource creation/import and publishing

Note:
ogr2ogr library looks like that has some issue getting the geometry column name from a geojson and needs to be evaluated:

driver = ogr.GetDriverByName("GeoJSON")
layers = driver.Open(path)
layer= [x for x in layers][0]
column = layer.GetGeometryColumn()
print(column)
''

Fix up the Dockerfile for accept and run the importer

the importer needs some specific library (like ogr2ogr) that by default are not available in the geonode-project

Exclude new import approach from the GeoNode import flow

ATM, GeoNode in his upload processing will take in consideration ALL the import that are in a certain status.
The idea is that GeoNode should ignore the import coming from the new importer flow

Copy / clone resource should clone also the dynamic model

Error not correctly shown error

During the import of a geo package, during the validation process, the error tooltip is too small to correctly show the error:
From the following image, you can see that the error is cut:

We expect that the error should be like the following (is visible since the second row)

It may be enough to make the tooltip larger than the current one to show correctly the error

Save the handler module path into the resource

Saving the module path of the handler that was used to handle the resource.
This is needed because, in the long-term view, the handler will take care of:

create the resource
update the resource
delete the resource
copy/clone the resource

Each handler should handle all the aspects of the resource

Add tests end2end for copy endpoint

DataManager

Data manager using django_dynamic_models, will create inside the geonode_data table, the layers coming from the geopackage.
As decided, to import the data we have to:

data validation
define the dynamic_models by reading the geopackage with FIONA or OGR library
define the schema of the dynamci_model and create the relative field schema
use ogr2ogr to run the import command via the terminal.
The basic command is the following:
/usr/bin/ogr2ogr -f PostgreSQL PG:"dbname='gpkg' host='localhost' port=5432 user='geonode' password='geonode'" /home/geosolutions/mattia/PBAP_20200203_test.gpkg

example of dynamci models usage: https://github.com/GeoNode/geonode-contribs/blob/fbb0fae7c905664effb413197123246db1e21da4/sos/geonode_sos/sos_handler.py#L346-L354

Improve error handling

Improve the actual base error handling of celery.
If the max retries is reached and the task is failing, the execution request and the legacy upload should be always put in fail