Giter Club home page Giter Club logo

geonode-importer's Introduction

GeoNode OSGeo Project

Table of Contents

What is GeoNode?

GeoNode is a geospatial content management system, a platform for the management and publication of geospatial data. It brings together mature and stable open-source software projects under a consistent and easy-to-use interface allowing non-specialized users to share data and create interactive maps.

Data management tools built into GeoNode allow for integrated creation of data, metadata, and map visualization. Each dataset in the system can be shared publicly or restricted to allow access to only specific users. Social features like user profiles and commenting and rating systems allow for the development of communities around each platform to facilitate the use, management, and quality control of the data the GeoNode instance contains.

It is also designed to be a flexible platform that software developers can extend, modify or integrate against to meet requirements in their own applications.

Try out GeoNode

If you just want to try out GeoNode visit our official Demo online at: https://development.demo.geonode.org. After your registration, you will be able to test all basic functionalities like uploading layers, creation of maps, editing metadata, styles, and much more. To get an overview what GeoNode can do we recommend having a look at the Users Workshop.

Quick Docker Start

  python create-envfile.py

create-envfile.py accepts the following arguments:

  • --https: Enable SSL. It's disabled by default
  • --env_type:
    • When set to prod DEBUG is disabled and the creation of a valid SSL is requested to Letsencrypt's ACME server
    • When set to test DEBUG is disabled and a test SSL certificate is generated for local testing
    • When set to dev DEBUG is enabled and no SSL certificate is generated
  • --hostname: The URL that will serve GeoNode (localhost by default)
  • --email: The administrator's email. Notice that a real email and valid SMPT configurations are required if --env_type is set to prod. Letsencrypt uses email for issuing the SSL certificate
  • --geonodepwd: GeoNode's administrator password. A random value is set if left empty
  • --geoserverpwd: GeoNode's administrator password. A random value is set if left empty
  • --pgpwd: PostgreSQL's administrator password. A random value is set if left empty
  • --dbpwd: GeoNode DB user role's password. A random value is set if left empty
  • --geodbpwd: GeoNode data DB user role's password. A random value is set if left empty
  • --clientid: Client id of Geoserver's GeoNode Oauth2 client. A random value is set if left empty
  • --clientsecret: Client secret of Geoserver's GeoNode Oauth2 client. A random value is set if left empty
  docker compose build
  docker compose up -d

Learn GeoNode

After you´ve finished the setup process make yourself familiar with the general usage and settings of your GeoNodes instance. - the User Training is going in depth into what we can do. - the Administrators Workshop will guide you to the most important parts regarding management commands and configuration settings.

Development

GeoNode is a web-based GIS tool, and as such, in order to do development on GeoNode itself or to integrate it into your own application, you should be familiar with basic web development concepts as well as with general GIS concepts.

For development, GeoNode can be run in a 'development environment'. In contrast to a 'production environment' development differs as it uses lightweight components to speed up things.

To get started visit the Developer workshop for a basic overview.

If you're planning to customize your GeoNode instance or to extend its functionalities it's not advisable to change core files in any case. In this case, it's common to setup a GeoNode Project Template.

Contributing

GeoNode is an open source project and contributors are needed to keep this project moving forward. Learn more on how to contribute on our Community Bylaws.

Roadmap

GeoNode's development roadmap is documented in a series of GeoNode Improvement Projects (GNIPS). They are documented at GeoNode Wiki.

GNIPS are considered to be large undertakings that will add a large number of features to the project. As such they are the topic of community discussion and guidance. The community discusses these on the developer mailing list: http://lists.osgeo.org/pipermail/geonode-devel/

Showcase

A handful of other Open Source projects extend GeoNode’s functionality by tapping into the re-usability of Django applications. Visit our gallery to see how the community uses GeoNode: GeoNode Showcase.

The development community is very supportive of new projects and contributes ideas and guidance for newcomers.

Most useful links

General

Related projects

Support

Licensing

GeoNode is Copyright 2018 Open Source Geospatial Foundation (OSGeo).

GeoNode is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. GeoNode is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with GeoNode. If not, see http://www.gnu.org/licenses.

geonode-importer's People

Contributors

afabiani avatar cesarbenjamindotnet avatar etj avatar giohappy avatar mattiagiupponi avatar pchevali avatar ridoo avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

geonode-importer's Issues

Improve error handling

Improve the actual base error handling of celery.
If the max retries is reached and the task is failing, the execution request and the legacy upload should be always put in fail

GPKG hanlder parallelism

Each handler should be responsible to:

  • decide if some steps should be done synchronously or in parallel.

For example the GPKGFileHandler:

  • the validation can be synchronous since at least we are not loading the whole GPKG in the ram
  • The creation of the dynamic model and the importing can be done in parallel

NOTE: we want to maintain the actual geonode behaviour, so if some layer already exists in the geonode_data database, a suffix should be added (an UUID) to create a new resource

task consistency

To maintain consistency of the uploaded resource is needed that each task should be able to handle the rollback of the data.
Example:

  • if during the publishing something goes wrong, the on_failure function of the base task, should call the orchestrator to roll back all the actions done and his preceding tasks (unpublish the resource, delete the imported data and the dynamic models)

As an example, the task can have a boolean called "rollback". Each handler must define a rollback function and perform the action required to cleanup the state

Save the handler module path into the resource

Saving the module path of the handler that was used to handle the resource.
This is needed because, in the long-term view, the handler will take care of:

  • create the resource
  • update the resource
  • delete the resource
  • copy/clone the resource

Each handler should handle all the aspects of the resource

view to run the importer in async

the view will be injected into the _pages.
The Django view should just take the request coming from the FE and use the data_retriever to clone the GPK in a local environment. All the other actions (like validation) should be handled by the Importer

Improve task progress consistency

The orchestrator will evaluate if the task is finished or not.
To do it, when the task list is finished, the orchestrator (by using the execution_id) will:

  • Raise an error if any task_result that have as a param the execution_id is failed
  • Set as complete if all the task_result are success
  • if any task is still in another status != from success or failed, the progress is ignored and the task is not marked as completed

Upload size and Upload palallelism

At the moment the upload size limit and the upload parallelism are handled by the layer form and by the SizeRestrictedFileField

Those checks should be extracted from the FORMs to let them being used also at the API level

Handle ZIP files

at the moment the new importer is not considering zip files because the handlers take care of the base_file extension and zip doesn't have his own handler (is not needed)

The upload API should take care of the unzipping and then find the right handler

Evaluate the implementation of a raster handler

Raster import could be implemented in way similar to vectors, using a shared volume from which the raster file could be published.

This approach needs careful analysis of the implications, although I don't see blockers.

Meanwhile the current importer-based flow could be wrapped inside a geonode-importer handler. This way we could get rid of the old implementation, both upload and import.

@etj @mattiagiupponi opinions?

Error handling for parallel tasks

Error handling for parallel tasks should be improved
in case of any layer have some issue or during the import/publishing/resource creation must be written inside the output_params or log field of the execution Request object

Create importer orchestrator

The Importer will be an orchestrator that:

  • know which is the first step of the import phase
  • know which are the step that every single import type needs to follow
  • update the execution status of the import on the old Upload and on ExecutionRequest

Note: must be wrapper inside a async celery task

DataManager

Data manager using django_dynamic_models, will create inside the geonode_data table, the layers coming from the geopackage.
As decided, to import the data we have to:

  • data validation
  • define the dynamic_models by reading the geopackage with FIONA or OGR library
  • define the schema of the dynamci_model and create the relative field schema
  • use ogr2ogr to run the import command via the terminal.
    The basic command is the following:
    /usr/bin/ogr2ogr -f PostgreSQL PG:"dbname='gpkg' host='localhost' port=5432 user='geonode' password='geonode'" /home/geosolutions/mattia/PBAP_20200203_test.gpkg

example of dynamci models usage: https://github.com/GeoNode/geonode-contribs/blob/fbb0fae7c905664effb413197123246db1e21da4/sos/geonode_sos/sos_handler.py#L346-L354

Improve ExecutionRequest model

The model ExecutionRequest
https://github.com/GeoNode/geonode/blob/c3739ac74eb7be1add71651365f0ef93afb8e63c/geonode/resource/models.py#L26

Should be improved by adding:

  • STEP: will contain the exact step the async_execution is doing
  • heartbeat: datetime updated by each step to understand if the step is still alive or not
  • extra_information (optional): some log or information for each step

NOTE: To be decided if each step should have his own row with his own details or the step will update the information on a single row

estimate work for GeoJson handler

Steps:

  • create the GeoJson app into the importer/handler
  • dynamically register the app
  • define the task list
  • setup the basic functions required from the BaseHandler
  • create a task to handle the ogr2ogr command to import the GeoJson example: ogr2ogr -f "PostgreSQL" PG:"dbname=geonode_data host=localhost port=5434 password=geonode user=geonode" /mnt/c/Users/user/Desktop/gis_data/stations.geojson -lco DIM=2 (evaluate refactor to re-use the actual one from gpkg)
  • create a task to set up the dynamic fields (evaluate refactor to re-use the actual one from gpkg)
  • handle resource creation/import and publishing

Note:
ogr2ogr library looks like that has some issue getting the geometry column name from a geojson and needs to be evaluated:

driver = ogr.GetDriverByName("GeoJSON")
layers = driver.Open(path)
layer= [x for x in layers][0]
column = layer.GetGeometryColumn()
print(column)
''

table with lengh > 63 chars raise an error

If the table name is > of 63 chars, Postgres raises an issue due to the length

We have to decide how to trim these table names since we append the execution_id to the table

Importer of GPKG

We want to let an external app in geonode being able to import the gpkg.
The document with the description of the approach is available here

cc: @giohappy @etj

Add comment information to the code

Having comments is important in order to have a code easy to read and let the other devs understand easily the code.

Also adding type hints is important

Error not correctly shown error

During the import of a geo package, during the validation process, the error tooltip is too small to correctly show the error:
From the following image, you can see that the error is cut:
image

We expect that the error should be like the following (is visible since the second row)
image

It may be enough to make the tooltip larger than the current one to show correctly the error

Create CSV handler

As you know the new importer misses the CSV handler. We were waiting to implement a solution that should replace the upload steps that we have now, where the lat/lon column can be selected at upload time.
We cannot afford to implement a new UI for the custom selection of columns, so our proposal would be the following:

preconfigure the CSV handler with OGR X_POSSIBLE_NAMES="x,lon*" and Y_POSSIBLE_NAMES="y,lat*" options
accept a companion "*.csvt" file, as supported by the OGR CSV driver
This solution would provide an alternative that's not too expensive and complex to implement, and gives the opportunity to remove the current upload system (at the moment it's still required only for CSV files).

I'm not against the solution based on Tabular Data Resource and VSI.
I think all these options could coexist, letting the handler pick up the best depending on the provided files, with X_POSSIBLE_NAMES and Y_POSSIBLE_NAMES preconfigurations as a fallback.

What's your opinion?

GeoNode/geonode#8714 (comment)

Evaluate effort for create the FE configuration panel

The idea is to have a panel in the front-end that is called BEFORE the import start to let the user choose the configuration needed for the dataset selected.
For example: if the dataset is a CSV, select the geometry columns in the file.

An estimate of the work is needed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.