Giter Club home page Giter Club logo

Comments (5)

mattiagiupponi avatar mattiagiupponi commented on August 17, 2024

@giohappy @etj
I made some tests on the performance using locust. It made 3 parallel requests every 2 seconds in order to evaluate how the import behaves.

The code for the locust script is this:

from locust import HttpUser, task, between
import requests

class QuickstartUser(HttpUser):
    wait_time = between(1, 2)

    @task
    def test_post(self):
        url = "http://localhost:8000/api/v2/uploads/upload"

        payload={}
        files=[
        ('base_file',('lombardia2.gpkg',open('/mnt/c/Users/user/Desktop/gis_data/gpkg/valid/lombardia2.gpkg','rb'),'application/octet-stream'))
        ]
        headers = {
        'Authorization': 'Basic YWRtaW46YWRtaW4=',
        'Cookie': 'csrftoken=Uu1YGJ5b97So3LQQytzzh5FZtSfrjI5WMp2cirj0TjK1NFELGRB1jJpsm2kM0cQS; session=59d1ee62-122d-4bbe-9ae8-7b6ac66ddbcd.T88X_9UJ7L_bkYpAtJQ-o80kO_E; sessionid=1wtrer2rnf9w7qf2t3753u7u1rsjec9w'
        }
        self.client.post(url, headers=headers, data=payload, files=files)

Results:
image

The import behaves as expected, the file is correctly uploaded and the gpkg is correctly passed by the first import phase.

as expected, once the parallel limit is reached, the BE raises an exception:

[2022-07-18 09:20:03,133: ERROR/ForkPoolWorker-6] Task importer.import_resource[638c63c7-2f9a-4ee5-adcf-d710867acea6] raised unexpected: InvalidInputFileException()
Traceback (most recent call last):
  File "/opt/geosolutions/geonode-importer/importer/celery_tasks.py", line 140, in import_resource
    if not _datastore.input_is_valid():
  File "/opt/geosolutions/geonode-importer/importer/datastore.py", line 20, in input_is_valid
    return self.handler.is_valid(self.files, self.user)
  File "/opt/geosolutions/geonode-importer/importer/handlers/gpkg/handler.py", line 72, in is_valid
    upload_validator.validate_parallelism_limit_per_user()
  File "/opt/core/geonode/geonode/upload/utils.py", line 897, in validate_parallelism_limit_per_user
    raise UploadParallelismLimitException(_(
geonode.upload.api.exceptions.UploadParallelismLimitException: The number of active parallel uploads exceeds 200. Wait for the pending ones to finish.

And the FE shows the execution in failed status:

image

The creation of the dynamic models and the import of the gpkg is performing as expected and done in parallel (see the ack rate):

image

The status of the upload is correctly managed by checking the execution_id for the whole task:

[2022-07-18 09:30:31,164: INFO/ForkPoolWorker-8] Execution with ID 93ee3946-9969-4531-8943-c3fc09887af2 is completed. All tasks are done
[2022-07-18 09:30:33,195: INFO/ForkPoolWorker-7] Execution with ID 0cbe8e64-0897-4dab-a499-20db9d7d20c7 is completed. All tasks are done
[2022-07-18 09:30:35,230: INFO/ForkPoolWorker-8] Execution with ID 520f8b1e-5101-4402-9005-8bb4340af90d is completed. All tasks are done
[2022-07-18 09:30:35,230: INFO/ForkPoolWorker-7] Execution with ID 496ff6f5-0f94-4e07-ac26-93df65a6a28b is completed. All tasks are done
[2022-07-18 09:30:45,367: INFO/ForkPoolWorker-8] 
[2022-07-18 09:30:45,367: INFO/ForkPoolWorker-8] 1 batches, 0 sent
[2022-07-18 09:30:45,367: INFO/ForkPoolWorker-8] done in 0.02 seconds
Fixup GIS Backend Security Rules Accordingly on resource geonode:alpeggi_d9a8541db1173b47a2e7e0a41642b4b2 True
[2022-07-18 09:30:45,672: ERROR/ForkPoolWorker-7] Fixup GIS Backend Security Rules Accordingly on resource geonode:alpeggi_d9a8541db1173b47a2e7e0a41642b4b2 True
[2022-07-18 09:30:50,960: INFO/ForkPoolWorker-8] Execution with ID 520f8b1e-5101-4402-9005-8bb4340af90d is completed. All tasks are done

The only bottleneck at this moment is represented by the limit to the parallel request to geoserver for publishing the resource (on my local is cap to 5) and the parallel request that geonode do on the DB (still cap at 5).
Of course, this cap can be increased, but for local testing is the max I can set to let it works without any errors.

Any thoughts?

from geonode-importer.

giohappy avatar giohappy commented on August 17, 2024

@mattiagiupponi I would try to make some more tests with exceptional conditions like:

  • DB connection reset / closed during the executions
  • Geoserver not responding
  • Celery not elaborating queues (it happened recently due to Rabbit gone crazy on development demo)

from geonode-importer.

mattiagiupponi avatar mattiagiupponi commented on August 17, 2024

At the moment it takes into consideration just 1 tentative (since we were talking about having a rollback per phase), we can discuss it and how to handle the errors, btw I can reply to those points without trying

  • DB connection reset / closed during the executions

As soon as in the first stage (creation of the dynamic model, import of the ogr2ogr resource, or creation of the fields) in case of any error (db reset included) all the data are rollbacked (dynamic model/fields and table are deleted)

otherwise is just making the execution in fail state

  • Geoserver not responding

as I said only one tentative is done, if during the publishing something happens the execution is set into a FAILED state.
If, is during the geonode phase (since now the resource manager is still performing interaction with GS) the error is ignored and the resource is created

  • Celery not elaborating queues (it happened recently due to Rabbit going crazy on the development demo)

This could be quite challenging... I need some time to think about this. What happens to the rabbitmq? Does the service stop work?

from geonode-importer.

giohappy avatar giohappy commented on August 17, 2024

as I said only one tentative is done, if during the publishing something happens the execution is set into a FAILED state.

Only one tentative? Aren't we retrying here?

If, is during the geonode phase (since now the resource manager is still performing interaction with GS) the error is ignored and the resource is created

Why can't we capture errors happening during the geonode phase? Even if they aren't under the explicit control of the orchestrator I guess they can be handled, no? Are they trapped by internal code?

from geonode-importer.

mattiagiupponi avatar mattiagiupponi commented on August 17, 2024

Only one tentative? Aren't we retrying here?

Not now, but I'm going to add the retry on GeoServer publishing, it should be quite easy

Why can't we capture errors happening during the geonode phase? Even if they aren't under the explicit control of the orchestrator I guess they can be handled, no? Are they trapped by internal code?

Yes, internally geonode handle it so is not raised from a task point of view

from geonode-importer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.