Giter Club home page Giter Club logo

clusterodm's Issues

aws s3 bucker creation+autoscaling

Hello,
I have been testing cluster odm on local machines+some aws instances. Since clusterodm supports autoscaling/creating and destroying nodes on demand, I would like to know a bit more about the process of setting it up. The description provided in https://github.com/OpenDroneMap/ClusterODM/blob/master/docs/aws.md is not very clear for aws newbie like me.
So from here what I understand is:

  1. Given that I have aws account, I need to create an s3 bucket with unblocked access.
  2. Then I guess I need to deploy clusterodm in an instance in aws?
  3. Also, I did not quite understand what this means.

Select an AMI (machine image) to run - Ubuntu has a handy AMI finder.

Split-merge Integration

At some point we should add the ability to distribute large datasets over multiple nodes for parallel processing.

Did not run cleanup even task finished when i use this by aws auto-scaling

hello, geniuses,

did not run cleanup even task finished when I use this by AWS Auto-scaling.

when I used it last year, the instances were automatically shut down after the work was finished. I'm using it for the first time this year, and the instances aren't shutting down.

I use ClusterODM by docker-compose, and nodeodm version is "2.4.10" for compatibility with my webodm 1.8.1.
(it looks like something problem also within webodm 1.8.1 and nodeodm 2.5.0. that is not success even all process looks like done.)

화면 캡처 2021-05-26 161800

**all task was finished.**

화면 캡처 2021-05-26 161639

**still online even spend many times...**
version: '2.1'
services:
  nodeodm-1:
    image: opendronemap/nodeodm:2.4.10
    container_name: nodeodm-1
    ports:
      - "3000"
    restart: on-failure:10
    oom_score_adj: 500
    entrypoint: /usr/bin/node /var/www/index.js --max_images 1 --max_concurrency 1 --max_runtime 0 -q 0
  clusterodm:
    image: opendronemap/clusterodm
    container_name: clusterodm
    ports:
      - "80:3000"
      - "8082:8080"
      - "10000:10000"
    volumes:
      - ./docker/data:/var/www/data
      - ./config.json:/var/www/config-default.json  
      - ./asr-configuration.json:/var/www/configuration.json
    restart: on-failure:10
    depends_on:
      - nodeodm-1

ClusterODM is dropping a high number of uploads

What is the problem?

Since yesterday my Webodm is constantly failing all tasks after i restarted it. I noticed it pulled a newer image from DockerHub and there are no previous versions available on DockerHub.
After investigating i noticed that ClusterODM is closing a lot of POST http requests on the routes /task/new/upload/<task_id>
The error message displayed in Webodm is sometimes Connection error: HTTPSConnectionPool(host='example.com', port=443): Read timed out. (read timeout=30) and some other times just 502.

Even the smallest jobs are failing, i had this issue with a dataset only containing 5 images.

On the web interface of ClusterODM, i can still launch a task, but during the uploads, i get a lot of messages saying Upload of IMG_NAME.jpg failed, retrying...

After seeing this i made a clean install of my entire stack (Webodm webapp & worker, ClusterODM and one locked NodeODM for the autoscaler) on a totally different infrastructure and had the same exact problem.

What should be the expected behavior?

Uploading the files on WebODM or ClusterODM UI should work

How can we reproduce this? (What steps did you do to trigger the problem? If applicable, please include multiple screenshots of the problem! Be detailed)

Install WebODM and ClusterODM and try to upload files to launch a task.
My current installation is on a Kubernetes cluster hosted on scaleway. I can provide the manifests i'm using if needed.
WebODM version: 1.9.11
ClusterODM version: latest on Dockerhub

[Feature Request]: `docker-machine` is deprecated, we need to add support for another tool for autoscaling

docker-machine is being deprecated and starts to be unavailable from repositories. It's also not compatible with the latest docker version. See docker/machine#4537 and docker/roadmap#245 .

One alternative would be to move towards Terraform. There is a provider for Scaleway, so we wouldn't lose functionality. Is there any interest in moving towards this solution?

What is the problem?

docker-machine being deprecated.

What should be the expected behavior?

Not losing the autoscale!

How can we reproduce this? (What steps did you do to trigger the problem? If applicable, please include multiple screenshots of the problem! Be detailed)

Try installing docker-machine on a recent OS.

Setting up ClusterODM setup using WebODM

Hey Everyone,
I am struggling to make ClusterODM work with web odm. I have setup 3 nodes running NodeODM. And setup ClusterODM in a server spec machine (no apparent reason why i choose a server machine for this task). As the guide says, i connect all the node to ClusterODM using telnet, and I can see the nodes are online. Now how do I implement this cluster setup from WebODM? As per the documentation in https://docs.opendronemap.org/large.html#distributed-split-merge, in CLI odm, you need to add --sm-cluster http://cluster_odm_ip:3001.

My question again, is how do I trigger this action in webODM? In the custom settings option in webODM, there is a field to enter the URL of clusterODM, but I havent had any luck with that. What I need to know, is at what process stage, does odm create sub-models and distribute them to different nodes to process?

Setup :
WebODM (SYSTEM (1))-->ClusterODM(SYSTEM (3))+--------NodeODM
+--------NodeODM
+--------NodeODM

System specifications:
Following are the system specifications.

Machine running WebODM: (1)
CPU: Ryzen 7 3600
RAM: 16gb DDR4 2133
500GB SSD and 1TB HDD

node machine specs: (2)
CPU: Ryzen 7 3600
RAM: 16gb DDR4 2133
500GB SSD and 1TB HDD

System running ClusterODM: (A server Machine) (3)
CPU: Intel Xeon Silver 4208 2.10 Ghz
RAM: 32GB
HDD: 198GB

Add support for autoscaling queue

Optional support for an autoscaling queue could allow users to queue tasks even if they hit account limit restrictions and provide a better flow.

Admin commands to handle tasks

Would be nice to have a way to send manual API commands for certain tasks (for example, to query task status, restart tasks, cancel tasks, etc.)

Add Azure support for auto-scaling

What is the problem?

No option to integrate Azure components for storage & compute.

What should be the expected behavior?

One should be able to use Blob Storage/File Share for storing the results and add configuration file to auto-scale by spinning up new Azure Container Instances or removing them.

How can we reproduce this? (What steps did you do to trigger the problem? If applicable, please include multiple screenshots of the problem! Be detailed)

It's a feature enhancement request.

warn: Cannot forward task (ID) to processing node (IP):3000: Failed sending data to the peer

Experimenting with ClusterODM. Have got a two node test cluster setup and a VM with ClusterODM running on it. Everything seems to be setup correctly, both devices show correctly in :10000 and with NODE LIST.

When submitting a job with split, it throws the error warn: Cannot forward task (ID) to processing node (IP):3000: Failed sending data to the peer. The error appears with both IPs alternately with several test datasets from 14 - 986 images. No GCPs on these. I tried the following splits on a 986 image dataset:

50 - error above
100 - error above
400 - error above
500 - job splits into FIVE parts and only make it to one node. Raising the issue via my mobile but hopefully the below table comes out OK:

Node 	Status 	Queue 	Version 	Flags

1 192.168.1.172:3000 Online 5/4 1.5.2
2 192.168.1.173:3000 Online 0/4 1.5.2

Add ASR config CLI commands

This would allow a person to change the configuration of the ASR at runtime without a restart.

  • Add/set keys
  • Delete keys
  • Write config to file (make changes permanent)
  • Restore original config (?)

Windows Native ClusterODM

It should be relatively easy to create an executable bundle for running ClusterODM natively, NodeODM style. Just need to copy the relevant github workflow files and adjust them.

Shared Task / Routes / Nodes Table

Currently task tables and routes tables are stored as objects. This means a single point of failure and no ability for the network to have multiple proxies running concurrently on multiple machines.

We should have a shared database of such information (redis?) so that multiple proxies might be synced and distribute lots of incoming connections.

Add better web interface

It would be nice to have a web UI to show node information (memory, CPU usage, tasks) as well as the ability to do the same tasks as the CLI.

Why switch nodes many times in working?

122 photos
Options: split: 50, split-overlap: 50
Why switch nodes many times in working???

image

description:
122 pictures are ortho-spliced according to 50 segmentation. There are four nodes. When splicing, they are not allocated to three nodes for splicing, but to 1-2-3 for splicing, then 1-2-4 for splicing and then 1-4 for processing. I want to ask why the nodes are changed in this way instead of fixed for three nodes for processing.

like this
1.
image
2.
image
3.
image
4.
image

[ERROR] Cluster node offline

[ERROR] Cluster node seems to be offline: HTTPConnectionPool(host='topaz318hn', port=3000): Max retries exceeded with url: /info (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f91f14f1910>: Failed to establish a new connection: [Errno -2] Name or service not known',))

Hi,

I deployed the clusterODM to Docker Swarm.

I found this issue when I start the task with setting split. It works if I do not set the split and let it run on one node.

Is there any solution?

Thanks so much,
Tianyang

Auto scaling

We should implement the ability to automatically spin-up nodes if no nodes are currently available to process a task immediately.

A provider independent abstract layer should be implemented, as we don't want to be tied to a single cloud solution.

ClusterODM distributed split/merge results vary from single instance Docker split/merge

I’m getting mixed results between using Normal docker (single instance) split/merge output vs ClusterODM distributed split/merge.

Single instance split/merge is giving me a full orthophoto from the dataset. No problems (other than taking 3.5 days to run (701 images).

ClusterODM distributed split/merge is showing zero data for about half of the imaged area, but it completes the task in about 4 hours. Once you set the transparency band to None in QGIS, you can see that it should have the entire image, but doesn’t.

Screen Shot 2019-08-11 at 11 22 20 AM

Screen Shot 2019-08-11 at 11 22 47 AM

Screen Shot 2019-08-11 at 11 23 59 AM

Screen Shot 2019-08-11 at 11 24 14 AM

Fix - AWS Autoscaling w/ Docker-Machine

Problem

Docker-Machine fails to create node instances with an AWS asr request in some configurations of the AWS platform. Specifically, it fails for those configurations without a default VPC or subnet and/or those that use region zones, e.g. us-east-1c, for the VPC/subnet/security group resources. Yielding the errors below -

No default vpc/subnet -

(debug-machine) Couldn't determine your account Default VPC ID : "default-vpc is 'none'"
Error setting machine configuration from flags provided: amazonec2 driver requires either the --amazonec2-subnet-id or --amazonec2-vpc-id option or an AWS Account with a default vpc-id

No availability Zone -

Error creating machine: Error in driver during machine creation: Error launching instance: InvalidParameterValue: Value (us-east-1a) for parameter availabilityZone is invalid. Subnet 'subnet-****' is in the availability zone us-east-1c
status code: 400,

Defining a default VPC or subnet requires interaction with AWS support. Defining the zone is straightforward given the parameter in the JSON.

Expectation

What is expected is that when running cluster-odm locally via 'node index.js --asr aws.json' assuming a properly formed aws.json file, the docker-machine invocation should create a new machine, load docker, and invoke a containerized node.

Reproduction

In order to reproduce this error, in an AWS environment with the configuration above, launch local ClusterODM with the --asr aws.json config flag reflecting that environment. Telnet to 8080 and run 'asr viewcmd <# images>'. This provides the docker-machine command reflecting the information provided in the asr config. Copy that docker-machine command and attempt to execute it in the command line. If a VPC and subnet are not defaulted in the AWS environment, the machine will not be created. If the AWS resources are contained in zone of a region, e.g. a,b,c, the machine will not be created.

The command line for Docker-Machine described above provides a more descriptive error than simply using the WebODM interface to launch a processing job. That approach simply fails with the error below;

node index.js --asr aws.json
info: ClusterODM 1.5.3 started with PID 14656
info: Starting admin CLI on 8080
warn: No admin CLI password specified, make sure port 8080 is secured
info: Starting admin web interface on 10000
warn: No admin password specified, make sure port 10000 is secured
info: Cloud: LocalCloudProvider
info: ASR: AWSAsrProvider
info: Can write to S3
info: Found docker-machine executable
info: Loaded 1 nodes
info: Loaded 0 routes
info: Starting http proxy on 3000
info: Trying to create machine... (1)
warn: Cannot create machine: Error: docker-machine exited with code 1
info: Trying to create machine... (2)
warn: Cannot create machine: Error: docker-machine exited with code 1

Resolution

The problem resolution is limited in scope to the definition of the AWSAsrProvider extension of the AbstractASRProvider class in ./ClusterODM/libs/asr-providers/aws.js. Documentation must be updated in ~/ClusterODM/docs/aws.md.

The following provides the reference variable names in Docker-Machine required to translate between the JSON naming and the Docker-Machine naming for the variable names. https://gdevillele.github.io/machine/drivers/aws/

Better Docs

The README should have some more information on the various flags.

No Splitting of Jobs

There is no spliting of projects in my cluster.
these are my options:
Optionen: auto-boundary: true, fast-orthophoto: true, split: 40, split-overlap: 15, rerun-from: dataset

Node Status Queue Engine API CPU Cores RAM available Flags -

1 20.113.*.3:3000 Online 0/1 odm 2.8.1 2.2.0 4 97.38%
2 20.54.
.0:3000 Online 1/1 odm 2.8.1 2.2.0 4 51.87%
3 20.113.
.*3:3000 Online 0/1 odm 2.8.1 2.2.0 4 97.93%

only one node is used any idea how to change this behaver?
I have a Photo set of 760 pics.

Cluster node seems to be offline: HTTPConnectionPool(host='127.0.0.1', port=3001)

There's a weird intermittent connectivity issue for the cluster software:

[ERROR] Cluster node seems to be offline: HTTPConnectionPool(host='127.0.0.1', port=3001): Max retries exceeded with url: /info (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9861261810>: Failed to establish a new connection: [Errno 111] Connection refused',))

It works about 50% of the time, but times out for the rest. ClusterODM is started via systemd and remains running fine:

ExecStart=/usr/bin/node /opt/ClusterODM/index.js -p 3001

When submitting a job, it almost seems like WebODM is trying to brute force a job into ClusterODM, as the number of connections is huge:

root@survey-dev:~# netstat -tapn | grep 3001 | wc -l
269
tcp        0      0 127.0.0.1:43016         127.0.0.1:3001          TIME_WAIT   -               
tcp        0      0 127.0.0.1:42818         127.0.0.1:3001          TIME_WAIT   -               
tcp        0      0 127.0.0.1:42802         127.0.0.1:3001          TIME_WAIT   -               
tcp6       0      0 :::3001                 :::*                    LISTEN      13085/node      
tcp6       0      0 127.0.0.1:3001          127.0.0.1:43050         TIME_WAIT   - 

telneting to the port works fine. Server load is very low. Packet loss is zero (it's localhost).

Cluster Split fail and raise Exception!

700 photos to split

CMD:
docker run -ti -v "/ftpfile/1400/1400:/code/images" opendronemap/odm --split 400 --split-overlap 100 --sm-cluster http://192.168.3.86:3100

cluster: http://192.168.3.86:3100
config :

#> node list
1) 192.168.3.86:3001 [online] [0/2] <version 1.5.3>
2) 192.168.3.86:3002 [online] [0/2] <version 1.5.3>
3) 192.168.3.86:3003 [online] [0/2] <version 1.5.3>
4) 192.168.3.86:3005 [online] [0/2] <version 1.5.3>
5) 192.168.3.86:3006 [online] [0/2] <version 1.5.3>
6) 192.168.3.86:3009 [online] [0/2] <version 1.5.3>

EXCEPTION:

OpenCV Error: Assertion failed (data0.dims <= 2 && type == CV_32F && K > 0) in kmeans, file /code/SuperBuild/src/opencv/modules/core/src/matrix.cpp, line 2701
Traceback (most recent call last):
  File "/code/SuperBuild/src/opensfm/bin/opensfm", line 34, in <module>
    command.run(args)
  File "/code/SuperBuild/src/opensfm/opensfm/commands/create_submodels.py", line 30, in run
    self._cluster_images(meta_data, data.config['submodel_size'])
  File "/code/SuperBuild/src/opensfm/opensfm/commands/create_submodels.py", line 100, in _cluster_images
    labels, centers = tools.kmeans(positions, K)[1:]
  File "/code/SuperBuild/src/opensfm/opensfm/large/tools.py", line 34, in kmeans
    return cv2.kmeans(samples, nclusters, criteria, attempts, flags)
cv2.error: /code/SuperBuild/src/opencv/modules/core/src/matrix.cpp:2701: error: (-215) data0.dims <= 2 && type == CV_32F && K > 0 in function kmeans

Traceback (most recent call last):
  File "/code/run.py", line 56, in <module>
    app.execute()
  File "/code/stages/odm_app.py", line 93, in execute
    self.first_stage.run()
  File "/code/opendm/types.py", line 376, in run
    self.next_stage.run(outputs)
  File "/code/opendm/types.py", line 357, in run
    self.process(self.args, outputs)
  File "/code/stages/splitmerge.py", line 65, in process
    octx.run("create_submodels")
  File "/code/opendm/osfm.py", line 21, in run
    (context.opensfm_path, command, self.opensfm_project_path))
  File "/code/opendm/system.py", line 76, in run
    raise Exception("Child returned {}".format(retcode))
Exception: Child returned 1

Process exited with code 1

288 pictures

Processing Node: ClusterODM (auto)
Options: split: 80, split-overlap: 80
1) 192.168.3.86:3001 [offline] [0/2] <version 1.5.3>
2) 192.168.3.155:3001 [online] [0/2] <version 1.5.3>
3) 192.168.3.24:3001 [online] [0/2] <version 1.5.3>
[WARNING] LRE: submodel_0001 failed with: (ac5bb764-14e0-43ef-a8de-58861b7d0f52) failed with task output: self.process(self.args, outputs)
File "/code/stages/odm_meshing.py", line 72, in process
smooth_dsm=not args.fast_orthophoto)
File "/code/opendm/mesh.py", line 35, in create_25dmesh
apply_smoothing=smooth_dsm
File "/code/opendm/dem/commands.py", line 236, in create_dem
'{merged_vrt} {geotiff}'.format(**kwargs))
File "/code/opendm/system.py", line 76, in run
raise Exception("Child returned {}".format(retcode))
Exception: Child returned 1
Full log saved at /var/www/data/cd504d84-75e8-44c9-9159-6e6f01563722/submodels/submodel_0001/error.log
[INFO]    LRE: Cleaning up remote task (ac5bb764-14e0-43ef-a8de-58861b7d0f52)... OK
[INFO]    LRE: submodel_0002 (1171854e-aaf5-4537-aa83-912e1095a3bd) is still running
[INFO]    LRE: submodel_0003 (08cba982-e75f-4b0c-a5c6-a642d0129278) is still running
[INFO]    LRE: submodel_0002 (1171854e-aaf5-4537-aa83-912e1095a3bd) is still running
[INFO]    LRE: submodel_0003 (08cba982-e75f-4b0c-a5c6-a642d0129278) is still running
[INFO]    LRE: submodel_0002 (1171854e-aaf5-4537-aa83-912e1095a3bd) is still running
[INFO]    LRE: submodel_0003 (08cba982-e75f-4b0c-a5c6-a642d0129278) is still running
[INFO]    LRE: submodel_0002 (1171854e-aaf5-4537-aa83-912e1095a3bd) is still running
[INFO]    LRE: submodel_0003 (08cba982-e75f-4b0c-a5c6-a642d0129278) is still running
[INFO]    LRE: submodel_0002 (1171854e-aaf5-4537-aa83-912e1095a3bd) is still running
[INFO]    LRE: submodel_0003 (08cba982-e75f-4b0c-a5c6-a642d0129278) is still running
[WARNING] LRE: submodel_0002 failed with: (1171854e-aaf5-4537-aa83-912e1095a3bd) failed with task output: File "/code/opendm/types.py", line 376, in run
self.next_stage.run(outputs)
File "/code/opendm/types.py", line 376, in run
self.next_stage.run(outputs)
File "/code/opendm/types.py", line 357, in run
self.process(self.args, outputs)
File "/code/stages/mve.py", line 129, in process
raise e
Exception
: Child returned 137
Full log saved at /var/www/data/cd504d84-75e8-44c9-9159-6e6f01563722/submodels/submodel_0002/error.log
[INFO]    LRE: Cleaning up remote task (1171854e-aaf5-4537-aa83-912e1095a3bd)... OK
[INFO]    LRE: submodel_0003 (08cba982-e75f-4b0c-a5c6-a642d0129278) is still running
[INFO]    LRE: submodel_0003 (08cba982-e75f-4b0c-a5c6-a642d0129278) is still running
[INFO]    LRE: submodel_0003 (08cba982-e75f-4b0c-a5c6-a642d0129278) is still running
[INFO]    LRE: submodel_0003 (08cba982-e75f-4b0c-a5c6-a642d0129278) is still running
[INFO]    LRE: submodel_0003 (08cba982-e75f-4b0c-a5c6-a642d0129278) is still running
[INFO]    LRE: submodel_0003 (08cba982-e75f-4b0c-a5c6-a642d0129278) is still running
[INFO]    LRE: submodel_0003 (08cba982-e75f-4b0c-a5c6-a642d0129278) is still running
[INFO]    LRE: submodel_0003 (08cba982-e75f-4b0c-a5c6-a642d0129278) is still running
[INFO]    LRE: submodel_0003 (08cba982-e75f-4b0c-a5c6-a642d0129278) is still running
[INFO]    LRE: submodel_0003 (08cba982-e75f-4b0c-a5c6-a642d0129278) is still running
[INFO]    LRE: submodel_0003 (08cba982-e75f-4b0c-a5c6-a642d0129278) is still running
[INFO]    LRE: submodel_0003 (08cba982-e75f-4b0c-a5c6-a642d0129278) is still running
[INFO]    LRE: submodel_0003 (08cba982-e75f-4b0c-a5c6-a642d0129278) is still running
[WARNING] LRE: submodel_0003 failed with: (08cba982-e75f-4b0c-a5c6-a642d0129278) failed with task output: File "/code/opendm/types.py", line 376, in run
self.next_stage.run(outputs)
File "/code/opendm/types.py", line 357, in run
self.process(self.args, outputs)
File "/code/stages/mvstex.py", line 97, in process
'-n {nadirWeight}'.format(**kwargs))
File "/code/opendm/system.py", line 76, in run
raise Exception("Child returned {}".format(retcode))
Exception
: Child returned 134
Full log saved at /var/www/data/cd504d84-75e8-44c9-9159-6e6f01563722/submodels/submodel_0003/error.log
[INFO]    LRE: Cleaning up remote task (08cba982-e75f-4b0c-a5c6-a642d0129278)... OK
[INFO]    LRE: No remote tasks left to cleanup
Traceback (most recent call last):
File "/code/run.py", line 56, in <module>
app.execute()
File "/code/stages/odm_app.py", line 93, in execute
self.first_stage.run()
File "/code/opendm/types.py", line 376, in run
self.next_stage.run(outputs)
File "/code/opendm/types.py", line 357, in run
self.process(self.args, outputs)
File "/code/stages/splitmerge.py", line 153, in process
lre.run_toolchain()
File "/code/opendm/remote.py", line 57, in run_toolchain
self.run(ToolchainTask)
File "/code/opendm/remote.py", line 251, in run
raise nonloc.error
pyodm.exceptions.TaskFailedError: (08cba982-e75f-4b0c-a5c6-a642d0129278) failed with task output: File "/code/opendm/types.py", line 376, in run
self.next_stage.run(outputs)
File "/code/opendm/types.py", line 357, in run
self.process(self.args, outputs)
File "/code/stages/mvstex.py", line 97, in process
'-n {nadirWeight}'.format(**kwargs))
File "/code/opendm/system.py", line 76, in run
raise Exception("Child returned {}".format(retcode))
Exception
: Child returned 134
Full log saved at /var/www/data/cd504d84-75e8-44c9-9159-6e6f01563722/submodels/submodel_0003/error.log

description :
288 pictures are divided into 80 pictures. When running to 01:34:45, the error is "Process exited with code 1", but 174 of them can be successfully stitched.
The environment used is three machines and three nodes.
Ask for help to find out the reason from the exception.

Thanks.

Autoscaling sometimes fails

pyodm.exceptions.TaskFailedError: (7e4b9fc9-902d-4865-ae2d-9d20a78b4fa2) failed with task output: Failed sending data to the peer

Running OpenDroneMap on HPC

Purposes:
1a. Implement rootless container to run ODM on HPC, using Singularity (for ODM) and Podman (NodeODM)
1b. Use binary files if could not proceed with 1a
2. Connect ClusterODM to NodeODM in HPC environment, probably with proxy.
3. Use SLURM to dynamically assign tasks between different nodes in NodeODM.

Return response faster when processing /task/new

After upload has finished, it takes some time to forward the input images to one of the nodes.

This causes the upload call to "hang" for a little while. It would be better if we returned a status of running and assign a task ID right away.

This will require modifying nodeodm to support ID suggestions.

How to load balance queue?

I want processing to happen in everynode but currently I see

# | Node | Status | Queue | Engine | API | Flags
-- | -- | -- | -- | -- | -- | --

1 | 192.168.0.247:3000 | Online | 1/4 | odm 2.4.3 | 2.1.4 |  
2 | 192.168.0.248:3000 | Online | 0/4 | odm 2.4.3 | 2.1.4 |  
3 | 192.168.0.246:3000 | Online | 0/4 | odm 2.4.3 | 2.1.4 |  

One task takes up only 1 queue. But I would like it to queue on all node so processing will be very fast? Is there any scheduling provision that we can use to schedule it across all the queues on multiple nodes?

I do see split_merge options but I don't see any documentation etc. https://github.com/OpenDroneMap/ClusterODM/blob/master/config.js#L40

Thank You.

Add --node-priority flag

Add --node-priority flag for optimizing according to various logics.

Possible values:

  • least cost
  • round robin
  • ...

Docker machine is not present on the image for the last build (07/02/2022)

What is the problem?

On the last build there is a XML file instead of the docker-machine binary:

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>9X64YXJWB3NTJXDY</RequestId><HostId>gcYPM2y5ZRki30XSL44791CDe8Nz0kMvQu0PYlmKNUXVMFzebFBYkjlAB8pfiG3WqQ6uygHt1wc=</HostId></Error>

So the autoscaler is not working anymore.

How can we reproduce this?

Just pull the latest image and try to run a docker-machine command

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.