jhuapl-boss / ingest-client Goto Github PK
View Code? Open in Web Editor NEWA Python command line application for performing distributed ingest of image data into the Boss
License: Apache License 2.0
A Python command line application for performing distributed ingest of image data into the Boss
License: Apache License 2.0
Many services from which our plugins pull have a gradual rate-limit increase. Especially because these ingest jobs can saturate that limit very quickly (say, by running with a high -p
arallelism count), it'd be beneficial to support a gradual ramp-up for use alongside the -p
option.
Pass a --ramp
-r
flag that contains the number of seconds in between the addition of subsequent workers:
boss-ingest -p 36 --ramp 10s my-config.json
The above command will run 36 workers in parallel, adding another after every 10 seconds (after 30 seconds, three workers will be running. At 6 minutes, all workers will be running).
One could imagine adding support for other string formats like --ramp 5m
or --ramp 1h
; or perhaps adding support for a count of workers to be added at each increment, such as --ramp "10,5m"
. But this starts getting complicated and may be better served by:
I like this solution slightly less because it would potentially require a change to the ingest-client json schema, but it also enables much more flexibility with designing the ramp-up curve shape.
For example, borrowing from the k6 options:
{
stages: [
{ time: "10s", workers: 10 },
{ time: "20s", workers: 10 },
{ time: "20m", workers: 20 },
{ time: "40m", workers: 20 }
]
}
The above configuration would add 10 workers at the 0:00:10 mark, another 10 at the 0:00:20 mark, and then twenty workers each at the 0:20:00 and 0:40:00 minute marks, for a total of 60 workers.
Such a configuration would be particularly useful for, say, pulling from Google Cloud Storage, where read-rate limits are doubled no sooner than 20 minutes after an ingest starts.
The constructor for ingest.core.engine.Engine
should take a configuration
object, so Engine
s can be created programmatically. The configuration
object can be a dict
representing the contents of a configuration file.
For convenience, and to preserve the current behavior, it should be ok to pass a filename, which gets parsed into a configuration object.
Should first check if an ingest job exists and hasn't been canceled before attempting DELETE
request so appropriate message can be displayed to the user.
JIRA Task: MICRONS-1011
Currently says "X minutes elapsed", should be easy enough to add a "Est. hh:mm remaining" as well
There are certainly cases where an arbitrary upload order is preferable โ but it would be very helpful to be able to specify upload_order=contiguous
on some workers to ensure that a new plugin is working correctly, without having to wait for โdataset_size-(dataset_size/cuboid_size) uploads to finish first.
Possible similar functionality is setting a smaller ingest extent and conducting a smaller test-ingest first, but this would make it easier to get results into the hands of users quicker, without configuring multiple ingest jobs.
fixed by #49
Data are not written, but no error is thrown during ingest:
expected:
actual:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.