Giter Club home page Giter Club logo

ingest-client's People

Contributors

conradf7 avatar dkleissa avatar dxenes1 avatar j6k4m8 avatar kunallillaney avatar lrodri29 avatar movestill avatar sandyhider avatar straz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ingest-client's Issues

Add ramp option to ingest config

Issue

Many services from which our plugins pull have a gradual rate-limit increase. Especially because these ingest jobs can saturate that limit very quickly (say, by running with a high -parallelism count), it'd be beneficial to support a gradual ramp-up for use alongside the -p option.

Proposed solution 1: CLI argument

Pass a --ramp -r flag that contains the number of seconds in between the addition of subsequent workers:

boss-ingest -p 36 --ramp 10s my-config.json

The above command will run 36 workers in parallel, adding another after every 10 seconds (after 30 seconds, three workers will be running. At 6 minutes, all workers will be running).

One could imagine adding support for other string formats like --ramp 5m or --ramp 1h; or perhaps adding support for a count of workers to be added at each increment, such as --ramp "10,5m". But this starts getting complicated and may be better served by:

Proposed solution 2: JSON config

I like this solution slightly less because it would potentially require a change to the ingest-client json schema, but it also enables much more flexibility with designing the ramp-up curve shape.

For example, borrowing from the k6 options:

{
    stages: [
        { time: "10s", workers: 10 },
        { time: "20s", workers: 10 },
        { time: "20m", workers: 20 },
        { time: "40m", workers: 20 }
    ]
}

The above configuration would add 10 workers at the 0:00:10 mark, another 10 at the 0:00:20 mark, and then twenty workers each at the 0:20:00 and 0:40:00 minute marks, for a total of 60 workers.

Such a configuration would be particularly useful for, say, pulling from Google Cloud Storage, where read-rate limits are doubled no sooner than 20 minutes after an ingest starts.

Engine should take a config object

The constructor for ingest.core.engine.Engine should take a configuration object, so Engines can be created programmatically. The configuration object can be a dict representing the contents of a configuration file.

For convenience, and to preserve the current behavior, it should be ok to pass a filename, which gets parsed into a configuration object.

[Suggestion] Specify preferred queue ordering with flag or config

There are certainly cases where an arbitrary upload order is preferable โ€” but it would be very helpful to be able to specify upload_order=contiguous on some workers to ensure that a new plugin is working correctly, without having to wait for โ‰ˆdataset_size-(dataset_size/cuboid_size) uploads to finish first.

Possible similar functionality is setting a smaller ingest extent and conducting a smaller test-ingest first, but this would make it easier to get results into the hands of users quicker, without configuring multiple ingest jobs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.