Giter Club home page Giter Club logo

accsr's People

Contributors

adrianokf avatar fariedabuzaid avatar mischapanch avatar sebastiantimwagner avatar slettner avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

accsr's Issues

Add a connection_timeout on RemoteStorage operations

When pulling from e.g. google storage with wrong credentials, the process seems to simply hang and not produce an error. We should ping the storage provider prior to an operation and let the user specify a connection_timeout, raising an error if connecting fails.

Move convenient path selections in push/pull

We should make it easier to push/pull a bunch of paths based on patterns. For that we should add

  • Permit passing glob-patterns to push
  • Permit passing glob-patterns to pull
  • Add the possibility to pass a regex as except_matches kwarg to permit simple exclusion of files. The current regex kwarg should be renamed to if_matches.

This would permit things like

storage.push("data/**/*.jpg", except_matches=r".*test.*") 

We could additionally allow passing a except_condition: Callable[[str]], bool] = None (or do you think if_condition is more natural?), in which case the above can be rewritten

storage.push("data/**/*.jpg", except_condition=lambda n: "test" in n) 

The condition could be even made more general, mapping the metadata-object to a bool (thereby e.g. allowing filtering by size), at the cost of a more complicated interface for callables. @fariedabuzaid @AnesBenmerzoug what do you think?

Transactional safety for push and pull in remote storage

Currently, pushing and pulling of directories does not check whether the entire operation can be performed successfully (e.g. if modified files already exist and overwrite_existing=False). This leads to a partial execution before an error is thrown and thus to an unpredictable state.

We should check if the entire operation can be performed before pushing/pulling anything.

Also, to be more familiar to git users, overwrite_existing should be renamed to force. This is a breaking change, the minor version should be bumped

Improve docs by extending notebooks

We have essentially no documentation on how to use accsr. The interplay of storage and config modules should be demonstrated in notebooks. See tests/conftest.py for an example how a storage service is instantiated during local testing an in CI.

Add Simulation Mode

Add a new flag to the RemoteStorage push/pull operation.
If True the function should determine and return the operations that need to be conducted without actually performing them.

Support config in yaml format

We should keep support for json as well for existing projects and because some auto-generated configurations come in json format

Excessive logging/suboptimal config handling

Accsr seems to re-load the configuration every time it does something - and then tell me about it at INFO level.
I would propose to either

  • log at DEBUG level instead or
  • read the config only once and cache it (preferred).

Example:

INFO  2024-01-17 13:46:40,388 accsr.config:__init__ - Reading configuration from C:\Users\DominikJain\Dev\rl4sem\sem_env\config.json
INFO  2024-01-17 13:46:40,388 accsr.config:__init__ - Reading configuration from C:\Users\DominikJain\Dev\rl4sem\sem_env\config_local.json
..
INFO  2024-01-17 13:46:40,388 accsr.config:__init__ - Reading configuration from C:\Users\DominikJain\Dev\rl4sem\sem_env\config.json
INFO  2024-01-17 13:46:40,388 accsr.config:__init__ - Reading configuration from C:\Users\DominikJain\Dev\rl4sem\sem_env\config_local.json
..
INFO  2024-01-17 13:47:17,584 accsr.config:__init__ - Reading configuration from C:\Users\DominikJain\Dev\rl4sem\sem_env\config.json
INFO  2024-01-17 13:47:17,584 accsr.config:__init__ - Reading configuration from C:\Users\DominikJain\Dev\rl4sem\sem_env\config_local.json
...

CI: make caching work within containers

@MischaPanch This fixes the tests but it currently break caching.

WARNING: The directory '/github/home/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.

I think running jobs inside containers is the way to go and we should invest some time to make caching work with it.

Originally posted by @AnesBenmerzoug in #1 (comment)

chore: release version 0.3.5-dev0

@MischaPanch can you release the current dev branch? I found a bug in the old version which seems to fixed now.
Would be great to get the fix installed. Not urgent though, I can work with the dev branch for now :)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.