appliedai-initiative / accsr Goto Github PK
View Code? Open in Web Editor NEWLightweight library for accessing data and configuration
License: MIT License
Lightweight library for accessing data and configuration
License: MIT License
Blocked by #6
When pulling from e.g. google storage with wrong credentials, the process seems to simply hang and not produce an error. We should ping the storage provider prior to an operation and let the user specify a connection_timeout, raising an error if connecting fails.
We should make it easier to push/pull a bunch of paths based on patterns. For that we should add
except_matches
kwarg to permit simple exclusion of files. The current regex kwarg should be renamed to if_matches
.This would permit things like
storage.push("data/**/*.jpg", except_matches=r".*test.*")
We could additionally allow passing a except_condition: Callable[[str]], bool] = None
(or do you think if_condition
is more natural?), in which case the above can be rewritten
storage.push("data/**/*.jpg", except_condition=lambda n: "test" in n)
The condition could be even made more general, mapping the metadata-object to a bool (thereby e.g. allowing filtering by size), at the cost of a more complicated interface for callables. @fariedabuzaid @AnesBenmerzoug what do you think?
Add a mode that allows existing files to be skipped to the RemoteStorage push/pull methods.
Currently, pushing and pulling of directories does not check whether the entire operation can be performed successfully (e.g. if modified files already exist and overwrite_existing=False
). This leads to a partial execution before an error is thrown and thus to an unpredictable state.
We should check if the entire operation can be performed before pushing/pulling anything.
Also, to be more familiar to git users, overwrite_existing
should be renamed to force
. This is a breaking change, the minor version should be bumped
We have essentially no documentation on how to use accsr. The interplay of storage and config modules should be demonstrated in notebooks. See tests/conftest.py
for an example how a storage service is instantiated during local testing an in CI.
When installing dependencies according to the CONTRIBUTING.md.
the required deps. are conflicting.
The dev. requirements demand isort == 5.6.4
while the linting requirements demand isort ~= 5.12.0
.
Add a new flag to the RemoteStorage push/pull operation.
If True the function should determine and return the operations that need to be conducted without actually performing them.
We should avoid reinventing the wheel here
We should keep support for json as well for existing projects and because some auto-generated configurations come in json format
Accsr seems to re-load the configuration every time it does something - and then tell me about it at INFO level.
I would propose to either
Example:
INFO 2024-01-17 13:46:40,388 accsr.config:__init__ - Reading configuration from C:\Users\DominikJain\Dev\rl4sem\sem_env\config.json
INFO 2024-01-17 13:46:40,388 accsr.config:__init__ - Reading configuration from C:\Users\DominikJain\Dev\rl4sem\sem_env\config_local.json
..
INFO 2024-01-17 13:46:40,388 accsr.config:__init__ - Reading configuration from C:\Users\DominikJain\Dev\rl4sem\sem_env\config.json
INFO 2024-01-17 13:46:40,388 accsr.config:__init__ - Reading configuration from C:\Users\DominikJain\Dev\rl4sem\sem_env\config_local.json
..
INFO 2024-01-17 13:47:17,584 accsr.config:__init__ - Reading configuration from C:\Users\DominikJain\Dev\rl4sem\sem_env\config.json
INFO 2024-01-17 13:47:17,584 accsr.config:__init__ - Reading configuration from C:\Users\DominikJain\Dev\rl4sem\sem_env\config_local.json
...
Computing the hash can take a lot of time for large files
@MischaPanch This fixes the tests but it currently break caching.
WARNING: The directory '/github/home/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
I think running jobs inside containers is the way to go and we should invest some time to make caching work with it.
Originally posted by @AnesBenmerzoug in #1 (comment)
@MischaPanch can you release the current dev branch? I found a bug in the old version which seems to fixed now.
Would be great to get the fix installed. Not urgent though, I can work with the dev branch for now :)
Due to its dependence on pyyaml
, this package is currently not installbale with poetry. The error is caused by current versions on pyyaml
as documented here.
This allows pulling with absolute paths, which is very handy
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.