savkov / planchet Goto Github PK
View Code? Open in Web Editor NEWYour large data processing personal assistant
License: MIT License
Your large data processing personal assistant
License: MIT License
The client is trying to import from other modules in the __init__
module.
https://github.com/savkov/planchet/blob/master/planchet/__init__.py#L1
I forgot to add a client method for /purge
๐คฆ
Currently, jobs are running checks against served items whenever they receive items. It would be nice to have special jobs that only receive.
Docker-compose uses the redis
name for the container it creates. It's then impossible to run make install-redis
.
Currently, the automated CI test coverage does not include the app and client tests which is a significant part of the project. Ideally, we should be testing inside the docker-compose tests and reporting.
If multiple users or jobs are running in parallel, it could be that they write to the same place and that could be dangerous.
Suggestion:
force
mechanism for when this is the desired effectThe current ledger is basically the Redis client and each item is 1-to-1 recorded in Redis. This is quite easy and lazy but gets inefficient when Redis needs to sort through millions of records in order to update the 100 records it just received in a batch.
Proposed solutions:
Ledger
class.Allow processors to send a special flag to indicate error so that items can be re-submitted for processing.
At the moment anyone can create jobs and do processing as apart of existing jobs. As this is meant to be a tool that lives in a sandbox, this is acceptable, but the risk of users interfering with each other's work remains. Therefore a simple solution can go a long way.
Proposal:
CsvReader
and CsvWriter
use a list as their input/output item structure. The order of the list matters. It may be better to force this to be a dictionary. Probably through a new pair of classes that use JSON for their item structure.
Currently, there is no mechanism for the served items to be re-served in case the processor has died.
The proposed solution:
SERVED
items before anything elseLimitation:
SERVED
itemsRECEIVED
statusCurrently /clean
is a nice way to clean the ledger and restart a job that may have gone wrong. That doesn't currently remove the output file.
Suggestion:
output
True
Currently overwrite
when used as part of the /receive
endpoint overwrites the output file. This is desired behaviour but what is often not clear is that each following request will do the same. So to protect the users from themselves we should act on this only if there are no active items in the job ledger.
We need a convenient way to clean up redis remotely. An endpoint that would clean all entries for a job or all jobs can help the users do this without logging in on their planchet server.
So far Planchet has assumed that the data would be stored in a single file both in the reading and writing mode. Sometimes it would be more convenient to split data into smaller files for storage reasons or access reasons. Or maybe because that's how the data was stored originally.
Suggestion:
Add new readers and writers that are able to operate with batches of files. Ideally, we should try to wrap around the existing ones.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.