vkvam / fpipe Goto Github PK
View Code? Open in Web Editor NEWA framework for processing file-like objects with pipes
Home Page: https://github.com/vkvam/fpipe
License: MIT License
A framework for processing file-like objects with pipes
Home Page: https://github.com/vkvam/fpipe
License: MIT License
Line 36 in 5ab853b
FileInfoGenerator, CalculatedFileMeta are connected, but how and why is not clearly defined
CalculatedFileMeta(FileMeta) is 2 things, size and checksum, split these. Provided multiple calcs. To Fileinfo...
Explain intent and alternatives (input and output file types) of each generator and the difference between iter flush() and flush_iter().
Before documenting, ensure there is only on filetype. File and FileStream can be differentiated by wether an optional stream is present on a file or not.
FileMeta could be changed to FileData, and the stream could be one of these, for performance reasons set this as an instance variable as well as in the FileData map. File is turned into a pure container, and FileData can be accessed through get_item()
For each Generator specify acceptable FileData combinations.
FileMeta and FileInfoGenerator is extremely confusing, find a better way to structure this.
FileMetaGenerator?
FileInfoGenerator should also be split up, opening up the door for multiple instances of metadata produces from one generator, all linked to one file.
Could this be picked up from an environment variable?
Apparently not, check https://unix.stackexchange.com/questions/11946/how-big-is-the-pipe-buffer
Suggestion is to set this to 64kb and allow it to be modified with an env var.
Also, unix fifo should work as a drop in replacement for BytesBuffer, nice cheat if bytebuffer is profiled and proved to be slow. BytesBuffer should still be a fallback, because windows...
Generalise some mechanism to shut down all threads when one thread/process fails.
An example is Program, the data is specified directly on the generator, not on the source File or as a MetaResolver. Fix any inconsistencies in all generators.
Lack of support seems to be down to errors in the typing std library, specifically typing.BinaryIO
Today meta is retrieved by calling file.parent.parent.....meta, in a large pipeline this is not very convenient.
T/T2 are source/target.
Property: files return source files.
Exit gracefully when program itself completes before whole stream is read.
Split/join
Split one file into to multiple:
By line, count, regex/simple pattern, combinations. A predicate function with a buffer setting would be simpler. Maybe define a few predicate presets. Example: split on new day.
Join could be done in the same way. Example: merge on new month.
Stop previous:
Signal previous generators to stop or flush remaining data.
Could be a combination: split count >1 signal stop
Branch/merge
Let's say ffmpeg outputs both video and Audio in predictable sized chunks, we could banch audio to one file, video to another, by count. The two files could maybe be merged into ffmpeg again for transcoding.
Split/join: sequential
Branch/merge: parallel
Consider passing arguments to S3 gen and S3Writer as boto3 compatible ones.
FileMeta could have a context concept, where a value is re-formatted and named according to the the context it is used.
Path = boto3 Key
Version = boto3 Version
This could be in conflict with pathname_resolver in S3/Local generators. This should anyways be bound in some way to what we are expecting. Let's say S3 expects something that creates Path, and the default is getting the path from the source file.
Introduce version concept, allow fetching S3 objects by bucket, key, version or all objects by prefix
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.