Giter Club home page Giter Club logo

fpipe's People

Contributors

vkvam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

fpipe's Issues

Documentation

Explain intent and alternatives (input and output file types) of each generator and the difference between iter flush() and flush_iter().

Before documenting, ensure there is only on filetype. File and FileStream can be differentiated by wether an optional stream is present on a file or not.

FileMeta could be changed to FileData, and the stream could be one of these, for performance reasons set this as an instance variable as well as in the FileData map. File is turned into a pure container, and FileData can be accessed through get_item()

For each Generator specify acceptable FileData combinations.

FileMeta is confusing

FileMeta and FileInfoGenerator is extremely confusing, find a better way to structure this.

FileMetaGenerator?

FileInfoGenerator should also be split up, opening up the door for multiple instances of metadata produces from one generator, all linked to one file.

2**14 hardcoded everywhere, related to unix pipe buffer

Could this be picked up from an environment variable?

Apparently not, check https://unix.stackexchange.com/questions/11946/how-big-is-the-pipe-buffer

Suggestion is to set this to 64kb and allow it to be modified with an env var.

Also, unix fifo should work as a drop in replacement for BytesBuffer, nice cheat if bytebuffer is profiled and proved to be slow. BytesBuffer should still be a fallback, because windows...

https://alexdelorenzo.dev/programming/2019/04/14/buffer

Avoid deadlocks

Generalise some mechanism to shut down all threads when one thread/process fails.

Python 3.6 support

Lack of support seems to be down to errors in the typing std library, specifically typing.BinaryIO

BrokenPipe

Exit gracefully when program itself completes before whole stream is read.

Split/join and branch/merge

Split/join
Split one file into to multiple:
By line, count, regex/simple pattern, combinations. A predicate function with a buffer setting would be simpler. Maybe define a few predicate presets. Example: split on new day.

Join could be done in the same way. Example: merge on new month.

Stop previous:
Signal previous generators to stop or flush remaining data.

Could be a combination: split count >1 signal stop

Branch/merge
Let's say ffmpeg outputs both video and Audio in predictable sized chunks, we could banch audio to one file, video to another, by count. The two files could maybe be merged into ffmpeg again for transcoding.

Split/join: sequential
Branch/merge: parallel

S3 improvements

Consider passing arguments to S3 gen and S3Writer as boto3 compatible ones.

FileMeta could have a context concept, where a value is re-formatted and named according to the the context it is used.

Path = boto3 Key
Version = boto3 Version

This could be in conflict with pathname_resolver in S3/Local generators. This should anyways be bound in some way to what we are expecting. Let's say S3 expects something that creates Path, and the default is getting the path from the source file.

S3 improvements

Introduce version concept, allow fetching S3 objects by bucket, key, version or all objects by prefix

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.