Giter Club home page Giter Club logo

transformer's People

Contributors

adamvinueza avatar

Watchers

 avatar  avatar

transformer's Issues

Documentation of over-writing file is incorrect

It says the file will be overwritten if it exists, but there is a parameter in the __init__ function allowing the user to prevent overwriting the file. Also the default is to not overwrite the file.

Transformer should have a simple copy function

Copying a file from one file system to another should be easy:

s3 = s3fs.S3FileSystem(...)
tr = Transform(src_fs=s3, dest_fs=LocalFileSystem())
tr.copy('s3://my-bucket/my-file.txt', '/path/to/my-file.txt')

Transform should have easy way to create S3 and local file systems

Something like this:

from transformer.transform import Transform, S3FS, LocalFS
aws_profile='dev'
tr = Transform(src_fs=S3FS(aws_profile), dest_fs=LocalFS())
tr.copy(src, dest)

The S3FS function should automatically find credentials given the specified AWS profile name.

fcopy function: copy from src to dest passing through an intermediate function

The filter function takes a reader and some keyword arguments, and returns an array of bytes that can be consumed by a writer.

"""
class Transform(object):
    # ...

    def fcopy(self, src, dest, bufsize, filter, **kwargs):
        with self.src_fs.open(src, 'rb', bufsize) as rdr:
            with self.dest_fs.open(dest, 'wb', bufsize) as wr:
                while True:
                    b = filter(rdr, **kwargs)
                    if not b:
                        w.flush()
                        break
                    wr.write(b)
"""

# A sample filter function: encrypt an input stream.
def encrypt(src_stream, gpg, password):
    return gpg.encrypt_file(src_stream, password=passwd)

passwd = 'my-secret-gpg-password'
gpg = gnupg.GPG()
# configure gpg with key and so on


s3 = S3FS(my_profile)
tr = Transform(src_fs=s3, dest_fs=s3, overwrite=True)
tr.fcopy(src, dest, encrypt, 1024, gpg=gpg, password=passwd)

Transformer should be able to read and write to distinct file systems

I might want to upload a local file to an S3 bucket, or transfer a file from an SFTP server to an S3 bucket. The file systems of the the source and destination in these cases have to be different.

Basic idea:

import s3fs
from fsspec.implementations import sftp

def decrypt(rdr, wr, params):
    """Decrypts the file read by rdr, writes decrypted stream using wr."""
    pass

src_fs = sftp.SFTPFileSystem(host, paramiko_params_dict)
dest_fs = s3fs.S3FileSystem(anon=False)
tr = Transform(src_fs=src_fs, dest_fs=dest_fs, overwrite=True)
tr('file_on_sftp_server.gpg', 's3://my_bucket/decrypted_file.csv', decrypt, [passphrase])

We should still be able to use a single file system, though:

import s3fs

def decrypt(rdr, wr, params):
    """Decrypts the file read by rdr, writes decrypted stream using wr."""
    pass

fs = s3fs.S3FileSystem(anon=False)
tr = Transform(fs=fs)
tr('s3://my_bucket/encrypted.gpg', 's3://my_bucket/decrypted.csv', decrypt, [passphrase])

Mixing the fs parameter with src_fs or dest_fs should raise an error; specifying only one of src_fs and dest_fs should also raise an error.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.