adamvinueza / transformer Goto Github PK
View Code? Open in Web Editor NEWLibrary for transformations in Python
License: MIT License
Library for transformations in Python
License: MIT License
It says the file will be overwritten if it exists, but there is a parameter in the __init__
function allowing the user to prevent overwriting the file. Also the default is to not overwrite the file.
I thought I'd removed that and don't see why it's still saying 3.5 is a requirement.
Copying a file from one file system to another should be easy:
s3 = s3fs.S3FileSystem(...)
tr = Transform(src_fs=s3, dest_fs=LocalFileSystem())
tr.copy('s3://my-bucket/my-file.txt', '/path/to/my-file.txt')
Something like this:
from transformer.transform import Transform, S3FS, LocalFS
aws_profile='dev'
tr = Transform(src_fs=S3FS(aws_profile), dest_fs=LocalFS())
tr.copy(src, dest)
The S3FS function should automatically find credentials given the specified AWS profile name.
The filter function takes a reader and some keyword arguments, and returns an array of bytes that can be consumed by a writer.
"""
class Transform(object):
# ...
def fcopy(self, src, dest, bufsize, filter, **kwargs):
with self.src_fs.open(src, 'rb', bufsize) as rdr:
with self.dest_fs.open(dest, 'wb', bufsize) as wr:
while True:
b = filter(rdr, **kwargs)
if not b:
w.flush()
break
wr.write(b)
"""
# A sample filter function: encrypt an input stream.
def encrypt(src_stream, gpg, password):
return gpg.encrypt_file(src_stream, password=passwd)
passwd = 'my-secret-gpg-password'
gpg = gnupg.GPG()
# configure gpg with key and so on
s3 = S3FS(my_profile)
tr = Transform(src_fs=s3, dest_fs=s3, overwrite=True)
tr.fcopy(src, dest, encrypt, 1024, gpg=gpg, password=passwd)
I might want to upload a local file to an S3 bucket, or transfer a file from an SFTP server to an S3 bucket. The file systems of the the source and destination in these cases have to be different.
Basic idea:
import s3fs
from fsspec.implementations import sftp
def decrypt(rdr, wr, params):
"""Decrypts the file read by rdr, writes decrypted stream using wr."""
pass
src_fs = sftp.SFTPFileSystem(host, paramiko_params_dict)
dest_fs = s3fs.S3FileSystem(anon=False)
tr = Transform(src_fs=src_fs, dest_fs=dest_fs, overwrite=True)
tr('file_on_sftp_server.gpg', 's3://my_bucket/decrypted_file.csv', decrypt, [passphrase])
We should still be able to use a single file system, though:
import s3fs
def decrypt(rdr, wr, params):
"""Decrypts the file read by rdr, writes decrypted stream using wr."""
pass
fs = s3fs.S3FileSystem(anon=False)
tr = Transform(fs=fs)
tr('s3://my_bucket/encrypted.gpg', 's3://my_bucket/decrypted.csv', decrypt, [passphrase])
Mixing the fs
parameter with src_fs
or dest_fs
should raise an error; specifying only one of src_fs
and dest_fs
should also raise an error.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.