oshyshko / uio Goto Github PK
View Code? Open in Web Editor NEWA Clojure/Java library and a CLI tool for accessing various file systems such as HDFS, S3, SFTP etc.
License: Eclipse Public License 1.0
A Clojure/Java library and a CLI tool for accessing various file systems such as HDFS, S3, SFTP etc.
License: Eclipse Public License 1.0
It would be nice to be able to depend on this library without pulling in all the Hadoop dependencies. A tentative investigation suggests that a straightforward classpath check here would accomplish this:
https://github.com/oshyshko/uio/blob/master/src/uio/uio.clj#L51
When uploading very large files to s3 and a temporary network issue occurs, the s3 retry code throws an exception as seen below. I doubt we want to address retry problems like this via uio but as the underlying library already does it in a network efficient way, maybe we could take advantage of it?
We could wrap (from from-url)
in a BufferedInputStream of a specified size that is then communicated to request.getRequestClientOptions().setReadLimit(int)
and then the TransferManager
would be able to reset back the stream far enough to retry. This avoids falling back to the user to re-upload the file from the beginning.
There is an aws ticket that seems to recap the problem here aws/aws-sdk-java#427 but they pretty much say "hey don't give TransferManager
an InputStream
but give it a File
". This isn't helpful for uio copy
but I think it does mention the solution I propose or else the stacktrace suggests this.
I really only see an issue with validating the fix.
com.amazonaws.ResetException: Failed to reset the request input stream; If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1305)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1129)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1035)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:747)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:721)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:704)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:672)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:654)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:518)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4185)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4132)
at com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3172)
at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3157)
at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadPartsInSeries(UploadCallable.java:257)
at com.amazonaws.services.s3.transfer.internal.UploadCallable.uploadInParts(UploadCallable.java:191)
at com.amazonaws.services.s3.transfer.internal.UploadCallable.call(UploadCallable.java:123)
at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:139)
at com.amazonaws.services.s3.transfer.internal.UploadMonitor.call(UploadMonitor.java:47)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Resetting to invalid mark
at java.io.BufferedInputStream.reset(BufferedInputStream.java:448)
at com.amazonaws.internal.SdkBufferedInputStream.reset(SdkBufferedInputStream.java:106)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:102)
at com.amazonaws.event.ProgressInputStream.reset(ProgressInputStream.java:168)
at com.amazonaws.internal.SdkFilterInputStream.reset(SdkFilterInputStream.java:102)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.resetRequestInputStream(AmazonHttpClient.java:1303)
... 21 more```
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.