nextflow-io / nextflow-s3fs Goto Github PK

View Code? Open in Web Editor NEW

This project forked from upplication/amazon-s3-filesystem-nio2

1.0 1.0 10.0 1.68 MB

An S3 File System Provider for Java 7 (project archived)

License: Apache License 2.0

Shell 0.68% Java 77.47% Makefile 0.07% Groovy 21.78%

nextflow-s3fs's People

Contributors

Stargazers

Watchers

Forkers

fstrozzi delocalizer lukasjelonek real-currents lorenzgerber nkrumm omicsdi ziscloud sohrab-saraei mageshwarm

nextflow-s3fs's Issues

License confusion

The README.md says this is GPL3.

The POM.xml says it is Apache 2.

And it's a fork from a project which was itself Apache 2.

What license is this?

Include SignerOverride for S3 Filesystem

Hey,

I'd like to use the s3 support with a self hosted s3 provider, based on ceph and radosgw. Unfortunately the authentication requires a different signertype than the default one. I can't override the signertype in nextflow.

I have successfully implemented it with the latests Amazon-S3-FileSystem-NIO2 library. See below:

package bio.comp.jlu.psos;

import com.amazonaws.ClientConfiguration;
import com.amazonaws.auth.AWSCredentialsProvider;
import com.amazonaws.metrics.RequestMetricCollector;
import com.amazonaws.services.s3.AmazonS3;
import com.upplication.s3fs.AmazonS3ClientFactory;

/**
 *
 */
public class CephAmazonS3Factory extends AmazonS3ClientFactory {

    @Override
    protected AmazonS3 createAmazonS3(AWSCredentialsProvider credentialsProvider, ClientConfiguration clientConfiguration, RequestMetricCollector requestMetricsCollector) {
        clientConfiguration.setSignerOverride("S3SignerType");
        return super.createAmazonS3(credentialsProvider,clientConfiguration, requestMetricsCollector);
    }


}

and I can use it with the following code:

Map<String, ?> env = ImmutableMap.<String, Object>builder()
  .put(com.upplication.s3fs.AmazonS3Factory.ACCESS_KEY, accessKey)
  .put(com.upplication.s3fs.S3FileSystemProvider.AMAZON_S3_FACTORY_CLASS, "bio.comp.jlu.psos.CephAmazonS3Factory")
  .put(com.upplication.s3fs.AmazonS3Factory.SECRET_KEY, secretKey).build();
FileSystem fs = FileSystems.newFileSystem(new URI("s3://"+host+"/"), env, Thread.currentThread().getContextClassLoader());
fs.getRootDirectories().forEach(System.out::println);
Path path = fs.getPath("/psos/Cloud-50.png");
System.out.println(Files.exists(path));

I checked if I can set the SignerOverride in nextflow-s3, but unfortunately you use a very old version of the aws-java-sdk that lacks an option to set this attribute. I checked out your code and updated the aws-java-sdk to the latest version and it compiled, but a few tests failed. I have seen that Amazon-S3-FileSystem is based on a newer aws-java-sdk, but your fork diverged from them and can't be merged automatically. So it's non-trivial to implement my request.

Is it intented to update the s3 filesystem to the latest changes in upstream? Or would it be sufficient to update the aws-java-sdk to the latest version?

Implement automatic error failover for multipart upload

Multipart upload splits a file in chunk and upload each chunk independently. If one of these uploads stop due to a network hiccup or a similar problem it stops the overall upload process.

Instead of aborting the overall process, the single chunk upload should be retried for a given maxim number of attempts.

The copy operation should use multipart uploads for files > 5GB

Currently S3FileSystemProvider.copy uses AWS copyObject method which does not allow files bigger than 5GB

It should use the multipart copy feature for such files.

Unable to delete file from s3 bucket- permissions error

Command:
file("s3://${params.s3bucket}/test.json").delete()

Apr-26 21:56:51.270 [main] ERROR nextflow.cli.Launcher - @unknown
com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: ; S3 Extended Request ID: )

aws s3 rm "s3://s3bucket/test.json works ok with the AWS CLI

I am able to access and copy the file, just not move or delete.

Environment
N E X T F L O W ~ version 20.01.0
Java version: [?]
Operating system: [ Linux]

Using apache ignite with non AWS S3 bucket as working directory

Hello,

my goal is use s3 for the working directory instead of a share file system. The ideal scenario would be for me that the remote files are stored on the worker nodes in a scratch directory where the process is executed and the result is then uploaded again to S3. For me it doesn't matter if the actual executor is 'slurm', 'ignite', etc.
My first try was using apache ignite in cominbation with the -w parameter.
However I'm using s3 API of ceph that is part of our openstack installation: https://docs.ceph.com/en/latest/radosgw/s3/.
I created an example repository https://github.com/pbelmann/ignite-s3 that shows my approach.

Nextflow Version

      N E X T F L O W
      version 21.04.0 build 5552
      created 02-05-2021 16:22 UTC 
      cite doi:10.1038/nbt.3820
      http://nextflow.io

Nextflow Error reported

While the file is correctly staged in s3 by the master node, the worker node fails with the message:


Error executing process > 'runBBMapDeinterleave (test1)'

Caused by:
  java.io.IOException: No space left on device

Command executed:

  reformat.sh in=interleaved.fq.gz out1=read1.fq.gz out2=read2.fq.gz

Command exit status:
  -

Command output:
  (empty)

Work dir:
  s3://staging/staging/9d/38a8cf157159b7df900b867731c4ea

Looking at the node-nextflow.log the actual error is the following:

May-14 07:12:44.708 [pool-2-thread-1] DEBUG nextflow.file.FileHelper - Creating a file system instance for provider: S3FileSystemProvider
May-14 07:12:44.721 [pool-2-thread-1] DEBUG nextflow.file.FileHelper - AWS S3 config details: {}
May-14 07:12:47.444 [pool-2-thread-1] ERROR nextflow.executor.IgBaseTask - Cannot execute task > runBBMapDeinterleave (test2)
com.amazonaws.services.s3.model.AmazonS3Exception: Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: *********************; S3 Extended Request ID: ********************)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4914)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4860)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4854)
        at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:880)
        at com.upplication.s3fs.AmazonS3Client.listObjects(AmazonS3Client.java:105)
        at com.upplication.s3fs.util.S3ObjectSummaryLookup.lookup(S3ObjectSummaryLookup.java:113)
        at com.upplication.s3fs.S3FileSystemProvider.readAttributes(S3FileSystemProvider.java:669)
        at java.base/java.nio.file.Files.readAttributes(Files.java:1764)
        at nextflow.util.CacheHelper.hashFile(CacheHelper.java:239)
        at nextflow.util.CacheHelper.hasher(CacheHelper.java:186)
        at nextflow.util.CacheHelper.hasher(CacheHelper.java:178)
        at nextflow.util.CacheHelper.hasher(CacheHelper.java:111)
        at nextflow.util.CacheHelper.hasher(CacheHelper.java:107)
        at nextflow.util.CacheHelper.hasher(CacheHelper.java:103)
        at nextflow.file.FileHelper.getLocalCachePath(FileHelper.groovy:645)
        at nextflow.executor.IgFileStagingStrategy.stage(IgFileStagingStrategy.groovy:81)
        at nextflow.executor.IgScriptStagingStrategy.super$2$stage(IgScriptStagingStrategy.groovy)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1268)
        at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:144)
        at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:164)
        at nextflow.executor.IgScriptStagingStrategy.stage(IgScriptStagingStrategy.groovy:55)
        at nextflow.executor.IgScriptTask.beforeExecute(IgScriptTask.groovy:56)
        at nextflow.executor.IgBaseTask.call(IgBaseTask.groovy:120)
        at nextflow.scheduler.SchedulerAgent$AgentProcessor.runTask0(SchedulerAgent.groovy:350)
        at nextflow.scheduler.SchedulerAgent$AgentProcessor$1.run(SchedulerAgent.groovy:339)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

I believe that the reason for this error is the incompatibility between amazon S3 API and S3 API offered by ceph.
Is there any way to get the actual S3 call that fails?

Add Content-MD5 header to uploaded objects

A Content-MD5 header can be added to upload request for integrity check.

http://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPUT.html

What is the status of this project? (...And an offer to join forces)

Hi,

I'm not sure, if you've received my e-mail, so I'm posting this as an issue here.

I came across your fork, as in the past you have contributed to the Upplication/Amazon-S3-FileSystem-NIO2. It looks like your fork has had a life of its own and that many of your fixes have not made it into the upstream (which appears to now be dead). Is your fork still being actively developed? Would you be interested in joining forces?

We have started a new spin-off project (a rebranded fork with preserved history, based on the latest master of the Upplication/Amazon-S3-FileSystem-NIO2) called s3fs-nio, as we'd like to resuscitate the original project and keep the development going. We have upgraded to AWS SDK v2, as well as all of the outdated dependencies, have support for JDK 8 and 11 and have done a cleanup of the code base, among many other things (some of the more fundamental you can find here). We're working on quite a few other things and we're wondering if you'd be interested in joining our efforts (or contributing to them).

Would you be interested in:

Explaining what fixes you've applied to your fork and what value they add?
Submitting pull requests to our spin-off, so that we can apply your fixes to our codebase?
Joining us in our efforts to build a better maintained project?
Raising issues in our issue tracker to help understand what features still need to be implemented and what bugs exist?
Evaluating our fixes so far and providing feedback?
Hearing your view in regards to the licensing of the https://github.com/Upplication/Amazon-S3-FileSystem-NIO2 project ( Upplication#134 , carlspring/s3fs-nio#2 ).
We would really like to bring this project back to life and make some noise around it, so that people could contribute to it and enjoy using a stable version of a library that is well-maintained and documented. ;)

Here are the details of our project:

Looking forward to hearing back from you!

Kind regards,

Martin Todorov

Add ability to configure API endpoint and region attributes

It should be possible to specify the API endpoint and region configuration properties

Error is thrown when AES encryption is enforced

All S3 putObject API invocations should be made AES friendly to make it work when the use of encryption is mandatory.

It should also be tested the transfer using multipart upload for large files which may use a different API.

Related nextflow-io/nextflow#1830

AWS exception should be captured and reported as IOExeception

Exception in thread "Thread-3" com.amazonaws.services.s3.model.AmazonS3Exception: The specified bucket is not valid. (Service: Amazon S3; Status Code: 400; Error Code: InvalidBucketName; Request ID: 43F4B931257A8265; S3 Extended Request ID: RIvPDIjCBCM09skkshGtFtgytva7/vUj6l9X6z0Tnc6hyK2HppdHl+Xvb24vgr+COhhbAsP1jhQ=), S3 Extended Request ID: RIvPDIjCBCM09skkshGtFtgytva7/vUj6l9X6z0Tnc6hyK2HppdHl+Xvb24vgr+COhhbAsP1jhQ=
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1587)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1257)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1029)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:741)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:715)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:697)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:665)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:647)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:511)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4227)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4174)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4168)
	at com.amazonaws.services.s3.AmazonS3Client.listObjects(AmazonS3Client.java:823)
	at com.upplication.s3fs.AmazonS3Client.listObjects(AmazonS3Client.java:85)
	at com.upplication.s3fs.util.S3ObjectSummaryLookup.lookup(S3ObjectSummaryLookup.java:117)
	at com.upplication.s3fs.S3FileSystemProvider.readAttributes(S3FileSystemProvider.java:636)
	at java.nio.file.Files.readAttributes(Files.java:1737)
	at java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219)
	at java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
	at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:322)
	at java.nio.file.Files.walkFileTree(Files.java:2662)
	at nextflow.file.FileHelper.visitFiles(FileHelper.groovy:723)
	at nextflow.file.FileHelper$visitFiles$0.call(Unknown Source)
	at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:48)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:113)
	at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:149)
	at nextflow.Channel$_pathImpl_closure3.doCall(Channel.groovy:269)
	at nextflow.Channel$_pathImpl_closure3.doCall(Channel.groovy)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:93)
	at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:325)
	at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:294)
	at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1022)
	at groovy.lang.Closure.call(Closure.java:414)
	at groovy.lang.Closure.call(Closure.java:408)
	at groovy.lang.Closure.run(Closure.java:495)
	at java.lang.Thread.run(Thread.java:748)

Files.exists false positive

Given a bucket containing a file named, for example alpha.txt, the method Files.exists will return true for any file whose name is a prefix of the alpha.txt string.

Directory traversal fails when an homonymous file exists

Directory traversal fails when a bucket contains a file and a directory having the same name. For example:

s3://bucket-name
            |- foo
            \- foo
                  |- file1.txt
                  \- file2.txt

The bucket traversal stops at s3://bucket-name/foo level.

s3 upload failure for large file >1GB

Bug report

I am getting following error while trying to upload a large (>1GB) file to s3.

Feb-11 23:51:50.487 [pool-4-thread-3] DEBUG com.upplication.s3fs.S3OutputStream - Failed to upload part 3 attempt 1 for bucket: methylseq, key:   somefile.bam versionId: null -- Caused by: Failed to reset the request input stream;  If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)

Expected behavior and actual behavior

(Give an brief description of the expected behavior
and actual behavior)

Steps to reproduce the problem

tmpfile = Channel.fromPath('mytestfile)
process setExample {
    publishDir 's3://mybucket/test_s3new', mode: 'copy' , overwrite: true
    input:
    file myfile from tmpfile

    output:
    file myfile into tmp

    """
    echo $myfile
    """

}

Program output

Feb-11 23:51:50.487 [pool-4-thread-3] DEBUG com.upplication.s3fs.S3OutputStream - Failed to upload part 3 attempt 1 for bucket: methylseq, key: test_s3new/26415_8#1_1s.fq_000000.gz_val_1_bismark_bt2_pe.deduplicated.bam, versionId: null -- Caused by: Failed to reset the request input stream;  If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
Feb-11 23:51:51.744 [pool-4-thread-1] DEBUG com.upplication.s3fs.S3OutputStream - Failed to upload part 1 attempt 2 for bucket: methylseq, key: test_s3new/26415_8#1_1s.fq_000000.gz_val_1_bismark_bt2_pe.deduplicated.bam, versionId: null -- Caused by: Failed to reset the request input stream;  If the request involves an input stream, the maximum stream buffer size can be configured via request.getRequestClientOptions().setReadLimit(int)
Feb-11 23:51:52.036 [pool-4-thread-4] DEBUG com.upplication.s3fs.S3OutputStream - Failed
Feb-11 23:30:05.442 [pool-4-thread-8] ERROR com.upplication.s3fs.S3OutputStream - Upload: 2~zD0KjZq5in0DbQxmC-Qr5_2KlHCFX33 > Error for part: 8
Caused by: java.io.IOException: Failed to upload multipart data to Amazon S3
        at com.upplication.s3fs.S3OutputStream.uploadPart(S3OutputStream.java:439)
        at com.upplication.s3fs.S3OutputStream.access$000(S3OutputStream.java:68)
        at com.upplication.s3fs.S3OutputStream$1.run(S3OutputStream.java:345)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: com.amazonaws.SdkClientException: Unable to execute HTTP request: The target server failed to respond
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1116)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1066)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
        at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
        at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4365)
        at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4312)
        at com.amazonaws.services.s3.AmazonS3Client.doUploadPart(AmazonS3Client.java:3338)
        at com.amazonaws.services.s3.AmazonS3Client.uploadPart(AmazonS3Client.java:3323)
        at com.upplication.s3fs.S3OutputStream.uploadPart(S3OutputStream.java:472)
        at com.upplication.s3fs.S3OutputStream.uploadPart(S3OutputStream.java:434)
        ... 7 more
Caused by: org.apache.http.NoHttpResponseException: The target server failed to respond
        at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:141)
        at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
        at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
        at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
        at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
        at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
        at com.amazonaws.http.protocol.SdkHttpRequestExecutor.doReceiveResponse(SdkHttpRequestExecutor.java:82)
        at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
        at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
        at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
        at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)

Environment

Nextflow version: [nextflow version 18.10.1.5003]
Java version: [build 1.8.0_171-b11]
Operating system: [ Ubuntu]

Additional context

It always works with smaller files.
and with large files using:
aws s3 cp

Listing the content of a bucket results in an endless loop

For example:

file('s3://cbcrg-eu/').list().each { println it }

Key cannot be empty error when list S3 bucket

When listing the content of an S3 bucket without specifying any subdirectory e.g. s3://bucket/* the following error is returned:

ERROR ~ Key cannot be empty

Error stack trace

java.lang.IllegalArgumentException: Key cannot be empty
	at com.amazonaws.util.ValidationUtils.assertStringNotEmpty(ValidationUtils.java:89)
	at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1421)
	at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1316)
	at com.upplication.s3fs.AmazonS3Client.getObject(AmazonS3Client.java:110)
	at com.upplication.s3fs.util.S3ObjectSummaryLookup.getS3Object(S3ObjectSummaryLookup.java:197)
	at com.upplication.s3fs.util.S3ObjectSummaryLookup.lookup(S3ObjectSummaryLookup.java:88)
	at com.upplication.s3fs.S3FileSystemProvider.readAttributes(S3FileSystemProvider.java:643)
	at java.nio.file.Files.readAttributes(Files.java:1737)
	at java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219)
	at java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
	at java.nio.file.FileTreeWalker.walk(FileTreeWalker.java:322)
	at java.nio.file.Files.walkFileTree(Files.java:2662)
	at nextflow.file.FileHelper.visitFiles(FileHelper.groovy:729)
	at nextflow.file.PathVisitor.pathImpl(PathVisitor.groovy:142)
	at nextflow.file.PathVisitor.applyGlobPattern0(PathVisitor.groovy:110)
	at nextflow.file.PathVisitor.apply(PathVisitor.groovy:70)
	at nextflow.file.PathVisitor$_applyAsync_closure1.doCall(PathVisitor.groovy:76)
	at nextflow.file.PathVisitor$_applyAsync_closure1.call(PathVisitor.groovy)

Add support for canned ACL for multi-part uploads

The support for cannel ACL should be added to multi-part uploads request. The following error is thrown otherwise

Access Denied (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: 9FF5JKN9DRTAMQWN; S3 Extended Request ID: fIs6JStuclo89RTAWGfXQcQSlVPBxv06HaLx12+fdMbRmfK/5myMvJ0pwGgr9fhGl5DB2pP3HKM=)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1712)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1367)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1113)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:770)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:744)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:726)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:686)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:668)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:532)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:512)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4914)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4860)
	at com.amazonaws.services.s3.AmazonS3Client.initiateMultipartUpload(AmazonS3Client.java:3552)
	at com.upplication.s3fs.AmazonS3Client.multipartCopyObject(AmazonS3Client.java:237)
	at com.upplication.s3fs.S3FileSystemProvider.copy(S3FileSystemProvider.java:572)

Implement S3 multipart parallel upload protocol

The largest object that can be uploaded in a single PUT is 5 gigabytes.

The client needs to implement the S3 multipart upload protocol to allow files bigger than 5GB to be stored in the S3 storage.

File access to public s3 bucket fails when checkIfExists: true

Also raised on:
nextflow-io/nextflow#1055

Bug report

Expected behavior and actual behavior

testfile = Channel.fromPath("s3://ref/test/genome.fa.fai", checkIfExists: true)
Mar-01 11:56:36.524 [PathVisitor-1] ERROR nextflow.Channel - null (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID:-XXXXXX -default; S3 Extended Request ID: 144c2bfe-default-default)

When using private s3 bucket file access works with both true and false.
When checkIfExists: false it works fine.

Steps to reproduce the problem

cat test.nf
testfile = Channel.fromPath("s3://ref/test/genome.fa.fai", checkIfExists: true)

nextflow run test.nf

Program output

Mar-01 12:28:14.810 [PathVisitor-1] ERROR nextflow.Channel - null (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: tx000000000000003e8cd66-005c79255e-13d7a414-default; S3 Extended Request ID: 13d7a414-default-default)
com.amazonaws.services.s3.model.AmazonS3Exception: null (Service: Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID: tx000000000000003e8cd66-005c79255e-13d7a414-default; S3 Extended Request ID: 13d7a414-default-default)

Environment

Nextflow version: 19.01.0.5050
Java version: [?]
Operating system: [ Linux]

Additional context

(Add any other context about the problem here)

AWS ACL ownerships as default

Hello,

Can we add CannedAccessControlList.BucketOwnerFullControl so that the bucket owner can have access to the files that are published to the bucket. Not having this, causes issues when the files are being published to buckets with different owners.

git diff src/main/java/com/upplication/s3fs/S3OutputStream.java
diff --git a/src/main/java/com/upplication/s3fs/S3OutputStream.java b/src/main/java/com/upplication/s3fs/S3OutputStream.java
index 5034756..abcb43c 100644
--- a/src/main/java/com/upplication/s3fs/S3OutputStream.java
+++ b/src/main/java/com/upplication/s3fs/S3OutputStream.java
@@ -545,7 +545,8 @@ public final class S3OutputStream extends OutputStream {
         meta.setContentLength(contentLength);
         meta.setContentMD5( Base64.encodeAsString(checksum) );
 
-        final PutObjectRequest request = new PutObjectRequest(objectId.getBucket(), objectId.getKey(), content, meta);
+        final PutObjectRequest request = new PutObjectRequest(objectId.getBucket(), objectId.getKey(), content, meta)
+                .withCannedAcl(CannedAccessControlList.BucketOwnerFullControl);
 
         if (storageClass != null) {
             request.setStorageClass(storageClass);

nextflow-io / nextflow-s3fs Goto Github PK

nextflow-s3fs's People

Contributors

Stargazers

Watchers

Forkers

nextflow-s3fs's Issues

Nextflow Version

Nextflow Error reported

Bug report

Expected behavior and actual behavior

Steps to reproduce the problem

Program output

Environment

Additional context

Bug report

Expected behavior and actual behavior

Steps to reproduce the problem

Program output

Environment

Additional context

Recommend Projects

Recommend Topics

Recommend Org