Giter Club home page Giter Club logo

Comments (7)

zzbennett avatar zzbennett commented on August 10, 2024

This is actually an issue with the httpclient version. After sleuthing around the classpath and maven dependency tree, it appears that the aws-java-sdk-s3 dependency, which in streamx is currently set at 1.11.69, pulls in httpclient version 4.5.1. It seems aws-java-sdk-s3 actually needs to be downgraded? I'm actually not sure how this is working for other folks. Downgrading aws-java-sdk-s3 to version 1.10.77 pulls in httpclient version 4.3.6 which appears to solve the java.lang.NoSuchFieldError: INSTANCE error, however, a new error appears:

Caused by: java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager.<init>(Lcom/amazonaws/services/s3/AmazonS3;Ljava/util/concurrent/ThreadPoolExecutor;)V
	at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:287)
	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
	at org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:2675)
	at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:418)
	at com.qubole.streamx.s3.S3Storage.<init>(S3Storage.java:49)
	... 16 more

This apparently is a known issue related to an incompatibility with hadoop version 2.7 and aws-java-sdk >1.7. After trying a few different versions of aws-java-sdk-s3, I ended up just deleting the dependency entirely which resolved the issue.

from streamx.

PraveenSeluka avatar PraveenSeluka commented on August 10, 2024

Hi @zzbennett Sorry to respond late. Yes, we have found multiple issues with S3A (thread leak and httpClient related issues). So far, the experience with using NativeS3FileSystem is very stable. Can you try that out instead ?

from streamx.

zzbennett avatar zzbennett commented on August 10, 2024

Thanks for your reply @PraveenSeluka. I'm not able to use NativeS3FileSystem because it doesn't support aws's temporary security credentials, which is what I'm using. There is a ticket open in the hadoop community to add support for temporary security credentials but they have decided not to implement it as s3a already supports it, and (according to the third comment on this thread) they are not planning on making any more enhancements to the s3n connector. So, sadly, s3n will never support temporary security tokens, but I cannot get s3a to work with streamx. The dependency issue appears to have resolved when I deleted the aws-java-sdk-s3 dependency though, so I'm unblocked on that issue for now. I'm still not able to connect to S3 due to an access denied 403 error, so hopefully once that is resolve, things will start working.

from streamx.

PraveenSeluka avatar PraveenSeluka commented on August 10, 2024

@zzbennett You are right. They are not going to add the Roles (temp creds) support in S3N and S3A is the way forward. I will look into this issue and get back soon.

from streamx.

zzbennett avatar zzbennett commented on August 10, 2024

Regarding the S3 403 error, I resolved that by deleting the access_key and secret_key configs from the hadoop hdfs-site.xml config file. Streamx seems to be working smoothly now. Really the only thing I ended up doing was deleting the aws-java-sdk-s3 dependency from the streamx pom.xml file.

from streamx.

PraveenSeluka avatar PraveenSeluka commented on August 10, 2024

Yeah right, you need to remove the keys (or it wont use roles and the keys are invalid). I will add a note for this.

from streamx.

PraveenSeluka avatar PraveenSeluka commented on August 10, 2024

@zzbennett Please look at #30 for issues related to S3A.

from streamx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.