Comments (7)
This is actually an issue with the httpclient version. After sleuthing around the classpath and maven dependency tree, it appears that the aws-java-sdk-s3 dependency, which in streamx is currently set at 1.11.69, pulls in httpclient version 4.5.1. It seems aws-java-sdk-s3 actually needs to be downgraded? I'm actually not sure how this is working for other folks. Downgrading aws-java-sdk-s3 to version 1.10.77 pulls in httpclient version 4.3.6 which appears to solve the java.lang.NoSuchFieldError: INSTANCE
error, however, a new error appears:
Caused by: java.lang.NoSuchMethodError: com.amazonaws.services.s3.transfer.TransferManager.<init>(Lcom/amazonaws/services/s3/AmazonS3;Ljava/util/concurrent/ThreadPoolExecutor;)V
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:287)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.getUnique(FileSystem.java:2675)
at org.apache.hadoop.fs.FileSystem.newInstance(FileSystem.java:418)
at com.qubole.streamx.s3.S3Storage.<init>(S3Storage.java:49)
... 16 more
This apparently is a known issue related to an incompatibility with hadoop version 2.7 and aws-java-sdk >1.7. After trying a few different versions of aws-java-sdk-s3, I ended up just deleting the dependency entirely which resolved the issue.
from streamx.
Hi @zzbennett Sorry to respond late. Yes, we have found multiple issues with S3A (thread leak and httpClient related issues). So far, the experience with using NativeS3FileSystem is very stable. Can you try that out instead ?
from streamx.
Thanks for your reply @PraveenSeluka. I'm not able to use NativeS3FileSystem because it doesn't support aws's temporary security credentials, which is what I'm using. There is a ticket open in the hadoop community to add support for temporary security credentials but they have decided not to implement it as s3a already supports it, and (according to the third comment on this thread) they are not planning on making any more enhancements to the s3n connector. So, sadly, s3n will never support temporary security tokens, but I cannot get s3a to work with streamx. The dependency issue appears to have resolved when I deleted the aws-java-sdk-s3 dependency though, so I'm unblocked on that issue for now. I'm still not able to connect to S3 due to an access denied 403 error, so hopefully once that is resolve, things will start working.
from streamx.
@zzbennett You are right. They are not going to add the Roles (temp creds) support in S3N and S3A is the way forward. I will look into this issue and get back soon.
from streamx.
Regarding the S3 403 error, I resolved that by deleting the access_key and secret_key configs from the hadoop hdfs-site.xml config file. Streamx seems to be working smoothly now. Really the only thing I ended up doing was deleting the aws-java-sdk-s3 dependency from the streamx pom.xml file.
from streamx.
Yeah right, you need to remove the keys (or it wont use roles and the keys are invalid). I will add a note for this.
from streamx.
@zzbennett Please look at #30 for issues related to S3A.
from streamx.
Related Issues (20)
- Folders as files with $ dollar sign in their name when using s3n HOT 5
- NullPointerException when tasks.max > 1 and using s3a HOT 2
- Writing Avro data HOT 4
- NullPointer exception using DBWAL and s3n
- Error while running copy job in standalone mode
- Support Openstack Swift object store HOT 2
- Tests failing? HOT 3
- JSON records to Parquet on S3 won't work HOT 1
- S3 to Kafka HOT 1
- S3 partition file per hourly batch HOT 1
- `NoSuchMethodError` on Kafka 0.11.0.0 HOT 2
- What is the suggested way to configure path in s3 bucket? HOT 1
- can we store the data as txt/json format in s3?
- Strange problem of Parquet files in S3
- saving json data , partition by specific field (timestamp)
- Not a valid partitioner class: io.confluent.connect.hdfs.partitioner.DefaultPartitioner HOT 2
- Do I have to set up HDFS in order to use streamX? HOT 1
- from kafka(avro format) to S3 (Parquet format)
- AWS Athena connection
- Update README.md on using AWS IAM ROLES
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from streamx.