Comments (2)
Very happy to know that StreamX helps. Here is a sample configuration
{"name":"s3connect",
"config":
{
"connector.class":"com.qubole.streamx.s3.S3SinkConnector",
"tasks.max":"1",
"flush.size":"3000",
"s3.url":"s3://streamx/demo/",
"hadoop.conf.dir":"/usr/lib/hadoop2/etc/hadoop/",
"topics":"clickstream",
"rotate.interval.ms":"60000",
"partitioner.class":"io.confluent.connect.hdfs.partitioner.TimeBasedPartitioner",
"partition.duration.ms":"3600000",
"path.format":"'year'=YYYY/'month'=MM/'day'=dd/'hour'=HH/",
"locale":"en",
"timezone":"GMT",
"hive.metastore.uris":"thrift://localhost:10000",
"hive.integration":"true",
"schema.compatibility":"BACKWARD"
}}
In general, users like to add hourly partitions. The above config will create a new partition every 3600000 ms (1 hour). It will have "s3://streamx/demo/topics/clickstream/year=2016/month=09/day=21/hour=22/" as directory.
Regarding hive integration, it packages the required hive dependencies. You need a hive metastore server running somewhere and point streamx to that. Every 1 hour, it will do add partition call to update the hive table.
In connect-distributed.properties,
You need to use AvroConverter and SchemaRegistry to store Avro schemas. (Avro message in Kafka, and Avro output to s3)
key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://localhost:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://localhost:8081
Let us know if you need any more information.
Thanks
Praveen
from streamx.
Hello Praveen,
Thank you so much for your answer. I tested with this configuration and this is working perfectly :)
Amazing job on this project guys ! Thanks !
Jocelyn
from streamx.
Related Issues (20)
- Folders as files with $ dollar sign in their name when using s3n HOT 5
- NullPointerException when tasks.max > 1 and using s3a HOT 2
- Writing Avro data HOT 4
- NullPointer exception using DBWAL and s3n
- Error while running copy job in standalone mode
- Support Openstack Swift object store HOT 2
- Tests failing? HOT 3
- JSON records to Parquet on S3 won't work HOT 1
- S3 to Kafka HOT 1
- S3 partition file per hourly batch HOT 1
- `NoSuchMethodError` on Kafka 0.11.0.0 HOT 2
- What is the suggested way to configure path in s3 bucket? HOT 1
- can we store the data as txt/json format in s3?
- Strange problem of Parquet files in S3
- saving json data , partition by specific field (timestamp)
- Not a valid partitioner class: io.confluent.connect.hdfs.partitioner.DefaultPartitioner HOT 2
- Do I have to set up HDFS in order to use streamX? HOT 1
- from kafka(avro format) to S3 (Parquet format)
- AWS Athena connection
- Update README.md on using AWS IAM ROLES
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from streamx.