Comments (5)
Did you try the BUCKET
index, it will distribute the keys evenly among buckets, while the bloom_filter index will always try to append to the small buckets first.
from hudi.
@danny0405 thanks you replay. current we do not try bucket index .do you mean in the current situation that we use bloom index. if we don't change index type ,then we want file evenly distributed .we only can reimport table currently
from hudi.
@wkhappy1 Did you tried clustering to fix the file size on existing table
from hudi.
@ad1happy2go sorry,I haven't tried it yet because this code has been running in production for a long time. Are there any considerations or documentation links you can provide if switching from bucket index to bloom index? Thank you very much
from hudi.
@ad1happy2go I also have a question, is it possible that if I don't want to change the index type, I can rewrite the table, as it's not very large?
from hudi.
Related Issues (20)
- [DISCUSSION] Deltastreamer - Reading commit checkpoint from Kafka instead of latest Hoodie commit HOT 5
- [SUPPORT]Hudi Deltastreamer compaction is taking longer duration HOT 6
- [SUPPORT]Performance degrade for migrating from Hudi 0.7 to Hudi 0.14 HOT 6
- [SUPPORT] Pulsar connection error for Hoodie Streamer HOT 1
- Failed insert schema compatibility mismatch issue HOT 9
- [SUPPORT] Datadog Metrics reporter fails with null pointer exception using hudi 0.14.0
- HUDI 0.14.1 and AWS GLUE 4.0 issues with schema evolution HOT 2
- [logical delete data] How to use flink-cdc to logical delete the hudi data HOT 1
- [SUPPORT] Flink bucket index partitioner may cause data skew HOT 6
- [SUPPORT] Failed to parse HoodieCommitMetadata HOT 1
- [SUPPORT] NPE when using PySpark with release-0.15.0 HOT 4
- org.apache.hudi.exception.HoodieException: org.apache.avro.AvroTypeException: Cannot encode decimal with precision 14 as max precision 13 HOT 6
- [SUPPORT] Failed to upsert for commit time xxxx ,HUDI 0.14.1 & Glue 4.0 HOT 4
- [SUPPORT] - Partial update of the MOR table after compaction with Hudi Streamer HOT 7
- [SUPPORT] Spark-Hudi: Unable to perform Hard delete using Pyspark on HUDI table from AWS Glue HOT 7
- [SUPPORT] Issue with RECORD_INDEX Initialization Falling Back to GLOBAL_SIMPLE HOT 1
- duplicated records when use insert overwrite HOT 4
- [SUPPORT] CVE problems in latest 0.14.1
- [SUPPORT] using spark's observe feature on dataframes saved by hudi is stuck HOT 3
- Corrupted parquet file in hudi partition | Deletion of partition in Hudi HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hudi.