when Doing partition and writing data: tenant i find write data skew <a target

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

[SUPPORT] write data skew when cow type table write data to parquet about hudi HOT 5 OPEN

wkhappy1 commented on June 18, 2024

[SUPPORT] write data skew when cow type table write data to parquet

from hudi.

Comments (5)

danny0405 commented on June 18, 2024

Did you try the BUCKET index, it will distribute the keys evenly among buckets, while the bloom_filter index will always try to append to the small buckets first.

from hudi.

wkhappy1 commented on June 18, 2024

@danny0405 thanks you replay. current we do not try bucket index .do you mean in the current situation that we use bloom index. if we don't change index type ,then we want file evenly distributed .we only can reimport table currently

from hudi.

ad1happy2go commented on June 18, 2024

@wkhappy1 Did you tried clustering to fix the file size on existing table

from hudi.

wkhappy1 commented on June 18, 2024

@ad1happy2go sorry,I haven't tried it yet because this code has been running in production for a long time. Are there any considerations or documentation links you can provide if switching from bucket index to bloom index? Thank you very much

from hudi.

wkhappy1 commented on June 18, 2024

@ad1happy2go I also have a question, is it possible that if I don't want to change the index type, I can rewrite the table, as it's not very large?

from hudi.

[SUPPORT] write data skew when cow type table write data to parquet about hudi HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent