Comments (3)
you are right, we should enode the partition path for these special characters.
from hudi.
@eshu I tried to insert these values and at least read/write worked fine. I do understand in case of slash it created the inner sub folder. Were you able to make it work by encoding them. Let us know in case you need any other help here or Feel free to close if all good.
columns = ["ts","uuid","rider","driver","fare","city"]
data =[(1695159649087,"334e26e9-8355-45cc-97c6-c31daf0df330","rider-A","driver-K",19.10,"san francisco"),
(1695091554788,"e96c4396-3fad-413a-a942-4cb36106d721","rider-B","driver-L",27.70 ,"san-francisco"),
(1695046462179,"9909a8b1-2d15-4d3d-8ec9-efc48c536a00","rider-C","driver-M",33.90 ,"san_francisco%"),
(1695516137016,"e3cf430c-889d-4015-bc98-59bdce1e530c","rider-C","driver-N",34.15,"sao/paulo")]
spark = get_spark_session(spark_version="3.2", hudi_version="0.13.0")
inserts = spark.createDataFrame(data).toDF(*columns)
hudi_options = {
'hoodie.table.name': tableName,
'hoodie.datasource.write.recordkey.field' : 'uuid',
'hoodie.datasource.write.precombine.field' : 'ts',
'hoodie.datasource.write.partitionpath.field': 'city',
}
# Insert data
inserts.write.format("hudi"). \
options(**hudi_options). \
mode("overwrite"). \
save(basePath)
spark.read.format("hudi").load(basePath).show()
![image](https://private-user-images.githubusercontent.com/63430370/310177122-45ae12e6-09e4-473b-87be-a05dc212d57f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTAxOTU5NzIsIm5iZiI6MTcxMDE5NTY3MiwicGF0aCI6Ii82MzQzMDM3MC8zMTAxNzcxMjItNDVhZTEyZTYtMDllNC00NzNiLTg3YmUtYTA1ZGMyMTJkNTdmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDAzMTElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwMzExVDIyMjExMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWIzNGJkNjI3NDg2ZmQ5NDY1YWExYWU2ODhiOTA1YTMxYTVmOGY3MDFhZmNmMzc2YjU3ZDdjOWJlZmI4ZWI4MTAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.1RcDOkVRB0oAXxhML5ebNt4-cHO1dxdd5yv6lKHkIo4)
from hudi.
Similar jira raised to fix this issue - https://issues.apache.org/jira/browse/HUDI-7484
from hudi.
Related Issues (20)
- [SUPPORT] spark-sql query results in duplicates, duplicates span multiple files in the same file group HOT 1
- must specify a primary key when creating a hudi table HOT 3
- [SUPPORT] using Flink to write to Hudi in upsert mode and syncing to Hive, querying the external table in Hive gives an error:Caused by: org.apache.hudi.org.apache.avro.AvroRuntimeException: Duplicate field _hoodie_commit_time in record flink_schema: _hoodie_commit_time type:UNION pos:13 and _hoodie_commit_time type:UNION pos:0. HOT 6
- [SUPPORT] using Flink to write to Hudi in upsert mode and syncing to Hive, querying the external table in Hive gives an error
- [SUPPORT]duplicate rows in my table HOT 7
- [Support] Enable Direct Writing to OneTable Format from Hudi Delta Streamer HOT 4
- [SUPPORT] Cleaner fails with com.esotericsoftware.kryo.KryoException: java.util.ConcurrentModificationException HOT 5
- [SUPPORT] Data loss due to incorrect selection of log file during compaction HOT 13
- hudi 0.14.1 and hudi 0.14.0 build issue HOT 5
- Using Delete Partition operation with KEEP_LATEST_FILE_VERSIONS = 1 cleaner policy option evicts parquet files from S3 storage HOT 1
- [SUPPORT] OverwriteWithLatestAvroPayload could not combine record by precombineKey HOT 5
- [Inquiry] Does HoodieIndexer can Do Indexing for RLI Async Fashion HOT 6
- [SUPPORT] Schema file too large and keeps growing, OOM when http handle it HOT 6
- [SUPPORT] Query with MDT but time cost is much as not on mdt HOT 10
- [SUPPORT] - Hudi 0.12.1 - production job slowing down HOT 4
- [SUPPORT] Hudi 0.14.0 - deletion from table failing for org.apache.hudi.keygen.TimestampBasedKeyGenerator HOT 2
- [SUPPORT] Spark Read Hudi Tables with WARN Message HOT 9
- The BigQuerySyncTool can't work well when the hudi table schema changed [SUPPORT] HOT 5
- [BUG] Spark3.3 overwrite partitioned mor table failed with hudi 0.14.1 HOT 2
- [SUPPORT] hudi0.14.0: Insert data into hudi with spark or create a new table exception HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hudi.