Comments (4)
Seems like a good idea to me. I'll probaly look into this during the weekend (no promises though). Could be an easy win if it just about passing a command line argument to the parquet writer.
from odbc2parquet.
Awesome, I'll give it a try! Appreciate the update.
If anyone runs into a similar issue or needs custom options and has python accessible, here is a short script to compress an existing parquet file.
import pyarrow as pa
import pyarrow.parquet as pq
import sys
pq_file = pq.ParquetFile(sys.argv[1])
with pq.ParquetWriter(sys.argv[2], pq_file.schema_arrow, compression='ZSTD') as writer:
for ri in range(pq_file.num_row_groups):
table = pq_file.read_row_group(ri)
writer.write_table(table)
from odbc2parquet.
New version 0.6.1
published with support for --column-compression-default
command line option. New default is gzip
. I only did the minimal thing here. There are a lot more option which could be forwarded like encoding or the ability to control encoding/compression for an individual column.
Let me know if this is already good enough for your use case, or if more is needed (or at least nice to have).
Cheers, Markus
from odbc2parquet.
Thank you, for you feedback and script!
from odbc2parquet.
Related Issues (20)
- Option to not generate file if row count is 0 HOT 4
- setup types for particular column HOT 2
- Issue with MySQL JSON columns HOT 8
- Reserved Column Names not Supported HOT 1
- Feature Request - Support column encryption in the generated parquet file HOT 4
- JobName as .sql file in config file HOT 4
- Parquet format version support HOT 9
- Feature suggestion: connect to URL `postgresql://username:pass@host/database` HOT 1
- What permissions are needed? - State: 42501, Native error: 1, Message: ERROR: permission denied HOT 4
- StarRocks parquet file import of parquet file generated by odbc2parquet fails with encoding error HOT 11
- Memory allocation with column-length-limit HOT 11
- Build for alpine HOT 8
- file-size-threshold generates wrong size files HOT 1
- --no-empty-file option doesn't work properly when row-groups-per-file should devide result into few files HOT 6
- MSSQL nvarchar - missing column in output file HOT 2
- Feature request: Progress bar for full table copies HOT 6
- Data source must return valid UTF16 in wide character buffer: Utf16Error HOT 4
- Write statistics HOT 14
- Make zstd the default compression HOT 4
- Build release assets for Ubuntu ARM64 as well HOT 11
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from odbc2parquet.