Comments (2)
PR #119 shoud alleviate this problem.
It tracks the presence of dictionary encoding and assumes it's a good proxy for choosing whether to cache duplicated ByteArray
entries or not. This is currently only enabled for string
as they are immutable. But the same logic can easily be applied to byte[]
(probably behind an optional flag, as they are not immutable and could confuse the user if he modifies these arrays).
Local testing has shown that if dictinary encoding is disabled, Parquet C++ returns different ByteArray
entries even if the string is identical. But that with dictinary encoding enabled, the entries are properly merged (which makes sense). Testing equality of ByteArray
is quite cheap (a pointer and length), so I expect this to be a winner in most scenarios.
from parquetsharp.
Released in 2.2.0-beta1.
from parquetsharp.
Related Issues (20)
- Question: Is there a way to write to a partitioned directory HOT 3
- Question: Append FileMetaData.KeyValueData? HOT 2
- Unable to read folder with parquet files without _metadata/_common_metadata HOT 2
- How do I register data types with ParquetRowWriter? HOT 3
- [FEATURE REQUEST]: Add new tags to optimize for GitHub and Google search HOT 2
- Illegal Parquet type: INT64 (TIMESTAMP_MICROS) error when Importing parquet in Microsoft Fabric/OneLake HOT 2
- [FEATURE REQUEST]: Social Media Image improvements HOT 1
- How to get column value enumerator If I don't know ahead of time the types for each column HOT 1
- [FEATURE REQUEST]: Support for .NET Standard 2.0 HOT 3
- [BUG]: Segfault when writing to a column writer after getting the next column writer
- [BUG]: unable to install on Windows using Nuget: The specified source 'parquetsharp.13.0.0.nupkg' is invalid. HOT 3
- [BUG]: ParquetSharp 13.0.0 crashes with segmentation fault on alpine:3.18 HOT 14
- [FEATURE REQUEST]: LogicalWriter<> and .LogicalReader<> for object type, or the type specified at runtime HOT 12
- [FEATURE REQUEST]: Strong Name HOT 2
- Upgrade to Arrow C++ 15.0.0 HOT 1
- Support round-tripping Half values using the new Float16 logical type
- [FEATURE REQUEST]: DateOnly and TimeOnly for the new net6 target HOT 3
- [BUG]: SIGABRT in C++ ParquetFileReader destructor HOT 3
- Null value for column read while non-null expected HOT 6
- [FEATURE REQUEST]: Windows ARM Support HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parquetsharp.