Comments (1)
This is a bug. The statistics implementation is correct in ignoring NaN.
NaN values should be ignored for the purpose of min and max statistics 1
The scanner needs to know whether there are NaN values in the column. I like the idea of adding a nan_count
column. If it's missing we can assume the count is unknown and that statistics can't be used for now. You are correct that we'd have to do upstream work in DataFusion to get the simplification working.
Footnotes
from lance.
Related Issues (20)
- Change the writer default to write v2 files HOT 1
- Compact with many thread could fail with commit conflict HOT 1
- More sophisticated simplification in v2 zone maps scheduler
- Support for nested fields in v2 pushdown
- Concurrent create dataset gives bad error, doesn't retry
- Document Lance compatibility with python multiprocessing
- feat(rust): expose Dataset uri in Rust HOT 1
- Android build fails due to `aarch64` in Rust HOT 1
- feat(encoding): support FSST encoding
- Optimize IOPS for opening fragment
- Make schema cheaper to manipulate HOT 1
- Cache open fragments HOT 1
- Use doctest on Read and Write Lance
- test: test_read_consistency_interval is failing on main
- PyTorch Integration throws error with `batch_readahead` argument HOT 1
- feat: Support sparse vector HOT 1
- Epic: Primary keys
- "error: Unrecognized option: 'diagnostic-width'" when first run test in visual code on mac HOT 1
- Failed to get AWS credentials: an error occurred while loading credentials HOT 1
- Native support for PIL Images in Lance
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lance.