Comments (13)
julia> scitype([1.0, 2.3, missing, 3.5])
AbstractVector{Union{Missing, Continuous}} (alias for AbstractArray{Union{Missing, Continuous}, 1})
from tabletransforms.jl.
Yes, I guess we could generalize assert_continuous to work in the presence of missing values.
from tabletransforms.jl.
Let's postpone this discussion to after v1.0 is out. We are working on the missing transforms in #13 and will soon release this first major version. The API is quite stable and we can't predict any major change in the future. Missing values can potentially change API and will be considered in a future major release.
from tabletransforms.jl.
I generalized assert_continuous to work in the presence of missing values, but this results in errors during the transformation process. For example:
MethodError: Cannot `convert` an object of type Missing to an object of type Float64
I can think of several alternatives:
- Automatically deal with missing values before the transformation to avoid these errors
- Simply generalize assert_continuous and allow rest of the transform to throw these errors
- Create a helpful warning message in assert_continuous function
- Do nothing and let assert_continuous to assert that the table is not continuous.
Should I implement one of these? Or maybe something else?
from tabletransforms.jl.
That is a great suggestion @ablaom , if I understood correctly you are suggesting that we use scitype on the column and then check if the result is <: AbstractVector{Union{Missing,Continuous}} ?
from tabletransforms.jl.
@ceferisbarov try to use scitype as @ablaom suggested, happy to review a PR.
from tabletransforms.jl.
@juliohm Great, I am working on it. The problem is that AbstractVector{Continuous} <: AbstractVector{Union{Missing,Continuous}}
returns false, but
Continuous <: Union{Missing, Continuous}
returns true. Is there a reason why I shouldn't use the latter? The following line:
@assert all(T <: Continuous for T in types) "columns must hold continuous variables"
would be replaced with:
@assert all(T <: Union{Missing, Continuous} for T in types) "columns must hold continuous variables"
from tabletransforms.jl.
I think we need two distinct assertion functions. The one we have is more strict in the sense that it asserts that we don't have missing values. Maybe we should add a new function assert_continuous_or_missing if a transform supports missing values. Can you please remind me why we started discussing this generalization? Do we really need to allow missing values in our currently implemented transforms? As far as I remember none of our statistical transforms, which require continuous values, support missing values.
from tabletransforms.jl.
@juliohm The original request was to skip missing values. This can be done at least for Center transform. I believe we can create assert_continuous_or_missing
as you said, and use it for transforms where we can skip missing values. And then we would have to add skip missing values option to the said transform.
from tabletransforms.jl.
An example:
x1 = [1.0, 2.0, 3.0, 4.0, 5.0]
x2 = [missing, 2.0, 3.0, 4.0, 5.0]
x3 = [5.0, 5.0, 5.0, 5.0, 5.0]
t = TypedTables.Table(;x1, x2, x3)
t |> Center(skipmissing=true)
Output:
Table with 3 columns and 5 rows:
x1 x2 x3
┌───────────────────
1 │ -2.0 missing 0.0
2 │ -1.0 -1.5 0.0
3 │ 0.0 -0.5 0.0
4 │ 1.0 0.5 0.0
5 │ 2.0 1.5 0.0
from tabletransforms.jl.
The problem is that
AbstractVector{Continuous} <: AbstractVector{Union{Missing,Continuous}}
returns false
Maybe this is what you're after:
julia> AbstractVector{Continuous} <: AbstractVector{<:Union{Missing,Continuous}}
true
from tabletransforms.jl.
@ablaom This works, thanks! Do you mind looking look at the PR?
from tabletransforms.jl.
This is no longer an issue with the migration to DataScienceTraits.jl.
from tabletransforms.jl.
Related Issues (20)
- `Select`/`Reject` could accept DataScienceTraits.jl
- Add `Unitify` transform to add units to columns
- Add `Compose` transform for CoDa
- Add `StdFeats` transform to standardize features
- Add `Except` transform as the opposite of the `Only` transform
- Add performance tests with GeoTable HOT 1
- Rename output columns in some statistical transforms
- Refactor assertion infrastructure
- Allow multiple columns in `DropExtrema`
- Add `DropConstant` transform to drop constant columns
- Revert of `DropConstant` assumes same number of rows
- Allow vector of pairs in `Rename` transform
- `Rename` could accept a modification function
- Add `Coerce(SciType)` option
- Clean warnings in test suite
- Allow `Select(Continuous)` besides the other methods HOT 1
- `Unitify()` should work in the presence of non-continuous variables
- Update to latest `Documenter` HOT 1
- Add `Unit` transform to convert units
- Use `Distributed` processes in `ParallelTableTransforms`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tabletransforms.jl.