Comments (6)
Looking at this, I think we have too many different constructors for PDA's.
If we're going to start enforcing the referential integrity of PDA's, then I don't see any reason to allow users to specify the pool. The constructors for PDA's should be exactly like the constructors for DA's: you can specify data + missingness or just data.
from dataarrays.jl.
Specifying the pool can increase performance (both memory and speed), since you know the number of bits you need to store the references. It may also be used as a way to check that no unexpected values appear in the data (though this could of course be checked later).
from dataarrays.jl.
If you need performance you could always use a RefArray. The argument that it lets you check values seems to be the mirror image of the "it lets you create an invalid PDA" argument.
from dataarrays.jl.
I thought RefArrays
were not supposed to be used out of the package itself (#13). I don't think providing a list of values should allow creating an invalid PDA. It would simply ensure that your expectations about the levels that are in the data are correct -- the resulting PDA should be correct in all cases.
Anyway, I don't really care, I was just exposing the possible counter arguments. It may be simpler to accept an optional argument giving the expected maximum number of levels -- or to do nothing for now, as it can always be added later easily.
from dataarrays.jl.
Yes, we shouldn't export RefArray's. But people who are sufficiently interested in performance will often try digging into internals even when we tell them not to.
I think I might just be missing something about your point, but it seems like we can only ensure that the passed pool is correct if we do a bunch of computations to check correctness, which might be just as costly as creating the pool to begin with. It seems like we'd need some substantial benchmarks to really know.
Offering a constructor that says what the ref size should be seems totally reasonable. I mostly just want us to stop exposing the pool at all, because I think it should be an implementation detail, not a property of PDA's. The property of PDA's is that they make it efficient to work with functions like unique
and `levels.
My main argument for removing a lot of these constructors is that they only make sense if you intend to allow PDA's to not guarantee referential integrity. Since R allows that, we ended up implementing a lot of functionality that I think we would be better off without. We have too many features and our code is too buggy. I'd like to offer fewer features and more guarantees that everything really works as claimed.
from dataarrays.jl.
In my mind you would create the pool, iterate over the input array and assign elements a reference. If you encounter a value that was not passed in the pool, throw an error. If you find a level in the pool with no value in the data, throw an error at the end of the process too (or drop it, not sure what's best).
But let's go with only an argument specifying the number of unique values (i.e. size of the pool).
from dataarrays.jl.
Related Issues (20)
- #undef/uninitialised values from unique on PooledDataArray
- Get rid of @ngenerate
- Missing DataArray{T}(dims) constructors HOT 1
- MethodError in pooleddataarray.jl in Julia 0.6 HOT 7
- Inference failure/indexing issues HOT 14
- View on DataArray makes copy HOT 1
- broadcast() inference issue with mean() and > HOT 5
- In v 0.7 precompile fails with typeassert HOT 1
- `safe_mapslices` at test/reducedim.jl is broken on 0.7 HOT 2
- Can't load DataArrays HOT 9
- Weighted mean broken for matrices HOT 1
- Avoid copying input Array{Union{T, Null}} in DataArray{T} constructor
- getindex broken when using multidimensional index of DataArray{Bool}
- Division DataArray{Int} / Int results in Array{Any} HOT 2
- doesn't load on julia v0.7 because of `printf` HOT 1
- PooledDataArray depreciated? HOT 1
- Julia v0.7 compatibility HOT 9
- "UndefVarError: centralizedabs2fun not defined" while precompiling HOT 3
- Julia-1.0.0 Compile Error HOT 3
- Package compatibility caps
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dataarrays.jl.