juliastats / dataarrays.jl Goto Github PK
View Code? Open in Web Editor NEWDEPRECATED: Data structures that allow missing values
License: Other
DEPRECATED: Data structures that allow missing values
License: Other
#24 fixes this.
As mentioned in JuliaData/DataFrames.jl#523, we might want to expose an "unsafe" interface to the underlying values of a DataArray for those trying to do high-performance work:
NA
by making isna
return a reference. We could also implementing complex indexing for isna(da, inds...)
but that seems like a lot of needless work.values(da)
, which will have undefined values for any NA
entries.This would put us in a position to write code like:
dm = @data([1 2; 3 4])
isna(dm)[1, 1]
values(dm)[1, 1]
We could make this code very fast because it would be perfectly type stable. As a (probably too) radical step, we could even change getindex
to implement the semantics of values
.
julia> da = DataArray([1,2,3])
3-element DataArray{Int64,1}:
1
2
3
julia> padNA(da, 1, 1)
5-element DataArray{Int64,1}:
261993005056
1
2
3
140720308486144
Bizarre, not sure how this happens.
We have several types of constructors for DataArrays that have no parallels in Base Julia. We should remove them.
This includes things like DataArray(3)
and DataArray(3, 3)
.
For unary operators, binary operators with scalar aguments, and some others (e.g. transpose
, which I'm working on now), we could make specialized versions that operate substantially faster on PooledDataArray than the current implementations for AbstractDataArray. My questions are:
Right now, we are far too liberal when recycling values. We should only allow the following behaviors:
We should definitely not follow the R lead of recycling vectors of short length until they match the length of the longer vector. Only arrays whose sizes exactly match should be allowed to interact.
As an example of what we should not allow going forward, consider the following
julia> using DataArrays
julia> x = @data([1, 2, 3, 4, 5])
5-element DataArray{Int64,1}:
1
2
3
4
5
julia> y = @data([6, 7])
2-element DataArray{Int64,1}:
6
7
julia> x[:] = y
2-element DataArray{Int64,1}:
6
7
julia> x
5-element DataArray{Int64,1}:
6
7
3
4
5
This operation does not work on Julia's normal arrays. In general, we should always try to behave exactly like Julia's normal arrays, except with NA's added in.
The @data
and @pdata
macros need to be improved to handle more types of expressions. In particular, it would be good if things like isequal(@data ones(2), DataArray(ones(2))
didn't parse beyond the comma.
ERROR: no method array(Array{Float64,1})
in model_response at /Users/johnmyleswhite/.julia/DataFrames/src/formula.jl:181
in glm at /Users/johnmyleswhite/.julia/GLM/src/glmfit.jl:117
in glm at /Users/johnmyleswhite/.julia/GLM/src/glmfit.jl:134
Dividing DataArray by integer gives InexactError() when it should convert to float:
>DataArray([1:10])./10
InexactError()
at In[8]:1
in ./ at C:\Users\admin\.julia\DataArrays\src\operators.jl:135
I think our current ad hoc testing practice lets too many things slip through the cracks. I propose that we switch to a simple, systematic testing rule: every file in src
needs to have a mirror file in test
that contains tests that check every function defined in the src
file in the order present in the src
file. This kind of test file is pretty boring to write, but is much more systematic and reliable. I've started going this myself, so I'll upload tests as they get constructed.
One other rule: I'd propose that the tests for the src
file, x.jl
go in a test
file called x.jl
that contains a module called TestX
. This ensures parallelization across test files. Ideally the module would contain tests written as functions, which can be easily analyzed by a code checker to confirm systematic test coverage of the src
file.
On the latest Julia 0.3.0-dev:
Version 0.3.0-prerelease+734 (2013-12-29 21:13 UTC)
Commit 974b794* (0 days old master)
x86_64-apple-darwin12.5.0
Loading DataArrays fails on extras.jl attempting to define Stats.table:
julia> using DataArrays
ERROR: table not defined
in reload_path at loading.jl:146
in _require at loading.jl:59
in require at loading.jl:43
while loading ~/.julia/DataArrays/src/extras.jl, in expression starting on line 1
while loading ~/.julia/DataArrays/src/DataArrays.jl, in expression starting on line 85
Commenting out this:
function Stats.table{T}(d::AbstractDataArray{T})
counts = Dict{Union(T, NAtype), Int}()
for i = 1:length(d)
if haskey(counts, d[i])
counts[d[i]] += 1
else
counts[d[i]] = 1
end
end
return counts
end
Works as a workaround.
I was expecting it to be false
(same for @data(1:3) == @pdata(1:3)
), but I guess either way, you lose the ability to evaluate something easily.
Here's a list of what I would consider the most important changes to make to this package:
Each*
typesisna
: isna(da, inds)
unique
and levels
definitions right@data
and @pdata
macrosautocor
, percent_change
and reldiff
to Stats.jldatabool
, datafloat
and dataint
get_indices
, index_to_level
and level_to_index
getpoolidx
pdatabool
, pdatafloat
and pdataint
reorder
/ move it to DataFrames.jlrep
functionality into Base.repeat
PooledDataArray
xtab
and xtabs
to Statsbaseval
hackskipna
or dropna
row*
and col*
functions with slice-indexing interfacegl
Why discussing the recent changes to DA constructors, @simonster made a suggestion that I really like that would have us remove a constructor that isn't strictly.
I'd like to propose the following design principle:
@data
or @pdata
Does this sound right to you, @simonster?
The number of new methods required is really kind of insane and not sustainable if people keep adding things to Base.
... rather than AbstractArray?
In #67, @johnmyleswhite proposed replacing PooledDataArrays with OrdinalVariable/NominalVariable enums. Here are four possible approaches in order of my current preferences regarding them:
getindex
wrap extracted value as an OrdinalVariable and NominalVariable. While trying to clean up the PDA code, I realized that our current approach, which allows PDA's to change the type of their references field, might be introducing type-instability into our code. I don't know if there are specific cases where this is a problem yet, but it seems worth starting to debate.
On 0.2:
julia> using BinDeps
julia> using DataArrays
Warning: New definition
|(NAtype,Any) at C:\Users\mlubin\.julia\DataArrays\src\operators.jl:502
is ambiguous with:
|(Any,SynchronousStepCollection) at C:\Users\mlubin\.julia\BinDeps\src\BinDeps.jl:286.
To fix, define
|(NAtype,SynchronousStepCollection)
before the new definition.
Warning: New definition
|(Any,NAtype) at C:\Users\mlubin\.julia\DataArrays\src\operators.jl:502
is ambiguous with:
|(SynchronousStepCollection,Any) at C:\Users\mlubin\.julia\BinDeps\src\BinDeps.jl:283.
To fix, define
|(SynchronousStepCollection,NAtype)
before the new definition.
I just added a .travis.yml file (copied from DataFrames). @johnmyleswhite, can you flip the switch to enable the service hook?
Right now ==
on two DataArrays gives an error, but I'd like to implement it. Since NA == NA
returns NA
, I think the right thing to do is to make ==
return NA
if there are any NAs in either DataFrame, and otherwise give the same behavior as ==
for standard Arrays. Is this reasonable?
Until we can do something to provide the compiler more information about DataArrays, I'd like to propose that we deprecate isna(x::NAtype)
and isna(x::Any)
. This will encourage people to write loops like
function sum(x::DataArray{T})
s = 0.0
for i in 1:length(x)
if !isna(x, i)
s += x[i]::T
end
end
return s
end
and to discourage loops like
function sum(x::DataArray{T})
s = 0.0
for i in 1:length(x)
if !isna(x[i])
s += x[i]::T
end
end
return s
end
Ideally we'd like to get rid of that T
annotation, but I'd like to provide and encourage idioms that will perform better than the type-unstable code we currently encourage. The more we push people towards type-stable code, the fewer performance questions we'll have to field.
At some point, the PooledDataArray constructors got a little wonky:
julia> PooledDataArray([true], [false])
1-element PooledDataArray{Bool,Uint32,1}:
NA
julia> PooledDataArray([true], [true])
1-element PooledDataArray{Bool,Uint32,1}:
true
This is almost the exact opposite semantics that we should have.
I've been hoping to get rid of the DataVector[1, NA]
hack for a long time now. It's really convenient, but didn't extend to matrices in any clear way.
I think the solution is to create @data
and @pdata
macros, which will take in literals that could contain NA
values and generate DataArray's and PooledDataArray's. You'd end up with:
@data [1, NA, 3] #=> DataArray([1, 0, 3], [false, true, false])
@pdata ["a", "a", "a"] #=> PooledDataArray(["a", "a", "a"], [false, false, false])
@data [1 NA; 3 4] #=> DataArray([1 0; 3 4], [false true; false false])
@pdata ["a" "a"; "a" "a"] #=> PooledDataArray(["a" "a"; "a" "a"], [false false; false false])
Currently, DataArray(Int, 3, 1)[:, 1] = @data [1, 2, NA]
fails, so I started cleaning up the setindex!
functions. Unfortunately, while this is more concise than the old indexing functions, handles a larger variety of cases, and passes tests, the generated code looks pretty bad (not that it looked particularly good before). I'm also missing setindex!(DataArray, AbstractDataArray, inds...)
(the general case of this bug), which should probably use Cartesian, as should setindex!(PooledDataArray, AbstractArray, inds...)
, where I'm currently doing allocation.
@johnmyleswhite Is it really necessary to document each method of getindex
and setindex!
? Could we just specify that they will behave the same way for DataArrays as they do for Arrays? Right now I've got about twice as much documentation here as I have methods.
We do not presently have logical (boolean) indexing with anything besides vectors:
julia> using DataArrays
julia> a = @data [1 2; 3 4]
2x2 DataArray{Int64,2}:
1 2
3 4
julia> a[a .== 1]
ERROR: BoundsError()
in getindex at bitarray.jl:363
julia> a[a .== 1] = 1
ERROR: no method setindex!(DataArray{Int64,2}, Int64, DataArray{Bool,2})
When implemented, I can remove the ugly vec
in the tests for #68.
Per JuliaLang/julia#4552, users will now be recommended to use DataArray
s to handle missing data. For completeness, it would be nice to support the full range of statistical functions defined in Base
.
Base.stdm
Base.varm
Base.median!
Base.hist
Base.hist2d
Base.histrange
Base.midpoints
Base.quantile
Base.quantile!
Base.cov
Base.cor
While trying to subset a df
where both foo
and cat
are PooledDataArrays, I get:
no method &(PooledDataArray{Bool,Uint32,1},PooledDataArray{Bool,Uint32,1})
But,
select(:(DataArray(foo .== "bar") & DataArray(cat .== "dog")), pda)
works as expected.
Currently, the tests passed on my machine. Just that it produces ambiguity warnings when loaded:
julia> using DataArrays
Warning: New definition
-(DataArray{T,N},AbstractArray{T,N}) at /Users/dhlin/.julia/DataArrays/src/operators.jl:324
is ambiguous with:
-(AbstractArray{T,2},Diagonal{T}) at linalg/diagonal.jl:27.
To fix, define
-(DataArray{T,2},Diagonal{T})
before the new definition.
Warning: New definition
-(AbstractArray{T,N},DataArray{T,N}) at /Users/dhlin/.julia/DataArrays/src/operators.jl:324
is ambiguous with:
-(Diagonal{T},AbstractArray{T,2}) at linalg/diagonal.jl:26.
To fix, define
-(Diagonal{T},DataArray{T,2})
before the new definition.
Warning: New definition
-(AbstractDataArray{T,N},AbstractArray{T,N}) at /Users/dhlin/.julia/DataArrays/src/operators.jl:345
is ambiguous with:
-(AbstractArray{T,2},Diagonal{T}) at linalg/diagonal.jl:27.
To fix, define
-(AbstractDataArray{T,2},Diagonal{T})
before the new definition.
Warning: New definition
-(AbstractArray{T,N},AbstractDataArray{T,N}) at /Users/dhlin/.julia/DataArrays/src/operators.jl:345
is ambiguous with:
-(Diagonal{T},AbstractArray{T,2}) at linalg/diagonal.jl:26.
To fix, define
-(Diagonal{T},AbstractDataArray{T,2})
before the new definition.
The problem is that the Base defines:
- (AbstractMatrix, Diagonal)
- (Diagonal, AbstractMatrix)
and this package defines:
- (AbstractArray, DataArray)
- (DataArray, AbstractArray)
So when you write something like a diagonal matrix subtract a data matrix, the compiler won't know which method to use.
My thinking on PooledDataArray's is slowly changing, especially with regard to the absence of anything like an official factor
type like. Here are some proposals for how we might use them going forward.
compact
. Instead, we should revert our earlier stance and ensure, on every operation, that PDA's are represented in maximally compact form.ordered::Bool
and order::Vector{Uint64}
, which define an ordering over the pool by mapping each item of the pool to an integer that matches the rank of the associated level in the ordering over categories.reorder
so that we only have a setorder!
function, which takes in a new ordering of the pool as a vector. This would let you take something like pda = @pdata(["a", "b", "c"])
and reorder it as setorder!(pda, ["b", "c", "a"])
.get_indices
and similar functions should not exist.The values
function in DataArrays doesn't make any sense: it turns PDA's into DA's, but returns copies of DA's and Array's. If people want to convert a PDA into DA, they should use convert
, not values
.
julia> using DataArrays
Warning: New definition
getindex(DataArray{T,N},Union(Array{T,1},Ranges{T})) at /Users/johnmyleswhite/.julia/DataArrays/src/dataarray.jl:350
is ambiguous with
getindex(DataArray{T<:Number,N},Union(Array{T,1},Ranges{T},BitArray{1})) at /Users/johnmyleswhite/.julia/DataArrays/src/dataarray.jl:334.
Make sure
getindex(DataArray{T<:Number,N},Union(Array{T,1},Ranges{T}))
is defined first.
We should definitely implement these. See also JuliaData/DataFrames.jl#354 and JuliaData/DataFrames.jl#325. I'm beginning to think the most general/performant approach is to replicate some of @lindahua's code from NumericExtensions but add in an if
statement to avoid touching values that are NA
.
After merging #52, we should provide a tool that constructs a mapping from the levels of a PooledDataArray to the integers. This function should make clear that the mapping is ad hoc and not related to the underlying representation of the data.
Should we call it levelsmap
?
Warning: New definition
round(DataArray{T<:Number,N},Integer...) at /Users/viral/.julia/DataFrames/src/operators.jl:350
is ambiguous with:
round(AbstractArray{T<:Real,1},) at operators.jl:236.
To fix, define
round(DataArray{_<:Real,1},)
before the new definition.
Warning: New definition
round(DataArray{T<:Number,N},Integer...) at /Users/viral/.julia/DataFrames/src/operators.jl:350
is ambiguous with:
round(AbstractArray{T<:Real,2},) at operators.jl:237.
To fix, define
round(DataArray{_<:Real,2},)
before the new definition.
Warning: New definition
round(AbstractDataArray{T<:Number,N},Integer...) at /Users/viral/.julia/DataFrames/src/operators.jl:359
is ambiguous with:
round(AbstractArray{T<:Real,1},) at operators.jl:236.
To fix, define
round(AbstractDataArray{_<:Real,1},)
before the new definition.
Warning: New definition
round(AbstractDataArray{T<:Number,N},Integer...) at /Users/viral/.julia/DataFrames/src/operators.jl:359
is ambiguous with:
round(AbstractArray{T<:Real,2},) at operators.jl:237.
To fix, define
round(AbstractDataArray{_<:Real,2},)
before the new definition.
Warning: New definition
ceil(DataArray{T<:Number,N},Integer...) at /Users/viral/.julia/DataFrames/src/operators.jl:350
is ambiguous with:
ceil(AbstractArray{T<:Real,1},) at operators.jl:236.
To fix, define
ceil(DataArray{_<:Real,1},)
before the new definition.
Warning: New definition
ceil(DataArray{T<:Number,N},Integer...) at /Users/viral/.julia/DataFrames/src/operators.jl:350
is ambiguous with:
ceil(AbstractArray{T<:Real,2},) at operators.jl:237.
To fix, define
ceil(DataArray{_<:Real,2},)
before the new definition.
Warning: New definition
ceil(AbstractDataArray{T<:Number,N},Integer...) at /Users/viral/.julia/DataFrames/src/operators.jl:359
is ambiguous with:
ceil(AbstractArray{T<:Real,1},) at operators.jl:236.
To fix, define
ceil(AbstractDataArray{_<:Real,1},)
before the new definition.
Warning: New definition
ceil(AbstractDataArray{T<:Number,N},Integer...) at /Users/viral/.julia/DataFrames/src/operators.jl:359
is ambiguous with:
ceil(AbstractArray{T<:Real,2},) at operators.jl:237.
To fix, define
ceil(AbstractDataArray{_<:Real,2},)
before the new definition.
Warning: New definition
floor(DataArray{T<:Number,N},Integer...) at /Users/viral/.julia/DataFrames/src/operators.jl:350
is ambiguous with:
floor(AbstractArray{T<:Real,1},) at operators.jl:236.
To fix, define
floor(DataArray{_<:Real,1},)
before the new definition.
Warning: New definition
floor(DataArray{T<:Number,N},Integer...) at /Users/viral/.julia/DataFrames/src/operators.jl:350
is ambiguous with:
floor(AbstractArray{T<:Real,2},) at operators.jl:237.
To fix, define
floor(DataArray{_<:Real,2},)
before the new definition.
Warning: New definition
floor(AbstractDataArray{T<:Number,N},Integer...) at /Users/viral/.julia/DataFrames/src/operators.jl:359
is ambiguous with:
floor(AbstractArray{T<:Real,1},) at operators.jl:236.
To fix, define
floor(AbstractDataArray{_<:Real,1},)
before the new definition.
Warning: New definition
floor(AbstractDataArray{T<:Number,N},Integer...) at /Users/viral/.julia/DataFrames/src/operators.jl:359
is ambiguous with:
floor(AbstractArray{T<:Real,2},) at operators.jl:237.
To fix, define
floor(AbstractDataArray{_<:Real,2},)
before the new definition.
Warning: New definition
trunc(DataArray{T<:Number,N},Integer...) at /Users/viral/.julia/DataFrames/src/operators.jl:350
is ambiguous with:
trunc(AbstractArray{T<:Real,1},) at operators.jl:236.
To fix, define
trunc(DataArray{_<:Real,1},)
before the new definition.
Warning: New definition
trunc(DataArray{T<:Number,N},Integer...) at /Users/viral/.julia/DataFrames/src/operators.jl:350
is ambiguous with:
trunc(AbstractArray{T<:Real,2},) at operators.jl:237.
To fix, define
trunc(DataArray{_<:Real,2},)
before the new definition.
Warning: New definition
trunc(AbstractDataArray{T<:Number,N},Integer...) at /Users/viral/.julia/DataFrames/src/operators.jl:359
is ambiguous with:
trunc(AbstractArray{T<:Real,1},) at operators.jl:236.
To fix, define
trunc(AbstractDataArray{_<:Real,1},)
before the new definition.
Warning: New definition
trunc(AbstractDataArray{T<:Number,N},Integer...) at /Users/viral/.julia/DataFrames/src/operators.jl:359
is ambiguous with:
trunc(AbstractArray{T<:Real,2},) at operators.jl:237.
To fix, define
trunc(AbstractDataArray{_<:Real,2},)
before the new definition.
Warning: New definition
formatter(
Array{T,N},Date{C<:Calendar}...)
is ambiguous with:
formatter(Array{T,N},FloatingPoint...).
To fix, define
formatter(Array{T,N},)
before the new definition.
It'd be nice to have a subtype of AbstractDataArray that wraps a regular floating point array and treats NaN
values as NA
. I think this should generally be faster than indexing the BitArray that holds the list of NA
values, and with #4 and some minor API changes this should eventually give us nansum
etc. with equivalent performance to manual loops.
julia> using DataArrays
Warning: New definition
copy!(AbstractDataArray{T,N},Any) at /Users/jiahao/.julia/v0.3/DataArrays/src/abstractdataarray.jl:48
is ambiguous with:
copy!(AbstractArray{T,2},AbstractArray{T,2}) at multidimensional.jl:142.
To fix, define
copy!(AbstractDataArray{T,2},AbstractArray{T,2})
before the new definition.
...(repeats)
The bug I fixed last night suggested that similar
was not setting the NA
values in a DataArray correctly. This led me to wonder whether we should have similar
do any NA
setting or whether it should only do allocation. My inclination is that it should only do allocation, not value setting. But there is obviously an argument for automatically setting everything to NA
until a value is known.
# import Base.first, Base.last
first(d::DataArray) = d[1]
last(d::DataArray) = d[size(d)[1]]
Right now we have a bunch of constructors that do conversion:
julia> DataArray(1:10)
10-element DataArray{Int64,1}:
1
2
3
4
5
6
7
8
9
10
julia> Array(1:1)
ERROR: no method Array{T,N}(Range1{Int64})
Standard Array's don't this and neither should we. At some point I'm just going to open an issue that lists all constructors for Array's so that we can exactly match it.
Conversely, we're missing the relevant conversion methods:
julia> convert(DataVector, 1:10)
ERROR: no method convert(Type{DataArray{T,1}}, Range1{Int64})
in convert at base.jl:11
julia> convert(Vector, 1:10)
10-element Array{Int64,1}:
1
2
3
4
5
6
7
8
9
10
The semantics of unique and levels are a mess at the moment, which causes the error Andreas sees in DataFrames:
julia> using DataArrays
julia> da = @data([1, 2, NA])
3-element DataArray{Int64,1}:
1
2
NA
julia> pda = @pdata([1, 2, NA])
3-element PooledDataArray{Int64,Uint32,1}:
1
2
NA
julia> unique(da)
3-element DataArray{Int64,1}:
NA
2
1
julia> levels(da)
3-element DataArray{Int64,1}:
NA
2
1
julia> unique(pda)
3-element DataArray{Int64,1}:
1
2
NA
julia> levels(pda)
2-element Array{Int64,1}:
1
2
My preference is that unique
should always return a DataArray
of the same type containing all the unique values (including NA
), whereas levels
should always return an Array
of the same type containing only the non-NA unique values. We're doing this for PDA's, but not for DA's.
The max
function changes the type when applied to instances of DataArrays
. As a side effect, taking the maximum of 3 or more instances ofDataArrays
result in an error:
julia> u = DataArray([1,2,3]); v = DataArray([0,4,0]); w =[4,0,0];
julia> u = DataArray([1,2,3]); v = DataArray([0,4,0]); w = DataArray([4,0,0]);
julia> typeof(max(u,v))
Array{Any,1}
julia> max(u,v,w)
ERROR: no method isless(DataArray{Int64,1}, Array{Any,1})
in max at operators.jl:71
I have this issue with Julia 0.3.0 preview (mac binary package). In Julia 0.2.0 max
returns and Array{Int64, 1}
While perhaps not ideal, at least this allowed to execute max(u,v,w)
.
Now that we have @data
and @pdata
macros, I'd like to remove the data*
constructors. Instead of having a special datazeros()
function, it's much more general to use @data zeros()
.
Redoing the Julia benchmarks results in an error due to a use of DataFrame
here.
This method is very common and when it comes to naming it, it's easy to start calling it something different depending on what package you happen to be using. R has this issue and calls this method ROC
, Delt
, delt
, returns
and probably a handful of other names. TimeSeries uses percentchange
now (I just deprecated simple_return and log_return). That naming is consistent with Julian convention to remove underscores, the bane of Ruby variable names.
percentchange.TimeSeries
also offers a kwarg for either simple percent change (which is what the currently implemented percent_change
offers), and log percent change.
I can do a PR on this. It will also require 3 changes to files in DataFrames, where the method is referenced as percent_change
.
Here is what the code looks like:
function percentchange(dv::DataArray; method="simple")
if method == "simple"
return expm1(log_return(dv))
elseif method == "log"
return log_return(dv)
else
throw("only simple and log methods supported")
end
end
function log_return(dv::DataArray)
pad(diff(log(dv)), 1, 0, NA)
end
The code is short enough where you can refactor and put the definition of log_return inside the original method.
Right now we have four functions that could plausibly be merged into a single function that takes keyword args:
failNA
replaceNA
vector
matrix
There's really no need to have vector
and matrix
be separate functions: they could both be replaced by array
. And we could incorporate the ability to (a) fail on encountering NA and (b) replace NA with a chosen value into this new array
function.
If we used something like,
array(da::DataArray{T}; outtype = T, fail = true, replace = nothing)
we could do all of this work at once. The only thing that's tricky is getting removeNA
into a function, since its output has unpredictable size.
Xref #73
$ ./julia -e 'versioninfo()'
Julia Version 0.3.0-prerelease+1692
Commit 736251d* (2014-02-23 06:21 UTC)
Platform Info:
System: Linux (i686-redhat-linux)
CPU: Genuine Intel(R) CPU T2250 @ 1.73GHz
WORD_SIZE: 32
BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY)
LAPACK: libopenblas
LIBM: libopenlibm
$ ./julia ~/.julia/DataArrays/test/runtests.jl
Running tests:
* abstractarray.jl
* booleans.jl
* constructors.jl
* containers.jl
* conversions.jl
* data.jl
* dataarray.jl
* datamatrix.jl
* linalg.jl
* operators.jl
ERROR: no method -(PooledDataArray{Float64,Uint32,1})
in anonymous at /home/rick/.julia/DataArrays/test/operators.jl:59
in anonymous at no file:32
in include_from_node1 at loading.jl:120
while loading /home/rick/.julia/DataArrays/test/operators.jl, in expression starting on line 56
while loading /home/rick/.julia/DataArrays/test/runtests.jl, in expression starting on line 30
Right now, the @data
macro handles most things correctly except for variables that are equal to NA
.
Consider this example:
a, b, c = 1, 2, NA
@data [a, b, c]
This will fail because the type of c
isn't known from the surface analysis that the @data
macro does. To fix this, we'd need to write out code that analyzes the values of the inputs, which can't be known at compile-time. So the macro needs to write code that does analysis at run-time.
As discussed in #38, any indexing operation that uses an NA index should fail. Right now, we simply drop NA's indices, but it's safer if you know that leaving NA's in your indices will always fail.
I haven't looked closely, but this might just need to be changed to |>
.
(Of course, the same change might be needed in BinDeps...)
julia> using Winston
Warning: New definition
|(SynchronousStepCollection,Any) at /home/kmsquire/.julia/v0.2/BinDeps/src/BinDeps.jl:283
is ambiguous with:
|(Any,NAtype) at /home/kmsquire/.julia/v0.2/DataArrays/src/operators.jl:502.
To fix, define
|(SynchronousStepCollection,NAtype)
before the new definition.
Warning: New definition
|(Any,SynchronousStepCollection) at /home/kmsquire/.julia/v0.2/BinDeps/src/BinDeps.jl:286
is ambiguous with:
|(NAtype,Any) at /home/kmsquire/.julia/v0.2/DataArrays/src/operators.jl:502.
To fix, define
|(NAtype,SynchronousStepCollection)
before the new definition.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.