Comments (9)
Agreed! However, maybe we can introduce scalars without moving far away from this idea.
Yeah, this has sort of been what I've been leaning toward, so it's good to hear you reinforce that direction.
As I think about it more, I think we'll have Scalar
, Vector
, and in the future (maybe Matrix
? and) NDArray
for higher order arrays. For things that are used frequently like scalars and vectors, I think it's good to specialize the representation and only defer to a more generic representation for higher dimensional.
This should have some side-benefits regarding performance as we don't have to box all length 1 vectors.
Certainly - especially if those basic types all implement Copy
, then we save a lot on performance and avoiding project overhead by dealing with boxes/shared references.
Cool, well that's about all the confirmation I needed. I think Scalar
is a necessary feature here.
from r.
Was reading Designing Types for R, Empirically (Vitek, et. al., 2020) today and just wanted to highlight some of their conclusions (Section 6.1.1). It's worth a read through all their commentary, but there are at least two that might inform a direction for this project:
Supporting the inclusion of an explicit missing/na
Similar to julia
's Vector{Union{_, Missing}}
[on] NAs. Our data supports making the presence of
NA
s explicit. Only 2923 (or 2.78%) of arguments are marked as possibly havingNA
s, thus the overwhelming majority of types appear to beNA
-free. In practice, programmers check for them and sanitize them if they are present.
Supporting the introduction of a Scalar
[on] Scalars. The data also suggests that programmers often use scalars, and do dimensionality checks on their data. In our data 25,064 (or 33.33%) of the arguments are scalar types. While not completely surprising, this is a rather large number. Consider the
hankel.matrix
function, it takes two arguments and checks thatn
isint
, thatx
is avector
, and also, indirectly, thatn
is a a scalar (this comes from the fact that it is used in the guard of a conditional which fails if n is not a scalar).
from r.
I'm very aware - I haven't really figured out a good path forward. I'm at an impasse with this syntax. This is described in this issue, but it warrants having it's own dedicated issue. I'm glad you honed in on it as well, I'd love to have other ideas bouncing around this.
I really want this syntax to make more sense because I found the [
vs [[
syntax really convoluted in R when I was learning and I still see people struggling with it frequently.
I'd really like to retire [[
as a separate postfix operator, and instead use dispatch. To me this just feels much more "R" (even if it's not how R works). Making that work raises a few questions:
- How do you differentiate a scalar index from a vector index such that:
x[<scalar>]
produces an element of x
x[<vector>]
produces a slice of x - Does this mean we need a scalar type?
- Even if we did have a scalar type, and
y = 1; x[y]
(x[<scalar>]
) could be differentiated fromy = [1]; x[y]
(x[<vector>]
), then this is still the opposite behavior of R. Personally I'm okay with that if the syntax can be consolidated. - I think R's roots as an array-programming-language are pretty magical and I wouldn't want to abandon that philosophy just to add scalars, so the semantics of using scalars in an array language need to be hashed out.
I don't have good answers to these, so for now it remains a wide-open experiment.
from r.
The only way I see to both avoid introducing scalars and to not have a dedicated [[
operator (or an additional parameter to the [
operator) would be to use non-standard evaluation.
For example:
> l <- list(1, 2, 3)
> l[1]
[1] 1
> l[1:1]
[[1]]
[1] 1
However, this then gets a little tricky when the argument passed to [(x, i)
is a variable:
> l <- list(1, 2, 3)
> ii <- 1:2
> l[ii]
# Error: Must have length 1
> l[ii[1]:ii[length(ii)]] # needless to say this is extremely ugly, but this could be made prettier
[[1]]
[1] 1 2
> ii1 <- 1
l[ii1]
[1] 1
I am not sure that this is then altogether much different from the [[
operator, just the internal implementation does not use a special operator but non-standard evaluation. Needless to say, the syntax I suggested above is not pretty. I am also not sure though, whether I would bake NSE into such a core data structure of the language. Also the above is probably also not easier to understand than the difference between the [[
and [
operators (maybe even harder?).
Even if we did have a scalar type, and y = 1; x[y] (x[]) could be differentiated from y = [1]; x[y] (x[]), then this is still the opposite behavior of R. Personally I'm okay with that if the syntax can be consolidated.
Maybe this would be an argument to construct vectors with (...)
and lists with [... ]
?
Then the syntax would be x[(1)]
to slice the vector and x[1]
to access the scalar, avoiding the confusion for R users.
However, I think x[(1)]
looks a little weird, so would probably just bite the bullet and live with the inverted semantics of [[
.
I think R's roots as an array-programming-language are pretty magical and I wouldn't want to abandon that philosophy just to add scalars, so the semantics of using scalars in an array language need to be hashed out.
Agreed! However, maybe we can introduce scalars without moving far away from this idea.
What I like about the introduction of scalars (without thinking through all the details):
- We can more naturally express vectors that contain doubles or NAs as we define vector types in terms of the scalar values that they contain, i.e.
double()
is aObj::Vector
whose elements are all of typeObj::Scalar(Double)
orObj::Scalar(NA)
- This should have some side-benefits regarding performance as we don't have to box all length 1 vectors. (Admittedly, with the Altreps used throughout this project, the benefits are probably smaller than in R, where vectors are almost always allocated).
- We should be able to keep the semantics of scalars almost identical to length-1 vectors (except for cases like the subsetting, where we want to differentiate). For example,
1 == [1, 2, 3]
should result in[true, false, false]
and notfalse
.
from r.
Just that I don't forget it: The AtomicMode
trait should then probably also be renamed, as Vector
s as are not the atomic data structure anymore once we have scalars.
from r.
We might be able to keep it, since it applies to the T
in Vector<T>
, not the Vector
itself.
I imagine that any T
in either a Vector<T>
or a Scalar<T>
will probably need to share some foundational traits, which is currently what AtomicMode
is for. That said, we might consider renaming it anyways to be more descriptive.
I'm not sure I agree that Vector
should become Vector<Scalar<T>>
. In principle I think this would be preferred, but because it's such a widely used data type, we should probably minimize the layers of computation when working with its data.
from r.
As I think about it more, I think we'll have Scalar, Vector, and in the future (maybe Matrix? and) NDArray for higher order arrays.
I think it would be really cool if we could conceptualize a Scalar as a 0-dimensional array. Not necessarily in the internal representation for which I agree that overhead should be as minimal as possible, but in terms of its semantics.
(Then array still is the most atomic data structure and AtomicMode
is no misnomer, or rather should be implemented for scalar, matrix, vector, nd-array)
I'm not sure I agree that Vector should become Vector<Scalar>. In principle I think this would be preferred, but because it's such a widely used data type, we should probably minimize the layers of computation when working with its data.
Yes I think that makes sense!
from r.
I think it would be really cool if we could conceptualize a Scalar as a 0-dimensional array.
Yeah, that's how I see it too. I think we'll have distinct data types, but I think the broadcasting should operate such that a 0-dim array is operationally identical, at least for broadcasting/recycling operations like mathematical operators.
from r.
thanks so much for sharing this, the title sound fascinating. maybe we could even try to reach out to Jan vitek?
from r.
Related Issues (20)
- Ellipsis is exposed through evaluation environment of functions HOT 5
- Suggestion: ability to wite tests in R HOT 2
- Bug when mutating promise HOT 5
- Suggestion: Abandon named Lists in favor our Hashmaps with value semantics HOT 13
- use a different hash function for environments HOT 2
- quote drops parenthesis HOT 2
- Bug when function is passed an expression that contains an assignment HOT 2
- The derived debug implementation will fail for anything that can reach the builtin environment
- Cannot assign list to list-slice
- General call assignment HOT 2
- Replace `Expr::ExprList` with `Obj::List` HOT 7
- Syntax suggestion to create vector of ints HOT 4
- ParserError when combining negation and vector indexing HOT 2
- Suggestion for sequence (`:`) operator to require `lhs <= rhs` HOT 1
- Ensure consistency of operations with respect to `Inf`, `NA`, `NaN` HOT 5
- Proposal: new vector representation `RepType::Closure` HOT 9
- Should `NA` be a valid value for subsetting HOT 5
- Indexing with negative values is not supported yet
- Fall cleaning pass ideas :fallen_leaf: :broom: HOT 2
- Use `phf` for primitive function lookup
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from r.