Giter Club home page Giter Club logo

r's Introduction

R

An experimental implementation of R, with embellishments

Check out the live demo

What can it do?

cargo run
# R version 0.3.1 -- "Art Smock"

x <- function(a = 1, ...) { a + c(...) }
# function(a = 1, ...) {
#   a + c(...)
# }
# <environment 0x6000005b8e28>

y <- function(...) x(...)
# function(...) x(...)
# <environment 0x6000005b8e28>

y(4, 3, 2, 1)
# [1] 7 6 5 

This amounts to (most) of R's grammar parsing, basic primitives, scope management and ellipsis argument passing.

What's different?

This project is not just a rewrite of R, but a playground for features and reinterpretations. It is not meant to reimplement a compatible R layer, but to rethink some of R's assumptions.

Syntax

To start, there are a few superficial changes:

# 'fn' keyword
f <- fn(a, b, c) {
  a + b + c
}

# vector syntax
v <- [1, 2, 3, 4]

# list syntax
l <- (a = 1, b = 2, c = 3)

# lowercase keywords
kws <- (na, null, inf, true, false)

# destructuring assignment
(a, b) <- (1, 2)

There are plenty of more substantial changes being considered. If you enjoy mulling over the direction of syntax and features, feel free to join the conversation.

Experiments

All experiments are feature-gated and enabled by running (or building) with

cargo run -- --experiments "<experiment>"

Please try them out and share your thoughts in the corresponding issues!

Ellipsis packing and unpacking

Note

--experiments rest-args (discussed in #48, #49)

Current work is focused on ..args named ellipsis arguments and ..args unpacking in function calls. However, due to the experimental nature of this syntax it is currently behind a feature gate.

f <- function(..args) {
  args
}

f(1, 2, 3)  # collect ellipsis args into a named variable
# (1, 2, 3)
args <- (a = 1, b = 2, c = 3)
f <- function(a, b, c) {
  a + b + c
}

f(..args)  # unpack lists into arguments
# [1] 6

more_args <- (c = 10)
f(..args, ..more_args)  # duplicate names okay, last instance takes priority
# [1] 13

Tail Recursion

Note

--experiments tail-calls (discussed in #60)

Tail recursion allows for arbitrarily recursive call stacks - or, more accurately, it discards frames from the call stack in this special case allowing for recursion without overflowing of the call stack.

f <- function(n) if (n > 0) f(n - 1) else "done"
f(10000)
# [1] "done"

The details of how this is achieves requires the tail call's arguments to be executed eagerly instead of R's typical lazy argument evaluation. This change can result in some unexpected behaviors that need discussion before the feature can be fully introduced.

Performance

You might be thinking rust is fast, and therefore this project must be fast. Well, unfortunately you'd be wrong. That's probably more of a reflection on me than rust. To get the basic skeleton in place, my focus has been on getting things working, not on getting them working well. For now, expect this interpreter to be about 1000x slower than R.

I'm feeling good about the general structure of the internals, but there have been plenty of quick proofs of concept that involve excess copies, extra loops, panics and probably less-than-ideal data structures. If you're an optimization fiend and you want to help narrow the gap with R, your help would be very much appreciated!

Why

This project is primarily a personal exploration into language design.

At the outset, many of the choices are researched one-by-one and are almost certainly naive implementations. My goal is to learn and explore, and in that way the project is already a success in my eyes. Beyond advancing my own understanding of language internals, I'd love to see the project garner enough interest to become self-sustaining.

If you see value in the project for anything beyond prototyping ideas, then pushing the project toward something practical is contingent on your support. Contributions, suggestions, feedback and testing are all appreciated.

Values

Being primarily a one-person project, the values currently map closely to my own. Somethings I want to aim for:

  • A reasonably approachable language for R users (possibly with the ability to interpret R code).
  • Improved R constructs for complex calls, including argument packing and unpacking, partial function calls, destructuring assignment
  • Guardrails on non-standard-evaluation, allowing for user-facing domain-specific-languages, while allowing a more rigid evaluation scheme internally.
  • Lean into the things that rust does well, such as threading, arguably async evaluation, first-class data structures and algebraic error types.
  • Learn from more general languages like TypeScript to better understand how static typing can be comfortably embedded in a high-level language.

Contribution Guide

If you also want to learn some rust or want to explore language design with me, I'm happy to have you along for the ride. There are plenty of ways to contribute. In order of increasing complexity, this might include:

  • Documenting internals
  • Improving documentation throughout
  • Helping to improve the demo page hosted on GitHub pages
  • Implementing new language concepts
  • Providing feedback on internals

Any and all contributions are appreciated, and you'll earn yourself a mention in release notes!

License

I welcome other contributors, but also have not thoughtfully selected a long- term license yet. For now there's a CLA in place so that the license can be altered later on. I don't intend to keep it around forever. If you have suggestions or considerations for selecting an appropriate license, your feedback would be much appreciated.

My current preference is toward a copyleft license like GPL as opposed to a permissive license like MIT, as I believe that languages are a best-case candidate for such licenses and it fits well with the ethos of the R community as being scientific-community first. If you disagree strongly with that decision, now is your time to let me know.

r's People

Contributors

armenic avatar dgkf avatar edgar-manukyan avatar lornebradia avatar sebffischer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

r's Issues

Pilot interface to `extendr/libR-sys` for bootstrapping R internals

As a first pass, I think just hooking up Rf_isNull would be a good minimal example that might help design the broad data flow from rust to libR and back to rust. Naturally, building is.null in pure rust would be straightforward, so the value here is purely in minimizing the scope to explore the data interop between the two.

This would mean converting from an R (this crate) to a libR-sys SEXP which then calls the C binding, returning a C SEXP, converting to a libR-sys SEXP and back to a R (this crate).

I anticipate that it won't look super pretty, almost undoubtedly with a data conversion to and from libR's SEXP model, but it would help to fill a gap until the library can be built up as a rust-native solution.

The goal is the ability to define a

is.null <- function(x) .Primitive("is.null")

And have it leverage the libR version's internal definition.

Steps to implementation:

  • Implement Into<libR-sys::Robj> for r::R, which under the hood will convert to the various types defined in https://extendr.github.io/extendr/extendr_api/wrapper/index.html and use those as intermediaries for the conversion to libR-sys::Robj.
  • Implement .Primitive (and .Internal?) which call internals with conversion first to libR-sys::Robj
  • Implement From<libR-sys::Robj> for r::R

This will probably result in a lot of data duplication any time a primitive or internal is called, but I'd like to put that out of mind for this first implementation just to get to the point where we can bootstrap a standard library with a huge set of existing R internals.

Conceptually it sounds pretty reasonable. Am I missing anything?

Proposal: Async keyword

This is super neat! Perhaps a niche idea but, I saw in #15 that you were looking at generators/iterators, and the other proposal about a defer keyword, and it made think of how much nicer it would be in R if async was a keyword. I guess this also presupposes that async will be a thing?

x <- await(foo())
x <- await foo()

# or

x <- await {
  foo()
}

Fix argument matching so that arguments can depend on one another

One of the rather interesting behaviors of R is the ability to use arguments to define default argument values.

x <- function(a, b = a) { 
  b
}

x(3)
# 3

This currently doesn't work, as arguments are not considered when evaluating other arguments. I believe this should just be a matter of creating a closure in the function frame to lazily evaluate arguments that include other argument symbols.

How should tail call recursion be handled?

Inspired by recent advances in the R language, I went ahead and implemented a form of tail call recursion. However, it necessarily executes in a "non-standard" way, greedily evaluating arguments to recursive calls.

This means that expectations of R would apply to most calls, but might be unexpected for tail-recursive calls.

Typical Evaluation

f <- function(n, result) {
  if (n > 0) { 
    cat(n, "\n")
    f(n - 1, result)
  } else {
    result
  }
}

f(3, stop("Done"))
#> 3
#> 2
#> 1
#> Error: Done

Tail-Recursive Evaluation

f <- function(n, result) {
 if (n > 0) { 
   cat(n, "\n")
   f(n - 1, result)
 } else {
   result
 }
}

f(3, stop("Done"))
#> Error: Done

This is because result is necessarily greedily evaluated. Recursive calls will not work with promises dependent on the parent frame or non-standard evaluation. The "right" approach here is a bit unclear, so I'm going to wait to see how the R version feels with all of R's non-standard eval tricks to see if I can learn something from the design.

Implement `q()`

The current implementation sorely lacks an obvious way to quit. Although you can break out of the repl with Ctrl+D, anyone who's booting up a repl for the first time is likely to use q().

To implement a quit functionality, we need to first implement a new signalling condition to signal a termination. This should be added to the possible R Conditions here:

R/src/lang.rs

Lines 34 to 37 in a902965

pub enum RSignal {
Condition(Cond),
Error(RError),
}

Next we need to implement a primitive q function. Primitives are callable symbols, and the dispatch to primitive internal calls is handled in this implementation of the Callable trait:

R/src/builtins.rs

Lines 897 to 905 in a902965

impl Callable for String {
fn call(&self, args: ExprList, env: &mut Environment) -> EvalResult {
if let Some(f) = primitive(self) {
return f(args, env);
}
(env.get(self.clone())?).call(args, env)
}
}

Which then calls into the primitive function to try to find an appropriate primitive internal call if one exists:

R/src/builtins.rs

Lines 771 to 777 in a902965

pub fn primitive(name: &str) -> Option<Box<dyn Fn(ExprList, &mut Environment) -> EvalResult>> {
match name {
"c" => Some(Box::new(primitive_c)),
"list" => Some(Box::new(primitive_list)),
_ => None,
}
}

A new primitive callback for q() would need to be added. For now, I wouldn't worry about even handling any arguments. It should just return the new RSignal to terminate the session.

And finally, the last step would to respond to this termination signal in the REPL handler:

R/src/r_repl/repl.rs

Lines 54 to 58 in a902965

let res = global_env.eval(expr);
match res {
Ok(val) => println!("{}", val),
Err(e) => println!("{}", e),
}

Overly aggressive conversions to `Vector::Numeric`

For some mathematical operations, the type conversions do not use the "minimally numeric" type.
For example, adding / multiplying two Integers will result in a Numeric.

I believe the relevant lines are here:

R/src/lang.rs

Lines 361 to 392 in e440cf7

impl std::ops::Add for Obj {
type Output = EvalResult;
fn add(self, rhs: Self) -> Self::Output {
match (self.as_numeric()?, rhs.as_numeric()?) {
(Obj::Vector(l), Obj::Vector(r)) => Ok(Obj::Vector(l + r)),
_ => internal_err!(),
}
}
}
impl std::ops::Sub for Obj {
type Output = EvalResult;
fn sub(self, rhs: Self) -> Self::Output {
match (self.as_numeric()?, rhs.as_numeric()?) {
(Obj::Vector(l), Obj::Vector(r)) => Ok(Obj::Vector(l - r)),
_ => internal_err!(),
}
}
}
impl std::ops::Neg for Obj {
type Output = EvalResult;
fn neg(self) -> Self::Output {
match self.as_numeric()? {
Obj::Vector(x) => Ok(Obj::Vector(-x)),
_ => internal_err!(),
}
}
}

Let me know if these issues are already too nit-picky.

Naming of Numeric / Double

In R, numerics are defined as

Creates or coerces objects of type "numeric". is.numeric is a more general test of an object being interpretable as numbers.

(https://stat.ethz.ch/R-manual/R-devel/library/base/html/numeric.html)

I.e. numerics encompass both integers as well as doubles:

is.numeric(1); is.numeric(1L)
#> [1] TRUE
#> [1] TRUE

Created on 2023-12-27 with reprex v2.0.2

I think the Numeric variant of the Vector enum should maybe be renamed do Double (

Numeric(Rep<Numeric>),
).

I could do the renaming if this is something that is desired.

Try to detect terminal theme when choosing highlight colors

This is totally cosmetic, but I thought it would be nice to try to determine whether a terminal has a light or dark theme before doing any highlighting.

Currently, the theme is really only functional with a dark background as white is used as a foreground color.

Some options:

  • termbg
  • terminal-light
  • Just use the ansi reverse style for any uncolored text, then hope that there's enough contrast in the terminal colors on either background to be readable

It seems like any tool that tries to automatically detect the background color has a few limitations

  1. it might not be ubiquitous and will only work on specific terminals, meaning we probably want a safer default anyways
  2. it is slow (~10ms) as it needs to pass an ansi control sequence to the terminal and then read back the response, meaning that we want to do this sparingly (perhaps only on top-level repl expression evaluation)

Fix repl handling of `fn` keyword

The repl doesn't seem to recognize valid expressions defining a fn:

> function(a) a
function(a) a
<environment 0x56096e6beb18>

> fn(a) a
:

Instead of returning a function because the expression is parsed as a complete one-line expression, it is awaiting more code.

Designing a better backtrace

Soon this project will get to the point where it has some metaprogramming tools, and when it does we'll need a better way to report the backtrace. R's rlang is of course tackling this issue as well, so I drew a lot of inspiration from their trace_back() output.

After quite a bit of iteration, I think I'm going to aim for something like this (riffing on the output from this rlang example):

โ”ฌโ”ฌโ”€โ”€โ•ด1 base::try(identity(f()))
โ”œโ”‚โ”€โ”€โ•ด2 tryCatch(...)
โ”œโ”‚โ”€โ”€โ•ด3 tryCatchList(expr, classes, parentenv, handlers)
โ”œโ”‚โ”€โ”€โ•ด4 tryCatchOne(expr, names, parentenv, handlers[[1L]])
โ•ฐโ”‚โ”€โ”€โ•ด5 doTryCatch(return(expr), name, parentenv, handler)
 โ”œโ”€โ”€โ•ด6 identity(f())
 โ”œโ”€โ”€โ•ด7 f()
 โ”œโ”€โ”€โ•ด8 g()
 โ”œโ”€โ”€โ•ด9 h()
 โ•ฐโ”€โ•ด10 base::eval(quote(i()), envir = e)
โ”„โ”„โ”ฌโ•ด11 i()
  โ•ฐโ•ด12 j()
rlang code for comparison
> e <- new.env(parent = getNamespace("base"))  # any env not in call stack
> f <- function() g()
> g <- function() h()
> h <- function() eval(quote(i()), envir = e)
> i <- function() j()
> j <- function() rlang::trace_back()
> try(identity(f()))
#     โ–†
#  1. โ”œโ”€base::try(identity(f()))
#  2. โ”‚ โ””โ”€base::tryCatch(...)
#  3. โ”‚   โ””โ”€base (local) tryCatchList(expr, classes, parentenv, handlers)
#  4. โ”‚     โ””โ”€base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]])
#  5. โ”‚       โ””โ”€base (local) doTryCatch(return(expr), name, parentenv, handler)
#  6. โ”œโ”€base::identity(f())
#  7. โ””โ”€global f()
#  8.   โ””โ”€global g()
#  9.     โ””โ”€global h()
# 10.       โ””โ”€base::eval(quote(i()), envir = e)
# 11.         โ””โ”€base::eval(quote(i()), envir = e)
# 12.           โ””โ”€global i()
# 13.             โ””โ”€global j()

The evaluation environment can be found by walking up the call stack to the first โ”€ horizontal bar.

Design goals

  • Less indentation than rlang, hopefully improving the experience of scanning vertically through calls
  • An easier to follow network diagram in the margin
  • While retaining the ability to easily identify the evaluation environment
  • Explicitly indicating when environments can not be found in the call stack (as in frame 11)
  • Not shown, but in conjunction with #57, showing data environments when we get there

If there are benefits to the rlang indentation style, they aren't obvious to me. I think I capture all the same information, but I'd be happy to receive critical feedback enlightening me about what other helpful info might be missing.

Metaprogramming bounds syntax

One thing that has been a central theme of Jan Vitek's work is that R's meta-programming facilities come at a pretty steep performance cost, even though they are used in a small subset of functions. Even though performance isn't a goal, I think the future of an R-alike language should probably consider these ideas and I'm interested in mocking them up.

Argument Passing

R's defaults are quite nice. Arguments are passed as promises that aren't actually evaluated until they're needed. Even before they're evaluated, their expressions can be rearranged and evaluated in different contexts. This is the central feature of R's meta-programming, but it means that most functions carry forward the machinery for meta-programming even if it would have little impact had the arguments all been eagerly evaluated.

For this purpose, I'm considering a default of eager evaluation, with a syntax for lazy evaluation. The exact syntax is very much up for debate, but the crux is that individual arguments can be flagged as lazy:

Example using . "context" syntax

not_null_else <- function(a, .b) {
  # a eagerly evaluated in parent frame
  if (!is.null(a)) a
  else b  # b not evaluated until here as it is just a promise
}

not_null_else(
  loot_chest(1, 2, 3, 4, 5), 
  stop("that password was incorrect")
)

Here the . syntax is borrowed from this proposal which uses the . to mean something like "in this context". Although not a direct mapping of the concept, it evokes a sense of contextual ambiguity at the interface of the calling frame and evaluation frame.

This would also put nice bounds on when tail calls are permitted. When a recursive function requires lazily evaluated arguments a standard evaluation model can be used, while functions that take all eager arguments can leverage tail call optimizations.

Declaring a static function

Note

Feedback needed: What is the right name for this behavior?

A static function could be an even more restrictive constraint on a function which states that a function

  1. only uses parameters and variables defined in its current scope
  2. does not evaluate any expressions in other environments (including promises in other environments)
  3. only calls out to other static functions

This would allow for much more intensive and useful static code analysis and optimization. I'm a long ways off from even considering such ambitions, but I'd like to get the conversation started on whether it would be worth the cognitive overhead. This is intended to address the closing thought of Jan Vitek's R melts brains.

Example using static keyword

f <- static function(n, if_even, if_odd) {
  if (n > 0) f(n - 2, if_even, if_odd)
  else if (n == 0) if_even
  else if_odd
}

Defer keyword

R has the on.exit construction for delaying execution of code, but I feel like this capability should be more central to a language. Taking inspiration from the zig language, providing a more native defer syntax would help to keep cleanup more localized.

To also take a nod from withr and rust's lifetimes, I think defer should also act as a function that accepts a frame, allowing a function to defer execution until a specific frame completes execution. This is a bit like python's context managers, but with the added benefit that they can be attached to arbitrary scopes and do not rely on implementing object-specific entry and exit methods.

n_lines <- function(file) {
  f <- open(file)
  defer close(f)

  length(read_lines(f))
} 
connect <- function(file) {
  f <- open(file) 
  
  defer (parent_frame(-1)) {
    close(f)
  }
  
  f
}

Use of semicolon

I tried using a semicolon like this and as you can see the repl expects something else by showing the : .

> x <- 1; c(x, 2)
: 

This would help us a lot in the tests.

Implementing `sum()`

Primitives are implemented as special function names and evaluate code in lower-level code (rust). If a function of that name isn't found in the environment, we look through the known primitives to find the code to execute for that function name. This is a simplification of R's primitive calling, but it's sufficient for us to move forward with building up a standard library.

There are a few primitives that are already in place and will give a good template for implementing new ones. Primitives are matched against a known mapping in primitive(). We'll need to add a "sum" => Some(Box::new(primitive_sum)) branch here:

R/src/builtins.rs

Lines 771 to 778 in 117d4df

pub fn primitive(name: &str) -> Option<Box<dyn Fn(ExprList, &mut Environment) -> EvalResult>> {
match name {
"c" => Some(Box::new(primitive_c)),
"list" => Some(Box::new(primitive_list)),
"q" => Some(Box::new(primitive_q)),
_ => None,
}
}

Then the next step is to implement

pub fn primitive_sum(args: ExprList, env: &mut Environment) -> EvalResult {
  // for each arg (Expr) in args (ExprList)
  //   try eval expression in calling environment (`env.eval(value)?`)
  //   try coerce value to numeric (example of this coercion in `primitive_c`)
  //   add sum of elements to a total sum
}

I'm fine ignoring the na.rm parameter for now... this might be a design quirk of R that we leave in the past.

Implement paste()

As a follow-up of #8 (comment) maybe it is good time to implement paste().

References #5

@dgkf, do I understand correctly that we want to implement paste() as a primitive (written in Rust?) function, similar to q()?

I am also curious whether we will ever write functions in R or port them over from R?

Name this language!

R is just too ambiguous to be used solo, and calling it dgkf/R feels too vane. I think a rebrand is needed to help give this project a bit of its own identity.

Just to lay out some general themes that resonate with me with this project and its identity.

What is part of the language identity

  • developed publicly and visibly
  • is developed first and foremost as an educational and exploratory process, with the hopes of building a community of language enthusiasts
  • embraces what makes R unique - namely lisp-y metaprogramming wrapped in a more familiar data science shell
  • hopes to learn from and reinforce what has made R successful - particularly it's fluency for providing DSLs with enough guardrails to keep the core language familiar. There are good reasons why tidyverse reproductions have sprung up in a number of languages.

What isn't part of the identity

  • rust, although a language I enjoy writing in a lot, is not central to the identity of this project
  • Although I'm fine with allusions or homages to R, "beating" R is not part of the identity

Deviations from the R langauge

An ongoing catalog of intentional deviations from R, in varying stages of maturity

Confident

  • Lowercase equivalents for all uppercase keywords (NULL, TRUE, FALSE, Inf, NA)
  • fn as an alias for function (and replaces R lambda \() syntax)
  • Allow trailing commas in all function calls (eg list(a = 1,))

Undecided

  • [ as a primitive for creating a vector (ie, [1, 2, 3])
  • ( as a primitive for creating a list (ie (1, 2, 3) or (a = 1, b = 2, c = 3))
  • Introduce scalar values
  • Remove [[ as an indexing operator.
    With the above, indexing with a scalar value could return an element (x[1]), while indexing with a vector could return a vector (x[ [1, 2] ]; spaces to emphasize that this also uses the [ operator by passing a vector, [1, 2] - not a separate [[ operator). This is more similar to python's pandas indexing, but doesn't play nicely without scalar values in the first place (otherwise 1 is equivalent to c(1) an we always index by vector anyways).

Needs Feedback

  • A language construct for flagging when non-standard evaluation is enabled. This should exist at function declaration. Undecided on whether it should apply to a whole function or specific arguments.
  • A type system (some early exploration in an R-native implementation in dgkf/typewriter)

Pattern matched assignment

I would love for R to support some sort of destructuring.

Specifically, I think it would look something like this:

list(a, b) <- list(1, 2)

Or, since we already support (, ) syntax as a list, this could be rewritten as:

(a, b) <- (1, 2)

Since R already implements generic assignment calls, this would mean that we'd just have a couple special cases for some basic AST constructors.

Out of bounds subsetting

The following will panic:

x <- 1
x[2]

This is because of the printer. However this out-of bounds indexing should already be caught during the subsetting.
Note that the following will not panic:

x <- 1:3
x[2:4]
#> [1] 2 3

Converting `unreachable!()` and `unimplemented!()` to R errors

In its early form, this project targeted iterating quickly on the general design of the internals, even if it meant cutting a few corners to iterate quickly. This meant foregoing best practices for catching and escalating language errors. If you've tried out the code, you've probably realized that it's not hard to crash a session.

Generally this is because I've taken shortcuts to get code working fast and left many code paths as unimplemented!() - but these will panic and crash a session instead of raising an R error and letting users continue testing the implementation.

Instead, they should raise an R error signal.

Example

For example, you might see an unimplemented code path like this one:

R/src/lang.rs

Line 84 in f4bfaa7

_ => unimplemented!("cannot coerce object into vector"),

Instead, it would be preferred if these raised R errors like this:

R/src/lang.rs

Lines 93 to 96 in f4bfaa7

Numeric(v) => match v[..] {
[Some(x)] => Ok(x as usize),
_ => Err(RSignal::Error(RError::CannotBeCoercedToInteger)),
},

Function closure constructions

I think that passing around functions as data is a under-loved capability of R and I think part of it is how cumbersome it is to build a closure of a partially evaluated function, instead any functions-as-data approach requires building functions that return functions, encapsulating a few arguments.

Instead, it would be amazing if we could say "I'm not done specifying arguments yet"

sum_one_plus <- sum(1)..  # this says "I'm not done" and produces a partially evaluated function
sum_one_plus(2, 3, 4)
# [1] 10

alternatively

sum_one_plus <- sum(1, ..)  # perhaps too similar to ellipsis, although those might need to be named
sum_one_plus(2, 3, 4)
# [1] 10

Under the hood, this produces a new function call which starts to populate the function's internal environment with matched arguments and trims those matched arguments from the remaining function signature.

Implementing visibility / additional metadata for objects

In order to mark an object as visible / invisible (iirc, print currently prints twice), some additional meta-data is needed to store such information. I have seen that this was started to be implemented as part of the Return variant of the Signal enum (

R/src/lang.rs

Line 39 in 006f1a4

Return(Obj, bool), // (value, visibility)
). However, maybe it makes sense to take this as a starting point to introduce the possibility for some metadata for the Obj class? Because there will be other use-cases to store bits alongside R objects, an example being whether a vector is sorted.

Allow non-environment "environments"

In R today, every frame has an environment. However, the likes of dplyr and ggplot2 have shown that there's value in allowing other types of data to act as an environment (akin to using within()).

Although this would be "non-standard", this would allow calling eval() on an expression using an "environment" defined by a data.frame, list, or other data object. This would be similar to attach, but with mutable access to data.

This would also allow someone to implement a get and assign method for their data to allow it to operate as its own "environment" as part of a call stack, which might make it easier to implement dplyr-y interfaces.

Use binary `?` operator as `try-else`

R's error propagation is already quite powerful! All expressions are evaluated as though they can fail, and their errors are reported up through the call stack.

However, R's error capture and recovery is a bit messier. Instead of throwing errors, many codebases resort to returning NULL or some other value of significance to indicate an error, which can be interpreted as a special case by the calling function. This seems to be a pattern that has arisen out of the clunky error handling.

Perhaps we can make code both more readable and more stable by introducing better error handling features:

f <- function() {
  x <- g(x) ? g_default()
  paste0(x, "World")
}

The R world has no shortage of infix operators floating around, so deciding to reuse the binary infix behavior of a symbol should be done cautiously to make the best use of the syntax.

Build error, pattern binding `atom` is ambiguous

error[E0170]: pattern binding `atom` is named the same as one of the variants of the type `parser::Rule`
   --> src/parser.rs:233:9
    |
233 |         atom => unreachable!("invalid postfix operator '{:#?}'", atom),
    |         ^^^^ help: to match on the variant, qualify the path: `parser::Rule::atom`
    |
    = note: `#[deny(bindings_with_variant_name)]` on by default

For more information about this error, try `rustc --explain E0170`.
error: could not compile `r` (lib) due to previous error

An issue with a newer version of rust and an easy fix will be on the way.

Create integers by default if all values are integer-valued

I am wondering what the reason is why R creates doubles by default even if all values are integer-valued.

typeof(1)
#> [1] "double"

Created on 2023-12-27 with reprex v2.0.2

One reason I can think of is that doubles are 64 bit, while integers are 32 bit by default, so creating doubles allows for a greater range of values.

Vector internal altreps

Since this repo seems to have gathered a bit of visibility, I just want to file an issue for what I've been experimenting with recently.

One of the impressive enhancements that R has received is the idea of internal "altreps" for vectors. This avoids allocating huge vectors for things like ranges, which can be completely described by a start, end and increment (lobstr::obj_size(1:2) == lobstr::obj_size(1:1e12)).

Internally within rust, I'd like to have a few altreps:

  • A Vector:
    • The typical R-style vector with all values allocated
  • Any IntoIterator:
    • This should handle cases of ranges, and could be used for some primitives if things like seq were built as primitives.
    • This would be the representation used for performing any vectorized primitives, iterating over (possibly zipped) elements.
    • Ideally, this would mean vectorized operations also produce iterator representations so that they can be materialized lazily (but this butts up against some tough lifetime management and might be a longer term goal than I care to tackle right now) (other considerations on "views" in Jan Vitek's R melts brains, which suggests a pass to fuse expressions to reduce iterations over vectors when evaluating).
  • A Subset:
    • A vector with added information about how it has been subset.
    • The subset indices can be applied to produce a materialized vector or simply used for indexing to provide a mutable interface to the vector values without copying a whole vector into a new value.
    • Allows for improving the performance of indexed assignment (x[[1]][[1]][[1]] <- 3) to avoid excessive allocations as each layer of indexing reallocates a new vector.

To sort of map out an ideal workflow in my eyes, this is how I'd like to see the evaluation of vectors:

x = c(1, 2, 3)  # internally represented by a rust Vec<_>
y = 4:6  # internally represented as an iterator of values in the range 4-6

# creates an internal iterator, returning a new iterator (not materialized values!)
# eg, `x.zip(y).map(|(xi, yi)| xi + yi)`
# note that y is iterated over twice in this case, both iterating over the same source
z = x + y  + y

# when elements of z are needed, it is then collected into an internal Vec<_>
print(z)

I've been writing a series of small experiments to try to build this in a way that covers these use cases. The Vector and Subset use cases are well defined, but I'm finding the IntoIterator case to be a bit challenging.

I think this is an important internal concept to nail down early, so I'm taking my time to do lots of experiments.

Replace stringify! in the `r! {}` macro

The r! {...} macro uses stringify! to convert the token stream into a string, which is then interpreted.
Unfortunately this discards some information, such as white-spaces. I think ideally, the r! macro evaluates the code exactly as it is passed to the macro. This is e.g. important when wanting to rely on r! {} to write tests for the parser.

I noticed this when implementing the sum() primitive:
Currently, it is e.g. impossible to evaluate

r! {sum(-1)}

as the stringify! macro turns sum(-1) into "sum (- 1)", which results in a parser error (I will look into fixing this when I have time).

Add `ls()`

Since it's very unclear what is and isn't implemented, it would be nice to have ls() at least to help users discover what is already available.

This won't be trivial, as primitives are currently trait objects. Ideally, primitives would be automatically registered to some hashmap, which probably means doing some build-time constant construction, modifying the derive macros for Primitives and updating the primitive calling.

Implicit casting of vector types on assignment

One thing I did not anticipate was the implicit casting of R's internal types upon assignment. For example, this is R's behavior:

x <- c(1, 2, 3, 4, 5)
is.numeric(x)
# TRUE

x[[3]] <- "3"
x
# [1] "1" "2" "3" "4" "5"

The current design does not handle such cases and would require a small rework of where mutability is handled. I've got a few different ideas for how things would have to change, but before investing any time in figuring that out, I want to decide whether it's even desirable.

Personally, this feels like more of a footgun than a terribly useful feature. For now I think I'll leave it as.

Possible connection to ExtendR

This is a very fascinating project. I'd like to tell you about our project called ExtendR.
It is "extendr - A safe and user friendly R extension interface using Rust."

This project and our project are loosely connected with Rust, but they aren't necessarily... the same thing.
But we have similar interests, problems, and things we are concerned with. Therefore I'd like to invite you to
communicate with us! On our GitHub page are all the formal discussions and decisions are made, with all the project members.
We also have a fairly active Discord server, were only a small portion of the contributors are, but we have "users",
and basically we chat about every little thing, in a more fluid space there.
You're invited to our Discord as well.
Use this link to join our discord: deprecated

Ellipsis collection into a variable

In R, ellipsis are often collected into a list of values using function(...) { list(...) }, but if we are to add #47, then we don't really have a good way of collecting "the rest". Instead, we should have some way of collecting ellipsis contents into a variable.

f <- function(a, b, ..rest) {
}

This would assign a new variable rest which contains the contents of list(...).

Together with #47, this would mean that we could also do something like:

list(head, ..tail) <- list(1, 2, 3, 4, 5, 6, 7)

Keyword localization

I'll preface this by saying I'm a native english speaker and have never lived through the experience of trying to program in a language that assumes anything but english fluency. I'm certainly not the right person to plan such a feature, but I want to float the idea in case any non-native english speakers want to weigh in on opportunities for improvement on english-first languages and how they might be addressed.

The history of localized languages seems to indicate that there is at least some interest for this, and I'm very interested in experiments for a more native implementation. I'd like to explore translations early so that they're a central feature of the language.

Although I think it's important that there is an assumed language for the portability of code, there could certainly be a way to opt-in to swap out the grammar's keywords and condition message translations. The idea is that a language might have alternative language modes (r --lang=es) that would allow for localized keywords:

#! usr/bin/env r        # #! usr/bin/env r --lang=es
f <- function(...) {    # f <- funcion(...) {
  if (TRUE) {           #   si (CIERTO) {
  } else {              #   } sino {
  }                     #   }
}                       # }

(apologies to any spanish speakers if this isn't a sensible translation!)

Binary releases?

Can you upload binaries to the GitHub release so we can try this out immediately?
I have done such CI setup on other repositories and may be able to help.

Compile to wasm and host web demo

Explore compiling for web and possibly hosting a small browser demo.

Under-informed ideas, looking for input:

  • Not sure how much overhead it's going to take to build a tiny page, but I might split this out into a separate repo if it feels like it's going to clutter things too much.
  • Perhaps feature-gate wasm binding generation? Not sure if it makes more sense to build bindings into the core crate or to repackage just the repl with bindings as a separate project.

Update continuation prompt color

Currently, the continuation prompt (the repl prompt for expressions that span multiple lines) is a bright blue :, but it would be preferred if it was a bit more subtle. Preferably the same color as the first-line prompt.

Following changes to reedline, we can now customize the color of this prompt. First and foremost, adding this functionality will require bumping the version of reedline, which will require updating some code to accommodate reedline's move from using crossterm::style::Color to reedline::Color (a reexport, as I think they're preparing to remove the crossterm::style::Color dependency).

Like the other prompt color style configurations, this one requires that we implement a trait method.

Similar to how these colors are configured via Prompt trait methods,

R/src/r_repl/prompt.rs

Lines 56 to 60 in a902965

/// Get the default prompt color
fn get_prompt_color(&self) -> Color {
Color::White
}

A new method will need to be added to determine the continuation prompt:

pub trait Prompt: Send {
    fn get_prompt_multiline_color(&self) -> nu_ansi_term::Color {
    }
}

Note that unlike the other get_*_color methods in the Prompt trait which return a crossterm::style::Color, this method returns a nu_ansi_term::Color. This is just an artifact of reedterm being mid-transition to using crossterm::style::Color.

Ellipsis expansion into arguments

The inverse of #48 is expanding into ellipsis.

I think this should use the same syntax as #48

args <- list(sep = "-", collapse = "; ")
paste("Hello", c("World", "Friends"), ..args)

I don't think there are situations where it's ambiguous. ..rest for collecting into arguments will only happen when destructuring or in a function signature, while ..rest for passing arguments only happens in calls. This might introduce some confusion in situations where the ..rest argument in a function signature is re-used as an argument to default value for another parameter:

f <- function(a, b = sum(..rest), ..rest)) {
}

In this case, the ..rest at the end of the function call collects args, while the ..rest in names(..rest) passes those arguments to sum.

Suggestion: syntax for non-standard string literals

One could define a function to operate on a single string like so:

`g"` <- function(x) {
  glue::glue(x, .envir = parent.frame())
}

This would basically give you something very similar to Python's f-strings:

g"1 + 1 = {1 + 1}"
#> [1] "1 + 1 = 2"

Obviously, the {glue} package isn't available in this version of R, but you get the idea ๐Ÿ˜„

This idea comes from Julia, which lets you define non-standard string literals in the same way: https://docs.julialang.org/en/v1/manual/metaprogramming/#meta-non-standard-string-literals.

true, false are displayed as "true", "false" instead of TRUE, FALSE

I tried my best to locate the bug, but could not :(.

To reproduce:

$ cargo run
> true
[1] true
> false
[1] false
> c(true, false)
[1]  true false
> c(true, false, "a")
[1]  "true" "false"     "a"
> TRUE
[1] true
> FALSE
[1] false
> c(TRUE, FALSE, "a")
[1]  "true" "false"     "a"
> 

Let's build stack traces!

We currently panic and abort without any traceback of where the error originated. Instead, we should have a call stack, which we can then use to start building sys_calls and calling frames.

I'd like error messages to be reported with a nice call stack, so let's see if we can get that working.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.