Giter Club home page Giter Club logo

tokyo-rust-meetup-2023-12-12's Introduction

Presentation Notes

This presentation exists to challenge you, to have you experiment.

Although I hope that what I present will be useful, but more importantly, will have you question how your code should or could be written in the implementation, while maintaining a clean API.

If there is one word you should consider mantra for the following, it is:

  • Locality

As a general rule, Rust will give you more expressivity when you make your reasoning more local.

#![no_implicit_prelude]

When you want to either make your code more explicit, or want to better understand the structure of the standard library, consider using :

#![no_implicit_prelude]

This will do 2 major things. First is that std::prelude::* will it no longer implicitly be imported into your project in all files (with #![no_std] it would be core::prelude::*).

Second is that Cargo will no longer implicitly add extern crate with each dependency in your Cargo.toml manifest

There are 3 major crates that you can already use with the complier. If you use them all it looks like this :

#![no_implicit_prelude]
extern crate core;  // fundamentals, lang items, intrinsics
extern crate alloc; // fundamental data structures that require a heap
extern crate std;   // re-exports `core` and `alloc`, 
                    //   but also extends them with OS specific things

One might be surprised by how much can be done with just core and alloc.

You then need to explicitly import anything you need. Give it a try! you'll get to see funny things like alloc::alloc::alloc.

Labels

LABELS 1

Labels were initially introduced for nested loops

Using the label we know explicitly where the value breaks to :

let [mut i, mut j] = [5, 0];
let _loop_output = 'loop_label: loop {
  [i,j] = [i+j+1,j+1];
  if i > 20 {
    break 'loop_label j;
  }
  '_inner: loop {
    if i == 0 {
      continue 'loop_label;
    }
    i -= 1
  }
};

LABELS 2

Labels could also be used for other loops, I recommend naming the labels on what that are computing.

let mut i = 0..15;
'find_10: while let Some(val) = i.next() {
  if val == 10 {
    break 'find_10;
  }
}
'find_10: for each in 0..15 {
  if each == 10 {
    break 'find_10;
  }
}

LABELS 3

Labels have also been added to blocks recently, This is very powerful for "happy path programming".

let other = 5;
let ascii = b"martians";
let _out = 'check: {
  let [b'm', b'a', b'r', rest @ ..] = ascii else { break 'check None };
  let Some(thing) = rest.get(2) else { break 'check None };
  Some(*thing + other)
};

LABELS 4

We can factor out the incrementally with move closures.

let check = move |ascii:&[u8]| 'check: {
  let [b'm', b'a', b'r', rest @ ..] = ascii else { break 'check None };
  let Some(thing) = rest.get(2) else { break 'check None };
  Some(*thing + other)
};
let _out = check(ascii);

LABELS 5

We can change break to return

let check = move |ascii:&[u8]| {
  let [b'm', b'a', b'r', rest @ ..] = ascii else { return None };
  let Some(thing) = rest.get(2) else { return None };
  Some(*thing + other)
};
let _out = check(ascii);

LABELS 6

it's now posible to turn into a function, and we learn that we need to pass in other

fn check_(ascii:&[u8], other: u8)-> Option<u8> {
  let [b'm', b'a', b'r', rest @ ..] = ascii else { return None };
  let Some(thing) = rest.get(2) else { return None };
  Some(*thing + other)
}
let _out = check_(ascii, other);

It's now possible to extract the function.

Labels Conclusion

  • Labels give access to powerful control flow.
    • consider the use of labels for fallible operations instead of making a fancy error type.

Local Declarative Macros

MACRO 1

let's analyze this toy program.

let mut vector = alloc::vec::Vec::new();
let mut program_counter = 1;

'till_ten : loop{
  if vector.len() > 10 { break 'till_ten vector }
  program_counter = match program_counter {
    1 => {vector.push(1); /* more code*/ ; 2}
    2 => {/* more code*/ ; 3}
    _ => { std::println!("now loop"); vector.push(1); /* more code*/ ; 2}
  }
}

MACRO 2

We see there is some repetition so we try to abstract out the the repetition.

let mut vector = alloc::vec::Vec::new();
let p1 = || {vector.push(1); /* more code*/ ; 2};

let mut program_counter = 1;

'till_ten : loop{
  if vector.len()>10 { break 'till_ten vector }
  program_counter = match program_counter {
    1 => p1(),
    2 => {/* more code*/ ; 3}
    _ => {std::println!("now loop"); p1()}
  }
}

We will find that we can't abstract this out with a closure because of the capture of a &mut alloc::vec::Vec

MACRO 3

Interestingly, We could abstract this out with a declarative macro.

let mut vector = alloc::vec::Vec::new();
let mut program_counter = 1;

#[cfg_attr(rustfmt, rustfmt_skip)]
macro_rules! p1 {() => {  {vector.push(1); /* more code*/ ; 2}  };}

'till_ten : loop{
  if vector.len() > 10 { break 'till_ten vector }
  program_counter = match program_counter {
    1 => p1!(),
    2 => {/* more code*/ ; 3}
    _ => { std::println!("now loop"); p1!()}
  }
}

This might seem trivial, but note how we have worked around the type system.

A more hairy example

MACRO 4

Let's formulate a more hairy example.

pub async fn use_mut() -> Result<(), ()> {
  const MAX_OPS: usize = 10;
  let mut num = Wrapped { val: 0_i32 };
  let operation_count = &mut 0_usize;

  let scope = 'operate: {
    #[cfg_attr(rustfmt, rustfmt_skip)]

    macro_rules! op {($($PUNCT:tt $NUM:literal)*) => {$(

      if *operation_count > MAX_OPS { 
        break 'operate *num.inspect() 
      }
      *operation_count+=1;
      num = num.horror({ 
        |x: &mut i32| { *x $PUNCT $NUM; Ok(()) }  
      }).await?;

    )*};}

    op! {+=2 -=1 *=7 +=1 /=2 +=4 +=1 +=5}
    *num.inspect()
  };

  std::println!("{}", scope);
  Ok(())
}

MACRO 5

Let's formulate a more hairy example. Lets break it down.

first we have the elements before the macro in lexical scope

pub async fn use_mut() -> Result<(), ()> {
  const MAX_OPS: usize = 10;
  let mut num = Wrapped { val: 0_i32 };
  let operation_count = &mut 0_usize;

  let scope = 'operate: {
    /*
      note how op can refer to earlier symbols including :
      `operation_count`, `MAX_OPS`, `num`, and `'operate`
    
      we can shortcut localy using our `break 'operate` or
      return early on errors with `?`, and `.await` futures.
    */
  };

  std::println!("{}", scope);
  Ok(())
}

MACRO 6

Lets now look at the macro and it's use.

Try to imagine the function signature you would need, and how many arguments you would need to pass...

macro_rules! op {($($PUNCT:tt $NUM:literal)*) => {$(

  if *operation_count > MAX_OPS { 
    break 'operate *num.inspect() 
  }
  *operation_count+=1;
  num = num.horror({ 
    |x: &mut i32| { *x $PUNCT $NUM; Ok(()) }  
  }).await?;

)*};}

op! {+=2 -=1 *=7 +=1 /=2 +=4 +=1 +=5}
*num.inspect()

If it was a closure, imagine the lifetime issues you could have. And you wouldn't have access to the labels in either case.

Local Macro Conclusions

  • Consider using macros as locally as possible.
    • lexical scope is powerful, it can reduces overall complexity. The deeper the macro, the more lexical scope you have access to.
    • It's okay if a macro is only ever called once.

First steps into unsafe code

UNSAFE 1

Unsafe code has many forms, the the one that is most relevant has to do with memory allocation.

To explore this lets make the most basic data structure. A user defined box.

pub(crate) struct MyBox<T> {
  /// the field is private to ensure control of the allocation.
  ptr: *mut T,
}

This type has one job, own a value on the heap. It holds a pointer to that value.

UNSAFE 2

Let's consider the signatures and behavior of functions we want.

  • pub fn new(val: T) -> Self
    • allocates space and move val : T into the allocation
  • pub fn unbox(self: Self) -> T
    • consume self and it's allocation, moving the inner T out

We will want to guarantee that when do not unbox it deallocates safely, for this we need an implementation of core::ops::Drop

We also want to make accessing the inner T, cheap. For that we will have core::ops::Deref, and core::ops::DerefMut.

UNSAFE 3

We start by implementing core::ops::Drop. The idea is that it should safely drop, even before it can even be made.

use alloc::alloc::Layout;
impl<T> MyBox<T> {
  /// Size of the struct in bytes and it's aligment
  const LAYOUT: Layout = Layout::new::<T>();
  /*
    ... other associated constants and methods
  */
}

// The core::ops::Drop::drop gets called first,
//   then each field of the struct will be dropped after.
impl<T> core::ops::Drop for MyBox<T> {
  // Note how we have exclusive access with &mut
  fn drop(self: &mut Self) {
    unsafe {
      self.ptr.drop_in_place();
      // SAFETY: The `self.ptr` does not have a 
      //         drop implementation, so we need to 
      //         drop what it points to it ourselves.
      alloc::alloc::dealloc(
        self.ptr as *mut u8, 
        Self::LAYOUT,
      );
    }
  }
}

UNSAFE 4

We can now allocate with new. knowing it will get dropped.

pub fn new(val: T) -> Self {
  unsafe {
    let ptr = alloc::alloc::alloc(Self::LAYOUT) as *mut T;
    ptr.write(val);
    MyBox { ptr }
  }
}

UNSAFE 5

We might feel confident about our code, but we should do some sanity checks. This can be done by running tests with the miri tool.

You can get details on installation here : https://github.com/rust-lang/miri

If you have the nightly toolchain with rustup, you need only run this.

rustup +nightly component add miri

With miri installed, we can now make tests with miri, to check for undefined behavior.

UNSAFE 6

We can first start with something we know will blow up. Let's dereference a null pointer!

  #[test]
  #[cfg(miri)]
  fn obviously_fails() {
    let null = core::ptr::null();
    let _x: i32 = unsafe { *null };
  }

run the tests

cargo miri test

And it fails!

test basic_box::obviously_fails ... error: Undefined Behavior: memory access failed: null pointer is a dangling pointer (it has no provenance)

UNSAFE 7

We can now try with our functions.

#[test]
#[cfg(miri)]
fn constructor_destructor() {
  let x = MyBox::<i32>::new(5); // constructor
  core::mem::drop(x) // destructor
}

and ...

test basic_box::constructor_destructor ... ok

This passes!

UNSAFE 8

But we should try the base case. A zero sized type.

#[test]
#[cfg(miri)]
fn constructor_zst() {
  let x = MyBox::<()>::new(()); // constructor
  core::mem::drop(x) // destructor
}

and ...

test basic_box::constructor_zst ... error: Undefined Behavior: creating allocation with size 0

UNSAFE 9

We could make this work, but instead we are going to just define it out of existence.

use alloc::alloc::Layout;
impl<T> MyBox<T> {
  /// Size of the struct in bytes and it's aligment
  const LAYOUT: Layout = Layout::new::<T>();
  /// We don't support zero sized types
  const NOT_ZST: () = core::assert!(Self::LAYOUT.size() != 0);
  pub fn new(val: T) -> Self {
    // since all values must be created with 
    // `new` this constant will always be 
    // checked at compile time
    let _ = Self::NOT_ZST;
    unsafe {
      let ptr = alloc::alloc::alloc(Self::LAYOUT) as *mut T;
      ptr.write(val);
      MyBox { ptr }
    }
  }
  /*
    ... other associated methods
  */
}

UNSAFE 10

We can now ignore the test, as it's unreachable

#[test]
#[cfg_attr(miri, ignore)]
#[cfg(miri)]
fn constructor_zst() { /* ... */ }

UNSAFE 11

We can now work on unbox.

Since we deallocate the ptr, we don't want to drop Self, as it would "double free" by calling alloc::alloc::dealloc again. core::mem::ManuallyDrop stops the drop from happening

impl<T> MyBox<T> {
  const LAYOUT: alloc::alloc::Layout = /* ... */;
  const NOT_ZST: () = /* ... */;
  pub fn new(val: T) -> Self { /* ... */ }

  /// the standard library box is "magic" 
  /// when using `*` deref,
  /// we need this to pull out the value.
  pub fn unbox(self: Self) -> T {
    unsafe {
      let out = self.ptr.read();
      let this = core::mem::ManuallyDrop::new(self);
      // SAFETY: We have and owned `Self`, 
      //         the `self.ptr` is not shared.
      alloc::alloc::dealloc(
        this.ptr as *mut u8, 
        Self::LAYOUT,
      );

      out
    }
  }
}

UNSAFE 12

We add another miri tests.

#[test]
#[cfg(miri)]
fn box_unbox() {
  let x = MyBox::<i32>::new(5); // box
  x.unbox(); // unbox
}

It passes.

UNSAFE 13

Now we just need to do the same with core::ops::Deref and core::ops::DerefMut.

impl<T> core::ops::Deref for MyBox<T> {
  type Target = T;
  fn deref(self: &Self) -> &Self::Target {
    unsafe { &*self.ptr }
  }
}
impl<T> core::ops::DerefMut for MyBox<T> {
  fn deref_mut(self: &mut Self) -> &mut Self::Target {
    unsafe { &mut *self.ptr }
  }
}

UNSAFE 14

And their tests.

#[test]
#[cfg(miri)]
fn deref() {
  let x = MyBox::<i32>::new(5);
  assert!(*x == 5);
}
#[test]
#[cfg(miri)]
fn deref_mut() {
  let mut x = MyBox::<i32>::new(5);
  *x = 25;
  assert!(*x == 25);
}

And we are done!

Unsafe Code conclusions

  • Unsafe code can be reasoned about.
    • most Rust programmers do not need to write any.
    • But all Rust developers should experiment with it.
    • Use miri!

Final thoughts

  • Make use of code patterns that are effective locally.

  • Consider the cost of encoding everything into the type system.

    • If one is not careful the cost can be very high.
    • The added complexity can be pernicious.

BONUS ROUND 1

SEXPR 1

Here is an example of a function that transforms a sexpr(prefix notation) into stack based bytecode (postfix). The arity is encodes as a number.

Note how the label 'dfs says what process the loop is doing (depth first search).

#[derive(Debug, PartialEq)]
enum Atom { Plus, Mul, Num(i32) }

#[derive(PartialEq)]
enum Sexpr { Atom(Atom), List(Vec<Sexpr>) }

fn sexpr_to_postfix(sexpr: Sexpr) -> Result<Vec<Atom>, ()> {
  let mut postfix = Vec::new();
  let mut stack = Vec::new();
  stack.push(sexpr);
  'dfs: loop {
    match stack.pop() {
      None => break 'dfs Result::Ok(postfix),
      Some(val) => match val {
        Sexpr::Atom(atom) => postfix.push(atom),
        Sexpr::List(mut list) => {
          if let Some(Sexpr::Atom(Atom::Num(_)) | Sexpr::List(_)) = list.get(0) {
            break 'dfs Result::Err(());
          }

          let arity = (list.len() - 1) as i32;
          list.push(Sexpr::Atom(Atom::Num(arity)));
          let op = list.swap_remove(0);

          stack.push(op);
          stack.extend(list);
        }
      }
    }
  }
}

SEXPR 2

We expect this transformation :

(+ 2 4 (* 3 7) 4) => [4 7 3 2 * 4 2 4 +]

First some helper definitions

#[test]
#[cfg(test)]
fn use_sexpr_to_postfix() {
  // some helper definitions
  const PLUS: Sexpr = Sexpr::Atom(Atom::Plus);
  const MUL: Sexpr = Sexpr::Atom(Atom::Mul);
  const NUM: fn(i32) -> Sexpr = |n| Sexpr::Atom(Atom::Num(n));

  /// local declarative macro, talk about it later.
  #[cfg_attr(rustfmt, rustfmt_skip)]
  macro_rules! S {($($EXPR:expr)*) => { 
    Sexpr::List(alloc::vec![$($EXPR),*])  };
  }

  type A = Atom;

  /*
    the test code
  */
}

SEXPR 3

Now the tests

#[test]
#[cfg(test)]
fn use_sexpr_to_postfix() {
  /*
    the helper definitions
  */

  #[cfg_attr(rustfmt, rustfmt_skip)]
  let sexpr = 
    S!(PLUS NUM(2) 
            NUM(4) 
            S!(MUL NUM(3) 
                   NUM(7)) 
            NUM(4));

  #[cfg_attr(rustfmt, rustfmt_skip)]
  let expected = &[
    A::Num(4), 
      A::Num(7), 
      A::Num(3), 
      A::Num(2), A::Mul, 
    A::Num(4), 
    A::Num(2), 
    A::Num(4), A::Plus
  ];

  let Result::Ok(postfix) = sexpr_to_postfix(sexpr) else { core::panic!() };
  core::assert_eq!(&postfix[..], expected)
}

SEXPR 4

Earlier we used a macro.

#[cfg_attr(rustfmt, rustfmt_skip)]
macro_rules! S {($($EXPR:expr)*) => { 
  Sexpr::List(alloc::vec![$($EXPR),*])  };
}

This macro exists to simplify the reading of the Sexpr test. The intent is to make it as similar to a lisp S-expression.

#[cfg_attr(rustfmt, rustfmt_skip)]
let sexpr = 
  S!(PLUS NUM(2)         // (+ 2
          NUM(4)         //    4
          S!(MUL NUM(3)  //    (* 3
                 NUM(7)) //       7)
          NUM(4));       //    4)

SEXPR 5

The use of a macro raises questions. Why not just a function? Won't this complicate the API? Do I have to learn the DSL for this macro? Won't my coworkers complain?

Why not a function?

fn s<const LEN : usize>(list : [Sexpr; LEN]) -> Sexpr {
  let mut v = Vec::new();
  for each in list {
    v.push(each);
  }
  Sexpr::List(v)
}
/*
  ...
*/
#[cfg_attr(rustfmt, rustfmt_skip)]
let sexpr =
  s([PLUS, NUM(2),           // (+ 2
           NUM(4),           //    4
           s([MUL, NUM(3),   //    (* 3
                   NUM(7)]), //       7)
           NUM(4)]);         //    4)

SEXPR 5

Let's remember, we just wanted to have something for a test. To do so we had to define a type signature. If I used &[Sexpr] I would need to use Clone, forcing me to implement Clone. To avoid this I've used const Generics, increasing complexity.

  • What about the API?

    • Simple, it's used one locally. It's not exported, no dependent even has to know about it's existence.
  • But it's a DSL!

    • Yes, but it's also brain-dead simple, thread the arguments to alloc::vec!.
  • Won't my coworkers complain?

    • If they do, and can't be reasoned with, just inline it, and then see what they think.

BONUS ROUND 2 (ASYNC)

ASYNC 1

We start by procuring an async executor using only the standard library.

pub(crate) fn blocks<T>(
  a: impl core::future::Future<Output = T>
) -> T {
  use core::task::*;
  fn make_raw_waker() -> RawWaker {
    RawWaker::new(
      core::ptr::null(), 
      &RawWakerVTable::new(
        |_| make_raw_waker(), 
        |_| (), 
        |_| (), 
        |_| (),
      ),
    )
  }
  let waker = unsafe { 
    core::task::Waker::from_raw(make_raw_waker()) 
  };
  let mut context = Context::from_waker(&waker);
  let mut pinned = core::pin::pin!(a);
  'execute: loop {
    match pinned.as_mut().poll(&mut context) {
      Poll::Ready(out) => break 'execute out,
      Poll::Pending => continue 'execute,
    }
  }
}

Our Slide-ware executor (don't think too hard about it ... ask me about it later).

ASYNC 2

let's now consider this type that gives limited access to it's inner value.

The important thing to note is the hairy type signature of horror, it only takes an exclusive reference to the inner value, meaning if you want to take arguments, you are going to need curry values into a closure. Because it returns a future, lifetimes could get real hairy.

use core::result::Result::{self, *};

struct Wrapped<T> {
  val: T,
}
impl<T> Wrapped<T> {
  pub fn inspect(self: &Self) -> &T {
    &self.val
  }
  pub async fn horror<F>(
    mut self: Self, 
    mut f: F
  ) -> Result<Self, ()>
  where
    F: core::ops::FnMut(&mut T) -> Result<(), ()>,
  {
    /*
      ...
    */
  }
}

ASYNC 3

The horror method impl for the curious uses a zero sized future.

  impl<T> Wrapped<T> {
    pub fn inspect(self: &Self) -> &T {
      &self.val
    }
    pub async fn horror<F>(mut self: Self, mut f: F) -> Result<Self, ()>
    where
      F: core::ops::FnMut(&mut T) -> Result<(), ()>,
    {
      /*
        `Stutter` future implementation  
      */


      Stutter.await;
      f(&mut self.val)?;
      Stutter.await;
      Ok(self)
    }
  }

ASYNC 4

What is Stutter? is a zero sized future! (lets not loiter though ...)

use core::{clone::Clone, sync::atomic, task::*};
/// spuriously yields
struct Stutter;
static WAIT: atomic::AtomicBool = atomic::AtomicBool::new(true);

impl core::future::Future for Stutter {
  type Output = ();
  fn poll(
    self: core::pin::Pin<&mut Self>, 
    cx: &mut Context<'_>,
  ) -> Poll<Self::Output> {

    let wait = WAIT.load(atomic::Ordering::Relaxed);
    WAIT.store(!wait, atomic::Ordering::Relaxed);

    cx.waker().clone().wake();

    if wait {
      Poll::Pending
    } else {
      Poll::Ready(())
    }
  }
}

tokyo-rust-meetup-2023-12-12's People

Contributors

clarkeremy avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.