higherorderco / hvm-core Goto Github PK

View Code? Open in Web Editor NEW

494.0 17.0 34.0 2.71 MB

License: Apache License 2.0

Rust 98.98% JavaScript 0.33% Shell 0.10% Haskell 0.20% Nix 0.39%

hvm-core's People

Contributors

Stargazers

Watchers

hvm-core's Issues

side effects?

with the removal of constructors, and use of GPU, how can HVM be used for general purpose programming?

in HVM1 the a constructor could be implemented by the runtime as a built-in function, which could then do side effects, so without them how are things like syscalls called?

in fact by reducing the network on the GPU you might not be able to do side effects at all since you don't have access to the CPU OS from the GPU.

Discord invitation expired

Hello! It seems like the discord invitation link expired. Is this intentional?

Build failed on the current main branch

With cargo +nightly bench command I get the following error: https://gist.github.com/zamazan4ik/3d5de124d7e2e92dd1c86f04afb3ba7d

HVM core version: main branch, 57570ec6e9a2282c2cf51d7ebd90ce8b20a9a72f commit
Rustc compiler: 1.76.0-nightly
OS: macOS 13.6

`hvmc compile` throws error "No such file or directory"

This is on hvm-core commit 7d8053d.

Note this seems to be related to an output file, because if I do use a non-.hvmc file, it shows a different error:

Cannot compile .hvmc file into a binary when a string is too long

I got this:

error: reached the recursion limit while instantiating `<(for<'a, 'b> fn(&'a mut run::Net<'b, Strict>, ...) {call_Table_at_S83____ce760551ccb52d84::<...>}, ...) as AsDef>::call::<...>`
    |
note: `<(F, G) as AsDef>::call` defined here
   --> src/run.rs:631:3
    |
631 |   unsafe fn call<M: Mode>(slf: *const Def<Self>, net: &mut Net<M>, port: Port) {
    |   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    = note: the full type name has been written to '/Users/martinjaniczek/Localhost/scratch/hvm-crc32/.hvm/target/release/deps/hvmc-946b8138f3fad536.long-type.txt'

error: could not compile `hvm-core` (lib) due to 1 previous error

when compiling my CRC32 file (https://gist.github.com/Janiczek/fa91fadd53d29ef0e1bbcf8e034a519b) with this string:

testString = "12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678"

(It's 2048 characters long.)

Later I tried to make this smaller program:

str = "12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678"

main = (String.len str)
  
String.len list = (String.len.go list 0)
String.len.go (String.nil) count = count
String.len.go (String.cons x xs) count = (String.len.go xs (+ count 1))

which hvm-lang runs fine, compiles fine, hvm-core runs fine but the compilation gets stuck like this (it's compiling like that for a good few minutes now):

The hvmc file (note -O all was used) contains a lot of duplications it seems:

sscce.hvmc.txt

add support for floats

Depends on #109

I'm making this a separate issue because I think there are more open questions in the design of this.

IMO, floats should have a separate tag at runtime, so that they can be read back correctly. Blurring the edges with the numeric types slightly is fine, IMO, because converting between the representations is easy, but reading back floats as integers is unacceptably opaque IMO.

Luckily, there is room for another tag in the tag enum, so I'd propose we add an F32 tag.

The floating point ops should also be represented as separate ops.

I'm flexible on what the behavior is for using ints in float ops and vice versa; whether that's reinterpreting the bits or casting the value between the types.

"Remove when #73 is merged" hasn't been removed

This line says to remove the associated code when #73 is merged. #73 has been merged, so it might be appropriate to remove it now.

add support for more numeric operations

Add support for u8/i8/u16/i16/u32/i32 numbers and operations (currently, there's just u60, which should be kept).

The relevant code will mostly be in ops.rs and a little in ast.rs.

Number nodes should stay largely as-is; the type of a number will not be explicitly stored in the node; instead, operations will be typed. The one change I would suggest though is switching the representation of a number value from a u64 to an i64, to make sign-extension stuff easy. I don't have any strong opinions on how the Op type is represented, but it needs to be convertible to/from a 16-bit integer. Currently, it's just a simple enum; I'd probably suggest making it a #[repr(transparent)] struct(Type, OpKind), with two separate enums that each fit in a byte, but I'm flexible.

The numbers should be represented such that they can sensibly be implicitly casted between; i.e. 1 should have the same representation in all the integer types, and -1 should have the same representation in all signed integer types (and be uN::MAX in all the unsigned types).

For the syntax, I'd suggest type.op (e.g. u8.-), with op being a shorthand for u60.op (for back-compact). Also, IMO support for negative numbers would be good in the syntax, so #-1, and I think numbers should be printed as signed (so #-1 is printed as that instead of #1152921504606846975; makes the u60 readback weird, but that's fine IMO).

ROCm/HIP support

Great project. I want to help with ROCm/HIP if I have some time. Is PR welcomed?

OOM in a program that shouldn't grow

From Discord.
I would think this is a tail-recursive program that can run with any N, just growing the CPU time but not any memory usage:

Main n = (Sum (PowOfTwo n))

PowOfTwo n = (PowOfTwoHelp 1 n)
PowOfTwoHelp acc 0 = acc
PowOfTwoHelp acc e = (* 2 (PowOfTwoHelp acc (- e 1)))

Sum n = (SumHelp 0 n)
SumHelp acc 0 = acc
SumHelp acc n = (SumHelp (+ acc 1) (- n 1))

But it fails around 2^28

$ hvml run -m 32G -O all -s count-sequential.hvml 28

with

OOM
stack backtrace:
   0: rust_begin_unwind
   1: core::panicking::panic_fmt
   2: core::option::expect_failed
   3: hvmc::run::allocator::Allocator::alloc
   4: <hvmc::run::def::InterpretedDef as hvmc::run::def::AsDef>::call

The compiled routines dilemma. When should we implement it?

Consider the following programs:

// sum_ctr.hvm1

(Sum (Leaf x)) = x
(Sum (Both a b)) = (+ (Sum a) (Sum b))

(Gen 0 x) = (Leaf x)
(Gen n x) = (Both (Gen (- n 1) (* x 2)) (Gen (- n 1) (+ 1 (* x 2))))

Main = (% (Sum (Gen 22 0)) (<< 1 24))

// sum_lam.hvm1

(Match 0 case_zero case_succ) = case_zero
(Match n case_zero case_succ) = (case_succ (- n 1))

Leaf = λx    λleaf λnode (leaf x)
Node = λa λb λleaf λnode (node a b)

Sum = λtree
  let case_leaf = λx x
  let case_node = λa λb (+ ((Sum) a) ((Sum) b))
  (tree case_leaf case_node)

Gen = λn
  let case_zero =    λx ((Leaf) x)
  let case_succ = λm λx ((Node) ((Gen) m (* x 2)) ((Gen) m (+ (* x 2) 1)))
  (Match n case_zero case_succ)

Main = (% ((Sum) ((Gen) 22 0)) (<< 1 24))

// sum_lam.hvm2

leaf = λx    λleaf λnode (leaf x)
node = λa λb λleaf λnode (node a b)

sum = λtree
  let case_leaf = λx x
  let case_node = λa λb (+ (sum a) (sum b))
  (tree case_leaf case_node)

gen = λn
  let case_zero =    λx (leaf x)
  let case_succ = λm λx (node (gen m (* x 2)) (gen m (+ (* x 2) 1)))
  match n { 0: case_zero; 1+m: (case_succ m) }

Main = (sum (gen 22 0))

We implement a micro-benchmark that allocates and sums all elements of a perfect binary tree, in 3 different ways:

1. Using constructors + equations on HVM1
1. Using definitions + lambdas on HVM1
1. Using definitions + lambdas on HVM2

Benchmarking both cases on Apple M1, we have:

v@v ~/vic/dev/hvm-test$ time hvm run -f sum_lam.hvm -t 1 "Main"
14680064
hvm run -f sum_lam.hvm -t 1 "Main"  3.69s user 0.61s system 99% cpu 4.309 total

v@v ~/vic/dev/hvm-test/sum_lam$ time ./target/release/sum_lam run -t 1 "Main"
14680064
./target/release/sum_lam run -t 1 "Main"  2.43s user 0.57s system 99% cpu 3.007 total

v@v ~/vic/dev/hvm-test$ time hvm run -f sum_ctr.hvm -t 1 "Main"
14680064
hvm run -f sum_ctr.hvm -t 1 "Main"  1.22s user 0.37s system 99% cpu 1.596 total

v@v ~/vic/dev/hvm-test/sum_ctr$ time ./target/release/sum_ctr run -t 1 "Main"
14680064
./target/release/sum_ctr run -t 1 "Main"  0.25s user 0.19s system 98% cpu 0.447 total

v@v ~/vic/dev/hvm-test$ hvm-lang compile sum_lam.hvm2 >> sum_lam.hvmc; time hvmc run sum_lam.hvmc
#14680064
hvmc run sum_lam.hvmc  1.23s user 0.29s system 99% cpu 1.524 total

So, we can make the following ranking:

- HVM1-COMP, sum-ctr: 0.447 s
- HVM2-INTR, sum-lam: 1.524 s
- HVM1-INTR, sum-ctr: 1.596 s
- HVM1-COMP, sum-lam: 3.007 s
- HVM1-INTR, sum-lam: 4.309 s

So, as we can see, on the identical sum-lam program, HVM2 (interpreted) outperforms HVM1 (interpreted and compiled) by 2x-3x, all in a single core computations. Since the current CUDA implementation is ~35x faster than the single core implementation, we can expect HVM2 to complete this task in about 0.43s, i.e., almost 100x faster than HVM1. That is really solid and promising.

Yet, thanks to its equational notation, HVM1 allowed us to dispatch calls by pattern-matching on constructors, which, in turn, allowed us to compile routines. This wasn't explored to its full potential, and already made a huge difference, as can be seen on the compiled sum-ctr case, which outperforms all others, because all the numeric operations are compiled to actual machine code, never allocating nodes (like OP1, OP2, etc.). On its current design, HVM2 does not feature constructors nor pattern-matching, which means compiling isn't possible. While this won't make much difference for highly symbolic computations, for most real-world applications the impact would be huge.

The question is: when should we introduce some form of compiled routines to HVM2? For one it would likely result in massive (2x to 100x?) speedups, depending on the workload. In fact, without compiling routines, HVM2 is kinda sabotaging itself, as it would be restricted to being a massively parallel interpreter. Yet, how do we deal with the huge complexity bump that such a thing would entail? Implementing HVM2 as an interpreter with just interaction nodes and numbers in all the target environments (Rust, CUDA, Metal, Vulkan...), in sync and correctly, is already a colossal tas. As such, sadly, I do not think myself or the current team has the scale to implement a compiled version for HVM2 across all these targets.

When we release HVM2, we want people to be able to use it to run their programs in a massively parallel fashion and get a feeling of what is to come. Yet, if it turns out HVM needs 10 cores to do what GHC does in one, that will not paint a good picture. So, what do we do? There are 3 options that make logical sense:

We implement HVM2 as an interpreter on CPU+GPU, release it, and make clear the first version is interpreter only and, even though it is fast, massive speedups can be expected when we compile it, in the future. Due to lacking compilation, numbers wouldn't be nearly as great as they could, and this greatly impacts the public perception and value of the project.
We implement a compiler and restrict ourselves to a single implementation in Rust. That would make the complexity manageable, but would mean we will not be shipping with GPU support, and GPUs are clearly vastly superior than CPUs for interaction net reduction. Due to lacking GPU support, numbers wouldn't be nearly as great as they could.
We somehow raise significant funds (in the order of 100m or so?) and hire to let us build a fully featured compiler and target the GPU. I don't think that's a likely option.

So, I'm leaving this thread as a testament of the current situation and the reality that we must pick one of 2 for the initial release: compiled version or GPU version; and for my own acceptance that we won't have both for a while, sadly.

tl;dr on identical programs, HVM1 is 2-3 faster than HVM2 in a single core, and up to 100x faster on GPUs, but the fact it doesn't support constructors means we can't implement a compiler, which may leave us with interpreter-only, which may completely overshadow the difference in many cases. Sadly, we don't have the budget to have a compilation+GPU on HVM2's first release, so we must pick one.

hvmc compile on the example program results in Rust compile error

Given:
Running example program from the README using latest master (8a9a009)

@add = (<+ a b> (a b))

@sum = (?<(#1 @sumS) a> a)

@sumS = ({2 a b} c)
  & @add ~ (e (d c))
  & @sum ~ (a d)
  & @sum ~ (b e)

@main = a
  & @sum ~ (#24 a)

I get the following error:

error[E0382]: use of moved value: `def_sum`
  --> src/gen.rs:16:79
   |
11 |   let def_sum = Port::new_ref(&host.defs[r#"sum"#]);
   |       ------- move occurs because `def_sum` has type `port::Port`, which does not implement the `Copy` trait
...
14 |   host.get_mut::<HostedDef<Def_main>>(r#"main"#).data.0 = Def_main { def_sum };
   |                                                                      ------- value moved here
15 |   host.get_mut::<HostedDef<Def_sum>>(r#"sum"#).data.0 = Def_sum { def_sumS };
16 |   host.get_mut::<HostedDef<Def_sumS>>(r#"sumS"#).data.0 = Def_sumS { def_add, def_sum };
   |                                                                               ^^^^^^^ value used here after move
   |
help: consider cloning the value if the performance cost is acceptable
   |
14 |   host.get_mut::<HostedDef<Def_main>>(r#"main"#).data.0 = Def_main { def_sum: def_sum.clone() };
   |

Applying the suggested fix resolves the issue and program works correctly

snapshot of branches

[I'm deleting old branches; saving links here for posterity]

26c48fb chore/sc-484/optimize-pre-reduce-pass
d755c94 direct_deref_experiment
a7bf4c1 feature/sc-173/add-n-ary-ctrs-to-hvm-core
c03e4a3 feature/sc-177/implement-parallel-rust-cpu-runtime
8c8dbc5 feature/sc-213/implement-numeric-operations-in-the-hvm-core
2bd8d8e feature/sc-84/describe-the-semantics-of-the-runtime-in
a870726 flat-heap
f3135d7 main
54b4cf1 negative_cost_abstractions
65a6b5a parallel-locking
d322f5a parallel-olivenke
3e82b9b parallel-work-stealing-fast
d3e23f4 parallel-work-stealing-queue
473298e parallel-work-stealing-randomized-order
a61bd56 parallel-work-stealing-randomized-order-add-redex-count-diff-to-total
5fd6ef3 parallel-work-stealing-randomized-order-atomic-total_redex_count
24b5bfd parallel_cpu_wip
9a6ba4c parallel_fast_incomplete
3110de1 parallel_slow_complete
74ae424 ptr-refactor-alt-link
ea9385f redirect_free
335e40a rework-expand
50c7188 rust_transmute_not_zero_cost
17d2f3d sequential
495ac80 tmp_debug

support compilation/loading of dynamic libraries

#111 proposes a way to use IO functions in compiled mode, but it is limited to using hvm-core-defined IO, hvm-core readback, etc. Currently, if we wanted to support compiling hvm-lang programs to binaries, hvm-lang would need to have a comparable copy-the-whole-source-of-the-library-into-a-rust-crate-and-build-it script, which isn't really tenable.

Instead, it would be much more desirable if the hvm-core compilation could be modular – i.e. one could compile an hvm-lang book to a native form and then use it in the hvm-lang CLI. Only the definitions in the compiled book would need to be included in the compilation output, and the hvm-lang CLI would supply extraneous things like IO, lambda calc readback, etc.

Thus, I would propose that we support compiling hvm-core code to dynamic libraries (i.e. #![crate_type = "dylib"] in rust terms), and loading dynamic libraries to extend Hosts.

I don't think this would require too many changes. Building off of #111, it would roughly just replace the main.rs file with:

#![crate_type = "dylib"]

#[no_mangle]
pub fn hvmc_dylib_v0__insert_host(host: &mut Host) {
  gen::insert_into_host(host)
}

(it could also disable the cli feature)

Then loading one of these compiled dylibs would look roughly like:

let lib = libloading::Library::new("/path/to/dylib")?;
let insert_host = unsafe { lib.get::<fn(&mut Host)>(b"hvmc_dylib_v0__insert_host\0") };
insert_host(&mut host);
mem::forget(lib); // not sure if this is necessary, but I think it may need to be leaked

support IO in compiled binaries

Currently, compiled binaries are generated in a way that assumes all definitions are knowable at compile-time, but this is not the case for IO defs.

Right now the compiled output looks roughly like:

@foo = *
@bar = @foo

// gen.rs

pub fn host() -> Host {
  let mut host = Host::default();
  host.insert_def(r#"foo"#, DefRef::Static(unsafe { &*DEF_foo }));
  host.insert_def(r#"bar"#, DefRef::Static(unsafe { &*DEF_bar }));
  host
}

pub const DEF_foo: *const Def = const { &Def::new(LabSet::NONE, (call_foo, call_foo)) }.upcast();
pub const DEF_bar: *const Def = const { &Def::new(LabSet::NONE, (call_bar, call_bar)) }.upcast();

pub fn call_foo<M: Mode>(net: &mut Net<M>, to: Port) {
  let t0 = Trg::port(to);
  net.link_trg(t0, Trg::port(Port::ERA));
}

pub fn call_bar<M: Mode>(net: &mut Net<M>, to: Port) {
  let t0 = Trg::port(to);
  net.link_trg(t0, Trg::port(Port::new_ref(unsafe { &*DEF_foo })));
}

Instead, it should look like:

// gen.rs

pub fn insert_into_host(host: &mut Host) {
  host.insert_def(r#"foo"#, unsafe { HostedDef::new_hosted(LabSet::NONE, Def_foo::default()) });
  host.insert_def(r#"bar"#, unsafe { HostedDef::new_hosted(LabSet::NONE, Def_bar::default()) });

  let def_foo = Port::new_ref(&host.defs[r#"foo"#]);

  host.get_mut::<HostedDef<Def_bar>>(r#"bar"#).data.0 = Def_bar { def_foo };
}

#[derive(Default)]
struct Def_foo {}

impl AsHostedDef for Def_foo {
  fn call<M: Mode>(slf: &Def<Self>, net: &mut Net<M>, port: Port) {
    let t0 = Trg::port(port);
    net.link_trg(t0, Trg::port(Port::ERA));
  }
}

#[derive(Default)]
struct Def_bar {
  def_foo: Port,
}

impl AsHostedDef for Def_bar {
  fn call<M: Mode>(slf: &Def<Self>, net: &mut Net<M>, port: Port) {
    let t0 = Trg::port(port);
    net.link_trg(t0, Trg::port(slf.data.def_foo.clone()));
  }
}

(using this Host::get_mut api)

The insert_into_host function is somewhat peculiar because it needs to be able to handle circular references; thus, first, all of the definitions are inserted with meaningless default state, and then are all initialized with their dependencies.

This naturally supports using pre-defined IO defs with minimal modifications.

Then, in the compiled binary, one would simply write:

let mut host = create_host(&Book::default());
gen::insert_into_host(&mut host);

This would, of course, only support the IO operations hvm-core supports; I'll be opening another issue shortly for how that could be addressed.

"Entered unreachable code" when applying Op2 to non-numeric argument

data Foo.T = Foo
main = (^ 28416 Foo)

⬇️

01800004000000d8 01800005000000d8
thread '<unnamed>' panicked at 'internal error: entered unreachable code', /Users/martinjaniczek/.cargo/git/checkouts/hvm-core-31580e46fc731f4f/7d8053d/src/run.rs:729:67
stack backtrace:
   0:        0x100342e74 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h6d1ef757ca6aa08c
   1:        0x1002731f4 - core::fmt::write::h24aade456e61cbc1
   2:        0x100323a9c - std::io::Write::write_fmt::h5df313f4d7305ca8
   3:        0x1003466a8 - std::sys_common::backtrace::print::h509ed65c6d882fc0
   4:        0x1003462e0 - std::panicking::default_hook::{{closure}}::h9d25327d11b3fd1b
   5:        0x1003471cc - std::panicking::rust_panic_with_hook::h1d6008cc2f3fe794
   6:        0x100346d24 - std::panicking::begin_panic_handler::{{closure}}::hd15752c0e75f10f5
   7:        0x100346cb8 - std::sys_common::backtrace::__rust_end_short_backtrace::h6334164110c20096
   8:        0x100346cac - _rust_begin_unwind
   9:        0x100348d60 - core::panicking::panic_fmt::h09e6dc0b209e1ff9
  10:        0x100348e88 - core::panicking::panic::hd967f53605ad7120
  11:        0x100284f6c - std::sys_common::backtrace::__rust_begin_short_backtrace::haf3e1d75ee7a05a4
  12:        0x100285e5c - core::ops::function::FnOnce::call_once{{vtable.shim}}::hb62ee96d3c082907
  13:        0x100347aec - std::sys::unix::thread::Thread::new::thread_start::h4b97637403528bf6
  14:        0x18def6034 - __pthread_joiner_wake

(and the hvml run process hangs, unless it's ran with -1)

higherorderco / hvm-core Goto Github PK

hvm-core's People

Contributors

Stargazers

Watchers

Forkers

hvm-core's Issues

Recommend Projects

Recommend Topics

Recommend Org