calebwin / emu Goto Github PK

View Code? Open in Web Editor NEW

1.6K 38.0 54.0 350.1 MB

The write-once-run-anywhere GPGPU library for Rust

Home Page: https://calebwin.github.io/emu

License: MIT License

Rust 76.94% Shell 23.06%

emu gpu gpgpu gpu-computing gpu-acceleration gpu-programming rust

emu's People

Contributors

Stargazers

Watchers

emu's Issues

Initial overhead

Hello !

First, thanks for this crate and your contribution to the Rust community. This is amazingly simple to use, even for me who has not ever touch OpenCL.

I tried running a simple benchmark program and I found the result unsatisfying.
I suppose this example is so trivial, the initial cost of initializing the opencl environment each time is heavy and this is slowing down the entire function call.

The program is (based on your example) :

#![feature(test)]

extern crate em;
extern crate ocl;

extern crate test;

use em::emu;

emu! {
    function logistic(x [f32]) {
        x[..] = 1 / (1 + pow(E, -x[..]));
    }

    pub fn logistic(x: &mut Vec<f32>);
}

pub fn logistic_cpu(x: &mut Vec<f32>) {
    let mut result = Vec::new();

    for value in x {
        result.push(1.0 / (1.0 + 2.71828182846_f32.powf(-*value)))
    }
}

#[cfg(test)]
mod tests {
    use super::*;
    use test::Bencher;

    #[bench]
    fn logistic_opencl(b: &mut Bencher) {
        let mut test_data = vec![0.9, 4.9, 4.8, 3.9, 1.3, 4.8, 9.13, -0.16, 81.20, -16.0, 0.9, 4.9, 4.8, 3.9, 1.3, 4.8, 9.13, -0.16, 81.20, -16.0, 0.9, 4.9, 4.8, 3.9, 1.3, 4.8, 9.13, -0.16, 81.20, -16.0, 0.9, 4.9, 4.8, 3.9, 1.3, 4.8, 9.13, -0.16, 81.20, -16.0, 0.9, 4.9, 4.8, 3.9, 1.3, 4.8, 9.13, -0.16];
        b.iter(|| logistic(&mut test_data));
        println!("OpenCL : {:?}", test_data);
    }

    #[bench]
    fn logistic_non_opencl(c: &mut Bencher) {
        let mut test_data = vec![0.9, 4.9, 4.8, 3.9, 1.3, 4.8, 9.13, -0.16, 81.20, -16.0, 0.9, 4.9, 4.8, 3.9, 1.3, 4.8, 9.13, -0.16, 81.20, -16.0, 0.9, 4.9, 4.8, 3.9, 1.3, 4.8, 9.13, -0.16, 81.20, -16.0, 0.9, 4.9, 4.8, 3.9, 1.3, 4.8, 9.13, -0.16, 81.20, -16.0, 0.9, 4.9, 4.8, 3.9, 1.3, 4.8, 9.13, -0.16];
        c.iter(|| logistic_cpu(&mut test_data));
        println!("non OpenCL : {:?}", test_data);
    }
}

And the result is :

test tests::logistic_non_opencl ... bench:         561 ns/iter (+/- 66)
test tests::logistic_opencl     ... bench:  72,081,552 ns/iter (+/- 4,863,815)

My initial intention was to write a recurrent network as efficiently as possible. Do you think using Emu is a good choice ?

Unable to run on Ubuntu 20.04 due to python command

Hey! I noticed that when I tried to run the basic example that I was unable to. The reason seems to be that it runs the python command. Unfortunately, on my system (Ubuntu 20.04), I didn't have this command set, as I only had python3. I got around this by using update-alternatives as per this documentation.

Here is the error I got:

cargo run --features glsl-compile --release --example basic
   Compiling shaderc-sys v0.6.2
The following warnings were emitted during compilation:

warning: Checking for system installed libraries.  Use --features = build-from-source to disable this behavior
warning: System installed library not found.  Falling back to build from source

error: failed to run custom build command for `shaderc-sys v0.6.2`

Caused by:
  process didn't exit successfully: `/home/vax/code/emu/target/release/build/shaderc-sys-53b173bc91346eab/build-script-build` (exit code: 101)
--- stdout
cargo:warning=Checking for system installed libraries.  Use --features = build-from-source to disable this behavior
cargo:warning=System installed library not found.  Falling back to build from source

--- stderr
thread 'main' panicked at '

couldn't find required command: "python"

', /home/vax/.cargo/registry/src/github.com-1ecc6299db9ec823/shaderc-sys-0.6.2/build/cmd_finder.rs:50:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I've written an article about Emu on Russian

Link: https://habr.com/en/post/454678/

I'll translate it to English in the near future, but someone else can do it instead (just write a comment before you'll do it).

`.finish()` stage of the shader compilation segfaults on NVIDIA Vulkan driver

I am running on Ubuntu 22.04 with emu_core 0.1.1, info()?.name is "NVIDIA GeForce RTX 3050 Ti Laptop GPU", the driver is version 515 of the official NVIDIA Linux driver, installed through APT.

The problem seems to be related to the presence of a storage buffer, the one called prec_mat: if I remove it in both in the shader and in the SpirvBuilder, the issue does not manifest. I am using rust-gpu to write my shader. Note that if my integrated AMD GPU is selected, the code runs fine.

Below is a comprehensive stack trace:

___lldb_unnamed_symbol462 (@___lldb_unnamed_symbol462:301)
___lldb_unnamed_symbol11106 (@___lldb_unnamed_symbol11106:2200)
___lldb_unnamed_symbol11107 (@___lldb_unnamed_symbol11107:19)
___lldb_unnamed_symbol16036 (@___lldb_unnamed_symbol16036:120)
___lldb_unnamed_symbol11528 (@___lldb_unnamed_symbol11528:60)
___lldb_unnamed_symbol11308 (@___lldb_unnamed_symbol11308:258)
_nv002nvvm (@_nv002nvvm:11)
___lldb_unnamed_symbol58166 (@___lldb_unnamed_symbol58166:66)
___lldb_unnamed_symbol58168 (@___lldb_unnamed_symbol58168:583)
___lldb_unnamed_symbol58169 (@___lldb_unnamed_symbol58169:146)
___lldb_unnamed_symbol58181 (@___lldb_unnamed_symbol58181:164)
___lldb_unnamed_symbol58182 (@___lldb_unnamed_symbol58182:8)
___lldb_unnamed_symbol58172 (@___lldb_unnamed_symbol58172:148)
___lldb_unnamed_symbol58204 (@___lldb_unnamed_symbol58204:91)
___lldb_unnamed_symbol57964 (@___lldb_unnamed_symbol57964:70)
___lldb_unnamed_symbol57965 (@___lldb_unnamed_symbol57965:28)
ash::vk::features::DeviceFnV1_0::create_compute_pipelines (/home/mikidep/.cargo/registry/src/github.com-1ecc6299db9ec823/ash-0.31.0/src/vk/features.rs:5094)
gfx_backend_vulkan::device::<impl gfx_hal::device::Device<gfx_backend_vulkan::Backend> for gfx_backend_vulkan::Device>::create_compute_pipeline (/home/mikidep/.cargo/registry/src/github.com-1ecc6299db9ec823/gfx-backend-vulkan-0.5.11/src/device.rs:1044)
wgpu_core::device::<impl wgpu_core::hub::Global<G>>::device_create_compute_pipeline (/home/mikidep/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-core-0.5.6/src/device/mod.rs:1932)
wgpu_device_create_compute_pipeline (/home/mikidep/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-native-0.5.1/src/device.rs:347)
wgpu::Device::create_compute_pipeline (/home/mikidep/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.5.2/src/lib.rs:906)
emu_core::device::Device::compile (/home/mikidep/.cargo/registry/src/github.com-1ecc6299db9ec823/emu_core-0.1.1/src/device.rs:611)
emu_core::compile::SpirvOrFinished<P,C>::finish (/home/mikidep/.cargo/registry/src/github.com-1ecc6299db9ec823/emu_core-0.1.1/src/compile.rs:305)
scene_emu::main (/home/mikidep/Documenti/Codice/scene-emu/src/main.rs:104)
core::ops::function::FnOnce::call_once (@core::ops::function::FnOnce::call_once:6)
std::sys_common::backtrace::__rust_begin_short_backtrace (@std::sys_common::backtrace::__rust_begin_short_backtrace:6)
std::rt::lang_start::{{closure}} (@std::rt::lang_start::{{closure}}:7)
core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once (@std::rt::lang_start_internal:184)
std::panicking::try::do_call (@std::rt::lang_start_internal:183)
std::panicking::try (@std::rt::lang_start_internal:183)
std::panic::catch_unwind (@std::rt::lang_start_internal:183)
std::rt::lang_start_internal::{{closure}} (@std::rt::lang_start_internal:183)
std::panicking::try::do_call (@std::rt::lang_start_internal:183)
std::panicking::try (@std::rt::lang_start_internal:183)
std::panic::catch_unwind (@std::rt::lang_start_internal:183)
std::rt::lang_start_internal (@std::rt::lang_start_internal:183)
std::rt::lang_start (@std::rt::lang_start:13)
main (@main:10)
__libc_start_call_main (@__libc_start_call_main:29)
__libc_start_main_impl (@__libc_start_main@@GLIBC_2.34:43)
_start (@_start:15)

I am also attaching relevant Rust code and disassembled shader SPIR-V code:

main.rs.txt - This is the entrypoint of my main crate, which uses Emu;
stack.rs.txt - This is a module shared between my shader and my main crate, which is used in shader parameter and DeviceBox definitions;
lib.rs.txt - This is the entrypoint of my shader, where the accepted storage buffers are listed;
main_shader_disass.txt - This is the disassembled version of the SPIR-V shader compiled with rust-gpu.

Below are extracts from the above source files, in which the incriminated parameter is declared:

(in main.rs)

    let spirv = SpirvBuilder::new()
        .set_entry_point_name("main")
        .add_param_mut::<[u32]>() // alpha
        .add_param_mut::<[StackSym]>() // stack
        .add_param_mut::<[usize]>() // gives_stack
        .add_param_mut::<[u32]>() // prec_mat
        .add_param::<usize>() // length
        .add_param::<usize>() // chunk_size
        .add_param::<u32>() // term_thresh
        .set_code_with_u8(std::io::Cursor::new(code))?
        .build();
    let c = compile::<Spirv<_>, SpirvCompile, _, GlobalCache>(spirv)?.finish()?;

Segfault happens on the last line.

(in lib.rs)

#[spirv(compute(threads(4)))]
pub fn main(
    #[spirv(global_invocation_id)] id: UVec3,
    #[spirv(storage_buffer, descriptor_set = 0, binding = 0)] alpha: &mut [u32],
    #[spirv(storage_buffer, descriptor_set = 0, binding = 1)] stack: &mut [StackSym],
    #[spirv(storage_buffer, descriptor_set = 0, binding = 2)] gives_stack: &mut [usize],
    #[spirv(storage_buffer, descriptor_set = 0, binding = 3)] prec_mat: &mut [u32],
    #[spirv(storage_buffer, descriptor_set = 0, binding = 4)] length: &mut usize,
    #[spirv(storage_buffer, descriptor_set = 0, binding = 5)] chunk_size: &mut usize,
    #[spirv(storage_buffer, descriptor_set = 0, binding = 6)] term_thresh: &mut u32,
) { // ...

I understand that the issue should be related to NVIDIA's Vulkan implementation, but maybe you know something about this kind of issue. Thank you in advance.

Example fails to compile, because of the quote crate

Hello,

Your project looks very interesting and I wanted to give it ago. I copied your example code on the README and I couldn't compile it. After solving the issue in ticket #27 by using:
em = { git = "https://github.com/calebwin/emu", branch = "dev" }

I get the below output when doing cargo check:

error[E0433]: failed to resolve: could not find `__rt` in `quote`
   --> C:\Users\Pedro\.cargo\git\checkouts\emu-7973979264d9dc07\095942b\emu_macro\src\accelerating.rs:123:66
    |
123 | ...                   .is_ident(&Ident::new("load", quote::__rt::Span::call_site()))
    |                                                            ^^^^ could not find `__rt` in `quote`

error[E0433]: failed to resolve: could not find `__rt` in `quote`
   --> C:\Users\Pedro\.cargo\git\checkouts\emu-7973979264d9dc07\095942b\emu_macro\src\accelerating.rs:169:66
    |
169 | ...                   .is_ident(&Ident::new("read", quote::__rt::Span::call_site()))
    |                                                            ^^^^ could not find `__rt` in `quote`

error[E0433]: failed to resolve: could not find `__rt` in `quote`
   --> C:\Users\Pedro\.cargo\git\checkouts\emu-7973979264d9dc07\095942b\emu_macro\src\accelerating.rs:193:68
    |
193 | ...                   .is_ident(&Ident::new("launch", quote::__rt::Span::call_site()))
    |                                                              ^^^^ could not find `__rt` in `quote`

error[E0433]: failed to resolve: could not find `__rt` in `quote`
   --> C:\Users\Pedro\.cargo\git\checkouts\emu-7973979264d9dc07\095942b\emu_macro\src\accelerating.rs:259:64
    |
259 |                     let ident = Ident::new(&param.name, quote::__rt::Span::call_site());
    |                                                                ^^^^ could not find `__rt` in `quote`

I tried hard-setting the 'quote' crate to a specific version (1.0.1) and got the following message:

error: failed to select a version for `quote`.
    ... required by package `emu_macro v0.1.0 (https://github.com/calebwin/emu?branch=dev#095942ba)`
    ... which is depended on by `em v0.3.0 (https://github.com/calebwin/emu?branch=dev#095942ba)`
    ... which is depended on by `emu-test v0.1.0 (D:\Code\Rust\emu-test)`
versions that meet the requirements `^1.0.2` are: 1.0.3

all possible versions conflict with previously selected packages.

  previously selected package `quote v1.0.1`
    ... which is depended on by `emu-test v0.1.0 (D:\Code\Rust\emu-test)`

failed to select a version for `quote` which could resolve this conflict

Setting quote to 1.0.3 doesn't solve the issue, but it's interesting that 1.0.2 seems to have been yanked from crates.io. Is it possible emu_macro depends on code that is no longer present in 1.0.3?

OS X example build failed

Hello, I am trying the example:

#[macro_use]
extern crate em;
use em::*;

#[gpu_use]
fn main() {
    let mut x = vec![0.0; 1000];

    gpu_do!(load(x)); // move data to the GPU
    
    gpu_do!(launch()); // off-load to run on the GPU

    for i in 0..1000 {
        x[i] = x[i] * 10.0;
    }

    gpu_do!(read(x)); // move data back from the GPU
    
    println!("{:?}", x);
}

Here's the error.

   ...
   Compiling emu_macro v0.1.0
error[E0277]: the trait bound `syn::Expr: std::convert::From<quote::__rt::TokenStream>` is not satisfied
   --> /Users/mrrobb/.cargo/registry/src/github.com-1ecc6299db9ec823/emu_macro-0.1.0/src/passing.rs:337:50
    |
337 |                     let gpu_ident = quote! {gpu}.into();
    |                                                  ^^^^ the trait `std::convert::From<quote::__rt::TokenStream>` is not implemented for `syn::Expr`
    |
    = help: the following implementations were found:
              <syn::Expr as std::convert::From<syn::ExprArray>>
              <syn::Expr as std::convert::From<syn::ExprAssign>>
              <syn::Expr as std::convert::From<syn::ExprAssignOp>>
              <syn::Expr as std::convert::From<syn::ExprAsync>>
            and 35 others
    = note: required because of the requirements on the impl of `std::convert::Into<syn::Expr>` for `quote::__rt::TokenStream`

error: aborting due to previous error

For more information about this error, try `rustc --explain E0277`.
error: Could not compile `emu_macro`.
warning: build failed, waiting for other jobs to finish...
error: build failed

Am I doing something wrong? Thank you.

Example code given in readme doesn't seem to work

Test project: emu_test.zip

Errors (and warning) produced:

error[E0433]: failed to resolve: use of undeclared type or module `futures`
  --> src\main.rs:15:5
   |
15 |     futures::executor::block_on(assert_device_pool_initialized());
   |     ^^^^^^^ use of undeclared type or module `futures`

error[E0433]: failed to resolve: use of undeclared type or module `GlslBuilder`
  --> src\main.rs:23:9
   |
23 |         GlslBuilder::new()
   |         ^^^^^^^^^^^ use of undeclared type or module `GlslBuilder`

error[E0433]: failed to resolve: use of undeclared type or module `futures`
  --> src\main.rs:70:22
   |
70 |     println!("{:?}", futures::executor::block_on(x.get())?);
   |                      ^^^^^^^ use of undeclared type or module `futures`

error[E0412]: cannot find type `GlslCompile` in this scope
  --> src\main.rs:22:31
   |
22 |     let c = compile::<String, GlslCompile, _, GlobalCache>(
   |                               ^^^^^^^^^^^ not found in this scope

warning: unused import: `emu_glsl::*`
 --> src\main.rs:1:5
  |
1 | use emu_glsl::*;
  |     ^^^^^^^^^^^
  |
  = note: `#[warn(unused_imports)]` on by default

error[E0277]: the trait bound `Rectangle: zerocopy::AsBytes` is not satisfied
  --> src\main.rs:18:71
   |
18 |     let mut x: DeviceBox<[Rectangle]> = vec![Default::default(); 128].as_device_boxed()?;
   |                                                                       ^^^^^^^^^^^^^^^ the trait `zerocopy::AsBytes` is not implemented for `Rectangle`
   |
help: trait impl with same name found
  --> src\main.rs:6:10
   |
6  | #[derive(AsBytes, FromBytes, Copy, Clone, Default, Debug)]
   |          ^^^^^^^
   = note: Perhaps two different versions of crate `zerocopy` are being used?
   = note: required because of the requirements on the impl of `zerocopy::AsBytes` for `[Rectangle]`
   = note: required because of the requirements on the impl of `emu_core::boxed::AsDeviceBoxed<[Rectangle]>` for `std::vec::Vec<Rectangle>`

error[E0599]: no method named `get` found for type `emu_core::device::DeviceBox<[Rectangle]>` in the current scope
  --> src\main.rs:70:52
   |
70 |     println!("{:?}", futures::executor::block_on(x.get())?);
   |                                                  --^^^
   |                                                  | |
   |                                                  | this is an associated function, not a method
   |                                                  help: use associated function syntax instead: `emu_core::device::DeviceBox::<[Rectangle]>::get`
   |
   = note: found the following associated functions; to be used as methods, functions must have a `self` parameter
note: the candidate is defined in the trait `emu_core::cache::Cache`
  --> C:\Users\jonat\.cargo\registry\src\github.com-1ecc6299db9ec823\emu_core-0.1.1\src\cache.rs:20:5
   |
20 |     fn get(key: u64) -> Arc<DeviceFnMut>;
   |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   = help: to disambiguate the method call, write `emu_core::cache::Cache::get(x)` instead
   = note: the method `get` exists but the following trait bounds were not satisfied:
           `Rectangle : zerocopy::FromBytes`

error: aborting due to 6 previous errors

Some errors have detailed explanations: E0277, E0412, E0433, E0599.
For more information about an error, try `rustc --explain E0277`.
error: could not compile `emu_test`.

To learn more, run the command again with --verbose.

Questions regarding the project

I am interested in using emu but I have a few questions regarding this project:

Emu is a modern library for low-level, cross-platform GPGPU enabling on-device, reproducible, privacy-focused compute

What does it mean by being privacy-focused compute?

Since this project uses WebGPU, does this run on a web browser? Can it run as a software without running on a web browser?
Is this project stable or Beta?
When using WebGPU, does it have native performance or does it lose performance as it is WebGPU (I apologise for my lack of knowledge in this field)?

Add badges to readme.md

For example:

Benchmarks

Could there be standard benchmarks that also compare against raw WebGPU peeformance? This is important for adoption.

Benchmarks + against WebGPU

Could there be standard benchmarks that also compare against raw WebGPU performance? This is important for adoption.

support for processing strings/byte arrays within the gpu

Are there any plans to provide string or byte array handling capability from within emu kernels?
I believe it would be feasible if there was more support for integer types within emu.
I understand both cuda/opencl provide integer support within kernels.

Thank you for listening.

Add target/ dir in .gitignore

emu/examples/multiply_by_scalar/target/.rustc_info.json

Line 1 in d26bfe7

 {"rustc_fingerprint":15037809574512656097,"outputs":{"1617349019360157463":["___\nlib___.rlib\nlib___.so\nlib___.so\nlib___.a\nlib___.so\n/home/calebwin/.rustup/toolchains/stable-x86_64-unknown-linux-gnu\ndebug_assertions\nproc_macro\ntarget_arch=\"x86_64\"\ntarget_endian=\"little\"\ntarget_env=\"gnu\"\ntarget_family=\"unix\"\ntarget_feature=\"fxsr\"\ntarget_feature=\"sse\"\ntarget_feature=\"sse2\"\ntarget_os=\"linux\"\ntarget_pointer_width=\"64\"\ntarget_vendor=\"unknown\"\nunix\n",""],"15337506775154344876":["___\nlib___.rlib\nlib___.so\nlib___.so\nlib___.a\nlib___.so\n/home/calebwin/.rustup/toolchains/stable-x86_64-unknown-linux-gnu\ndebug_assertions\nproc_macro\ntarget_arch=\"x86_64\"\ntarget_endian=\"little\"\ntarget_env=\"gnu\"\ntarget_family=\"unix\"\ntarget_feature=\"fxsr\"\ntarget_feature=\"sse\"\ntarget_feature=\"sse2\"\ntarget_os=\"linux\"\ntarget_pointer_width=\"64\"\ntarget_vendor=\"unknown\"\nunix\n",""],"1164083562126845933":["rustc 1.33.0 (2aa4c46cf 2019-02-28)\nbinary: rustc\ncommit-hash: 2aa4c46cfdd726e97360c2734835aa3515e8c858\ncommit-date: 2019-02-28\nhost: x86_64-unknown-linux-gnu\nrelease: 1.33.0\nLLVM version: 8.0\n",""]},"successes":{}} 

You don't need to store examples/multiply_by_scalar/target

Unable to run example at readme

Hi! I tried with the example at readme, and it failed to compile with stable rustc.
I tried adding em version 0.3 and from git.

    |
338 |                     i.args.insert(0, gpu_ident);
    |                                      ^^^^^^^^^ the trait `std::convert::From<quote::__rt::TokenStream>` is not implemented for `syn::Expr`
    |
    = help: the following implementations were found:
              <syn::Expr as std::convert::From<syn::ExprArray>>
              <syn::Expr as std::convert::From<syn::ExprAssign>>
              <syn::Expr as std::convert::From<syn::ExprAssignOp>>
              <syn::Expr as std::convert::From<syn::ExprAsync>>
            and 35 others
    = note: required because of the requirements on the impl of `std::convert::Into<syn::Expr>` for `quote::__rt::TokenStream`

CUDA Support

Hey

is it planned to support binding to CUDA in the future?
It would be a great enhancement I think :)

Best regards
Chips

`X server found. dri2 connection failed!`

I'm trying to run the tests for the arithmetic example (env RUST_BACKTRACE=1 cargo test), but am getting undefined behaviour. In most cases a double free is detected and the program crashes:

running 2 tests
X server found. dri2 connection failed!
X server found. dri2 connection failed!
X server found. dri2 connection failed!
X server found. dri2 connection failed!
X server found. dri2 connection failed!
X server found. dri2 connection failed!
X server found. dri2 connection failed!
X server found. dri2 connection failed!
X server found. dri2 connection failed!
X server found. dri2 connection failed!
X server found. dri2 connection failed!
X server found. dri2 connection failed!
X server found. dri2 connection failed!
double free or corruption (top)
error: process didn't exit successfully: `/home/cedwards/git/emu/examples/arithmetic/target/debug/deps/arithmetic-8e93e645dd4fd0c6` (signal: 6, SIGABRT: process abort signal)

futhark

Are you aware of Futhark ?

They let the user write GPU code with highlevel functions (in a dedicated language) that are then optimized and compiled down to openCL or Cuda.

As a long term goal, using a similar approach (or integrating with their optimizer), you could let the user use an iterator (gpu_iter() ?) and optimize the code at compile time.

Example panics at runtime (`COPY_DST` flag)

Hello, running the compute example:

View full code

use emu_core::prelude::*;
use emu_glsl::*;
use zerocopy::*;

#[repr(C)]
#[derive(AsBytes, FromBytes, Copy, Clone, Default, Debug, GlslStruct)]
struct Shape {
  x: u32,
  y: u32,
  w: i32,
  h: i32,
  r: [i32; 2],
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // ensure that a device pool has been initialized
    // this should be called before every time when you assume you have devices to use
    // that goes for both library users and application users
    futures::executor::block_on(assert_device_pool_initialized());

    println!("{:?}", take()?.lock().unwrap().info.as_ref().unwrap());

    // create some data on GPU
    // even mutate it once loaded to GPU
    let mut shapes: DeviceBox<[Shape]> = vec![Default::default(); 1024].as_device_boxed_mut()?;
    let mut x: DeviceBox<[i32]> = vec![0; 1024].as_device_boxed_mut()?;
    shapes.set(vec![
        Shape {
            x: 0,
            y: 0,
            w: 100,
            h: 100,
            r: [2, 9]
        };
        1024
    ])?;

    // compile GslKernel to SPIR-V
    // then, we can either inspect the SPIR-V or finish the compilation by generating a DeviceFnMut
    // then, run the DeviceFnMut
    let c = compile::<GlslKernel, GlslKernelCompile, Vec<u32>, GlobalCache>(
        GlslKernel::new()
            .spawn(64)
            .share("float stuff[64]")
            .param_mut::<[Shape], _>("Shape[] shapes")
            .param_mut::<[i32], _>("int[] x")
            .param::<i32, _>("int scalar")
            .with_struct::<Shape>()
            .with_const("int c", "7")
            .with_helper_code(
                r#"
Shape flip(Shape s) {
    s.x = s.x + s.w;
    s.y = s.y + s.h;
    s.w *= -1;
    s.h *= -1;
    s.r = ivec2(5, 3);
    return s;
}
"#,
    )
    .with_kernel_code(
        "shapes[gl_GlobalInvocationID.x] = flip(shapes[gl_GlobalInvocationID.x]); x[gl_GlobalInvocationID.x] = scalar + c + int(gl_WorkGroupID.x);",
    ),
)?.finish()?;
    unsafe {
        spawn(16).launch(call!(c, &mut shapes, &mut x, &DeviceBox::new(10)?))?;
    }

    // download from GPU and print out
    println!("{:?}", futures::executor::block_on(shapes.get())?);
    println!("{:?}", futures::executor::block_on(x.get())?);
    Ok(())
}

$ cargo run

yields

    Finished dev [unoptimized + debuginfo] target(s) in 0.44s
     Running `target/debug/emu_test`
Limits {
    max_bind_groups: 4,
    max_dynamic_uniform_buffers_per_pipeline_layout: 8,
    max_dynamic_storage_buffers_per_pipeline_layout: 4,
    max_sampled_textures_per_shader_stage: 16,
    max_samplers_per_shader_stage: 16,
    max_storage_buffers_per_shader_stage: 4,
    max_storage_textures_per_shader_stage: 4,
    max_uniform_buffers_per_shader_stage: 12,
    max_uniform_buffer_binding_size: 16384,
    max_push_constant_size: 0,
}
{ name: "Intel(R) Iris(TM) Plus Graphics 655", vendor_id: 0, device_id: 0, device_type: IntegratedGpu }
wgpu error: Validation Error

Caused by:
    In CommandEncoder::copy_buffer_to_buffer
    Copy error
    destination buffer/texture is missing the `COPY_DST` usage flag
      note: destination = `<Buffer-(4, 1, Metal)>`


thread 'main' panicked at 'Handling wgpu errors as fatal by default', /Users/wbrickner/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.7.0/src/backend/direct.rs:1896:5
stack backtrace:
   0: std::panicking::begin_panic
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:616:12
   1: wgpu::backend::direct::default_error_handler
             at /Users/wbrickner/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.7.0/src/backend/direct.rs:1896:5
   2: core::ops::function::Fn::call
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ops/function.rs:70:5
   3: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/alloc/src/boxed.rs:1875:9
   4: wgpu::backend::direct::ErrorSinkRaw::handle_error
             at /Users/wbrickner/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.7.0/src/backend/direct.rs:1883:9
   5: wgpu::backend::direct::Context::handle_error
             at /Users/wbrickner/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.7.0/src/backend/direct.rs:109:9
   6: wgpu::backend::direct::Context::handle_error_nolabel
             at /Users/wbrickner/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.7.0/src/backend/direct.rs:121:9
   7: <wgpu::backend::direct::Context as wgpu::Context>::command_encoder_copy_buffer_to_buffer
             at /Users/wbrickner/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.7.0/src/backend/direct.rs:1542:13
   8: wgpu::CommandEncoder::copy_buffer_to_buffer
             at /Users/wbrickner/.cargo/registry/src/github.com-1ecc6299db9ec823/wgpu-0.7.0/src/lib.rs:1954:9
   9: emu_core::device::Device::get::{{closure}}
             at /Users/wbrickner/.cargo/git/checkouts/emu-7973979264d9dc07/9fe3db3/emu_core/src/device.rs:391:9
  10: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91:19
  11: emu_core::boxed::<impl emu_core::device::DeviceBox<[T]>>::get::{{closure}}
             at /Users/wbrickner/.cargo/git/checkouts/emu-7973979264d9dc07/9fe3db3/emu_core/src/boxed.rs:298:23
  12: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/future/mod.rs:91:19
  13: futures_executor::local_pool::block_on::{{closure}}
             at /Users/wbrickner/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-executor-0.3.21/src/local_pool.rs:315:23
  14: futures_executor::local_pool::run_executor::{{closure}}
             at /Users/wbrickner/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-executor-0.3.21/src/local_pool.rs:90:37
  15: std::thread::local::LocalKey<T>::try_with
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/thread/local.rs:442:16
  16: std::thread::local::LocalKey<T>::with
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/thread/local.rs:418:9
  17: futures_executor::local_pool::run_executor
             at /Users/wbrickner/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-executor-0.3.21/src/local_pool.rs:86:5
  18: futures_executor::local_pool::block_on
             at /Users/wbrickner/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-executor-0.3.21/src/local_pool.rs:315:5
  19: emu_test::main
             at ./src/main.rs:71:22
  20: core::ops::function::FnOnce::call_once
             at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

my understanding is that buffers must have their usage declared correctly (with some amount of detail) at construction time through wgpu.

Unable to get platform id list after 10 seconds of waiting

This code waits 10 seconds and prints the error:

thread 'main' panicked at 'Platform::default(): Unable to get platform id list after 10 seconds of waiting.', src/libcore/result.rs:999:5
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

Code:

use em::{build, emu};

extern crate ocl;
use ocl::{flags, Buffer, Context, Device, Kernel, Platform, Program, Queue};

emu! {
    multiply(global_vector [f32], scalar f32) {
        global_vector[get_global_id(0)] *= scalar;
    }
}

build! { multiply [f32] f32 }

fn main() {
    let args = std::env::args().collect::<Vec<String>>();
    if args.len() < 3 {
        panic!("cargo run -- <SCALAR> <NUMBERS>...");
    }

    let scalar = args[1].parse::<f32>().unwrap();

    let vector = args[2..]
        .into_iter()
        .map(|string| string.parse::<f32>().unwrap())
        .collect();

    let result = multiply(vector, scalar).unwrap();
    dbg!(result);
}

My operating system is Ubuntu and I installed OpenCL by the following commands:

$ sudo apt update
$ sudo apt install ocl-icd-opencl-dev

Rustdoc documentation search bar does not work

The search bar in the rusdoc documentation does not work.

Update ocl to 0.19.3

Because several bugs and UBs were fixed:
cogciprocate/ocl#150
cogciprocate/ocl#142

Selecting OpenCL device?

Hello!

Is there currently any way to manually list and select the OpenCL device?

Thanks! :)

Typo

emu/src/lib.rs

Line 130 in d26bfe7

/// Represents an Emu kerenl within an Emu program

em compilation fails, traits not implemented, and some missing imports?

rustc 1.53.0 (53cb7b09b 2021-06-17)
cargo 1.53.0 (4369396ce 2021-04-27)

Same happens under Windows and Linux (given the errors, I doubted it was an OS level issue)

Steps to recreate:

cargo new emdemo
cd emdemo
echo 'em = "0.3.0"' >> Cargo.toml
cargo build

There's a few errors being reported,
Errors messages follow. The first 4 are similar, all in accelerating.rs:

error[E0433]: failed to resolve: could not find `__rt` in `quote`
   --> /home/paul/.cargo/registry/src/github.com-1ecc6299db9ec823/emu_macro-0.1.0/src/accelerating.rs:123:72
    |
123 | ...                   .is_ident(&Ident::new("load", quote::__rt::Span::call_site()))
    |                                                                  ^^^^ not found in `quote::__rt`
    |
help: consider importing one of these items
    |
3   | use crate::Span;
    |
3   | use crate::__private::Span;
    |
3   | use proc_macro2::Span;
    |
3   | use proc_macro::Span;
    |
      and 3 other candidates

error[E0433]: failed to resolve: could not find `__rt` in `quote`
   --> /home/paul/.cargo/registry/src/github.com-1ecc6299db9ec823/emu_macro-0.1.0/src/accelerating.rs:169:72
    |
169 | ...                   .is_ident(&Ident::new("read", quote::__rt::Span::call_site()))
    |                                                                  ^^^^ not found in `quote::__rt`
    |
help: consider importing one of these items
    |
3   | use crate::Span;
    |
3   | use crate::__private::Span;
    |
3   | use proc_macro2::Span;
    |
3   | use proc_macro::Span;
    |
      and 3 other candidates

error[E0433]: failed to resolve: could not find `__rt` in `quote`
   --> /home/paul/.cargo/registry/src/github.com-1ecc6299db9ec823/emu_macro-0.1.0/src/accelerating.rs:193:74
    |
193 | ...                   .is_ident(&Ident::new("launch", quote::__rt::Span::call_site()))
    |                                                                    ^^^^ not found in `quote::__rt`
    |
help: consider importing one of these items
    |
3   | use crate::Span;
    |
3   | use crate::__private::Span;
    |
3   | use proc_macro2::Span;
    |
3   | use proc_macro::Span;
    |
      and 3 other candidates

error[E0433]: failed to resolve: could not find `__rt` in `quote`
   --> /home/paul/.cargo/registry/src/github.com-1ecc6299db9ec823/emu_macro-0.1.0/src/accelerating.rs:259:70
    |
259 |                     let ident = Ident::new(&param.name, quote::__rt::Span::call_site());
    |                                                                      ^^^^ not found in `quote::__rt`
    |
help: consider importing one of these items
    |
3   | use crate::Span;
    |
3   | use crate::__private::Span;
    |
3   | use proc_macro2::Span;
    |
3   | use proc_macro::Span;
    |
      and 3 other candidates

   Compiling num-bigint v0.1.44

Then the 5th is quite different:

error[E0277]: the trait bound `syn::Expr: From<TokenStream2>` is not satisfied
   --> /home/paul/.cargo/registry/src/github.com-1ecc6299db9ec823/emu_macro-0.1.0/src/passing.rs:337:50
    |
337 |                     let gpu_ident = quote! {gpu}.into();
    |                                                  ^^^^ the trait `From<TokenStream2>` is not implemented for `syn::Expr`
    |
    = help: the following implementations were found:
              <syn::Expr as From<ExprArray>>
              <syn::Expr as From<ExprAssign>>
              <syn::Expr as From<ExprAssignOp>>
              <syn::Expr as From<ExprAsync>>
            and 35 others
    = note: required because of the requirements on the impl of `Into<syn::Expr>` for `TokenStream2`

Some tests are not passing on my system

Hi all,

I am trying to execute the tests in the examples. I did tried running tests in integration, arithmetic example folders, in both the cases my system is failing the tests.

Is this anything to do with the crate or am I missing some library?

Compiling error

Hi everyone! This project ia awesome. I was tring run example code, but I got error:
let gpu_ident = quote! {gpu}.into(); ->
^^^^ the trait std::convert::From<quote::__private::TokenStream> is not implemented for syn::Expr
this error was in passing.rs file.
Help me please to understand how it works?
p.s. my OS is Windows 10 64 bit, Rust has 1.42.0 version, and I have installed OpenCL driver (my video card is Nvidia GeForce 1070).

README instructions have various issues

As mentioned in #31 , I'm migrating my project to use the new emu_core GLSL abstraction layer and I am encountering a few documentation issues getting started.

In the dependencies section, it is said that this is how you add emu to a project:

emu_core = {
    git = "https://github.com/calebwin/emu/tree/master/emu_core.git",
    rev = "265d2a5fb9292e2644ae4431f2982523a8d27a0f"
}

Newlines inside of a dep key in Cargo are invalid, and also this isn't a valid Git URL. Currently, the only way I see to use emu_core is to clone the whole emu repository and then use a path dependency. In workspace setups, typically the individual crates are published to crates.io individually, avoiding this issue.

Additionally, there's a discord link at the bottom of the README that I was going to use to address the above question, but the Discord invite is invalid.

The only person who can fix these issues would be @calebwin so that's why I'm filing an issue instead of a PR. Thanks again for the crate -- I'll be using it with the cloned dependency for now.

Logo is an ostrich not an emu

Unfortunately the logo is clearly an ostrich not an emu.

Implementing more traits for `DeviceBox`

We should implement several additional traits for DeviceBox<T> to make it more useful.

Clone - for duplicating data on the GPU
Hash - for parallel hashing
Eq and PartialEq - for parallel equality checking
AcceleratedIterator - we should define a new trait for doing high-level iterator-based computation

Work group sizes greater than 32 don't work on Intel integrated GPU

I don't know if other people can replicate this. But at least on my system, I can't use a local work group size of anything greater than 32.

How a user runs an Emu function

A function in Emu operates on a "work-item" (work-item is a term OpenCL uses; I loosely use it here but we can refer to it differently if we come up with a better name).

multiply(global_buffer [f32], scalar f32) {
	global_buffer[get_global_id(0)] *= scalar;
}

With the above function, a work-item corresponds to a particular index in global_buffer. So the work can be thought of as a 1d grid with dimensions equal to the length of global_buffer. Let's consider another function.

multiply_matrices(m i32, n i32, k i32, global_a [f32], global_b [f32], global_c [f32]) {
	let row: i32 = get_global_id(0);
	let col: i32 = get_global_id(1);

	let acc: f32 = 0.0;
 
	for i in 0..k {
		acc += global_a[i*m + row] * global_b[col*k + i];
	}
     
	global_c[col * m + row] = acc;
}

When this function is run, a work-item corresponds to a pair of indices - one in global_a and one in global_b. So the work in this case is a 2d grid with dimensions equal to the product of the lengths of global_a and global_b.

Now here's the thing - both of these functions can be ultimately run with a binding to OpenCL. But only the first function can be run with the build! macro. This is because functions you intend to run with the build! macro operate on 1d grids of work where the dimension is by default the length of the first parameter to the function.

This is an important thing to note and I think it can help us answer the following key questions.

How should Emu functions be ultimately called by a user?
How should a user be using get_global_id()?
A user has a bunch of data - how do we support mapping and filtering and reducing?

Use gitflow

Hi!
Your library is getting more and more popular.
Would you please use gitflow? At least would you please protect master branch from pushing into it without pull requests even for administrators? It will help people to track your progress and it will help you not to screw up undeliberately by pushing unwanted code into master branch.

Best regards

Current Project Status Update?

Hey there! I was curious what the current project status is, as it hasn't been updated in a few years.

I totally understand if you just ran out of time for the project or something like that, but I was curious what your thoughts on the current code state is. Do you feel like the direction was right, and might just need to be updated to later versions of WGPU or something, or do you feel like there are better options now, or maybe the original plan didn't go as well as immagined?

That kind of stuff. :)

I'm thinking of adding a WGPU-powered backend to burn for machine learning, and Emu seems to have a lot of the right goals, but I'm not sure if it'd be better to use WGPU directly, since there hasn't been a lot of activity here for a while.

How to pass a 2D array of floats?

How can I pass a 2D array of floats?

Preparing any sort of DeviceBox from a Vec<Vec<f32>> is a no-go it seems.

The dimensions of the vector are compile-time constants from the perspective of GLSL (they get formatted in).
The dimensions are determined at runtime on the rust side.

Can I just flatten into a single buffer and the GLSL code won't notice?

Support for structs and typechecking?

I wanted to ping you and make you aware of this project having raised quite some interest on reddit. There are two interesting questions, one about potential struct support, and the other about type checking/inference before translation.

Maybe they are of interest to you? See the reddit thread here: https://www.reddit.com/r/rust/comments/bvwvpd/emu_gpu_programming_language_for_rust/

Structs as input to functions

The only things Emu functions can really accept right now are vectors (technically arrays/pointers) and primitive data (f32 or i32). Simple structures could be accepted with 2 changes.

-A change to the language so you can declare what kind of structs you accept and how to unpackage primitive data from them in the declaration of the Emu function
-A change to the build! macro to generate a function that can accept structs of a certain type and unpackage them into primitive data to send to the Emu function.

Before these changes are implemented, we should think about how the general interface to an Emu user should change. How should they pass structs to functions in a way that is most seamless.

Important internal optimizations, potential bug

There are several internal things that make Emu's performance potentially suboptimal. This issue is a place to discuss them.

wgpu::Device::poll is used here and right now it blocks in an async context. I'm not sure what the solution is but there is some discussion here.
right now, we use a different staging buffer for every DeviceBox. This should be replaced by some sort of global pool of staging buffers which is shared by all DeviceBoxs.

Edit: the Device::poll thing is probably not an optimization but a bug in the library

Khronos ML summit

Hello,

I am the chair of a working group at Khronos developping standards for graphics and compute hardware accelerators (as well as a representative for Nvidia in those groups).

We are organizing a summit to gather feedback from the ecosystem, and influence the design of the next round of improvements, and I thought that you may be interested in being represented:
https://www.khronos.org/events/2021-invitation-to-the-khronos-machine-learning-summit

This summit is IP free, and will let you present your project and your needs for improvements in the ML ecosystem, as well as hearing from other companies.

regards,

Pierre Boudier
Software architect at Nvidia
chair of the Machine Learning TSG at Khronos

Lost [] in readme example

emu/README.md

Line 16 in d26bfe7

 move_particles(global_particles_x [f32], global_particles_y f32, global_particles_z f32) { 

For global_particles_y and global_particles_z.

coreaudio-sys AudioUnit compile error on Windows & Linux

I get this in WSL2 Debian on Windows

error[E0455]: native frameworks are only available on macOS targets
    --> /home/walther/.cargo/registry/src/github.com-1ecc6299db9ec823/coreaudio-sys-0.1.2/src/audio_unit.rs:6380:1
     |
6380 | #[link(name = "AudioUnit", kind = "framework")]
     | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error[E0455]: native frameworks are only available on macOS targets
    --> /home/walther/.cargo/registry/src/github.com-1ecc6299db9ec823/coreaudio-sys-0.1.2/src/audio_unit.rs:6739:1
     |
6739 | #[link(name = "AudioUnit", kind = "framework")]
     | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error: aborting due to 2 previous errors

For more information about this error, try `rustc --explain E0455`.
error: could not compile `coreaudio-sys`.

As well as on Windows itself

file io operations from within emu kernel functions

I read about gpufs. It has a capability to access files from within gpu kernel functions. Are there any plans to provide file io apis from within emu's gpu kernel functions? i.e. open file, create file, write/read, close.

Thank you for listening.

calebwin / emu Goto Github PK

emu's People

Contributors

Stargazers

Watchers

Forkers

emu's Issues

Recommend Projects

Recommend Topics

Recommend Org