Giter Club home page Giter Club logo

stealth-paint's Introduction

Stealth-paint

A library for common image operations, so quick and embeddable that you might barely notice it running. At least that's the goal. The main idea is to use pre-built GPU pipelines, the same you will find used in video games, but to wrap it into an interface more familiar from CPU-based methods.¹

How to run

Warning: The test suite checks for pixel-accurate results by doing a CRC check on the contents of images. It is somewhat likely that those fail on your machine as there are allowed differenced in the exact floating point math. This, for example, affects rotated images sampled with Nearest pixels as well as buffer stores relying on the driver to perform sRGB encoding, float-scaling, interpolation of vertex attributes across fragments etc.

Run on the first time, then run:

STEALTH_PAINT_BLESS=1 cargo test --release
cargo test

As an added benefit, the first call will produce a debug version of all test results in the form of png images within the tests/debug folder.

Otherwise, see the documentation on: http://docs.rs/

Project goals

AM/FM AM/FM is an engineer's term distinguishing the inevitable clunky real-world faultiness of "Actual Machines" from the power-fantasy techno-dreams of "Fucking Magic." (Source: Turkey City Lexicon)

Without naming specific other solutions, relying on magic for your image processing needs invites the risk of many CVEs, a world of painful configuration, and overall slowness. Avoid all of this by relying on safe Rust, nice embedding, hardware acceleration and an optimizing execution engine.

Initially and for the forseeable future we will require at least some device and driver supported by any of wgpu's backend. We will try to keep the basic interface agnostic of that specific though and might offer a pure CPU-based solution or SIMD acceleration later, essentially emulating the Vulkan API and forgoing the shader compilation/upload to gpu memory. But then again, we might expect a general purpose CPU-based drop-in Vulkan implementation on the platform. That's not yet decided.

The other more interesting point to resolve at some future version is the case of external memory/pixel container, as necessary for pipelines operation on images too large for any of the computer's memories.

We will also try to make it 'nearly realtime' (cough) in that the main execution of a program should be free of infinite cycles, instead based on step-by-step advance (with hopefully bounded time by the underlying driver) and fuel based methods, as well as providing ahead of time estimates and bounds on our resource usage. While we're not close to it yet, at least this is part of the API reasoning and it's worthy of a bug report if some system actively makes it impossible.

How it works

This project never had the goals of being a cairo alternative. Learning from graphics interfaces, we rely on a declarative and ahead-of-time specification of your operations pipeline. We also assume that any resources (except temporary inputs and outputs byte buffers) are owned by a library object. This has the advantage that we might reuse memory, have intermediate results that are never backed by CPU accessible memory, may change certain layouts and sampling descriptors on the fly, and can plan execution steps and resource utilization transparently.

What it is not

¹Such as ImageMagick whose enormous mix of decoding/encoding/computing tasks and half-baked HW acceleration the author personally views as a crime against color science accuracy (if your composition library starts out with 'gamma correction' as an available color operation you have no idea what you're doing), resource efficient computation, web-server security, and several software principles (though not against doing a decent job a doing tons of stuff).

In particular there will be no IO done by this library. Unless we get a OS agnostic disk/memory-to-GPU-memory API, in which case we might relax this to perform some limited amount of pre-declared decoding work in compute shaders if this leads to overwhelming efficiency gains in terms of saved CPU cycles or parallelism. Again, we will even then require the caller to very strictly setup the transfer channel itself such as the binding of File Descriptors, avoid any operation requiring any direct permissions checks (i.e. use of our process personality and credentials).

Future ideas

I'm really grateful if you pick any of the below. Feel free to grab one that tickles your interest. (And see 'Project goals' for less concrete tasks).

Cool stuff with WASM as computation

ImageMagick offers some generic formula application with some custom language. I don't want to do this but see the purpose of an image transformation language that is not specific to any API. Why not use WASM for this? This would allow writing and compiling such code in any other (runtime-free) language first. If you feel like inventing such a language and writing a compiler to SPIR-V for it then feel free to make it and then to PR it.

Similarity measures

Edge detection

Noise constructors

Just, any. Although keep in mind to use a deterministic method in order to stay reproducible. Ever wanted to create mega-bytes of pseudo-randomness with a fragment shader call?

stealth-paint's People

Contributors

heroickatora avatar worldsender avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

worldsender

stealth-paint's Issues

Output and input a texture without moving it to host memory

The Pool's images have already been coded with the capability of administering GPU-owned textures and buffers in mind but this has no user-accessible interface to construct those. It's also unclear how and when we need to ensure that the device owning such buffers is the same as launching a program as well as what we can do in the case that we would need to transfer it. There also needs to be a separate Output operation that can be distinguished from the existing one in that one can not save the image to disk and not readily retrieve a host buffer, as doing so would require a command queue call.

This might pave the path for supporting textures in GPU-only (DRM-encumbered) textures but I'm not sure if I like that or whether the overhead of handling the additional flags feels worth it.

The output function ignores its argument

It only supports one actual argument now, which was good enough for testing but does not behave as documented. It should instead keep the map between output registers and internal buffers and return the correct buffer.

Allow restarting of a compiled Execution

One of the initial goals was, in particular, an efficient pipeline by performing some initialization and compilation optimization ahead of time. As part of this we should allow restarting a compiled Execution with renewed inputs, skipping the last compilation step. This is specifically meant to target batch processing/post processing pipelines where all input descriptors are preserved. Due to the design it does not work if any of the sizes change.

Support planar binary image layouts

Support planar layouts as input/output. It's explicitly not the goal to support them as the internal representation. Currently, each texture can be represented with up to five buffers / textures on the GPU side with possible data transfer as indicated.

inp_buffer (host mappable)
  |*
  v
  quantized buffer <---> staging texture <---> linear access texture
  |*
  v
out_buffer (host mappable)

Each of the textures on the way performs some form of normalization. The copy between inp_buffer and quantized_buffer –and respectively the other way to out_buffer–will currently always involve a copy_buffer_to_buffer command. However, we could add another intermediate buffer and a compute shader to the two operations marked with (*). (Note: if we do have the feature MAPPABLE_PRIMARY_BUFFERS then this isn't necessary as we can mark the inp/out buffer attachable while using them for their primary purpose of host IO. However, in those cases we might have also elided the original input and output buffers because similarly we don't need them for transferring simple textures).

This compute shader would have the primary purpose of normalizing any planar texture to a block-based rectangular texture. This implies that each so supported planar layout must have an equivalent block layout where the only difference is the order of texel components in memory, with the exact same bit width in components. This restriction doesn't seem too bad. For byte-sized components such a layout should almost always exist anyways and for layouts where one particular plane contains very compressed parts (e.g. 1-bit alpha or 1-bit b/w) we could also introduce additional block layouts with a small number of uninterpreted bits. The architecture comfortably permits such uint32 encoded blocks to be added to stage.frag without too much hassle.

Reuse of buffers and textures

In the first stage of compilation (Commands to Program) we perform lifetime analysis on registers. We could use this information to determine a set of buffers and textures that may be reused in a later texture, avoiding the overhead of allocating it separately.

Probably broken wasm execution

stealth-paint/src/program.rs

Lines 1121 to 1183 in 6ce2e54

pub(crate) fn block_on<F, T>(future: F, device: Option<&wgpu::Device>) -> T
where
F: Future<Output = T> + 'static,
T: 'static,
{
#[cfg(target_arch = "wasm32")]
{
use core::cell::RefCell;
use std::rc::Rc;
async fn the_thing<T: 'static, F: Future<Output = T> + 'static>(
future: F,
buffer: Rc<RefCell<Option<T>>>,
) {
let result = future.await;
*buffer.borrow_mut() = Some(result);
}
let result = Rc::new(RefCell::new(None));
let mover = Rc::clone(&result);
wasm_bindgen_futures::spawn_local(the_thing(future, mover));
match Rc::try_unwrap(result) {
Ok(cell) => match cell.into_inner() {
Some(result) => result,
None => unreachable!("In this case we shouldn't have returned here"),
},
_ => unreachable!("There should be no reference to mover left"),
}
}
#[cfg(not(target_arch = "wasm32"))]
{
if let Some(device) = device {
// We have to manually poll the device. That is, we ensure that it keeps being polled
// and each time will also poll the device. This isn't super efficient but a dirty way
// to actually finish this future.
struct DevicePolled<'dev, F> {
future: F,
device: &'dev wgpu::Device,
}
impl<F: Future> Future for DevicePolled<'_, F> {
type Output = F::Output;
fn poll(
self: core::pin::Pin<&mut Self>,
ctx: &mut core::task::Context,
) -> core::task::Poll<F::Output> {
self.as_ref().device.poll(wgpu::Maintain::Poll);
// Ugh, noooo...
ctx.waker().wake_by_ref();
let future = unsafe { self.map_unchecked_mut(|this| &mut this.future) };
future.poll(ctx)
}
}
async_io::block_on(DevicePolled { future, device })
} else {
async_io::block_on(future)
}
}
}

According to source the spawn_local returns immediately, and we can't wait for its completion.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.