vivekpanyam / carton Goto Github PK

View Code? Open in Web Editor NEW

417.0 417.0 11.0 11.25 MB

Run any ML model from any programming language.

Home Page: https://carton.run

License: Apache License 2.0

JavaScript 0.78% Rust 89.76% Python 4.55% Dockerfile 0.14% Shell 0.03% C 0.68% C++ 4.06%

carton's People

Contributors

Stargazers

Watchers

Forkers

hbcbh1999 bramhoven partnerise drahnr ericxsun lei-rs automationkit arthurmelton mbrukman spread0x rheehot

carton's Issues

C# bindings

Looks like a very interesting project especially if it has bindings for most of the popular programming languages.

I saw on the site that you were planning to have bindings for C#. I have never created such a project to bind Rust to C#, but I would love to give it a shot.

Cannot the quickstart nor any of the examples on https://carton.pub/

I cannot seem to run any of the examples shown on https://carton.pub/.

For example I tried bert-base-uncased:

import cartonml as carton

# A permalink to the model
MODEL_URL = "https://carton.pub/google-research/bert-base-uncased/5f26d87c5d82b7c37ebf92fcb38788a063d49a64cfcf1f9d118b3b710bb88005"

async def main():
    # Load the model
    model = await carton.load(MODEL_URL)

    # Set up inputs
    inputs = {
        "input": await model.info.examples[0].inputs["input"].get(),
    }

    # Run the model
    results = await model.infer(inputs)

    # Print the results
    print(results)

    # Print the expected results
    print({
        "tokens": await model.info.examples[0].sample_out["tokens"].get(),
        "scores": await model.info.examples[0].sample_out["scores"].get(),
    })

import asyncio
asyncio.run(main())

And the result I get is:

(carton-tests) ➜  carton-tests python main.py
Request: 0 67584
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: Custom("invalid value: integer `1281`, expected variant index 0 <= i < 6")', /app/source/carton-runner-interface/src/do_not_modify/framed.rs:41:59
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I also tried distilbert-base-cased-distilled-squad and that gave the same response.

Maybe I am doing something wrong or I might be missing some dependencies, but I haven't been able to figure it out yet.

What language would you like Carton to support next?

Please vote in the poll in #157!

Implement SHM serialization for Tensors

Maybe also make Handle serialize differently based on the size of the tensor? For large tensors, it uses shared memory. For small ones it serializes them inline? Make Handle a variant so we can choose dynamically.

C++ Bindings

I'm actively working on C++ bindings and will update this issue once PRs are ready.

Feel free to subscribe to this issue to be notified of progress.

WASM Interface

While I am workin on finishing up #173, I drafted up some stuff here which allows us to wrap the user's infer implementation with custom logic. I don't know if this would actually work though. For the user it would look like:

use carton_wasm_interface::infer;

#[infer]
fn infer(in_: Vec<(String, Tensor)>) -> Vec<(String, Tensor)> {
    // ...
}

The reason this work around is needed is because the .wit interface needs to be implemented in the same place the bindings are generated. Now we can implement stuff like conversions for candle and returning a pointer (and managing it's lifetime) easily.

Let me know if you have any thoughts! I'm hoping it makes developing wasm models more ergonomic.

Implement Python runner

Support reading examples, test data, and misc data from a carton file

Create a docs website

Ludwig support

Adding support for Ludwig seems like it should be fairly straightforward. Ludwig supports export to TorchScript and we already have a TorchScript runner.

Specifically, it would probably make sense to create something similar to the export_neuropod utility in Ludwig.

This would involve:

Modifying it to generate inputs and outputs in a format that Carton supports
Modifying the GeneratedInferenceModule to work with Carton

This can be done entirely in Python code so if you want to contribute, but you're not familiar with Rust, this is a good option!

Investigate logging during runner shutdown

In some cases, worker threads panic when we try to send log messages to the main process during runner shutdown.

This doesn't really have a practical impact (as it's contained to the runner process and only happens after communication with the main process stops), but it can make stdout/stderr confusing because it looks like something important broke.

WASM support

How easy would it be to add support for Rust based libraries like Candle and Burn. I'd like to implement this if you aren't already working on it. I'd also appreciate your thoughts on whether this integration is even necessary or useful, since both packages allow you to compile everything down. Maybe it would make more sense, to instead create runners for the formats those libraries can produces, like binaries, wasm, and executables.

Crate `carton_window` not found on `crates.io`

[dependencies]
carton = "0.1.0"

$ cargo run
    Updating crates.io index
error: no matching package named `carton_window` found
location searched: registry `crates-io`
required by package `carton v0.1.0`
    ... which satisfies dependency `carton = "^0.1.0"` of package ...

Python bindings logging

It seems like some logs below INFO don't go to the python logging system (or aren't appearing for some reason on the python end) even though others do. Dig into this more

Figure out how to run tests on WASM builds

Implement `LINKS` resolution/OverlayFS

XLA runner (to support JAX)

Background

XLA is an ML "compiler for GPUs, CPUs, and ML accelerators."

Carton support for XLA would primarily be used to provide JAX support, but in theory it could also support some PyTorch and TensorFlow models.

Here's a guide on how to export a JAX model from Python and run it from C++ using XLA: google/jax#5337 (comment).

This is an example of the above in the JAX codebase.

I've explored doing this in the past (outside of Carton), but there weren't XLA prebuilt binaries available and it required building from source in the TensorFlow repo. Now, with OpenXLA and prebuilt binaries, this is a lot easier.

@LaurentMazare created rust bindings to XLA that include a straightforward example of loading the HLO IR generated by the JAX export code. That should make it fairly easy to prototype an integration with Carton if anyone is interested in doing so.

Implementation

Concretely, this could be implemented as follows:

A Python utility in the Python bindings that accepts a function, does some of what jax_to_ir.py does and calls jax.xla_computation
A new Carton runner crate that uses the XLA crate linked above to load and run the HLO

Ensure we use buffering in IO critical paths

Add buffering to uses of tokio::io::copy that aren't yet using it. Also explore places where we're using slowlog to make sure we're not missing buffering somewhere

Add logging during runner download and extraction

This makes the first-run experience better

Add args and kwargs to infer

This lets us handle positional args for frameworks that support it

Pass through custom error types in the Node and Python bindings

Write more docs

Implement runner fetching/installation

ONNX support

There are many different ways of running an ONNX model from Rust:

tract

"Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference".

Notes:

By Sonos
It does appear to support many platforms including WASM
I don't think it supports GPUs

wonnx

"A WebGPU-accelerated ONNX inference run-time written 100% in Rust, ready for native and the web"

Notes:

This uses wgpu under the hood so it supports a lot of platforms
Importantly, this supports WASM and WebGPU.
I'm unclear on how strong its CPU inference support is. wgpu supports Vulkan and there are software implementations of it (e.g. SwiftShader), but not sure how plug-and-play it is.
Can it run in WASM without WebGPU?

ort

"A Rust wrapper for ONNX Runtime"

Notes:

Rust bindings for the official ONNX Runtime
Seems to be used in prod
Doesn't appear to support WASM yet. The underlying runtime does support it so maybe that's coming soon. There's an issue about it with recent activity.

If we're going to have one "official" ONNX runner, it should probably use ort. Unfortunately, since ort doesn't have WASM support, we need another solution for running from WASM environments.

This could be:

One "official" ONNX runner for Carton that uses ort on desktop, tract on WASM without GPU, and wonnx on WASM with GPUs. This seems like a complex solution especially because they don't all support the same set of ONNX operators.
Use tract everywhere, but don't have GPU support
Use wonnx everywhere, but require GPU/WebGPU

@kali @pixelspark @decahedron1 If you get a chance, I'd really appreciate any thoughts you have on the above. Thank you!

Investigate using serde in the bindings

Most of the code in language bindings just does type conversion. If you squint a bit, this fits into serde's definition of serialization/deserialization.

You could implement a serde::Serializer that "serializes" things into PyO3/Neon objects and a serde::Deserializer that does the opposite.

People have done this and implemented things like neon-serde and pythonize. This would significantly simplify boilerplate code in the language bindings, make things easier to maintain, and also make it easier to add support for new languages.

Finish connecting `carton` and `carton-runner-interface`

Handle `required_platforms` when loading a model

Actually use this field when loading a model:

carton/docs/specification/format.md

Line 52 in 8005f38

required_platforms = []

[Rust] Quick start example throw Broken pipe error via WSL

Source

https://carton.run/quickstart

Error

{"tokens": String([["day"]], shape=[1, 1], strides=[1, 1], layout=CFcf (0xf), dynamic ndim=2), "scores": Float([[14.551311]], shape=[1, 1], strides=[1, 1], layout=CFcf (0xf), dynamic ndim=2)}
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }', /app/source/carton-runner-interface/src/do_not_modify/framed.rs:71:38
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: SendError(RPCResponse { id: 0, complete: true, data: LogMessage { record: LogRecord { metadata: LogMetadata { level: Trace, target: "mio::poll" }, args: "deregistering event source from poller", module_path: Some("mio::poll"), file: Some("/root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-0.8.5/src/poll.rs"), line: Some(663) } } })', /app/source/carton-runner-interface/src/server.rs:299:36

Any hint?

C Bindings

The initial implementation of the C bindings is in #169

Feel free to subscribe to this issue/any of the PRs above to be notified of progress.

Finish implementing `AsyncSeek`, `AsyncRead`, and `AsyncWrite` in Anywhere

OCaml Bindings

@arthurmelton is working on OCaml bindings in #166

Feel free to subscribe to this issue to be notified of progress.

Support passing in all `LoadOpts` from nodejs

Investigate sparse tensor support

Improve memory usage when packing a model

The previous zip file library we used during packing required complete files to be available before they could be stored. This required us to load large (possibly multi GB) files into memory.

This is no longer required. The following two places within the packing code can be refactored to read, compute sha256, and store files in a streaming/incremental fashion:

carton/source/carton/src/format/v1/save.rs

Lines 259 to 279 in 33cc183

 // Load the data and compute the sha256 

 let mut hasher = Sha256::new(); 

 let data = tokio::fs::read(entry.path()).await.unwrap(); 

 hasher.update(&data); 

 let sha256 = format!("{:x}", hasher.finalize()); 

 manifest_contents.insert(relative_path.clone(), Some(sha256)); 

 // Add the entry to the zip file 

 writer = tokio::task::spawn_blocking(move || { 

 writer 

 .start_file( 

 relative_path, 

 zip::write::FileOptions::default() 

 .compression_method(zip::CompressionMethod::Zstd), 

 ) 

 .unwrap(); 

 writer.write_all(&data).unwrap(); 

 writer 

 }) 

 .await 

 .unwrap();

carton/source/carton/src/format/v1/save.rs

Lines 354 to 390 in 33cc183

 // Load the data and compute the sha256 

 let mut hasher = Sha256::new(); 

 let data = tokio::fs::read(entry.path()).await.unwrap(); 

 log::trace!("Done reading file {}", &relative_path); 

 let (data, sha256) = tokio::task::spawn_blocking(move || { 

 hasher.update(&data); 

 (data, format!("{:x}", hasher.finalize())) 

 }) 

 .await 

 .unwrap(); 

 log::trace!("Computed sha256 of {}", &relative_path); 

 // Only store the file in the zip if (1) we don't have any linked files or (2) the linked files don't include this sha256 

 if linked_files 

 .as_ref() 

 .map_or(true, |v| !v.urls.contains_key(&sha256)) 

 { 

 // Add the entry to the zip file 

 let relative_path = relative_path.clone(); 

 writer = tokio::task::spawn_blocking(move || { 

 writer 

 .start_file( 

 relative_path, 

 zip::write::FileOptions::default() 

 .compression_method(zip::CompressionMethod::Zstd) 

 .large_file(data.len() >= 4 * 1024 * 1024 * 1024), 

 ) 

 .unwrap(); 

 writer.write_all(&data).unwrap(); 

 writer 

 }) 

 .await 

 .unwrap(); 

 }

Make String tensors in the bindings more robust

Remove special casing for `carton.pub` now that Carton is open source

Now that Carton is open source (and the websites are public), we can remove this special case for carton.pub:

carton/source/carton/src/http.rs

Lines 88 to 93 in 33cc183

 // Temporary workaround while the site is not public 

 let mut parsed = Url::parse(&url).unwrap(); 

 if parsed.host_str() == Some("carton.pub") { 

 parsed.set_host(Some("dl.carton.pub")).unwrap(); 

 parsed.to_string() 

 } else {

That change lets us remove this test as well:

carton/source/carton/src/carton.rs

Lines 353 to 369 in 33cc183

 /// This tests a subdomain of carton.pub to exercise a different code path 

 /// We can remove this once the special case for carton.pub in `http.rs` is removed 

 #[tokio::test] 

 async fn test_other_domain() { 

 let _ = env_logger::builder() 

 .filter_level(log::LevelFilter::Info) 

 .filter_module("carton", log::LevelFilter::Trace) 

 .is_test(true) 

 .try_init(); 

 let start = Instant::now(); 

 let _info = 

 super::Carton::get_model_info("https://assets.carton.pub/manifest_sha256/0851b8cbda75c2f587c4c2a832c245575330a65932b9206f6e70391b78032c51") 

 .await 

 .unwrap(); 

 println!("Loaded info in {:#?}", start.elapsed()); 

 }

If you're looking for a quick task to get started with the codebase, this is a good option!

Rust quickstart appears incomplete

When trying the quickstart with Rust, I got several compilation errors:

error[E0433]: failed to resolve: use of undeclared crate or module `ndarray`
  --> src/main.rs:14:15
   |
14 |     let arr = ndarray::ArrayD::from_shape_vec(
   |               ^^^^^^^ use of undeclared crate or module `ndarray`

error[E0433]: failed to resolve: use of undeclared type `Tensor`
  --> src/main.rs:22:37
   |
22 |         .infer([("input_sequences", Tensor::<GenericStorage>::String(arr))])
   |                                     ^^^^^^ use of undeclared type `Tensor`
   |
help: consider importing this enum
   |
1  + use carton::types::Tensor;
   |

error[E0412]: cannot find type `GenericStorage` in this scope
  --> src/main.rs:22:46
   |
22 |         .infer([("input_sequences", Tensor::<GenericStorage>::String(arr))])
   |                                              ^^^^^^^^^^^^^^ not found in this scope
   |
help: consider importing this struct
   |
1  + use carton::types::GenericStorage;
   |

error[E0433]: failed to resolve: use of undeclared crate or module `ndarray`
  --> src/main.rs:15:9
   |
15 |         ndarray::IxDyn(&[1]),
   |         ^^^^^^^ use of undeclared crate or module `ndarray`

error[E0752]: `main` function is not allowed to be `async`
 --> src/main.rs:4:1
  |
4 | async fn main() {
  | ^^^^^^^^^^^^^^^ `main` function is not allowed to be `async`

Some errors have detailed explanations: E0412, E0433, E0752.
For more information about an error, try `rustc --explain E0412`.
error: could not compile `carton-test2` (bin "carton-test2") due to 5 previous errors

To resolve, I had to add a couple crates:

cargo add -F macros,rt-multi-thread tokio
cargo add ndarray

I also needed to make a few small code changes:

@@ -1,6 +1,9 @@
 use carton::Carton;
+use carton::types::GenericStorage;
 use carton::types::LoadOpts;
+use carton::types::Tensor;

+#[tokio::main]
 async fn main() {
     // Load the model
     let model = Carton::load(

Implement the ability to pack a Carton

Create a model repository

Decide on an approach for when to use `#[non_exhaustive]`

Figure out ragged/nested tensors

Add "standard" problem definitions/interfaces and a way to check them

e.g. a standard definition for a model that does translation, one for summarization, one for image infill, etc.

This way, models can be drop in replacements of each other (at least on the inference path; loading may require different options).

Versioning for definitions? Do the standard definitions have to be "special" (i.e. specially handled in the library or the registry website) or can we handle any definitions?

Maybe each task in the public registry has a "standard interface" that models can adopt

Investigate CI build timings

When building on a 2 vCPU instance (both arm and x86) using buildkite and agents on AWS, build and test takes ~55 min even with sccache.

This is much slower than GH actions builds were. Explicitly caching the target dir and some of .cargo might make it a lot faster, but it would be simpler if just sccache worked.

As a workaround, linux CI now runs on 32 vCPU instances. This isn't ideal, but it's good enough for now.

A good first step to improve this might be to build with --timings in CI and explore from there

	// Load the data and compute the sha256
	let mut hasher = Sha256::new();
	let data = tokio::fs::read(entry.path()).await.unwrap();
	hasher.update(&data);
	let sha256 = format!("{:x}", hasher.finalize());
	manifest_contents.insert(relative_path.clone(), Some(sha256));

	// Add the entry to the zip file
	writer = tokio::task::spawn_blocking(move \|\| {
	writer
	.start_file(
	relative_path,
	zip::write::FileOptions::default()
	.compression_method(zip::CompressionMethod::Zstd),
	)
	.unwrap();
	writer.write_all(&data).unwrap();
	writer
	})
	.await
	.unwrap();

	// Temporary workaround while the site is not public
	let mut parsed = Url::parse(&url).unwrap();
	if parsed.host_str() == Some("carton.pub") {
	parsed.set_host(Some("dl.carton.pub")).unwrap();
	parsed.to_string()
	} else {

	/// This tests a subdomain of carton.pub to exercise a different code path
	/// We can remove this once the special case for carton.pub in `http.rs` is removed
	#[tokio::test]
	async fn test_other_domain() {
	let _ = env_logger::builder()
	.filter_level(log::LevelFilter::Info)
	.filter_module("carton", log::LevelFilter::Trace)
	.is_test(true)
	.try_init();

	let start = Instant::now();
	let _info =
	super::Carton::get_model_info("https://assets.carton.pub/manifest_sha256/0851b8cbda75c2f587c4c2a832c245575330a65932b9206f6e70391b78032c51")
	.await
	.unwrap();
	println!("Loaded info in {:#?}", start.elapsed());
	}

vivekpanyam / carton Goto Github PK

carton's People

Contributors

Stargazers

Watchers

Forkers

carton's Issues

Background

Implementation

Source

Error

Recommend Projects

Recommend Topics

Recommend Org