vivekpanyam / carton Goto Github PK
View Code? Open in Web Editor NEWRun any ML model from any programming language.
Home Page: https://carton.run
License: Apache License 2.0
Run any ML model from any programming language.
Home Page: https://carton.run
License: Apache License 2.0
Looks like a very interesting project especially if it has bindings for most of the popular programming languages.
I saw on the site that you were planning to have bindings for C#. I have never created such a project to bind Rust to C#, but I would love to give it a shot.
I cannot seem to run any of the examples shown on https://carton.pub/.
For example I tried bert-base-uncased
:
import cartonml as carton
# A permalink to the model
MODEL_URL = "https://carton.pub/google-research/bert-base-uncased/5f26d87c5d82b7c37ebf92fcb38788a063d49a64cfcf1f9d118b3b710bb88005"
async def main():
# Load the model
model = await carton.load(MODEL_URL)
# Set up inputs
inputs = {
"input": await model.info.examples[0].inputs["input"].get(),
}
# Run the model
results = await model.infer(inputs)
# Print the results
print(results)
# Print the expected results
print({
"tokens": await model.info.examples[0].sample_out["tokens"].get(),
"scores": await model.info.examples[0].sample_out["scores"].get(),
})
import asyncio
asyncio.run(main())
And the result I get is:
(carton-tests) โ carton-tests python main.py
Request: 0 67584
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: Custom("invalid value: integer `1281`, expected variant index 0 <= i < 6")', /app/source/carton-runner-interface/src/do_not_modify/framed.rs:41:59
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
I also tried distilbert-base-cased-distilled-squad
and that gave the same response.
Maybe I am doing something wrong or I might be missing some dependencies, but I haven't been able to figure it out yet.
Please vote in the poll in #157!
Maybe also make Handle
serialize differently based on the size of the tensor? For large tensors, it uses shared memory. For small ones it serializes them inline? Make Handle
a variant so we can choose dynamically.
I'm actively working on C++ bindings and will update this issue once PRs are ready.
Feel free to subscribe to this issue to be notified of progress.
While I am workin on finishing up #173, I drafted up some stuff here which allows us to wrap the user's infer implementation with custom logic. I don't know if this would actually work though. For the user it would look like:
use carton_wasm_interface::infer;
#[infer]
fn infer(in_: Vec<(String, Tensor)>) -> Vec<(String, Tensor)> {
// ...
}
The reason this work around is needed is because the .wit
interface needs to be implemented in the same place the bindings are generated. Now we can implement stuff like conversions for candle and returning a pointer (and managing it's lifetime) easily.
Let me know if you have any thoughts! I'm hoping it makes developing wasm models more ergonomic.
Adding support for Ludwig seems like it should be fairly straightforward. Ludwig supports export to TorchScript and we already have a TorchScript runner.
Specifically, it would probably make sense to create something similar to the export_neuropod
utility in Ludwig.
This would involve:
This can be done entirely in Python code so if you want to contribute, but you're not familiar with Rust, this is a good option!
In some cases, worker threads panic when we try to send log messages to the main process during runner shutdown.
This doesn't really have a practical impact (as it's contained to the runner process and only happens after communication with the main process stops), but it can make stdout/stderr confusing because it looks like something important broke.
How easy would it be to add support for Rust based libraries like Candle and Burn. I'd like to implement this if you aren't already working on it. I'd also appreciate your thoughts on whether this integration is even necessary or useful, since both packages allow you to compile everything down. Maybe it would make more sense, to instead create runners for the formats those libraries can produces, like binaries, wasm, and executables.
[dependencies]
carton = "0.1.0"
$ cargo run
Updating crates.io index
error: no matching package named `carton_window` found
location searched: registry `crates-io`
required by package `carton v0.1.0`
... which satisfies dependency `carton = "^0.1.0"` of package ...
It seems like some logs below INFO
don't go to the python logging system (or aren't appearing for some reason on the python end) even though others do. Dig into this more
XLA is an ML "compiler for GPUs, CPUs, and ML accelerators."
Carton support for XLA would primarily be used to provide JAX support, but in theory it could also support some PyTorch and TensorFlow models.
Here's a guide on how to export a JAX model from Python and run it from C++ using XLA: google/jax#5337 (comment).
This is an example of the above in the JAX codebase.
I've explored doing this in the past (outside of Carton), but there weren't XLA prebuilt binaries available and it required building from source in the TensorFlow repo. Now, with OpenXLA and prebuilt binaries, this is a lot easier.
@LaurentMazare created rust bindings to XLA that include a straightforward example of loading the HLO IR generated by the JAX export code. That should make it fairly easy to prototype an integration with Carton if anyone is interested in doing so.
Concretely, this could be implemented as follows:
jax_to_ir.py
does and calls jax.xla_computation
Recommended reading:
Add buffering to uses of tokio::io::copy
that aren't yet using it. Also explore places where we're using slowlog to make sure we're not missing buffering somewhere
This makes the first-run experience better
This lets us handle positional args for frameworks that support it
There are many different ways of running an ONNX model from Rust:
"Tiny, no-nonsense, self-contained, Tensorflow and ONNX inference".
Notes:
"A WebGPU-accelerated ONNX inference run-time written 100% in Rust, ready for native and the web"
Notes:
wgpu
supports Vulkan and there are software implementations of it (e.g. SwiftShader), but not sure how plug-and-play it is."A Rust wrapper for ONNX Runtime"
Notes:
If we're going to have one "official" ONNX runner, it should probably use ort
. Unfortunately, since ort
doesn't have WASM support, we need another solution for running from WASM environments.
This could be:
ort
on desktop, tract
on WASM without GPU, and wonnx
on WASM with GPUs. This seems like a complex solution especially because they don't all support the same set of ONNX operators.tract
everywhere, but don't have GPU supportwonnx
everywhere, but require GPU/WebGPU@kali @pixelspark @decahedron1 If you get a chance, I'd really appreciate any thoughts you have on the above. Thank you!
Most of the code in language bindings just does type conversion. If you squint a bit, this fits into serde's definition of serialization/deserialization.
You could implement a serde::Serializer
that "serializes" things into PyO3/Neon objects and a serde::Deserializer
that does the opposite.
People have done this and implemented things like neon-serde and pythonize. This would significantly simplify boilerplate code in the language bindings, make things easier to maintain, and also make it easier to add support for new languages.
Actually use this field when loading a model:
carton/docs/specification/format.md
Line 52 in 8005f38
{"tokens": String([["day"]], shape=[1, 1], strides=[1, 1], layout=CFcf (0xf), dynamic ndim=2), "scores": Float([[14.551311]], shape=[1, 1], strides=[1, 1], layout=CFcf (0xf), dynamic ndim=2)}
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }', /app/source/carton-runner-interface/src/do_not_modify/framed.rs:71:38
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()` on an `Err` value: SendError(RPCResponse { id: 0, complete: true, data: LogMessage { record: LogRecord { metadata: LogMetadata { level: Trace, target: "mio::poll" }, args: "deregistering event source from poller", module_path: Some("mio::poll"), file: Some("/root/.cargo/registry/src/index.crates.io-6f17d22bba15001f/mio-0.8.5/src/poll.rs"), line: Some(663) } } })', /app/source/carton-runner-interface/src/server.rs:299:36
Any hint?
The initial implementation of the C bindings is in #169
Feel free to subscribe to this issue/any of the PRs above to be notified of progress.
@arthurmelton is working on OCaml bindings in #166
Feel free to subscribe to this issue to be notified of progress.
The previous zip file library we used during packing required complete files to be available before they could be stored. This required us to load large (possibly multi GB) files into memory.
This is no longer required. The following two places within the packing code can be refactored to read, compute sha256, and store files in a streaming/incremental fashion:
carton/source/carton/src/format/v1/save.rs
Lines 259 to 279 in 33cc183
carton/source/carton/src/format/v1/save.rs
Lines 354 to 390 in 33cc183
Now that Carton is open source (and the websites are public), we can remove this special case for carton.pub
:
carton/source/carton/src/http.rs
Lines 88 to 93 in 33cc183
That change lets us remove this test as well:
carton/source/carton/src/carton.rs
Lines 353 to 369 in 33cc183
If you're looking for a quick task to get started with the codebase, this is a good option!
When trying the quickstart with Rust, I got several compilation errors:
error[E0433]: failed to resolve: use of undeclared crate or module `ndarray`
--> src/main.rs:14:15
|
14 | let arr = ndarray::ArrayD::from_shape_vec(
| ^^^^^^^ use of undeclared crate or module `ndarray`
error[E0433]: failed to resolve: use of undeclared type `Tensor`
--> src/main.rs:22:37
|
22 | .infer([("input_sequences", Tensor::<GenericStorage>::String(arr))])
| ^^^^^^ use of undeclared type `Tensor`
|
help: consider importing this enum
|
1 + use carton::types::Tensor;
|
error[E0412]: cannot find type `GenericStorage` in this scope
--> src/main.rs:22:46
|
22 | .infer([("input_sequences", Tensor::<GenericStorage>::String(arr))])
| ^^^^^^^^^^^^^^ not found in this scope
|
help: consider importing this struct
|
1 + use carton::types::GenericStorage;
|
error[E0433]: failed to resolve: use of undeclared crate or module `ndarray`
--> src/main.rs:15:9
|
15 | ndarray::IxDyn(&[1]),
| ^^^^^^^ use of undeclared crate or module `ndarray`
error[E0752]: `main` function is not allowed to be `async`
--> src/main.rs:4:1
|
4 | async fn main() {
| ^^^^^^^^^^^^^^^ `main` function is not allowed to be `async`
Some errors have detailed explanations: E0412, E0433, E0752.
For more information about an error, try `rustc --explain E0412`.
error: could not compile `carton-test2` (bin "carton-test2") due to 5 previous errors
To resolve, I had to add a couple crates:
cargo add -F macros,rt-multi-thread tokio
cargo add ndarray
I also needed to make a few small code changes:
@@ -1,6 +1,9 @@
use carton::Carton;
+use carton::types::GenericStorage;
use carton::types::LoadOpts;
+use carton::types::Tensor;
+#[tokio::main]
async fn main() {
// Load the model
let model = Carton::load(
e.g. a standard definition for a model that does translation, one for summarization, one for image infill, etc.
This way, models can be drop in replacements of each other (at least on the inference path; loading may require different options).
Versioning for definitions? Do the standard definitions have to be "special" (i.e. specially handled in the library or the registry website) or can we handle any definitions?
Maybe each task in the public registry has a "standard interface" that models can adopt
When building on a 2 vCPU instance (both arm and x86) using buildkite and agents on AWS, build and test takes ~55 min even with sccache.
This is much slower than GH actions builds were. Explicitly caching the target dir and some of .cargo might make it a lot faster, but it would be simpler if just sccache worked.
As a workaround, linux CI now runs on 32 vCPU instances. This isn't ideal, but it's good enough for now.
A good first step to improve this might be to build with --timings
in CI and explore from there
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.