Bench I've created this simple benchmark with the MNIST model to a

<a href="https://github.com/gorgonia/tensor/pull/42/commits/eee5e7316341f67b8b9179efa1

With the <a href="https://github.com/gorgonia/tensor/pull/43" data-hovercard-type="pul

Closed thanks to <a href="https://github.com/gorgonia/tensor/pull/43" data-hovercard-t

<a href="https://github.com/gorgonia/gorgonia/pull/299" data-hovercard-type="pull_requ

Broadcasting is consuming a lot of memory in Gorgonnx/Gorgonia about onnx-go HOT 7 CLOSED

owulveryck commented on July 22, 2024

Broadcasting is consuming a lot of memory in Gorgonnx/Gorgonia

from onnx-go.

Comments (7)

owulveryck commented on July 22, 2024

I just made a quick test by "hacking" the tensor package.
I've made a "lazy initialization" of the array; the value is populated on a call to Data() which is not often.

The results look promising:

Normal bench:

➜  onnx-go git:(benchmarks) ✗  go test -bench=. -benchmem -memprofile memprofile.out -cpuprofile profile.out -benchtime=10s
goos: darwin
goarch: amd64
pkg: github.com/owulveryck/onnx-go
BenchmarkUnmarshalBinary-4          2000          10688594 ns/op         3906741 B/op      67107 allocs/op
PASS
ok      github.com/owulveryck/onnx-go   22.620s

bench with the hack:

➜  onnx-go git:(benchmarks) ✗  go test -bench=. -benchmem -memprofile memprofile.out -cpuprofile profile.out -benchtime=10s
goos: darwin
goarch: amd64
pkg: github.com/owulveryck/onnx-go
BenchmarkUnmarshalBinary-4          3000           6136003 ns/op         2642474 B/op      27664 allocs/op
PASS
ok      github.com/owulveryck/onnx-go   19.169s

from onnx-go.

owulveryck commented on July 22, 2024

This commit from then tensor.Tensor package drastically enhances performances and memory consumption.

However, I will keep this issue open for now to do a further investigation with the broadcasting mechanism.

from onnx-go.

owulveryck commented on July 22, 2024

In Gorgonia, the broadcast mechanism is based on the repeatOp, which itself triggers a call to Repeat(...) in the Dense implementation of the Tensor package.
This mechanism is calling copyDenseSliced many times in two embedded for loops. A single call to copyDenseSlice is creating two new objects here

d := dst.arr().slice(dstart, dend)
s := src.arr().slice(sstart, send)

Within this loop, we are creating i x j x 2 strides.
We could reduce the number of creation and garbage collection by extracting the s and d slices from the copyDenseFunction.

from onnx-go.

owulveryck commented on July 22, 2024

With the PR 43 from the tensor package, the results are now:

➜  onnx-go git:(benchmarks) ✗ go test -bench=. -benchmem -memprofile memprofile.out -cpuprofile profile.out -benchtime=10s
goos: darwin
goarch: amd64
pkg: github.com/owulveryck/onnx-go
BenchmarkUnmarshalBinary-4          3000           4208506 ns/op         2042788 B/op      18273 allocs/op
PASS
ok      github.com/owulveryck/onnx-go   13.320s

Comparing with the initial investigation of the issue, the performance comparison will be:

benchmark                      old ns/op     new ns/op     delta
BenchmarkUnmarshalBinary-4     9457554       4528740       -52.12%

benchmark                      old allocs     new allocs     delta
BenchmarkUnmarshalBinary-4     67120          18272          -72.78%

benchmark                      old bytes     new bytes     delta
BenchmarkUnmarshalBinary-4     3910542       2042637       -47.77%

Once the PR is merged, that will be enough to close this issue

from onnx-go.

owulveryck commented on July 22, 2024

Closed thanks to PR #43 of the tensor package

from onnx-go.

owulveryck commented on July 22, 2024

I reopen this issue because on NN involving small tensors, broadcasting is ok, but on bigger tensor it's still too slow.

from onnx-go.

owulveryck commented on July 22, 2024

PR 299 from Gorgonia should improve things

from onnx-go.

Broadcasting is consuming a lot of memory in Gorgonnx/Gorgonia about onnx-go HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent