I have a demo model with 39 inputs.it takes 0.5s to predict 10000 data using keras.wit

I write a simple one <div class="snippet-clipboard-content notranslate position-re

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

A basic performance analysis has started in issue <a class="issue-link js-issue-link"

A patch has been <a href="https://github.com/gorgonia/tensor/pull/42/commits/eee5e7316

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Analyse (and enhance) performances on multiple predictions about onnx-go HOT 13 CLOSED

owulveryck commented on July 22, 2024

Analyse (and enhance) performances on multiple predictions

from onnx-go.

Comments (13)

bitnick10 commented on July 22, 2024 2

I write a simple one

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD
import keras.engine.saving
import keras.models
import math
import numpy as np
import stopwatch
import onnxmltools
data_size = 100000
x_train = np.array(np.random.rand(data_size,39),dtype='float32')
y_train = np.zeros(shape=(data_size,2),dtype='float32')

model = Sequential()
model.add(Dense(units=22,input_shape=(39,), activation='tanh'))
model.add(Dense(units=22, activation='tanh'))
model.add(Dense(units=2, activation='tanh'))
sgd = SGD(lr=0.01, momentum=0.9)
model.compile(loss='categorical_crossentropy',optimizer=sgd,)

sw = stopwatch.Stopwatch()
sws=[]
for i in range(2):
    sw = stopwatch.Stopwatch()
    sw.reset();
    sw.stop();
    sws.append(sw)

for i in range(100):
    sws[0].start()
    model.train_on_batch(x_train, y_train)
    sws[0].stop()
    sws[1].start()
    y_predict=model.predict(x_train,batch_size=len(x_train))
    sws[1].stop()
    print("{0} {1:.3f} {2:.3f} ".format(i,sws[0].duration/(i+1),sws[1].duration/(i+1)))

onnx_model = onnxmltools.convert_keras(model, target_opset=7)
onnxmltools.save_model(onnx_model,"model.onnx")

Hope helps

from onnx-go.

owulveryck commented on July 22, 2024

Hello,

TL;DR: no you are not, onnx-go is not (yet) optimized for performances.

Longer answer:
onnx-go is not optimized for performances (yet). In your case, it may be because of the way the Gorgonia backend itself is implemented. Actually, launching a backed.Run() initiates a lot of stuff under the hood (a gorgonia.TapeMachine is created each time for example). It is something that we are aware of, and hiding all of this within the Run() method should allow performance tuning without breaking the API (for example. there could be a reset method to recycle some elements, or create a worker pool of machines to execute the code - this is just ideas.)

Anyway, this is pure speculation and to enhance we need to know where the bottleneck is and to measure the performances. Your example could be a perfect starting point to start to analyze and to do some measurements.
Maybe we can turn your example into a _test.go and run some bench and analysis.

Do you mind sharing your complete example with us?

On top of that, we can also try to run your tests in concurrency via some goroutines to see how it behaves. The concurrency is also a goal as @blackrez and I would like to be able to run onnx-go inside web service.

from onnx-go.

bitnick10 commented on July 22, 2024

@owulveryck , it's ok to share the model and data.It's only some study.Give me a email address or something?

from onnx-go.

owulveryck commented on July 22, 2024

Is it huge? can you copy/paste the python code here?

from onnx-go.

owulveryck commented on July 22, 2024

A basic performance analysis has started in issue #68

from onnx-go.

owulveryck commented on July 22, 2024

I made a very simple test on my machine without actually checking the results; Gorgonia is winning against Tensorflow (if my test is ok):

Python

import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD
import keras.engine.saving
import keras.models
import math
import numpy as np
import time
model = Sequential()
import onnxmltools
data_size = 100000
x_train = np.array(np.random.rand(data_size,39),dtype='float32')
y_train = np.zeros(shape=(data_size,2),dtype='float32')

model.add(Dense(units=22,input_shape=(39,), activation='tanh'))
model.add(Dense(units=22, activation='tanh'))
model.add(Dense(units=2, activation='tanh'))
sgd = SGD(lr=0.01, momentum=0.9)
model.compile(loss='categorical_crossentropy',optimizer=sgd,)

model.train_on_batch(x_train, y_train)
total = 0
for i in range(100):
    x_test = np.array(np.random.rand(data_size,39),dtype='float32')

    start = time.time()
    y_predict=model.predict(x_test)
    end = time.time()
    total += (end - start)
    print(total / (i+1))

onnx_model = onnxmltools.convert_keras(model, target_opset=7)
onnxmltools.save_model(onnx_model,"model.onnx")

The prediction is on average 870ms

Go

This code should run something similar:

       datasize := 100000
        backend := gorgonnx.NewGraph()
        model := onnx.NewModel(backend)
        b, err := ioutil.ReadFile(os.Args[1])
        if err != nil {
                log.Fatal(err)
        }
        err = model.UnmarshalBinary(b)
        if err != nil {
                log.Fatal(err)
        }
        var d time.Duration
        for i := 0; i < 100; i++ {
                input := tensor.New(tensor.WithShape(datasize, 39), tensor.Of(tensor.Float32), tensor.WithBacking(tensor.Random(tensor.Float32, datasize*39)))
                model.SetInput(0, input)
                t := time.Now()
                err = backend.Run()
                if err != nil {
                        log.Fatal(err)
                }
                d += time.Since(t)
                fmt.Println(time.Duration(float64(d) / float64(i+1)))
        }

Gives on average 320ms (and around 200ms with a patch that i will commit to the tensor package soon).

from onnx-go.

bitnick10 commented on July 22, 2024

try this keras predict ,it's faster
y_predict=model.predict(x_train,batch_size=len(x_train))

from onnx-go.

owulveryck commented on July 22, 2024

A patch has been commited to the tensor.Tensor package; it really changes the performances and memory consumption.
Can you update your go workspace and give a new try?

from onnx-go.

bitnick10 commented on July 22, 2024

onnx-go improved.my ratio is 0.07s(keras) vs 0.3s

from onnx-go.

owulveryck commented on July 22, 2024

@bitnick10 , would you be kind enough to provide the python code and the Go code you are actually executing to get your results? This would allow us to run the exact same test as you do.

Maybe a gist.github.com can do the job.

Thanks

from onnx-go.

bitnick10 commented on July 22, 2024

I copied keras of mine and goalng of yours.I changed this line that your keras code without it y_predict=model.predict(x_train,batch_size=len(x_train)) which can make keras fast a lot if you add batch_size=len(x_train) and keras settings like this

 {
    "floatx": "float32",
    "epsilon": 1e-07,
    "backend": "tensorflow",
    "image_data_format": "channels_last"
}

from onnx-go.

owulveryck commented on July 22, 2024

You're right, setting the batch_size improve the performances of kheras and make it more efficient.
One thing I've noticed is that the tensor.Random(tensor.Float32, datasize*39) of the Go file generates numbers that are not between 0 and 1; this makes the computation of the Tanh expensive.
I've tried with a fix small value, and I am falling down to a 100ms in average for Go while I have around 50ms for the Python version.

I made profiling, and the next move will be to enhance the broadcasting (see issue #68):

Or maybe we can try to generate a model that does not "compact the tensor" and use broadcasting for testing. But I don't know if it's possible.

from onnx-go.

owulveryck commented on July 22, 2024

This PR from the tensor package gives good results;
I am closing this issue. A new issue can be raised to work on performances again.

from onnx-go.

Analyse (and enhance) performances on multiple predictions about onnx-go HOT 13 CLOSED

Comments (13)

Python

Go

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent