Comments (16)
I extracted the inputs and outputs from 2nd conv node, and run some tests on gorgonia conv node, it didn't give out the correct result, I suspect it's a bug in gorgonia's conv layer.
@owulveryck @chewxy
To reproduce the tests, follow this README.
https://github.com/lynic/gorgonnx/tree/test_conv/eltest
Submit a related issue to gorgonia gorgonia/gorgonia#268
from onnx-go.
First of all: many thanks for your help.
I have just tested it with a fresh and empty GOPATH; this seems to work:
GOPATH=/tmp go get -v github.com/owulveryck/onnx-go
cd /tmp/src/github.com/owulveryck/onnx-go
git checkout b7af8bd
cd example/gorgonia
GOPATH=/tmp go get -v ./...
GOPATH=/tmp go run mnist.go
from onnx-go.
This is weird, just by looking at the test file on line 463, it looks that the add operator is a no-op.
I have placed a couple of markers (log.Println
based debugging). The Add operation operates the Add tensor
package
The input of the tensor.Add
operation are:
[ 55.45495 984.50616 -1191.5568 -652.15924 ... -303.621 952.82043 -233.81728 -672.868]
R[-0.044856027 0.007791661 0.06810082 0.02999374 ... -0.055284902 -0.049383815 0.08432205 -0.054540414]
and the output value is:
[55.41009 984.514 -1191.4886 -652.1293 802.4857 497.57553 -303.6763 952.77106 -233.73296 -672.92255]
which is correct (55.45495 - 0.044856027 = 55.41009 ...)
.
In the graph, the value carried by the first input of the operator is:
[55.41009 984.514 -1191.4886 -652.1293 802.4857 497.57553 -303.6763 952.77106 -233.73296 -672.92255]
Maybe a problem with pointers somewhere....
from onnx-go.
Hello,
I wanted to help but I encounter a bug in reproducing steps for debug and use delve (https://github.com/go-delve/delve).
➜ gorgonia git:(b7af8bd) go run mnist.go
# github.com/owulveryck/onnx-go/example/gorgonia/vendor/gorgonia.org/gorgonia/debugger/dot
vendor/gorgonia.org/gorgonia/debugger/dot/encode.go:15:25: not enough arguments in call to dot.Marshal
have (graph.Graph, string, string, string)
want (graph.Graph, string, string, string, bool)
# github.com/owulveryck/onnx-go/internal/pb-onnx
../../internal/pb-onnx/onnx.proto3.pb.go:22:11: undefined: proto.ProtoPackageIsVersion3
from onnx-go.
I confirm, my GOPATH was the root cause of my issue.
from onnx-go.
From what I could tell, the Add
operation is performing correctly. It would appear that the numpy generation is not.
I re-added the VMOpt
to gorgonia/onnx/machine.go
(essentially a redirect to engine.VMOpt
), then I augmented the machine
in mnist.go
with the correct VMOpts:
machine := gorgonnx.NewTapeMachine(graph,
gorgonnx.WithLogger(log.New(os.Stderr, "", 0)), // log execution
gorgonnx.WithWatchlist(), // watching all nodes
gorgonnx.WithValueFmt("%#1.6f")) // log with the following format for values
This is the result, which looks correct to me:
PC 49
Executing + false [CPU32 CPU8] CPU32 false true false. Node is: 14
Inputs:
R[ 55.454948 984.506165 -1191.556763 -652.159241 802.612122 497.435303 -303.621002 952.820435 -233.817276 -672.867981]
R[-0.044856 0.007792 0.068101 0.029994 -0.126410 0.140219 -0.055285 -0.049384 0.084322 -0.054540]
Result:
R[ 55.410091 984.513977 -1191.488647 -652.129272 802.485718 497.575531 -303.676300 952.771057 -233.732956 -672.922546]
Written To: CPU32
R[ 55.410091 984.513977 -1191.488647 -652.129272 802.485718 497.575531 -303.676300 952.771057 -233.732956 -672.922546]
Important to note is the instruction itself:
Executing + false [CPU32 CPU8] CPU32 false true false. Node is: 14
It reads from CPU32, CPU8 and then it overwrites CPU32.
I have no idea how the mnist test numbers are generated.
from onnx-go.
I may be on the wrong path down, lmk
from onnx-go.
I think that you are on the right path, but I am not :D
Thank you for your help.
The Add
operation is actually ok. Which is coherent: in the previous Gorgonnx implementation, I made functional tests of every operator and they all successfully passed.
To generate the numpy
, what I do is:
- running the graph;
- looping over all the nodes;
- extracting the
Value().Data()
.
From your analysis, I see that some registers are overwritten.
Question for you @chewxy: Can this have an impact on the underlying backing value? Maybe one of the registers is wrongly overwritten at a certain point. Can we avoid this behavior for testing purpose (by playing with isPointer
or anything else?)
I have two possibilities to continue the debugging process now:
- to extend the debug process and use the
VMOpts
to investigate more (and maybe I will couple it with the old work of stepped execution to investigate.) - to build a new MNIST graph into Gorgonia (based on the model I have extracted), and set the weights' values with the data from the initializer. Then play with the model (mostly with the shape of the tensor to avoid broadcasting) and see if I can get the good result.
from onnx-go.
Hi, I would like to help, but I'm new to this project. My though was to check the outputs node by node and see the difference between gorgonia and onnxruntime. Following this the script I used to check the output of "Convolution28" node. But I still don't know how to check the outputs of same node in onnx-go.
import onnx
import os
import glob
import onnxruntime as onnxrt
import onnxruntime.backend as backend
# import onnx_tf.backend as backend
# import caffe2.python.onnx.backend as backend
# import cntk as C
import numpy as np
from onnx import numpy_helper, helper
from onnx import TensorProto
print("after import")
model = onnx.load('mnist/model.onnx')
onnx.checker.check_model(model)
test_data_dir = 'mnist/test_data_set_0'
# Load inputs
inputs = []
inputs_num = len(glob.glob(os.path.join(test_data_dir, 'input_*.pb')))
for i in range(inputs_num):
input_file = os.path.join(test_data_dir, 'input_{}.pb'.format(i))
tensor = onnx.TensorProto()
with open(input_file, 'rb') as f:
tensor.ParseFromString(f.read())
inputs.append(numpy_helper.to_array(tensor))
# Load reference outputs
ref_outputs = []
ref_outputs_num = len(glob.glob(os.path.join(test_data_dir, 'output_*.pb')))
for i in range(ref_outputs_num):
output_file = os.path.join(test_data_dir, 'output_{}.pb'.format(i))
tensor = onnx.TensorProto()
with open(output_file, 'rb') as f:
tensor.ParseFromString(f.read())
ref_outputs.append(numpy_helper.to_array(tensor))
# print(ref_outputs)
mg = model.graph
n1 = mg.node[1]
# n1.attribute[2].s = "NOTSET".encode()
# inp0 = mg.input[0]
# inp1 = mg.input[1]
# out0 = helper.make_tensor_value_info('Convolution28_Output_0', TensorProto.FLOAT, [1])
graph_def = helper.make_graph(
[
n1,
],
"MLP",
[
helper.make_tensor_value_info('Input3', TensorProto.FLOAT, [1,1,28,28]),
helper.make_tensor_value_info('Parameter5', TensorProto.FLOAT, [8,1,5,5]),
# inp0,
# inputs,
# inp1,
],
[
helper.make_tensor_value_info('Convolution28_Output_0', TensorProto.FLOAT, [1]),
]
)
graph_def.initializer.extend([mg.initializer[2]])
import ipdb; ipdb.set_trace()
model_def = helper.make_model(graph_def, producer_name='onnx-example')
onnx.checker.check_model(model_def)
pm = backend.prepare(model_def)
outs = list(pm.run(inputs))
oo = np.asarray(outs[0])
print(oo[0][0])
# for ref_o, o in zip(ref_outputs, outs):
# np.testing.assert_almost_equal(ref_o, o)
# ro = onnxrt.RunOptions()
# ro.run_log_verbosity_level = 1
# ro.run_tag = "testtag123"
# import ipdb; ipdb.set_trace()
# model.graph.node[1].attribute[2].s = "NOTSET".encode()
# Run the model on the backend
# prep_model = backend.prepare(model, session_log_verbosity_level=1)
# outputs = list(prep_model.run(inputs, run_options=ro))
prep_model = backend.prepare(model)
outputs = list(prep_model.run(inputs))
# outputs = list(backend.run(model, inputs, run_log_verbosity_level=1))
# print(outputs)
# import ipdb; ipdb.set_trace()
# Compare the results with reference outputs.
for ref_o, o in zip(ref_outputs, outputs):
np.testing.assert_almost_equal(ref_o, o)
We could run this script on my builded docker image "elynn/onnxrt:latest" .
from onnx-go.
Thanks @lynic for your help;
Wtih the help of @chewxy, I have realized that my tests files were not ok. Actually, I was extracting the values from the nodes after the execution on the tape machine, but some values are de facto incorrect due to optimization (some nodes can have their values overwritten).
I have started a new branch to track this issue.
So far, I have inserted a channel
inside the tapeMachine
. I can grab the instructions and the associated tensors at runtime.
The code here demonstrate how to get the values (forgive me for its ugliness, but it's a 30 minutes work between two meetings).
With this code I am sure that I can get the exact values. I will try to analyze them manually or to get the inspiration from your python code to generate a test file for every operation. I may then see if one is behaving badly (I have some doubts on the Maxpool
operator).
from onnx-go.
I have generated a sort of sequence graph of the register usage in the tapeMachine
of gorgonia.
I was looking for something weird, such as a register that could have been wrongly overwritten.
But I don't find anything strange on the graph.
I copy/paste the graph here only for the record; On each edge, the (number)
is the execution order.
from onnx-go.
I compare the outputs node by node and found that the output from 2nd conv node is not correct, that's weird since conv node pass my tests. "maxpool" node before 2cd "conv" node actually gave out the correct matrix.
import onnx
import os
import glob
import onnxruntime as onnxrt
import onnxruntime.backend as backend
# import onnx_tf.backend as backend
# import caffe2.python.onnx.backend as backend
# import cntk as C
import numpy as np
from onnx import numpy_helper, helper
from onnx import TensorProto
print("after import")
model = onnx.load('mnist/model.onnx')
onnx.checker.check_model(model)
test_data_dir = 'mnist/test_data_set_1'
# Load inputs
inputs = []
inputs_num = len(glob.glob(os.path.join(test_data_dir, 'input_*.pb')))
for i in range(inputs_num):
input_file = os.path.join(test_data_dir, 'input_{}.pb'.format(i))
tensor = onnx.TensorProto()
with open(input_file, 'rb') as f:
tensor.ParseFromString(f.read())
inputs.append(numpy_helper.to_array(tensor))
# Load reference outputs
ref_outputs = []
ref_outputs_num = len(glob.glob(os.path.join(test_data_dir, 'output_*.pb')))
for i in range(ref_outputs_num):
output_file = os.path.join(test_data_dir, 'output_{}.pb'.format(i))
tensor = onnx.TensorProto()
with open(output_file, 'rb') as f:
tensor.ParseFromString(f.read())
ref_outputs.append(numpy_helper.to_array(tensor))
# print(ref_outputs)
mg = model.graph
n1 = mg.node[1]
# n1.attribute[2].s = "NOTSET".encode()
# inp0 = mg.input[0]
# inp1 = mg.input[1]
# out0 = helper.make_tensor_value_info('Convolution28_Output_0', TensorProto.FLOAT, [1])
graph_def = helper.make_graph(
[
# mg.node[1], # Convolution28
onnx.helper.make_node(
"Conv",
name='Convolution28',
inputs=['Input3', 'Parameter5'],
outputs=['Convolution28_Output_0'],
kernel_shape=[5, 5],
pads=[2, 2, 2, 2],
# auto_pad="SAME_UPPER",
strides=[1, 1], # Default values for other attributes: dilations=[1, 1], groups=1
group=1,
dilations=[1,1],
), # Convolution28
mg.node[2], # Plus30
mg.node[3], # ReLU32
mg.node[4], # Pooling66
# mg.node[5], # Convolution110
onnx.helper.make_node(
"Conv",
name='Convolution110',
inputs=['Pooling66_Output_0', 'Parameter87'],
outputs=['Convolution110_Output_0'],
kernel_shape=[5, 5],
pads=[2, 2, 2, 2],
# auto_pad="SAME_UPPER",
strides=[1, 1], # Default values for other attributes: dilations=[1, 1], groups=1
group=1,
dilations=[1,1],
), # Convolution28
],
"test_mnist",
[
mg.input[0], # Input3
mg.input[1], # Parameter5
mg.input[2], # Parameter6
mg.input[3], # Parameter87
],
[
# mg.value_info[1], # Convolution28_Output_0
# mg.value_info[2], # Plus30_Output_0
# mg.value_info[3], # ReLU32_Output_0
# mg.value_info[4], # Pooling66_Output_0
mg.value_info[5], # Convolution110_Output_0
],
initializer = [
mg.initializer[2], # Parameter5
mg.initializer[3], # Parameter6
mg.initializer[1], # Parameter87
],
value_info= [
mg.value_info[1], # Convolution28_Output_0
mg.value_info[2], # Plus30_Output_0
mg.value_info[3], # ReLU32_Output_0
mg.value_info[4], # Pooling66_Output_0
]
)
import ipdb; ipdb.set_trace()
model_def = helper.make_model(graph_def, producer_name='onnx-example')
onnx.checker.check_model(model_def)
pm = backend.prepare(model_def)
outs = list(pm.run(inputs))
oo = np.asarray(outs[0])
print(oo[0][0])
from onnx-go.
My aim for today is to fix the conv and maxpool - will be available on Slack in 4 hrs to discuss
from onnx-go.
My aim for today is to fix the conv and maxpool - will be available on Slack in 4 hrs to discuss
Hi @chewxy , if you need any help from me, I would like to join the discussion. What slack channel are you in?
from onnx-go.
from onnx-go.
The graph-builder
branch has been updated.
The vendor's version of Gorgonia
is using the code of @lynic referenced in this PR.
The execution is now providing a good result:
$ go run mnist.go
2019/02/25 09:12:02 [5041.889 -3568.877 -187.82419 -1685.7964 -1183.323 -614.4293 892.66394 -373.65866 -290.2622 -111.176735]
Many thanks to all of you for your help.
from onnx-go.
Related Issues (20)
- Cannot model.UnmarshalBinary - says 'No data found' HOT 1
- Can't import onnx model, converted from BigGAN-PyTorch HOT 2
- Implement operator LSTM,Clip for backend Gorgonia HOT 1
- Will this project be maintained further and are contributions still welcomed?
- Implement operator `LinearRegressor` for backend `gorgonia`
- "Asymmetric padding" error
- panic: negative dimension size does not make sense
- ../../go/src/gorgonia.org/tensor/dense_compat.go:442:23: undefined: array.Interface HOT 2
- Updated depens
- poor performance (run model)
- run() function calls newMachine() everytime HOT 1
- Question: unsqueeze: axes in not an []int64 HOT 3
- Support for empty tensors
- Tape machine does not reset properly for some models HOT 2
- Implement operator `PReLU` for backend `Gorgonia`
- Implement operator `Cast` for backend `Gorgonia`
- panic: index out of range HOT 4
- the Error during UnmarshalBinary(b)
- Is Conv1D supported?
- Implement operator `Gather` for backend `YYY`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from onnx-go.