When I start my application, the serving load one big model fail. The error message is

Hi there, I believe the workaround described at <a class="issue-link

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

TensorFlow uses use_fast_cpp_protos=true and <code cl

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I've tried solutions listed on <a class="issue-link js-issue-link" data-error-text="Fa

I agree with <a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Load large model fail about serving HOT 20 CLOSED

tensorflow commented on May 22, 2024

Load large model fail

from serving.

Comments (20)

chrisolston commented on May 22, 2024

Hi there,

I believe the workaround described at tensorflow/tensorflow#582, i.e. increasing the protocol message size limit, will work for TF-Serving as well.

I'm not sure exactly what caused you to exceed the limit, but FYI a common cause of large models is if you are serializing the model weights as part of the graph-def (vs. a separate parameter saver file).

-Chris

from serving.

doubler commented on May 22, 2024

Thanks @chrisolston
I used the TF-Serving to export the model successful. The error message generate when loading the exported model. Yes I serialize all the model weights as part of the graph and the exported model file is less than 200M.
I don't know why export is OK while load fail without changing the protobuf limit.

from serving.

kirilg commented on May 22, 2024

TensorFlow uses use_fast_cpp_protos=true and allow_oversize_protos=true by default. You can try running bazel build -c opt --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true tensorflow_serving/... which should work with protos >64MB.

I think since they put it in their bazel.rc by default, we'll probably do the same, but need to double check.

from serving.

vrv commented on May 22, 2024

@kirilg: Users need to install a protobuf package we've prepared here to get the >64MiB python protobuf support: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#protobuf-library-related-issues.

Also, those flags only apply to python protobuf parsing, not C++, which has a higher limit.

from serving.

doubler commented on May 22, 2024

@vrv After installed the >64MB version python protobuf https://storage.googleapis.com/tensorflow/linux/cpu/protobuf-3.0.0b2.post2-cp27-none-linux_x86_64.whl
the tensorflow(v0.8.0 RC0) cannot be used.
import tensorflow will have core dump after install the specific protobuf version.

The core info is:
(gdb) bt #0 0x00007f9606e2a2f1 in std::__detail::_Map_base<google::protobuf::Descriptor const*, std::pair<google::protobuf::Descriptor const* const, google::protobuf::DynamicMessage::TypeInfo const*>, std::allocator<std::pair<google::protobuf::Descriptor const* const, google::protobuf::DynamicMessage::TypeInfo const*> >, std::__detail::_Select1st, std::equal_to<google::protobuf::Descriptor const*>, google::protobuf::hash<google::protobuf::Descriptor const*>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true>, true>::operator[](google::protobuf::Descriptor const* const&) () from /usr/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so #1 0x00007f9606e2a3d3 in google::protobuf::DynamicMessageFactory::GetPrototypeNoLock(google::protobuf::Descriptor const*) () from /usr/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so #2 0x00007f9606e2b02a in google::protobuf::DynamicMessageFactory::GetPrototype(google::protobuf::Descriptor const*) () from /usr/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so #3 0x00007f95ee4f5129 in google::protobuf::python::cmessage::New (cls=<optimized out>, unused_args=<optimized out>, unused_kwargs=<optimized out>) at google/protobuf/pyext/message.cc:1255 #4 0x00007f9618131d23 in type_call () from /lib64/libpython2.7.so.1.0 #5 0x00007f96180dc0b3 in PyObject_Call () from /lib64/libpython2.7.so.1.0 #6 0x00007f961817025c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0 #7 0x00007f96181740bd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0 #8 0x00007f96181741c2 in PyEval_EvalCode () from /lib64/libpython2.7.so.1.0 #9 0x00007f9618183fac in PyImport_ExecCodeModuleEx () from /lib64/libpython2.7.so.1.0 #10 0x00007f9618184228 in load_source_module () from /lib64/libpython2.7.so.1.0

from serving.

vrv commented on May 22, 2024

@keveman : any ideas?

from serving.

doubler commented on May 22, 2024

By the way, If I upgrade the protobuf in mac, the import tensorflow also has error.

`In [1]: import tensorflow

KeyError Traceback (most recent call last)
in ()
----> 1 import tensorflow

/Library/Python/2.7/site-packages/tensorflow/init.py in ()
21 from future import print_function
22
---> 23 from tensorflow.python import *

/Library/Python/2.7/site-packages/tensorflow/python/init.py in ()
47
48 try:
---> 49 from tensorflow.core.framework.graph_pb2 import *
50 except ImportError:
51 msg = """%s\n\nError importing tensorflow. Unless you are using bazel,

/Library/Python/2.7/site-packages/tensorflow/core/framework/graph_pb2.py in ()
8 from google.protobuf import reflection as _reflection
9 from google.protobuf import symbol_database as _symbol_database
---> 10 from google.protobuf import descriptor_pb2
11 # @@protoc_insertion_point(imports)
12

/Library/Python/2.7/site-packages/google/protobuf/descriptor_pb2.py in ()
1493 message_type=None, enum_type=None, containing_type=None,
1494 is_extension=False, extension_scope=None,
-> 1495 options=None),
1496 _descriptor.FieldDescriptor(
1497 name='source_file', full_name='google.protobuf.GeneratedCodeInfo.Annotation.source_file', index=1,

/Library/Python/2.7/site-packages/google/protobuf/descriptor.pyc in new(cls, name, full_name, index, number, type, cpp_type, label, default_value, message_type, enum_type, containing_type, is_extension, extension_scope, options, has_default_value, containing_oneof)
503 return _message.default_pool.FindExtensionByName(full_name)
504 else:
--> 505 return _message.default_pool.FindFieldByName(full_name)
506
507 def init(self, name, full_name, index, number, type, cpp_type, label,

KeyError: "Couldn't find field google.protobuf.GeneratedCodeInfo.Annotation.path"`

from serving.

dzhyeon commented on May 22, 2024

I've tried solutions listed on tensorflow/tensorflow#582: remove tensorflow, protobuf and reinstall with source, reinstall protobuf with 'pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/protobuf-3.0.0b2.post2-cp27-none-linux_x86_64.whl'. There is no problem on importing tensorflow. However, when following serving tutorial on https://tensorflow.github.io/serving/serving_basic with my own network, same error occured: [libprotobuf ERROR external/protobuf/src/google/protobuf/io/coded_stream.cc:207] A protocol message was rejected because it was too big (more than 67108864 bytes). To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.

from serving.

ank286 commented on May 22, 2024

I agree with @dzhyeon, I reinstalled protobuf-3.0.0b2.post2 but that did not help.

The way I fixed the issue was go to the path external/protobuf/src/google/protobuf/io/coded_stream.h and changing the variable "kDefaultTotalBytesLimit" from 64 to 256. -- for 256MB.
The file coded_stream.h is located within the ~/.cache/bazel/_bazel_root//execroot/serving/ folder.

Now you can bazel build the inference server and do a bazel-bin to run it according to the tutorials

Ankur

from serving.

haibarasiao commented on May 22, 2024

@ank286 I changed "kDefaultTotalBytesLimit" from 64 to 256,and reinstalled protobuf from source, but it didn't work.
Do you know the dir of the libprotobuf that the tensorflow used is where? And what did you mean that"the file coded_stream.h is located within the ~/.cache/bazel/_bazel_root//execroot/serving/ folder."?Please help, I am crazy now...:(

from serving.

haibarasiao commented on May 22, 2024

@dzhyeon Haved you fixed the issue now?I faced the same problem as yours...I reinstalled the protobuf with changing 256 << 20 in coded_stream.h,but It didn't work

from serving.

ank286 commented on May 22, 2024

Find all instances of coded_stream.h on the machine, if you installed Tensorflow it will be linked to one version of coded_stream.h, but you may have changed another. From my experience, Tensorflow will use bazel and bazel will place a version of coded_stream.h in a (temporary) cache folder, so that is the one that needs to be changed to 256

from serving.

vrv commented on May 22, 2024

See: https://www.tensorflow.org/versions/r0.10/get_started/os_setup.html#protobuf-library-related-issues

from serving.

haibarasiao commented on May 22, 2024

@vrv I have tried to update protobuf as https://www.tensorflow.org/versions/r0.10/get_started/os_setup.html#protobuf-library-related-issues
said, but get the Segment fault like others. My step is pip install tensorflow, pip install --upgrade protobuf.
Is there some thing wrong?

from serving.

haibarasiao commented on May 22, 2024

@ank286 Do you mean that I have to reinstalled tensorflow by bazel from source after I changed 256limited in coded_stream.h? I only reinstalled protobuf from source after changed 256limited in every coded_stream.h.

from serving.

ank286 commented on May 22, 2024

I would find all instances of coded_stream.h on your machine and see if
value has been changed to 256. For me, TF was reading coded_stream.h from a
different location that was not in the protobuf source

On Sat, Aug 20, 2016 at 5:17 AM, haibarasiao [email protected]
wrote:

@ank286 https://github.com/ank286 Do you mean that I have to
reinstalled tensorflow by bazel from source after I changed 256limited in
coded_stream.h? I only reinstalled protobuf from source after changed
256limited in every coded_stream.h.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#24 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABgU_INIwfCsMXWwpRjq-MHbTAQwH88nks5qhsYmgaJpZM4H4gQU
.

from serving.

haibarasiao commented on May 22, 2024

@ank286
I changed coded_stream.h of dir of
"/home/usr/.cache/bazel/_bazel_scw4150/9f70318cfa7ecd7a7b579a16191209d1/external/protobuf/src/google/protobuf/io/coded_stream.h
/home/usr/.cache/bazel/_bazel_scw4150/9f70318cfa7ecd7a7b579a16191209d1/external/grpc/third_party/protobuf/src/google/protobuf/io/coded_stream.h
/home/usr/.local/share/Trash/files/protobuf-3.0.0/src/google/protobuf/io/coded_stream.h
/home/scw4150/.local/share/Trash/files/Untitled Folder.2/protobuf-3.0.0/protobuf-3.0.0/src/google/protobuf/io/coded_stream.h
/usr/lib/python2.7/site-packages/tensorflow/include/google/protobuf/io/coded_stream.h
/usr/include/google/protobuf/io/coded_stream.h"

but nothing helped. I think I need to uninstall tensorflow and reinstalled it from source.But I don't know if this will be work.
sad.

from serving.

ank286 commented on May 22, 2024

Did the reinstallation work?

On Aug 22, 2016 4:17 AM, "haibarasiao" [email protected] wrote:

@ank286 https://github.com/ank286
I changed coded_stream.h of dir of
"/home/usr/.cache/bazel/_bazel_scw4150/9f70318cfa7ecd7a7b579a16191209
d1/external/protobuf/src/google/protobuf/io/coded_stream.h
/home/usr/.cache/bazel/_bazel_scw4150/9f70318cfa7ecd7a7b579a16191209
d1/external/grpc/third_party/protobuf/src/google/protobuf/
io/coded_stream.h
/home/usr/.local/share/Trash/files/protobuf-3.0.0/src/
google/protobuf/io/coded_stream.h
/home/scw4150/.local/share/Trash/files/Untitled Folder.2/protobuf-3.0.0/
protobuf-3.0.0/src/google/protobuf/io/coded_stream.h
/usr/lib/python2.7/site-packages/tensorflow/include/
google/protobuf/io/coded_stream.h
/usr/include/google/protobuf/io/coded_stream.h"

but nothing helped. I think I need to uninstall tensorflow and reinstalled
it from source.But I don't know if this will be work.
sad.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#24 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABgU_IUCKd19cKA9t8gmeMCDfNma3IkCks5qiVsogaJpZM4H4gQU
.

from serving.

dzhyeon commented on May 22, 2024

As @ank286 suggested, changing coded_stream from .cache solved the problem.
Just for sure, I've also run bazel clean and rebuild it.

from serving.

laotao commented on May 22, 2024

The latest protobuf version has already raised the hard limit to 2GB. But there's another place to modify if the model is over 1GB. It is in (tfserving root)/tensorflow/tensorflow/core/platform/env.cc line 422:
coded_stream.SetTotalBytesLimit(1024LL << 20, 512LL << 20);

I changed 2014LL to 1500LL and my model was successfully loaded. (But 2048LL caused a 0 limit error. You may try 2047LL or INT_MAX if your model is as big as 2GB)

from serving.

Load large model fail about serving HOT 20 CLOSED

Comments (20)

`In [1]: import tensorflow

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent