Comments (20)
Hi there,
I believe the workaround described at tensorflow/tensorflow#582, i.e. increasing the protocol message size limit, will work for TF-Serving as well.
I'm not sure exactly what caused you to exceed the limit, but FYI a common cause of large models is if you are serializing the model weights as part of the graph-def (vs. a separate parameter saver file).
-Chris
from serving.
Thanks @chrisolston
I used the TF-Serving to export the model successful. The error message generate when loading the exported model. Yes I serialize all the model weights as part of the graph and the exported model file is less than 200M.
I don't know why export is OK while load fail without changing the protobuf limit.
from serving.
TensorFlow uses use_fast_cpp_protos=true
and allow_oversize_protos=true
by default. You can try running bazel build -c opt --define=use_fast_cpp_protos=true --define=allow_oversize_protos=true tensorflow_serving/...
which should work with protos >64MB.
I think since they put it in their bazel.rc by default, we'll probably do the same, but need to double check.
from serving.
@kirilg: Users need to install a protobuf package we've prepared here to get the >64MiB python protobuf support: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/get_started/os_setup.md#protobuf-library-related-issues.
Also, those flags only apply to python protobuf parsing, not C++, which has a higher limit.
from serving.
@vrv After installed the >64MB version python protobuf https://storage.googleapis.com/tensorflow/linux/cpu/protobuf-3.0.0b2.post2-cp27-none-linux_x86_64.whl
the tensorflow(v0.8.0 RC0) cannot be used.
import tensorflow
will have core dump after install the specific protobuf version.
The core info is:
(gdb) bt #0 0x00007f9606e2a2f1 in std::__detail::_Map_base<google::protobuf::Descriptor const*, std::pair<google::protobuf::Descriptor const* const, google::protobuf::DynamicMessage::TypeInfo const*>, std::allocator<std::pair<google::protobuf::Descriptor const* const, google::protobuf::DynamicMessage::TypeInfo const*> >, std::__detail::_Select1st, std::equal_to<google::protobuf::Descriptor const*>, google::protobuf::hash<google::protobuf::Descriptor const*>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true>, true>::operator[](google::protobuf::Descriptor const* const&) () from /usr/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so #1 0x00007f9606e2a3d3 in google::protobuf::DynamicMessageFactory::GetPrototypeNoLock(google::protobuf::Descriptor const*) () from /usr/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so #2 0x00007f9606e2b02a in google::protobuf::DynamicMessageFactory::GetPrototype(google::protobuf::Descriptor const*) () from /usr/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so #3 0x00007f95ee4f5129 in google::protobuf::python::cmessage::New (cls=<optimized out>, unused_args=<optimized out>, unused_kwargs=<optimized out>) at google/protobuf/pyext/message.cc:1255 #4 0x00007f9618131d23 in type_call () from /lib64/libpython2.7.so.1.0 #5 0x00007f96180dc0b3 in PyObject_Call () from /lib64/libpython2.7.so.1.0 #6 0x00007f961817025c in PyEval_EvalFrameEx () from /lib64/libpython2.7.so.1.0 #7 0x00007f96181740bd in PyEval_EvalCodeEx () from /lib64/libpython2.7.so.1.0 #8 0x00007f96181741c2 in PyEval_EvalCode () from /lib64/libpython2.7.so.1.0 #9 0x00007f9618183fac in PyImport_ExecCodeModuleEx () from /lib64/libpython2.7.so.1.0 #10 0x00007f9618184228 in load_source_module () from /lib64/libpython2.7.so.1.0
from serving.
@keveman : any ideas?
from serving.
By the way, If I upgrade the protobuf in mac, the import tensorflow also has error.
`In [1]: import tensorflow
KeyError Traceback (most recent call last)
in ()
----> 1 import tensorflow
/Library/Python/2.7/site-packages/tensorflow/init.py in ()
21 from future import print_function
22
---> 23 from tensorflow.python import *
/Library/Python/2.7/site-packages/tensorflow/python/init.py in ()
47
48 try:
---> 49 from tensorflow.core.framework.graph_pb2 import *
50 except ImportError:
51 msg = """%s\n\nError importing tensorflow. Unless you are using bazel,
/Library/Python/2.7/site-packages/tensorflow/core/framework/graph_pb2.py in ()
8 from google.protobuf import reflection as _reflection
9 from google.protobuf import symbol_database as _symbol_database
---> 10 from google.protobuf import descriptor_pb2
11 # @@protoc_insertion_point(imports)
12
/Library/Python/2.7/site-packages/google/protobuf/descriptor_pb2.py in ()
1493 message_type=None, enum_type=None, containing_type=None,
1494 is_extension=False, extension_scope=None,
-> 1495 options=None),
1496 _descriptor.FieldDescriptor(
1497 name='source_file', full_name='google.protobuf.GeneratedCodeInfo.Annotation.source_file', index=1,
/Library/Python/2.7/site-packages/google/protobuf/descriptor.pyc in new(cls, name, full_name, index, number, type, cpp_type, label, default_value, message_type, enum_type, containing_type, is_extension, extension_scope, options, has_default_value, containing_oneof)
503 return _message.default_pool.FindExtensionByName(full_name)
504 else:
--> 505 return _message.default_pool.FindFieldByName(full_name)
506
507 def init(self, name, full_name, index, number, type, cpp_type, label,
KeyError: "Couldn't find field google.protobuf.GeneratedCodeInfo.Annotation.path"`
from serving.
I've tried solutions listed on tensorflow/tensorflow#582: remove tensorflow, protobuf and reinstall with source, reinstall protobuf with 'pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/protobuf-3.0.0b2.post2-cp27-none-linux_x86_64.whl'. There is no problem on importing tensorflow. However, when following serving tutorial on https://tensorflow.github.io/serving/serving_basic with my own network, same error occured: [libprotobuf ERROR external/protobuf/src/google/protobuf/io/coded_stream.cc:207] A protocol message was rejected because it was too big (more than 67108864 bytes). To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
from serving.
I agree with @dzhyeon, I reinstalled protobuf-3.0.0b2.post2 but that did not help.
The way I fixed the issue was go to the path external/protobuf/src/google/protobuf/io/coded_stream.h and changing the variable "kDefaultTotalBytesLimit" from 64 to 256. -- for 256MB.
The file coded_stream.h is located within the ~/.cache/bazel/_bazel_root//execroot/serving/ folder.
Now you can bazel build the inference server and do a bazel-bin to run it according to the tutorials
Ankur
from serving.
@ank286 I changed "kDefaultTotalBytesLimit" from 64 to 256,and reinstalled protobuf from source, but it didn't work.
Do you know the dir of the libprotobuf that the tensorflow used is where? And what did you mean that"the file coded_stream.h is located within the ~/.cache/bazel/_bazel_root//execroot/serving/ folder."?Please help, I am crazy now...:(
from serving.
@dzhyeon Haved you fixed the issue now?I faced the same problem as yours...I reinstalled the protobuf with changing 256 << 20 in coded_stream.h,but It didn't work
from serving.
Find all instances of coded_stream.h on the machine, if you installed Tensorflow it will be linked to one version of coded_stream.h, but you may have changed another. From my experience, Tensorflow will use bazel and bazel will place a version of coded_stream.h in a (temporary) cache folder, so that is the one that needs to be changed to 256
from serving.
See: https://www.tensorflow.org/versions/r0.10/get_started/os_setup.html#protobuf-library-related-issues
from serving.
@vrv I have tried to update protobuf as https://www.tensorflow.org/versions/r0.10/get_started/os_setup.html#protobuf-library-related-issues
said, but get the Segment fault like others. My step is pip install tensorflow, pip install --upgrade protobuf.
Is there some thing wrong?
from serving.
@ank286 Do you mean that I have to reinstalled tensorflow by bazel from source after I changed 256limited in coded_stream.h? I only reinstalled protobuf from source after changed 256limited in every coded_stream.h.
from serving.
I would find all instances of coded_stream.h on your machine and see if
value has been changed to 256. For me, TF was reading coded_stream.h from a
different location that was not in the protobuf source
On Sat, Aug 20, 2016 at 5:17 AM, haibarasiao [email protected]
wrote:
@ank286 https://github.com/ank286 Do you mean that I have to
reinstalled tensorflow by bazel from source after I changed 256limited in
coded_stream.h? I only reinstalled protobuf from source after changed
256limited in every coded_stream.h.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#24 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABgU_INIwfCsMXWwpRjq-MHbTAQwH88nks5qhsYmgaJpZM4H4gQU
.
from serving.
@ank286
I changed coded_stream.h of dir of
"/home/usr/.cache/bazel/_bazel_scw4150/9f70318cfa7ecd7a7b579a16191209d1/external/protobuf/src/google/protobuf/io/coded_stream.h
/home/usr/.cache/bazel/_bazel_scw4150/9f70318cfa7ecd7a7b579a16191209d1/external/grpc/third_party/protobuf/src/google/protobuf/io/coded_stream.h
/home/usr/.local/share/Trash/files/protobuf-3.0.0/src/google/protobuf/io/coded_stream.h
/home/scw4150/.local/share/Trash/files/Untitled Folder.2/protobuf-3.0.0/protobuf-3.0.0/src/google/protobuf/io/coded_stream.h
/usr/lib/python2.7/site-packages/tensorflow/include/google/protobuf/io/coded_stream.h
/usr/include/google/protobuf/io/coded_stream.h"
but nothing helped. I think I need to uninstall tensorflow and reinstalled it from source.But I don't know if this will be work.
sad.
from serving.
Did the reinstallation work?
On Aug 22, 2016 4:17 AM, "haibarasiao" [email protected] wrote:
@ank286 https://github.com/ank286
I changed coded_stream.h of dir of
"/home/usr/.cache/bazel/_bazel_scw4150/9f70318cfa7ecd7a7b579a16191209
d1/external/protobuf/src/google/protobuf/io/coded_stream.h
/home/usr/.cache/bazel/_bazel_scw4150/9f70318cfa7ecd7a7b579a16191209
d1/external/grpc/third_party/protobuf/src/google/protobuf/
io/coded_stream.h
/home/usr/.local/share/Trash/files/protobuf-3.0.0/src/
google/protobuf/io/coded_stream.h
/home/scw4150/.local/share/Trash/files/Untitled Folder.2/protobuf-3.0.0/
protobuf-3.0.0/src/google/protobuf/io/coded_stream.h
/usr/lib/python2.7/site-packages/tensorflow/include/
google/protobuf/io/coded_stream.h
/usr/include/google/protobuf/io/coded_stream.h"but nothing helped. I think I need to uninstall tensorflow and reinstalled
it from source.But I don't know if this will be work.
sad.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#24 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABgU_IUCKd19cKA9t8gmeMCDfNma3IkCks5qiVsogaJpZM4H4gQU
.
from serving.
As @ank286 suggested, changing coded_stream from .cache solved the problem.
Just for sure, I've also run bazel clean and rebuild it.
from serving.
The latest protobuf version has already raised the hard limit to 2GB. But there's another place to modify if the model is over 1GB. It is in (tfserving root)/tensorflow/tensorflow/core/platform/env.cc line 422:
coded_stream.SetTotalBytesLimit(1024LL << 20, 512LL << 20);
I changed 2014LL to 1500LL and my model was successfully loaded. (But 2048LL caused a 0 limit error. You may try 2047LL or INT_MAX if your model is as big as 2GB)
from serving.
Related Issues (20)
- Building tensorflow serving with TCMalloc HOT 1
- Unable to compile prediction_service.proto for Golang HOT 4
- TF Serving batching for Sparse Tensors HOT 6
- TF Serving gets stuck in the polling loop due to a non-existing model provided in config file HOT 3
- Evaluate using Profile-Guided Optimization (PGO) and LLVM BOLT HOT 3
- TensorFlow serving seems to have no version attribute HOT 3
- GPU inference in Docker container fails due to missing libdevice directory HOT 4
- CPU Memory occupied by TF Serving even though serving is on GPU HOT 6
- Version 2.15 release? HOT 7
- Mismatch between TensorRT version used in TF 2.14 GPU docker images for tensorflow/serving and tensorflow/tensorflow causes segfault during inference HOT 1
- Critical Vulnerability HOT 3
- Who to contact for security issues HOT 3
- Difference between Metrics emitted by TF Serving HOT 4
- OP_REQUIRES failed at xla_ops : UNIMPLEMENTED: Could not find compiler for platform CUDA: NOT_FOUND HOT 7
- java.lang.RuntimeException: Unexpected code Response{protocol=http/1.1, code=400, message=Bad Request, url=http://localhost:8501/v1/models/myfruit:predict} HOT 6
- CUDA Graphs support for Tensorflow Serving HOT 2
- OP_REQUIRES failed at xla_compile_on_demand_op.cc:290 : UNIMPLEMENTED: Could not find compiler for platform CUDA: NOT_FOUND HOT 4
- Add health check to Dockerfile HOT 1
- ETA for TensorFlow Runtime Integration?
- Why TF Serving using one CUDA Compute Stream HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from serving.