Comments (9)
The issue was tested on latest version v1.2-rc1.
Can you try the current master?
from graphene.
The issue was tested on latest version v1.2-rc1.
Can you try the current master?
Yes, I've just tried, the issue still exists.
from graphene.
I've found another issue. When data transmitted is large enough, about 300M, sometimes gRPC client will output errors like these:
[libprotobuf ERROR google/protobuf/wire_format_lite.cc:577] String field 'helloworld.HelloReply.message' contains invalid UTF-8 data when parsing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes.
ERROR:grpc._common:Exception deserializing message!
Traceback (most recent call last):
File "/venv/lib/python3.6/site-packages/grpc/_common.py", line 86, in _transform
return transformer(message)
google.protobuf.message.DecodeError: Error parsing message
Traceback (most recent call last):
File "scripts/grpc_client.py", line 27, in <module>
run()
File "scripts/grpc_client.py", line 19, in run
response = stub.SayHello(helloworld_pb2.HelloRequest(name = data))
File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INTERNAL
details = "Exception deserializing response!"
debug_error_string = "None"
or
Traceback (most recent call last):
File "scripts/grpc_client.py", line 27, in <module>
run()
File "scripts/grpc_client.py", line 19, in run
response = stub.SayHello(helloworld_pb2.HelloRequest(name = data))
File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Failed parsing HTTP/2"
debug_error_string = "{"created":"@1630602970.948493000","description":"Error received from peer ipv4:10.67.125.170:50051","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"Failed parsing HTTP/2","grpc_status":14}"
(the above errors happen when client sends 300M data to server.)
or
Traceback (most recent call last):
File "scripts/grpc_client.py", line 26, in <module>
run()
File "scripts/grpc_client.py", line 18, in run
response = stub.SayHello(helloworld_pb2.HelloRequest(name='hello'))
File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNKNOWN
details = "Stream removed"
debug_error_string = "{"created":"@1630517301.980850000","description":"Error received from peer ipv4:10.67.125.170:50051","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"Stream removed","grpc_status":2}"
(the above error happens when server replies 300M data to client.)
To reproduce, we need to modify maximum message transmission size of grpc first. grpc_server.py and grpc_client.py codes in the exmaple zip file are as bellow:
grpc_server.py:
from concurrent import futures
import logging
import grpc
from protos import helloworld_pb2
from protos import helloworld_pb2_grpc
MAX_MESSAGE_LENGTH = 1024*1024*1024
class Greeter(helloworld_pb2_grpc.GreeterServicer):
def SayHello(self, request, context):
print(len(request.name))
data = (300 * 1024 * 1024) * 'a'
return helloworld_pb2.HelloReply(message = data)
def serve():
options = [('grpc.max_send_message_length', MAX_MESSAGE_LENGTH),('grpc.max_receive_message_length', MAX_MESSAGE_LENGTH)]
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10), options = options)
helloworld_pb2_grpc.add_GreeterServicer_to_server(Greeter(), server)
server.add_insecure_port('[::]:50051')
server.start()
server.wait_for_termination()
if __name__ == '__main__':
logging.basicConfig()
serve()
grpc_client.py:
from __future__ import print_function
import logging
import grpc
from protos import helloworld_pb2
from protos import helloworld_pb2_grpc
MAX_MESSAGE_LENGTH = 1024*1024*1024
def run():
# NOTE(gRPC Python Team): .close() is possible on a channel and should be
# used in circumstances in which the with statement does not fit the needs
# of the code.
options = [('grpc.max_send_message_length', MAX_MESSAGE_LENGTH),('grpc.max_receive_message_length', MAX_MESSAGE_LENGTH)]
with grpc.insecure_channel('10.67.125.170:50051', options = options) as channel:
stub = helloworld_pb2_grpc.GreeterStub(channel)
data = (300 * 1024 * 1024) * 'a'
response = stub.SayHello(helloworld_pb2.HelloRequest(name = data))
print(len(response.message))
with open('/tcf/message.txt', 'wb') as f:
f.write(response.message.encode())
if __name__ == '__main__':
logging.basicConfig()
run()
from graphene.
Interesting, seems like large sends/recvs/reads/writes get mangled. I'd expect that gRPC has proper retry loop for partial operations, but on the other hand I think we also have this part done properly :)
Could you try to reproduce this issue in C, without gRPC? This would require checking which syscalls gRPC uses for the buggy transmission (you can do this in Graphene with "trace" log level) and then trying to do the same in C and verifying if it works. But remember to implement a proper retry loop, as partial reads/writes are perfectly valid on Linux.
from graphene.
Interesting, seems like large sends/recvs/reads/writes get mangled. I'd expect that gRPC has proper retry loop for partial operations, but on the other hand I think we also have this part done properly :)
Could you try to reproduce this issue in C, without gRPC? This would require checking which syscalls gRPC uses for the buggy transmission (you can do this in Graphene with "trace" log level) and then trying to do the same in C and verifying if it works. But remember to implement a proper retry loop, as partial reads/writes are perfectly valid on Linux.
I ran graphene with trace level, but couldn't find out which syscall caused the error...
server log:
...
[P14582:T1:python3] warning: Ignoring FUTEX_CLOCK_REALTIME flag
[P14582:T1:python3] trace: ---- return from shim_futex(...) = -110
[P14582:T1:python3] trace: ---- shim_clock_gettime(1, 0x2c8aea004790) = 0x0
[P14582:T1:python3] trace: ---- shim_gettimeofday(0x2c8aea004770, 0x0) = 0x0
[P14582:T1:python3] trace: ---- shim_futex(0x2c8ae61000c0, FUTEX_PRIVATE|FUTEX_CLOCK_REALTIME|FUTEX_WAIT_BITSET, 0, 0x2c8aea004760, 0x0, -1) ...
[P14582:T1:python3] warning: Ignoring FUTEX_CLOCK_REALTIME flag
[P14582:T5:python3] trace: ---- return from shim_epoll_wait(...) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(1, 0x2c8ae4563270) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(0, 0x2c8ae4563450) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(0, 0x2c8ae4563370) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(0, 0x2c8ae4563240) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(1, 0x2c8ae4563240) = 0x0
[P14582:T5:python3] trace: ---- shim_getcpu(0x2c8ae45631cc, 0x0, 0x0) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(1, 0x2c8ae45631c0) = 0x0
[P14582:T5:python3] trace: ---- shim_epoll_wait(3, 0x2c8ae70f4d04, 100, 201) ...
[P14582:T1:python3] trace: ---- return from shim_futex(...) = -110
[P14582:T1:python3] trace: ---- shim_clock_gettime(1, 0x2c8aea004790) = 0x0
[P14582:T1:python3] trace: ---- shim_gettimeofday(0x2c8aea004770, 0x0) = 0x0
[P14582:T1:python3] trace: ---- shim_futex(0x2c8ae61000c0, FUTEX_PRIVATE|FUTEX_CLOCK_REALTIME|FUTEX_WAIT_BITSET, 0, 0x2c8aea004760, 0x0, -1) ...
[P14582:T1:python3] warning: Ignoring FUTEX_CLOCK_REALTIME flag
[P14582:T1:python3] trace: ---- return from shim_futex(...) = -110
[P14582:T1:python3] trace: ---- shim_clock_gettime(1, 0x2c8aea004790) = 0x0
[P14582:T1:python3] trace: ---- shim_gettimeofday(0x2c8aea004770, 0x0) = 0x0
[P14582:T1:python3] trace: ---- shim_futex(0x2c8ae61000c0, FUTEX_PRIVATE|FUTEX_CLOCK_REALTIME|FUTEX_WAIT_BITSET, 0, 0x2c8aea004760, 0x0, -1) ...
[P14582:T1:python3] warning: Ignoring FUTEX_CLOCK_REALTIME flag
[P14582:T5:python3] trace: ---- return from shim_epoll_wait(...) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(1, 0x2c8ae4563270) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(0, 0x2c8ae4563370) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(0, 0x2c8ae4563240) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(1, 0x2c8ae4563240) = 0x0
[P14582:T5:python3] trace: ---- shim_getcpu(0x2c8ae45631cc, 0x0, 0x0) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(1, 0x2c8ae45631c0) = 0x0
[P14582:T5:python3] trace: ---- shim_epoll_wait(3, 0x2c8ae70f4d04, 100, 201) ...
[P14582:T1:python3] trace: ---- return from shim_futex(...) = -110
[P14582:T1:python3] trace: ---- shim_clock_gettime(1, 0x2c8aea004790) = 0x0
[P14582:T1:python3] trace: ---- shim_gettimeofday(0x2c8aea004770, 0x0) = 0x0
[P14582:T1:python3] trace: ---- shim_futex(0x2c8ae61000c0, FUTEX_PRIVATE|FUTEX_CLOCK_REALTIME|FUTEX_WAIT_BITSET, 0, 0x2c8aea004760, 0x0, -1) ...
...
client log when everything is normal:
...
[P88349:T3:python3] trace: ---- shim_madvise(0x5eedbc29f000, 0x1fb000, MADV_DONTNEED) = -38
[P88349:T3:python3] debug: ---- shim_exit (returning 0)
[P88349:T3:python3] debug: Installed async event at 1630577724179050
[P88349:shim] debug: Thread exited, cleaning up
[P88349:T1:python3] trace: ---- return from shim_futex(...) = 0x0
[P88349:T1:python3] trace: ---- shim_close(4) = 0x0
[P88349:T1:python3] trace: ---- shim_close(5) = 0x0
[P88349:T1:python3] trace: ---- shim_close(3) = 0x0
[P88349:T1:python3] trace: ---- shim_munmap(0x5eedbc6b1000, 0xb0000) ...
[P88349:T1:python3] trace: ---- return from shim_munmap(...) = 0x0
[P88349:T1:python3] trace: ---- shim_mmap(0x0, 0x12c01000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0x0) ...
[P88349:T1:python3] trace: ---- return from shim_mmap(...) = 0x5eed73399000
[P88349:T1:python3] trace: ---- shim_munmap(0x5eed73399000, 0x12c01000) ...
[P88349:T1:python3] trace: ---- return from shim_munmap(...) = 0x0
[P88349:T1:python3] trace: ---- shim_munmap(0x5eed85f9a000, 0x12c01000) ...
[P88349:T1:python3] trace: ---- return from shim_munmap(...) = 0x0
[P88349:T1:python3] trace: ---- shim_write(1, 0x5eedc04a9c90, 0xa) ...
314572800
[P88349:T1:python3] trace: ---- return from shim_write(...) = 0xa
[P88349:T1:python3] trace: ---- shim_rt_sigaction([SIGINT], 0x5eedc1b89970, 0x5eedc1b89a10, 0x8) = 0x0
[P88349:T1:python3] trace: ---- shim_sigaltstack(0x0, 0x5eedc1b89c00) = 0x0
[P88349:T1:python3] trace: ---- shim_sigaltstack(0xa76c90, 0x0) = 0x0
[P88349:T1:python3] trace: ---- shim_futex(0x5eedbe348948, FUTEX_PRIVATE|FUTEX_WAKE, 2147483647, 0x0, 0x5eedbd9f6260, 1) ...
[P88349:T1:python3] trace: ---- return from shim_futex(...) = 0x0
[P88349:T1:python3] trace: ---- shim_fstat(0, 0x5eedc1b89c20) = 0x0
[P88349:T1:python3] trace: ---- shim_fstat(0, 0x5eedc1b89c20) = 0x0
[P88349:T1:python3] trace: ---- shim_fstat(0, 0x5eedc1b89c30) = 0x0
[P88349:T1:python3] debug: ---- shim_exit_group (returning 0)
[P88349:T1:python3] debug: clearing POSIX locks for pid 1
[P88349:T1:python3] debug: sync client shutdown: closing handles
[P88349:T1:python3] debug: sync client shutdown: waiting for confirmation
[P88349:T1:python3] debug: sync client shutdown: finished
[P88349:shim] debug: Async worker thread terminated
[P88349:shim] debug: IPC worker: exiting worker thread
[P88349:T1:python3] debug: process 88349 exited with status 0
client log when error happens:
...
[P91774:T1:python3] trace: ---- shim_openat(AT_FDCWD, "/venv/lib/python3.6/site-packages/grpc/_channel.py", O_RDONLY|0x80000, 0000) = 0x3
[P91774:T1:python3] trace: ---- shim_fstat(3, 0x330b48526470) = 0x0
[P91774:T1:python3] trace: ---- shim_ioctl(3, TCGETS, 0x330b48526440) ...
[P91774:T1:python3] trace: ---- return from shim_ioctl(...) = -38
[P91774:T1:python3] trace: ---- shim_lseek(3, 0x0, SEEK_CUR) = 0x0
[P91774:T1:python3] trace: ---- shim_fcntl(3, OP 1030, 0x0) = 0x4
[P91774:T1:python3] trace: ---- shim_fcntl(4, F_GETFL, 0x7146cc) = 0x80000
[P91774:T1:python3] trace: ---- shim_newfstatat(4, "", 0x330b485262b0, 4096) = 0x0
[P91774:T1:python3] trace: ---- shim_read(4, 0x330b46b66090, 0x2000) ...
[P91774:T1:python3] trace: ---- return from shim_read(...) = 0x2000
[P91774:T1:python3] trace: ---- shim_close(4) = 0x0
[P91774:T1:python3] trace: ---- shim_lseek(3, 0x0, SEEK_SET) = 0x0
[P91774:T1:python3] trace: ---- shim_lseek(3, 0x0, SEEK_CUR) = 0x0
[P91774:T1:python3] trace: ---- shim_read(3, 0x330b432a8d00, 0x2000) ...
[P91774:T1:python3] trace: ---- return from shim_read(...) = 0x2000
[P91774:T1:python3] trace: ---- shim_read(3, 0x330b46b660b0, 0x2000) ...
[P91774:T1:python3] trace: ---- return from shim_read(...) = 0x2000
[P91774:T1:python3] trace: ---- shim_read(3, 0x330b432a8d00, 0x2000) ...
[P91774:T1:python3] trace: ---- return from shim_read(...) = 0x2000
[P91774:T1:python3] trace: ---- shim_read(3, 0x330b46b660b0, 0x2000) ...
[P91774:T1:python3] trace: ---- return from shim_read(...) = 0x2000
[P91774:T1:python3] trace: sync client: destroying handle: 0x10000015c
[P91774:T1:python3] trace: ---- shim_close(3) = 0x0
[P91774:T1:python3] trace: ---- shim_write(2, 0x330b46e6b320, 0x36c) ...
Traceback (most recent call last):
File "scripts/grpc_client.py", line 26, in <module>
run()
File "scripts/grpc_client.py", line 18, in run
response = stub.SayHello(helloworld_pb2.HelloRequest(name='hello'))
File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNKNOWN
details = "Stream removed"
debug_error_string = "{"created":"@1630578115.422138000","description":"Error received from peer ipv4:10.67.125.170:50051","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"Stream removed","grpc_status":2}"
>
[P91774:T1:python3] trace: ---- return from shim_write(...) = 0x36c
[P91774:T1:python3] trace: ---- shim_rt_sigaction([SIGINT], 0x330b48527970, 0x330b48527a10, 0x8) = 0x0
[P91774:T1:python3] trace: ---- shim_sigaltstack(0x0, 0x330b48527c00) = 0x0
[P91774:T1:python3] trace: ---- shim_sigaltstack(0xa76c90, 0x0) = 0x0
[P91774:T1:python3] trace: ---- shim_futex(0x330b44ca6948, FUTEX_PRIVATE|FUTEX_WAKE, 2147483647, 0x0, 0x330b43204ec0, 1) ...
[P91774:T1:python3] trace: ---- return from shim_futex(...) = 0x0
[P91774:T1:python3] trace: ---- shim_fstat(0, 0x330b48527c20) = 0x0
[P91774:T1:python3] trace: ---- shim_fstat(0, 0x330b48527c20) = 0x0
[P91774:T1:python3] trace: ---- shim_fstat(0, 0x330b48527c30) = 0x0
[P91774:T1:python3] debug: ---- shim_exit_group (returning 1)
[P91774:T1:python3] debug: clearing POSIX locks for pid 1
[P91774:T1:python3] debug: sync client shutdown: closing handles
[P91774:T1:python3] debug: sync client shutdown: waiting for confirmation
[P91774:T1:python3] debug: sync client shutdown: finished
[P91774:shim] debug: Async worker thread terminated
[P91774:shim] debug: IPC worker: exiting worker thread
[P91774:T1:python3] debug: process 91774 exited with status 1
Here is the detailed full log: grpc-log.zip
I've tried tcp socket with sendmsg
and recvmsg
in C outside graphene, but when data transmitted is more than about 5M, it will cause core dumped. It seems hard to try 300M.
from graphene.
Root caused the issue and fixed. I will submit the PR on Monday. Please assign the bug to me and add appropriate bug tag and priority.
from graphene.
@jkr0103 Thanks! Could you describe shortly what was the issue (or give offending file+line, if it was simple issue)?
from graphene.
if one of the `DkStreamWrite' in sendmsg 'https://github.com/oscarlab/graphene/blob/1a2275e00d00a29de12e2d7de840141ce264b2cb/LibOS/shim/src/sys/shim_socket.c#L1157' invocations only did a partial write, the code simply continues writing from the next buffer (buf[i+1]) which is wrong. We should bail out as we do in recvmsg https://github.com/oscarlab/graphene/blob/1a2275e00d00a29de12e2d7de840141ce264b2cb/LibOS/shim/src/sys/shim_socket.c#L1484-L1487
from graphene.
@jkr0103 Yeah, this must be it, thanks for debugging it.
P.S. Don't forget to add "Fixes #number" to the PR fixing it.
from graphene.
Related Issues (20)
- Huge performance drop when running pytorch training with graphene-sgx HOT 26
- Graphene-SGX: Syscalls wakes up early/prematurely on Server Machines HOT 12
- untrusted PAL sent PAL event HOT 10
- BUG() triggered during vfork and clone HOT 15
- With Go program, inside a docker container, bind fails with permission denied error, invalid handle error. HOT 7
- Workloads (Redis, Curl, R) failing with Out of memory PAL error after new manifest syntax to define lists of SGX trusted files. HOT 7
- Unable to Sign the graphenized Docker image using gsc sign-image: HOT 5
- RFC: Trusted files metadata sideloading
- [Error:38]Function not implemented. multiprocessing in graphene HOT 7
- How to transmit variables between SGX and untrusted environments HOT 4
- Function not implemented (src/ip.cpp:563) in testing GSC container HOT 2
- [Examples] Python Example Stuck Without Any Error Message HOT 1
- ModuleNotFoundError: No module named 'graphenelibos' HOT 5
- web server use golang, QPS(queries per second) is very low HOT 4
- File Listener Based on INOTIFY Throws Error HOT 1
- Issue with libprotobuf version. HOT 1
- Issue in Cloud Deployment to AKS HOT 3
- Decimal type prone to float rounding error. HOT 1
- pytorch sample config for better performance HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from graphene.