Deion of the problem When running python grpc in graphene,

The issue was tested on latest version <a href="https://github.com/oscarl

The issue was tested on latest version <a href="https://gith

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

if one of the `DkStreamWrite' in sendmsg '<a href="https://github.com/oscarlab/graphen

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Data transmission error with Python gRPC running in graphene about graphene HOT 9 CLOSED

Kanakoo commented on July 22, 2024

Data transmission error with Python gRPC running in graphene

from graphene.

Comments (9)

mkow commented on July 22, 2024

The issue was tested on latest version v1.2-rc1.

Can you try the current master?

from graphene.

Kanakoo commented on July 22, 2024

The issue was tested on latest version v1.2-rc1.

Can you try the current master?

Yes, I've just tried, the issue still exists.

from graphene.

Kanakoo commented on July 22, 2024

I've found another issue. When data transmitted is large enough, about 300M, sometimes gRPC client will output errors like these:

[libprotobuf ERROR google/protobuf/wire_format_lite.cc:577] String field 'helloworld.HelloReply.message' contains invalid UTF-8 data when parsing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes.
ERROR:grpc._common:Exception deserializing message!
Traceback (most recent call last):
  File "/venv/lib/python3.6/site-packages/grpc/_common.py", line 86, in _transform
    return transformer(message)
google.protobuf.message.DecodeError: Error parsing message
Traceback (most recent call last):
  File "scripts/grpc_client.py", line 27, in <module>
    run()
  File "scripts/grpc_client.py", line 19, in run
    response = stub.SayHello(helloworld_pb2.HelloRequest(name = data))
  File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.INTERNAL
        details = "Exception deserializing response!"
        debug_error_string = "None"

Traceback (most recent call last):
  File "scripts/grpc_client.py", line 27, in <module>
    run()
  File "scripts/grpc_client.py", line 19, in run
    response = stub.SayHello(helloworld_pb2.HelloRequest(name = data))
  File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "Failed parsing HTTP/2"
        debug_error_string = "{"created":"@1630602970.948493000","description":"Error received from peer ipv4:10.67.125.170:50051","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"Failed parsing HTTP/2","grpc_status":14}"

(the above errors happen when client sends 300M data to server.)
or

Traceback (most recent call last):
  File "scripts/grpc_client.py", line 26, in <module>
    run()
  File "scripts/grpc_client.py", line 18, in run
    response = stub.SayHello(helloworld_pb2.HelloRequest(name='hello'))
  File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNKNOWN
        details = "Stream removed"
        debug_error_string = "{"created":"@1630517301.980850000","description":"Error received from peer ipv4:10.67.125.170:50051","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"Stream removed","grpc_status":2}"

(the above error happens when server replies 300M data to client.)

To reproduce, we need to modify maximum message transmission size of grpc first. grpc_server.py and grpc_client.py codes in the exmaple zip file are as bellow:
grpc_server.py:

from concurrent import futures
import logging

import grpc
from protos import helloworld_pb2
from protos import helloworld_pb2_grpc
MAX_MESSAGE_LENGTH = 1024*1024*1024

class Greeter(helloworld_pb2_grpc.GreeterServicer):

    def SayHello(self, request, context):
        print(len(request.name))
        data = (300 * 1024 * 1024) * 'a'
        return helloworld_pb2.HelloReply(message = data)


def serve():
    options = [('grpc.max_send_message_length', MAX_MESSAGE_LENGTH),('grpc.max_receive_message_length', MAX_MESSAGE_LENGTH)]
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10), options = options)
    helloworld_pb2_grpc.add_GreeterServicer_to_server(Greeter(), server)
    server.add_insecure_port('[::]:50051')
    server.start()
    server.wait_for_termination()


if __name__ == '__main__':
    logging.basicConfig()
    serve()

grpc_client.py:

from __future__ import print_function

import logging

import grpc
from protos import helloworld_pb2
from protos import helloworld_pb2_grpc
MAX_MESSAGE_LENGTH = 1024*1024*1024
def run():
    # NOTE(gRPC Python Team): .close() is possible on a channel and should be
    # used in circumstances in which the with statement does not fit the needs
    # of the code.
    options = [('grpc.max_send_message_length', MAX_MESSAGE_LENGTH),('grpc.max_receive_message_length', MAX_MESSAGE_LENGTH)]
    with grpc.insecure_channel('10.67.125.170:50051', options = options) as channel:
        stub = helloworld_pb2_grpc.GreeterStub(channel)
        data = (300 * 1024 * 1024) * 'a'
        response = stub.SayHello(helloworld_pb2.HelloRequest(name = data))
        print(len(response.message))
    with open('/tcf/message.txt', 'wb') as f:
        f.write(response.message.encode())


if __name__ == '__main__':
    logging.basicConfig()
    run()

from graphene.

mkow commented on July 22, 2024

Interesting, seems like large sends/recvs/reads/writes get mangled. I'd expect that gRPC has proper retry loop for partial operations, but on the other hand I think we also have this part done properly :)

Could you try to reproduce this issue in C, without gRPC? This would require checking which syscalls gRPC uses for the buggy transmission (you can do this in Graphene with "trace" log level) and then trying to do the same in C and verifying if it works. But remember to implement a proper retry loop, as partial reads/writes are perfectly valid on Linux.

from graphene.

Kanakoo commented on July 22, 2024

Interesting, seems like large sends/recvs/reads/writes get mangled. I'd expect that gRPC has proper retry loop for partial operations, but on the other hand I think we also have this part done properly :)

Could you try to reproduce this issue in C, without gRPC? This would require checking which syscalls gRPC uses for the buggy transmission (you can do this in Graphene with "trace" log level) and then trying to do the same in C and verifying if it works. But remember to implement a proper retry loop, as partial reads/writes are perfectly valid on Linux.

I ran graphene with trace level, but couldn't find out which syscall caused the error...
server log:

...
[P14582:T1:python3] warning: Ignoring FUTEX_CLOCK_REALTIME flag
[P14582:T1:python3] trace: ---- return from shim_futex(...) = -110
[P14582:T1:python3] trace: ---- shim_clock_gettime(1, 0x2c8aea004790) = 0x0
[P14582:T1:python3] trace: ---- shim_gettimeofday(0x2c8aea004770, 0x0) = 0x0
[P14582:T1:python3] trace: ---- shim_futex(0x2c8ae61000c0, FUTEX_PRIVATE|FUTEX_CLOCK_REALTIME|FUTEX_WAIT_BITSET, 0, 0x2c8aea004760, 0x0, -1) ...
[P14582:T1:python3] warning: Ignoring FUTEX_CLOCK_REALTIME flag
[P14582:T5:python3] trace: ---- return from shim_epoll_wait(...) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(1, 0x2c8ae4563270) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(0, 0x2c8ae4563450) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(0, 0x2c8ae4563370) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(0, 0x2c8ae4563240) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(1, 0x2c8ae4563240) = 0x0
[P14582:T5:python3] trace: ---- shim_getcpu(0x2c8ae45631cc, 0x0, 0x0) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(1, 0x2c8ae45631c0) = 0x0
[P14582:T5:python3] trace: ---- shim_epoll_wait(3, 0x2c8ae70f4d04, 100, 201) ...
[P14582:T1:python3] trace: ---- return from shim_futex(...) = -110
[P14582:T1:python3] trace: ---- shim_clock_gettime(1, 0x2c8aea004790) = 0x0
[P14582:T1:python3] trace: ---- shim_gettimeofday(0x2c8aea004770, 0x0) = 0x0
[P14582:T1:python3] trace: ---- shim_futex(0x2c8ae61000c0, FUTEX_PRIVATE|FUTEX_CLOCK_REALTIME|FUTEX_WAIT_BITSET, 0, 0x2c8aea004760, 0x0, -1) ...
[P14582:T1:python3] warning: Ignoring FUTEX_CLOCK_REALTIME flag
[P14582:T1:python3] trace: ---- return from shim_futex(...) = -110
[P14582:T1:python3] trace: ---- shim_clock_gettime(1, 0x2c8aea004790) = 0x0
[P14582:T1:python3] trace: ---- shim_gettimeofday(0x2c8aea004770, 0x0) = 0x0
[P14582:T1:python3] trace: ---- shim_futex(0x2c8ae61000c0, FUTEX_PRIVATE|FUTEX_CLOCK_REALTIME|FUTEX_WAIT_BITSET, 0, 0x2c8aea004760, 0x0, -1) ...
[P14582:T1:python3] warning: Ignoring FUTEX_CLOCK_REALTIME flag
[P14582:T5:python3] trace: ---- return from shim_epoll_wait(...) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(1, 0x2c8ae4563270) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(0, 0x2c8ae4563370) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(0, 0x2c8ae4563240) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(1, 0x2c8ae4563240) = 0x0
[P14582:T5:python3] trace: ---- shim_getcpu(0x2c8ae45631cc, 0x0, 0x0) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(1, 0x2c8ae45631c0) = 0x0
[P14582:T5:python3] trace: ---- shim_epoll_wait(3, 0x2c8ae70f4d04, 100, 201) ...
[P14582:T1:python3] trace: ---- return from shim_futex(...) = -110
[P14582:T1:python3] trace: ---- shim_clock_gettime(1, 0x2c8aea004790) = 0x0
[P14582:T1:python3] trace: ---- shim_gettimeofday(0x2c8aea004770, 0x0) = 0x0
[P14582:T1:python3] trace: ---- shim_futex(0x2c8ae61000c0, FUTEX_PRIVATE|FUTEX_CLOCK_REALTIME|FUTEX_WAIT_BITSET, 0, 0x2c8aea004760, 0x0, -1) ...
...

client log when everything is normal:

...
[P88349:T3:python3] trace: ---- shim_madvise(0x5eedbc29f000, 0x1fb000, MADV_DONTNEED) = -38
[P88349:T3:python3] debug: ---- shim_exit (returning 0)
[P88349:T3:python3] debug: Installed async event at 1630577724179050
[P88349:shim] debug: Thread exited, cleaning up
[P88349:T1:python3] trace: ---- return from shim_futex(...) = 0x0
[P88349:T1:python3] trace: ---- shim_close(4) = 0x0
[P88349:T1:python3] trace: ---- shim_close(5) = 0x0
[P88349:T1:python3] trace: ---- shim_close(3) = 0x0
[P88349:T1:python3] trace: ---- shim_munmap(0x5eedbc6b1000, 0xb0000) ...
[P88349:T1:python3] trace: ---- return from shim_munmap(...) = 0x0
[P88349:T1:python3] trace: ---- shim_mmap(0x0, 0x12c01000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0x0) ...
[P88349:T1:python3] trace: ---- return from shim_mmap(...) = 0x5eed73399000
[P88349:T1:python3] trace: ---- shim_munmap(0x5eed73399000, 0x12c01000) ...
[P88349:T1:python3] trace: ---- return from shim_munmap(...) = 0x0
[P88349:T1:python3] trace: ---- shim_munmap(0x5eed85f9a000, 0x12c01000) ...
[P88349:T1:python3] trace: ---- return from shim_munmap(...) = 0x0
[P88349:T1:python3] trace: ---- shim_write(1, 0x5eedc04a9c90, 0xa) ...
314572800
[P88349:T1:python3] trace: ---- return from shim_write(...) = 0xa
[P88349:T1:python3] trace: ---- shim_rt_sigaction([SIGINT], 0x5eedc1b89970, 0x5eedc1b89a10, 0x8) = 0x0
[P88349:T1:python3] trace: ---- shim_sigaltstack(0x0, 0x5eedc1b89c00) = 0x0
[P88349:T1:python3] trace: ---- shim_sigaltstack(0xa76c90, 0x0) = 0x0
[P88349:T1:python3] trace: ---- shim_futex(0x5eedbe348948, FUTEX_PRIVATE|FUTEX_WAKE, 2147483647, 0x0, 0x5eedbd9f6260, 1) ...
[P88349:T1:python3] trace: ---- return from shim_futex(...) = 0x0
[P88349:T1:python3] trace: ---- shim_fstat(0, 0x5eedc1b89c20) = 0x0
[P88349:T1:python3] trace: ---- shim_fstat(0, 0x5eedc1b89c20) = 0x0
[P88349:T1:python3] trace: ---- shim_fstat(0, 0x5eedc1b89c30) = 0x0
[P88349:T1:python3] debug: ---- shim_exit_group (returning 0)
[P88349:T1:python3] debug: clearing POSIX locks for pid 1
[P88349:T1:python3] debug: sync client shutdown: closing handles
[P88349:T1:python3] debug: sync client shutdown: waiting for confirmation
[P88349:T1:python3] debug: sync client shutdown: finished
[P88349:shim] debug: Async worker thread terminated
[P88349:shim] debug: IPC worker: exiting worker thread
[P88349:T1:python3] debug: process 88349 exited with status 0

client log when error happens:

...
[P91774:T1:python3] trace: ---- shim_openat(AT_FDCWD, "/venv/lib/python3.6/site-packages/grpc/_channel.py", O_RDONLY|0x80000, 0000) = 0x3
[P91774:T1:python3] trace: ---- shim_fstat(3, 0x330b48526470) = 0x0
[P91774:T1:python3] trace: ---- shim_ioctl(3, TCGETS, 0x330b48526440) ...
[P91774:T1:python3] trace: ---- return from shim_ioctl(...) = -38
[P91774:T1:python3] trace: ---- shim_lseek(3, 0x0, SEEK_CUR) = 0x0
[P91774:T1:python3] trace: ---- shim_fcntl(3, OP 1030, 0x0) = 0x4
[P91774:T1:python3] trace: ---- shim_fcntl(4, F_GETFL, 0x7146cc) = 0x80000
[P91774:T1:python3] trace: ---- shim_newfstatat(4, "", 0x330b485262b0, 4096) = 0x0
[P91774:T1:python3] trace: ---- shim_read(4, 0x330b46b66090, 0x2000) ...
[P91774:T1:python3] trace: ---- return from shim_read(...) = 0x2000
[P91774:T1:python3] trace: ---- shim_close(4) = 0x0
[P91774:T1:python3] trace: ---- shim_lseek(3, 0x0, SEEK_SET) = 0x0
[P91774:T1:python3] trace: ---- shim_lseek(3, 0x0, SEEK_CUR) = 0x0
[P91774:T1:python3] trace: ---- shim_read(3, 0x330b432a8d00, 0x2000) ...
[P91774:T1:python3] trace: ---- return from shim_read(...) = 0x2000
[P91774:T1:python3] trace: ---- shim_read(3, 0x330b46b660b0, 0x2000) ...
[P91774:T1:python3] trace: ---- return from shim_read(...) = 0x2000
[P91774:T1:python3] trace: ---- shim_read(3, 0x330b432a8d00, 0x2000) ...
[P91774:T1:python3] trace: ---- return from shim_read(...) = 0x2000
[P91774:T1:python3] trace: ---- shim_read(3, 0x330b46b660b0, 0x2000) ...
[P91774:T1:python3] trace: ---- return from shim_read(...) = 0x2000
[P91774:T1:python3] trace: sync client: destroying handle: 0x10000015c
[P91774:T1:python3] trace: ---- shim_close(3) = 0x0
[P91774:T1:python3] trace: ---- shim_write(2, 0x330b46e6b320, 0x36c) ...
Traceback (most recent call last):
  File "scripts/grpc_client.py", line 26, in <module>
    run()
  File "scripts/grpc_client.py", line 18, in run
    response = stub.SayHello(helloworld_pb2.HelloRequest(name='hello'))
  File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNKNOWN
        details = "Stream removed"
        debug_error_string = "{"created":"@1630578115.422138000","description":"Error received from peer ipv4:10.67.125.170:50051","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"Stream removed","grpc_status":2}"
>
[P91774:T1:python3] trace: ---- return from shim_write(...) = 0x36c
[P91774:T1:python3] trace: ---- shim_rt_sigaction([SIGINT], 0x330b48527970, 0x330b48527a10, 0x8) = 0x0
[P91774:T1:python3] trace: ---- shim_sigaltstack(0x0, 0x330b48527c00) = 0x0
[P91774:T1:python3] trace: ---- shim_sigaltstack(0xa76c90, 0x0) = 0x0
[P91774:T1:python3] trace: ---- shim_futex(0x330b44ca6948, FUTEX_PRIVATE|FUTEX_WAKE, 2147483647, 0x0, 0x330b43204ec0, 1) ...
[P91774:T1:python3] trace: ---- return from shim_futex(...) = 0x0
[P91774:T1:python3] trace: ---- shim_fstat(0, 0x330b48527c20) = 0x0
[P91774:T1:python3] trace: ---- shim_fstat(0, 0x330b48527c20) = 0x0
[P91774:T1:python3] trace: ---- shim_fstat(0, 0x330b48527c30) = 0x0
[P91774:T1:python3] debug: ---- shim_exit_group (returning 1)
[P91774:T1:python3] debug: clearing POSIX locks for pid 1
[P91774:T1:python3] debug: sync client shutdown: closing handles
[P91774:T1:python3] debug: sync client shutdown: waiting for confirmation
[P91774:T1:python3] debug: sync client shutdown: finished
[P91774:shim] debug: Async worker thread terminated
[P91774:shim] debug: IPC worker: exiting worker thread
[P91774:T1:python3] debug: process 91774 exited with status 1

Here is the detailed full log: grpc-log.zip

I've tried tcp socket with sendmsg and recvmsg in C outside graphene, but when data transmitted is more than about 5M, it will cause core dumped. It seems hard to try 300M.

from graphene.

jkr0103 commented on July 22, 2024

Root caused the issue and fixed. I will submit the PR on Monday. Please assign the bug to me and add appropriate bug tag and priority.

from graphene.

boryspoplawski commented on July 22, 2024

@jkr0103 Thanks! Could you describe shortly what was the issue (or give offending file+line, if it was simple issue)?

from graphene.

jkr0103 commented on July 22, 2024

if one of the `DkStreamWrite' in sendmsg 'https://github.com/oscarlab/graphene/blob/1a2275e00d00a29de12e2d7de840141ce264b2cb/LibOS/shim/src/sys/shim_socket.c#L1157' invocations only did a partial write, the code simply continues writing from the next buffer (buf[i+1]) which is wrong. We should bail out as we do in recvmsg https://github.com/oscarlab/graphene/blob/1a2275e00d00a29de12e2d7de840141ce264b2cb/LibOS/shim/src/sys/shim_socket.c#L1484-L1487

from graphene.

boryspoplawski commented on July 22, 2024

@jkr0103 Yeah, this must be it, thanks for debugging it.

P.S. Don't forget to add "Fixes #number" to the PR fixing it.

from graphene.

Data transmission error with Python gRPC running in graphene about graphene HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent