Giter Club home page Giter Club logo

Comments (9)

mkow avatar mkow commented on July 22, 2024

The issue was tested on latest version v1.2-rc1.

Can you try the current master?

from graphene.

Kanakoo avatar Kanakoo commented on July 22, 2024

The issue was tested on latest version v1.2-rc1.

Can you try the current master?

Yes, I've just tried, the issue still exists.

from graphene.

Kanakoo avatar Kanakoo commented on July 22, 2024

I've found another issue. When data transmitted is large enough, about 300M, sometimes gRPC client will output errors like these:

[libprotobuf ERROR google/protobuf/wire_format_lite.cc:577] String field 'helloworld.HelloReply.message' contains invalid UTF-8 data when parsing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes.
ERROR:grpc._common:Exception deserializing message!
Traceback (most recent call last):
  File "/venv/lib/python3.6/site-packages/grpc/_common.py", line 86, in _transform
    return transformer(message)
google.protobuf.message.DecodeError: Error parsing message
Traceback (most recent call last):
  File "scripts/grpc_client.py", line 27, in <module>
    run()
  File "scripts/grpc_client.py", line 19, in run
    response = stub.SayHello(helloworld_pb2.HelloRequest(name = data))
  File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.INTERNAL
        details = "Exception deserializing response!"
        debug_error_string = "None"

or

Traceback (most recent call last):
  File "scripts/grpc_client.py", line 27, in <module>
    run()
  File "scripts/grpc_client.py", line 19, in run
    response = stub.SayHello(helloworld_pb2.HelloRequest(name = data))
  File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "Failed parsing HTTP/2"
        debug_error_string = "{"created":"@1630602970.948493000","description":"Error received from peer ipv4:10.67.125.170:50051","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"Failed parsing HTTP/2","grpc_status":14}"

(the above errors happen when client sends 300M data to server.)
or

Traceback (most recent call last):
  File "scripts/grpc_client.py", line 26, in <module>
    run()
  File "scripts/grpc_client.py", line 18, in run
    response = stub.SayHello(helloworld_pb2.HelloRequest(name='hello'))
  File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNKNOWN
        details = "Stream removed"
        debug_error_string = "{"created":"@1630517301.980850000","description":"Error received from peer ipv4:10.67.125.170:50051","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"Stream removed","grpc_status":2}"

(the above error happens when server replies 300M data to client.)

To reproduce, we need to modify maximum message transmission size of grpc first. grpc_server.py and grpc_client.py codes in the exmaple zip file are as bellow:
grpc_server.py:

from concurrent import futures
import logging

import grpc
from protos import helloworld_pb2
from protos import helloworld_pb2_grpc
MAX_MESSAGE_LENGTH = 1024*1024*1024

class Greeter(helloworld_pb2_grpc.GreeterServicer):

    def SayHello(self, request, context):
        print(len(request.name))
        data = (300 * 1024 * 1024) * 'a'
        return helloworld_pb2.HelloReply(message = data)


def serve():
    options = [('grpc.max_send_message_length', MAX_MESSAGE_LENGTH),('grpc.max_receive_message_length', MAX_MESSAGE_LENGTH)]
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10), options = options)
    helloworld_pb2_grpc.add_GreeterServicer_to_server(Greeter(), server)
    server.add_insecure_port('[::]:50051')
    server.start()
    server.wait_for_termination()


if __name__ == '__main__':
    logging.basicConfig()
    serve()

grpc_client.py:

from __future__ import print_function

import logging

import grpc
from protos import helloworld_pb2
from protos import helloworld_pb2_grpc
MAX_MESSAGE_LENGTH = 1024*1024*1024
def run():
    # NOTE(gRPC Python Team): .close() is possible on a channel and should be
    # used in circumstances in which the with statement does not fit the needs
    # of the code.
    options = [('grpc.max_send_message_length', MAX_MESSAGE_LENGTH),('grpc.max_receive_message_length', MAX_MESSAGE_LENGTH)]
    with grpc.insecure_channel('10.67.125.170:50051', options = options) as channel:
        stub = helloworld_pb2_grpc.GreeterStub(channel)
        data = (300 * 1024 * 1024) * 'a'
        response = stub.SayHello(helloworld_pb2.HelloRequest(name = data))
        print(len(response.message))
    with open('/tcf/message.txt', 'wb') as f:
        f.write(response.message.encode())


if __name__ == '__main__':
    logging.basicConfig()
    run()

from graphene.

mkow avatar mkow commented on July 22, 2024

Interesting, seems like large sends/recvs/reads/writes get mangled. I'd expect that gRPC has proper retry loop for partial operations, but on the other hand I think we also have this part done properly :)

Could you try to reproduce this issue in C, without gRPC? This would require checking which syscalls gRPC uses for the buggy transmission (you can do this in Graphene with "trace" log level) and then trying to do the same in C and verifying if it works. But remember to implement a proper retry loop, as partial reads/writes are perfectly valid on Linux.

from graphene.

Kanakoo avatar Kanakoo commented on July 22, 2024

Interesting, seems like large sends/recvs/reads/writes get mangled. I'd expect that gRPC has proper retry loop for partial operations, but on the other hand I think we also have this part done properly :)

Could you try to reproduce this issue in C, without gRPC? This would require checking which syscalls gRPC uses for the buggy transmission (you can do this in Graphene with "trace" log level) and then trying to do the same in C and verifying if it works. But remember to implement a proper retry loop, as partial reads/writes are perfectly valid on Linux.

I ran graphene with trace level, but couldn't find out which syscall caused the error...
server log:

...
[P14582:T1:python3] warning: Ignoring FUTEX_CLOCK_REALTIME flag
[P14582:T1:python3] trace: ---- return from shim_futex(...) = -110
[P14582:T1:python3] trace: ---- shim_clock_gettime(1, 0x2c8aea004790) = 0x0
[P14582:T1:python3] trace: ---- shim_gettimeofday(0x2c8aea004770, 0x0) = 0x0
[P14582:T1:python3] trace: ---- shim_futex(0x2c8ae61000c0, FUTEX_PRIVATE|FUTEX_CLOCK_REALTIME|FUTEX_WAIT_BITSET, 0, 0x2c8aea004760, 0x0, -1) ...
[P14582:T1:python3] warning: Ignoring FUTEX_CLOCK_REALTIME flag
[P14582:T5:python3] trace: ---- return from shim_epoll_wait(...) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(1, 0x2c8ae4563270) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(0, 0x2c8ae4563450) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(0, 0x2c8ae4563370) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(0, 0x2c8ae4563240) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(1, 0x2c8ae4563240) = 0x0
[P14582:T5:python3] trace: ---- shim_getcpu(0x2c8ae45631cc, 0x0, 0x0) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(1, 0x2c8ae45631c0) = 0x0
[P14582:T5:python3] trace: ---- shim_epoll_wait(3, 0x2c8ae70f4d04, 100, 201) ...
[P14582:T1:python3] trace: ---- return from shim_futex(...) = -110
[P14582:T1:python3] trace: ---- shim_clock_gettime(1, 0x2c8aea004790) = 0x0
[P14582:T1:python3] trace: ---- shim_gettimeofday(0x2c8aea004770, 0x0) = 0x0
[P14582:T1:python3] trace: ---- shim_futex(0x2c8ae61000c0, FUTEX_PRIVATE|FUTEX_CLOCK_REALTIME|FUTEX_WAIT_BITSET, 0, 0x2c8aea004760, 0x0, -1) ...
[P14582:T1:python3] warning: Ignoring FUTEX_CLOCK_REALTIME flag
[P14582:T1:python3] trace: ---- return from shim_futex(...) = -110
[P14582:T1:python3] trace: ---- shim_clock_gettime(1, 0x2c8aea004790) = 0x0
[P14582:T1:python3] trace: ---- shim_gettimeofday(0x2c8aea004770, 0x0) = 0x0
[P14582:T1:python3] trace: ---- shim_futex(0x2c8ae61000c0, FUTEX_PRIVATE|FUTEX_CLOCK_REALTIME|FUTEX_WAIT_BITSET, 0, 0x2c8aea004760, 0x0, -1) ...
[P14582:T1:python3] warning: Ignoring FUTEX_CLOCK_REALTIME flag
[P14582:T5:python3] trace: ---- return from shim_epoll_wait(...) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(1, 0x2c8ae4563270) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(0, 0x2c8ae4563370) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(0, 0x2c8ae4563240) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(1, 0x2c8ae4563240) = 0x0
[P14582:T5:python3] trace: ---- shim_getcpu(0x2c8ae45631cc, 0x0, 0x0) = 0x0
[P14582:T5:python3] trace: ---- shim_clock_gettime(1, 0x2c8ae45631c0) = 0x0
[P14582:T5:python3] trace: ---- shim_epoll_wait(3, 0x2c8ae70f4d04, 100, 201) ...
[P14582:T1:python3] trace: ---- return from shim_futex(...) = -110
[P14582:T1:python3] trace: ---- shim_clock_gettime(1, 0x2c8aea004790) = 0x0
[P14582:T1:python3] trace: ---- shim_gettimeofday(0x2c8aea004770, 0x0) = 0x0
[P14582:T1:python3] trace: ---- shim_futex(0x2c8ae61000c0, FUTEX_PRIVATE|FUTEX_CLOCK_REALTIME|FUTEX_WAIT_BITSET, 0, 0x2c8aea004760, 0x0, -1) ...
...

client log when everything is normal:

...
[P88349:T3:python3] trace: ---- shim_madvise(0x5eedbc29f000, 0x1fb000, MADV_DONTNEED) = -38
[P88349:T3:python3] debug: ---- shim_exit (returning 0)
[P88349:T3:python3] debug: Installed async event at 1630577724179050
[P88349:shim] debug: Thread exited, cleaning up
[P88349:T1:python3] trace: ---- return from shim_futex(...) = 0x0
[P88349:T1:python3] trace: ---- shim_close(4) = 0x0
[P88349:T1:python3] trace: ---- shim_close(5) = 0x0
[P88349:T1:python3] trace: ---- shim_close(3) = 0x0
[P88349:T1:python3] trace: ---- shim_munmap(0x5eedbc6b1000, 0xb0000) ...
[P88349:T1:python3] trace: ---- return from shim_munmap(...) = 0x0
[P88349:T1:python3] trace: ---- shim_mmap(0x0, 0x12c01000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0x0) ...
[P88349:T1:python3] trace: ---- return from shim_mmap(...) = 0x5eed73399000
[P88349:T1:python3] trace: ---- shim_munmap(0x5eed73399000, 0x12c01000) ...
[P88349:T1:python3] trace: ---- return from shim_munmap(...) = 0x0
[P88349:T1:python3] trace: ---- shim_munmap(0x5eed85f9a000, 0x12c01000) ...
[P88349:T1:python3] trace: ---- return from shim_munmap(...) = 0x0
[P88349:T1:python3] trace: ---- shim_write(1, 0x5eedc04a9c90, 0xa) ...
314572800
[P88349:T1:python3] trace: ---- return from shim_write(...) = 0xa
[P88349:T1:python3] trace: ---- shim_rt_sigaction([SIGINT], 0x5eedc1b89970, 0x5eedc1b89a10, 0x8) = 0x0
[P88349:T1:python3] trace: ---- shim_sigaltstack(0x0, 0x5eedc1b89c00) = 0x0
[P88349:T1:python3] trace: ---- shim_sigaltstack(0xa76c90, 0x0) = 0x0
[P88349:T1:python3] trace: ---- shim_futex(0x5eedbe348948, FUTEX_PRIVATE|FUTEX_WAKE, 2147483647, 0x0, 0x5eedbd9f6260, 1) ...
[P88349:T1:python3] trace: ---- return from shim_futex(...) = 0x0
[P88349:T1:python3] trace: ---- shim_fstat(0, 0x5eedc1b89c20) = 0x0
[P88349:T1:python3] trace: ---- shim_fstat(0, 0x5eedc1b89c20) = 0x0
[P88349:T1:python3] trace: ---- shim_fstat(0, 0x5eedc1b89c30) = 0x0
[P88349:T1:python3] debug: ---- shim_exit_group (returning 0)
[P88349:T1:python3] debug: clearing POSIX locks for pid 1
[P88349:T1:python3] debug: sync client shutdown: closing handles
[P88349:T1:python3] debug: sync client shutdown: waiting for confirmation
[P88349:T1:python3] debug: sync client shutdown: finished
[P88349:shim] debug: Async worker thread terminated
[P88349:shim] debug: IPC worker: exiting worker thread
[P88349:T1:python3] debug: process 88349 exited with status 0

client log when error happens:

...
[P91774:T1:python3] trace: ---- shim_openat(AT_FDCWD, "/venv/lib/python3.6/site-packages/grpc/_channel.py", O_RDONLY|0x80000, 0000) = 0x3
[P91774:T1:python3] trace: ---- shim_fstat(3, 0x330b48526470) = 0x0
[P91774:T1:python3] trace: ---- shim_ioctl(3, TCGETS, 0x330b48526440) ...
[P91774:T1:python3] trace: ---- return from shim_ioctl(...) = -38
[P91774:T1:python3] trace: ---- shim_lseek(3, 0x0, SEEK_CUR) = 0x0
[P91774:T1:python3] trace: ---- shim_fcntl(3, OP 1030, 0x0) = 0x4
[P91774:T1:python3] trace: ---- shim_fcntl(4, F_GETFL, 0x7146cc) = 0x80000
[P91774:T1:python3] trace: ---- shim_newfstatat(4, "", 0x330b485262b0, 4096) = 0x0
[P91774:T1:python3] trace: ---- shim_read(4, 0x330b46b66090, 0x2000) ...
[P91774:T1:python3] trace: ---- return from shim_read(...) = 0x2000
[P91774:T1:python3] trace: ---- shim_close(4) = 0x0
[P91774:T1:python3] trace: ---- shim_lseek(3, 0x0, SEEK_SET) = 0x0
[P91774:T1:python3] trace: ---- shim_lseek(3, 0x0, SEEK_CUR) = 0x0
[P91774:T1:python3] trace: ---- shim_read(3, 0x330b432a8d00, 0x2000) ...
[P91774:T1:python3] trace: ---- return from shim_read(...) = 0x2000
[P91774:T1:python3] trace: ---- shim_read(3, 0x330b46b660b0, 0x2000) ...
[P91774:T1:python3] trace: ---- return from shim_read(...) = 0x2000
[P91774:T1:python3] trace: ---- shim_read(3, 0x330b432a8d00, 0x2000) ...
[P91774:T1:python3] trace: ---- return from shim_read(...) = 0x2000
[P91774:T1:python3] trace: ---- shim_read(3, 0x330b46b660b0, 0x2000) ...
[P91774:T1:python3] trace: ---- return from shim_read(...) = 0x2000
[P91774:T1:python3] trace: sync client: destroying handle: 0x10000015c
[P91774:T1:python3] trace: ---- shim_close(3) = 0x0
[P91774:T1:python3] trace: ---- shim_write(2, 0x330b46e6b320, 0x36c) ...
Traceback (most recent call last):
  File "scripts/grpc_client.py", line 26, in <module>
    run()
  File "scripts/grpc_client.py", line 18, in run
    response = stub.SayHello(helloworld_pb2.HelloRequest(name='hello'))
  File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/venv/lib/python3.6/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNKNOWN
        details = "Stream removed"
        debug_error_string = "{"created":"@1630578115.422138000","description":"Error received from peer ipv4:10.67.125.170:50051","file":"src/core/lib/surface/call.cc","file_line":1069,"grpc_message":"Stream removed","grpc_status":2}"
>
[P91774:T1:python3] trace: ---- return from shim_write(...) = 0x36c
[P91774:T1:python3] trace: ---- shim_rt_sigaction([SIGINT], 0x330b48527970, 0x330b48527a10, 0x8) = 0x0
[P91774:T1:python3] trace: ---- shim_sigaltstack(0x0, 0x330b48527c00) = 0x0
[P91774:T1:python3] trace: ---- shim_sigaltstack(0xa76c90, 0x0) = 0x0
[P91774:T1:python3] trace: ---- shim_futex(0x330b44ca6948, FUTEX_PRIVATE|FUTEX_WAKE, 2147483647, 0x0, 0x330b43204ec0, 1) ...
[P91774:T1:python3] trace: ---- return from shim_futex(...) = 0x0
[P91774:T1:python3] trace: ---- shim_fstat(0, 0x330b48527c20) = 0x0
[P91774:T1:python3] trace: ---- shim_fstat(0, 0x330b48527c20) = 0x0
[P91774:T1:python3] trace: ---- shim_fstat(0, 0x330b48527c30) = 0x0
[P91774:T1:python3] debug: ---- shim_exit_group (returning 1)
[P91774:T1:python3] debug: clearing POSIX locks for pid 1
[P91774:T1:python3] debug: sync client shutdown: closing handles
[P91774:T1:python3] debug: sync client shutdown: waiting for confirmation
[P91774:T1:python3] debug: sync client shutdown: finished
[P91774:shim] debug: Async worker thread terminated
[P91774:shim] debug: IPC worker: exiting worker thread
[P91774:T1:python3] debug: process 91774 exited with status 1

Here is the detailed full log: grpc-log.zip

I've tried tcp socket with sendmsg and recvmsg in C outside graphene, but when data transmitted is more than about 5M, it will cause core dumped. It seems hard to try 300M.

from graphene.

jkr0103 avatar jkr0103 commented on July 22, 2024

Root caused the issue and fixed. I will submit the PR on Monday. Please assign the bug to me and add appropriate bug tag and priority.

from graphene.

boryspoplawski avatar boryspoplawski commented on July 22, 2024

@jkr0103 Thanks! Could you describe shortly what was the issue (or give offending file+line, if it was simple issue)?

from graphene.

jkr0103 avatar jkr0103 commented on July 22, 2024

if one of the `DkStreamWrite' in sendmsg 'https://github.com/oscarlab/graphene/blob/1a2275e00d00a29de12e2d7de840141ce264b2cb/LibOS/shim/src/sys/shim_socket.c#L1157' invocations only did a partial write, the code simply continues writing from the next buffer (buf[i+1]) which is wrong. We should bail out as we do in recvmsg https://github.com/oscarlab/graphene/blob/1a2275e00d00a29de12e2d7de840141ce264b2cb/LibOS/shim/src/sys/shim_socket.c#L1484-L1487

from graphene.

boryspoplawski avatar boryspoplawski commented on July 22, 2024

@jkr0103 Yeah, this must be it, thanks for debugging it.

P.S. Don't forget to add "Fixes #number" to the PR fixing it.

from graphene.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.