Description of problem
The JSON RPC server aquadoggo
(hosted on https://welle.liebechaos.org) hangs and no longer responds after receiving a well formed and correct request. Requests time out with an 503
status code (Gateway timeout). It returns correct error responses to malformed requests (for example invalid payload, missing JSON RPC method etc.). When it "hangs" it also doesn't respond anymore on malformed requests.
Important: This issue doesn't appear on local environments (MacOS & Linux) but only when deployed on remote servers (vServers deployed via docker or installed directly).
After multiple debugging sessions we could identify the problem being part of an unbounded async channel which does not receive anything even though a message was sent through this channel. Increasing the server to have 4 CPUs solves the issue.
This is the part in the code where the channel receiver never gets notified (causing the server to hang):
https://github.com/p2panda/aquadoggo/blob/debug-doggo/aquadoggo/src/rpc/api.rs#L58-L69
This is where the request begins (start reading from here to follow request):
https://github.com/p2panda/aquadoggo/blob/debug-doggo/aquadoggo/src/rpc/api.rs#L121
For debugging we created a branch debug-doggo
with additional logging (compare logs below) and simplified logic (removing database handling etc.).
Steps to Reproduce
- Build binary file for
aquadoggo_cli
locally and test it is working by making a curl request to local host
- Copy the binary to a suitable remote host, run the process and test once again with curl request
- The curl request can be made locally or remotely
OR
- Install rust and cargo and any other required dependencies on a remote host
- Build aquadoggo with
cargo build
- Run it with
cargo run
- Test with curl requests
OR
- Use docker image: https://hub.docker.com/r/p2panda/aquadoggo
- Start aquadoggo on local and remote machine via Docker
- Test with curl requests
Example for an "invalid" JSON RPC request via curl:
Request:
curl -H "Content-Type: application/json" -X POST -d '{"jsonrpc":"2.0","id":"1","method":"panda_getEntryArguments","params":{"author":"bla"}}' https://welle.liebechaos.org
Response:
{
"jsonrpc": "2.0",
"error": {
"code": -32602,
"message": "Invalid params: missing field `schema`."
},
"id": "0"
}
Example for an "valid" JSON RPC request via curl which makes the server hang:
Request:
curl -H "Content-Type: application/json" -X POST -d '{"jsonrpc":"2.0","id":"2","method":"panda_getEntryArguments","params":{"author":"fddf32769f21f1cc9b32d91370aa8cdb598b51fb40732353cd3b8fe6cdad6ae8","schema":"0040cf94f6d605657e90c543b0c919070cdaaf7209c5e1ea58acb8f3568fa2114268dc9ac3bafe12af277d286fce7dc59b7c0c348973c4e9dacbe79485e56ac2a702"}}' https://welle.liebechaos.org
Response:
No response, request times out (504) after a few seconds
Environment Information
============FAILING SYSTEM===============
Distributor ID: Ubuntu
Description: Ubuntu 20.04.2 LTS
Release: 20.04
Codename: focal
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 40 bits physical, 48 bits virtual
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: DO-Regular
Stepping: 2
CPU MHz: 2494.138
BogoMIPS: 4988.27
Virtualization: VT-x
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 64 KiB
L1i cache: 64 KiB
L2 cache: 8 MiB
===========WORKING SYSTEM===============
Distributor ID: Ubuntu
Description: Ubuntu 20.04.2 LTS
Release: 20.04
Codename: focal
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 36 bits physical, 48 bits virtual
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 42
Model name: Intel(R) Core(TM) i5-2540M CPU @ 2.60GHz
Stepping: 7
CPU MHz: 1844.795
CPU max MHz: 3300.0000
CPU min MHz: 800.0000
BogoMIPS: 5183.44
Virtualisation: VT-x
L1d cache: 64 KiB
L1i cache: 64 KiB
L2 cache: 512 KiB
L3 cache: 3 MiB
NUMA node0 CPU(s): 0-3
Debug information
Logs with trace
flag running on local machine (and working without problems / not hanging):
[2021-04-14T08:55:28Z TRACE polling::epoll] wait: epoll_fd=3, timeout=Some(569.999073677s)
[2021-04-14T08:55:28Z TRACE polling::epoll] modify: epoll_fd=3, fd=5, ev=Event { key: 18446744073709551615, readable: true, writable: false }
[2021-04-14T08:55:28Z TRACE mio::poll] registering with poller
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::conn] Conn::read_head
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::conn] flushed({role=server}): State { reading: Init, writing: Init, keep_alive: Busy }
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::conn] Conn::read_head
[2021-04-14T08:55:28Z DEBUG hyper::proto::h1::io] read 424 bytes
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::role] parse_headers
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::role] -> parse_headers
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::role] Request.parse([Header; 100], [u8; 424])
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::role] Request.parse Complete(132)
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::role] <- parse_headers
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::role] -- parse_headers
[2021-04-14T08:55:28Z DEBUG hyper::proto::h1::io] parsed 5 headers
[2021-04-14T08:55:28Z DEBUG hyper::proto::h1::conn] incoming body is content-length (292 bytes)
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::decode] decode; state=Length(292)
[2021-04-14T08:55:28Z DEBUG hyper::proto::h1::conn] incoming body completed
[2021-04-14T08:55:28Z TRACE jsonrpc_core::io] Request: {"jsonrpc":"2.0","id":"2","method":"panda_getEntryArguments","params":{"author":"fddf32769f21f1cc9b32d91370aa8cdb598b51fb40732353cd3b8fe6cdad6ae8","schema":"0040cf94f6d605657e90c543b0c919070cdaaf7209c5e1ea58acb8f3568fa2114268dc9ac3bafe12af277d286fce7dc59b7c0c348973c4e9dacbe79485e56ac2a702"}}.
[2021-04-14T08:55:28Z DEBUG aquadoggo::rpc::api] get_entry_args
[2021-04-14T08:55:28Z DEBUG aquadoggo::rpc::api] validated! EntryArgsRequest { author: Author("fddf32769f21f1cc9b32d91370aa8cdb598b51fb40732353cd3b8fe6cdad6ae8"), schema: Hash("0040cf94f6d605657e90c543b0c919070cdaaf7209c5e1ea58acb8f3568fa2114268dc9ac3bafe12af277d286fce7dc59b7c0c348973c4e9dacbe79485e56ac2a702") }
[2021-04-14T08:55:28Z DEBUG aquadoggo::rpc::api] create back channel
[2021-04-14T08:55:28Z DEBUG aquadoggo::rpc::api] sent to backend
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::conn] flushed({role=server}): State { reading: KeepAlive, writing: Init, keep_alive: Busy }
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::conn] flushed({role=server}): State { reading: KeepAlive, writing: Init, keep_alive: Busy }
[2021-04-14T08:55:28Z TRACE async_io::driver] block_on: sleep until notification
[2021-04-14T08:55:28Z DEBUG aquadoggo::rpc::api] backend: inner task message=GetEntryArgs(EntryArgsRequest { author: Author("fddf32769f21f1cc9b32d91370aa8cdb598b51fb40732353cd3b8fe6cdad6ae8"), schema: Hash("0040cf94f6d605657e90c543b0c919070cdaaf7209c5e1ea58acb8f3568fa2114268dc9ac3bafe12af277d286fce7dc59b7c0c348973c4e9dacbe79485e56ac2a702") }, Sender { inner: Inner { complete: false, data: Lock { locked: false, data: UnsafeCell }, rx_task: Lock { locked: false, data: UnsafeCell }, tx_task: Lock { locked: false, data: UnsafeCell } } })
[2021-04-14T08:55:28Z DEBUG aquadoggo::rpc::api] backend: GetEntryArgs request
[2021-04-14T08:55:28Z DEBUG aquadoggo::rpc::methods::entry_args] method: start
[2021-04-14T08:55:28Z DEBUG aquadoggo::rpc::api] backend: GetEntryArgs request done
[2021-04-14T08:55:28Z TRACE async_io::driver] block_on: sleep until notification
[2021-04-14T08:55:28Z DEBUG aquadoggo::rpc::api] received to backend
[2021-04-14T08:55:28Z DEBUG jsonrpc_core::io] Response: {"jsonrpc":"2.0","result":{"entryHashBacklink":null,"entryHashSkiplink":null,"lastSeqNum":null,"logId":12345},"id":"2"}.
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::role] encode_headers
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::role] -> encode_headers
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::role] Server::encode status=200, body=Some(Known(120)), req_method=Some(POST)
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::role] <- encode_headers
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::role] -- encode_headers
[2021-04-14T08:55:28Z DEBUG hyper::proto::h1::io] flushed 244 bytes
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::conn] flushed({role=server}): State { reading: Init, writing: Init, keep_alive: Idle }
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::conn] Conn::read_head
[2021-04-14T08:55:28Z DEBUG hyper::proto::h1::io] read 0 bytes
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::io] parse eof
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::conn] State::close_read()
[2021-04-14T08:55:28Z DEBUG hyper::proto::h1::conn] read eof
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::conn] State::close_write()
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::conn] State::close_read()
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::conn] State::close_write()
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::conn] flushed({role=server}): State { reading: Closed, writing: Closed, keep_alive: Disabled }
[2021-04-14T08:55:28Z TRACE hyper::proto::h1::conn] shut down IO complete
[2021-04-14T08:55:28Z TRACE mio::poll] deregistering handle with poller
Logs with trace
flag running on remote machine (not working / hanging):
[2021-04-14T09:01:40Z TRACE polling::epoll] wait: epoll_fd=3, timeout=Some(599.948964038s)
[2021-04-14T09:01:40Z TRACE polling::epoll] modify: epoll_fd=3, fd=5, ev=Event { key: 18446744073709551615, readable: true, writable: false }
[2021-04-14T09:01:52Z TRACE mio::poll] registering with poller
[2021-04-14T09:01:52Z TRACE hyper::proto::h1::conn] Conn::read_head
[2021-04-14T09:01:52Z TRACE hyper::proto::h1::conn] flushed({role=server}): State { reading: Init, writing: Init, keep_alive: Busy }
[2021-04-14T09:01:52Z TRACE hyper::proto::h1::conn] Conn::read_head
[2021-04-14T09:01:52Z DEBUG hyper::proto::h1::io] read 424 bytes
[2021-04-14T09:01:52Z TRACE hyper::proto::h1::role] parse_headers
[2021-04-14T09:01:52Z TRACE hyper::proto::h1::role] -> parse_headers
[2021-04-14T09:01:52Z TRACE hyper::proto::h1::role] Request.parse([Header; 100], [u8; 424])
[2021-04-14T09:01:52Z TRACE hyper::proto::h1::role] Request.parse Complete(132)
[2021-04-14T09:01:52Z TRACE hyper::proto::h1::role] <- parse_headers
[2021-04-14T09:01:52Z TRACE hyper::proto::h1::role] -- parse_headers
[2021-04-14T09:01:52Z DEBUG hyper::proto::h1::io] parsed 5 headers
[2021-04-14T09:01:52Z DEBUG hyper::proto::h1::conn] incoming body is content-length (292 bytes)
[2021-04-14T09:01:52Z TRACE hyper::proto::h1::decode] decode; state=Length(292)
[2021-04-14T09:01:52Z DEBUG hyper::proto::h1::conn] incoming body completed
[2021-04-14T09:01:52Z TRACE jsonrpc_core::io] Request: {"jsonrpc":"2.0","id":"2","method":"panda_getEntryArguments","params":{"author":"fddf32769f21f1cc9b32d91370aa8cdb598b51fb40732353cd3b8fe6cdad6ae8","schema":"0040cf94f6d605657e90c543b0c919070cdaaf7209c5e1ea58acb8f3568fa2114268dc9ac3bafe12af277d286fce7dc59b7c0c348973c4e9dacbe79485e56ac2a702"}}.
[2021-04-14T09:01:52Z DEBUG aquadoggo::rpc::api] get_entry_args
[2021-04-14T09:01:52Z DEBUG aquadoggo::rpc::api] validated! EntryArgsRequest { author: Author("fddf32769f21f1cc9b32d91370aa8cdb598b51fb40732353cd3b8fe6cdad6ae8"), schema: Hash("0040cf94f6d605657e90c543b0c919070cdaaf7209c5e1ea58acb8f3568fa2114268dc9ac3bafe12af277d286fce7dc59b7c0c348973c4e9dacbe79485e56ac2a702") }
[2021-04-14T09:01:52Z DEBUG aquadoggo::rpc::api] create back channel
[2021-04-14T09:01:52Z DEBUG aquadoggo::rpc::api] sent to backend
[2021-04-14T09:01:52Z TRACE hyper::proto::h1::conn] flushed({role=server}): State { reading: KeepAlive, writing: Init, keep_alive: Busy }
[2021-04-14T09:01:52Z TRACE hyper::proto::h1::conn] flushed({role=server}): State { reading: KeepAlive, writing: Init, keep_alive: Busy }
// from here on the server hangs ..