Comments (17)
This issue should be fixed in unstable
.
from nimbus-eth2.
Sure, thanks, any explanation for what was actually happening? I see nothing in the PR or the commit.
It was an issue some small race condition which exits accept loop so no more connections are being accepted and processed after bug happens. It exists for more than 7 months already, but looks like it get revealed only when host is very busy.
from nimbus-eth2.
We might be seeing some kind of issue with freeing sockets:
from nimbus-eth2.
It appears the unstable
node on linux-01.ih-eu-mda1.nimbus.mainnet
for Mainnet has the same issue:
Though the socket graph looks a bit different:
from nimbus-eth2.
Same behavior again:
[email protected]:~ % curl --max-time 5 0:9302/eth/v1/node/version
curl: (28) Operation timed out after 5000 milliseconds with 0 bytes received
[email protected]:~ % curl --max-time 10 0:9302/eth/v1/node/version
{"data":{"version":"Nimbus/v24.1.1-0e63f8-stateofus"}}%
Sometimes it does work tho.
from nimbus-eth2.
Not sure if this socket usage growth is at fault, possibly:
from nimbus-eth2.
It's not files open limit since the currently used number is low:
[email protected]:~ % sudo ls /proc/$(systemctl show --property MainPID --value beacon-node-mainnet-unstable-01)/fd/ | wc -l
385
[email protected]:~ % s cat beacon-node-mainnet-unstable-01 | grep LimitNOFILE
LimitNOFILE=16384
from nimbus-eth2.
It doesn't seem to be a firewall issue because I can see the packet attempting to open the connection pass through correctly:
PACKET: 2 42017d99 IN=lo LOOPBACK SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x0 TTL=64 ID=17113DF SPORT=33346 DPORT=9304 SYN
TRACE: 2 42017d99 raw:PREROUTING:rule:0x2:CONTINUE -4 -t raw -A PREROUTING -p tcp -m tcp --dport 9304 -j TRACE
TRACE: 2 42017d99 raw:PREROUTING:return:
TRACE: 2 42017d99 raw:PREROUTING:policy:ACCEPT
PACKET: 2 42017d99 IN=lo LOOPBACK SRC=127.0.0.1 DST=127.0.0.1 LEN=60 TOS=0x0 TTL=64 ID=17113DF SPORT=33346 DPORT=9304 SYN
TRACE: 2 42017d99 filter:INPUT:rule:0xc:ACCEPT -4 -t filter -A INPUT -i lo -m comment --comment "loopback interface" -j ACCEPT
from nimbus-eth2.
I stopped all the other nodes on linux-01
on Mainnet to see if freeing up CPU and RAM will make a difference but it doesn't:
[email protected]:~ % c --max-time 10 0:9304/eth/v1/node/version
curl: (28) Operation timed out after 10001 milliseconds with 0 bytes received
So I don't think it's related to high system load or low memory.
from nimbus-eth2.
We identified that Prater node doesn't have the --rest-allow-origin="*"
flag, so we've disabled it on mainnet:
infra-nimbus#cb517650
- nimbus.master: disable wildcard REST API origin
Doubt it will do anything though.
from nimbus-eth2.
I cannot see any public REST API endpoint issues currently. It is possible the removal of --rest-allow-origin="*"
did fix the issue for the publicly exposed nodes. But that does not actually address the underlying issue with the REST API endpoint.
We will continue to monitor for API problems like this, but this particular issue is not resolved, only mitigated on infra side.
from nimbus-eth2.
Actually, scratch that. The testing
node on geth-02.ih-eu-mda1.nimbus.holesky
is now showing the same symptoms:
[email protected]:~ % c --max-time 5 0:9302/eth/v1/node/syncing
curl: (28) Operation timed out after 5001 milliseconds with 0 bytes received
[email protected]:~ % c --max-time 10 0:9302/eth/v1/node/syncing
curl: (28) Operation timed out after 10000 milliseconds with 0 bytes received
[email protected]:~ % c --max-time 30 0:9302/eth/v1/node/syncing
curl: (28) Operation timed out after 30001 milliseconds with 0 bytes received
And it is exposed publicly:
'geth-02.ih-eu-mda1.nimbus.holesky': # 1 each
- { branch: 'stable', start: 0, end: 1, el: 'geth', vc: true }
- { branch: 'testing', start: 1, end: 2, el: 'geth', vc: false, public_api: true }
- { branch: 'unstable', start: 2, end: 3, el: 'geth', vc: false }
- { branch: 'libp2p', start: 3, end: 4, el: 'geth', vc: false }
And it does not have the --rest-allow-origin
flag:
[email protected]:~ % s cat beacon-node-holesky-testing | grep rest-
--rest-address=0.0.0.0 \
--rest-port=9302 \
--rest-max-body-size=16384 \
--rest-max-headers-size=128 \
Which suggests that the --rest-allow-origin
flag is a red herring.
from nimbus-eth2.
Another instance of the issue on linux-02.ih-eu-mda1.nimbus.mainnet
node beacon-node-mainnet-testing-01
:
[email protected]:~ % for port in $(seq 9300 9305); do c 0:$port/eth/v1/node/version | jq -c; done
{"data":{"version":"Nimbus/v24.2.0-742f15-stateofus"}}
{"data":{"version":"Nimbus/v24.2.0-742f15-stateofus"}}
curl: (28) Connection timed out after 5001 milliseconds
{"data":{"version":"Nimbus/v24.2.0-742f15-stateofus"}}
{"data":{"version":"Nimbus/v24.2.0-81b849-stateofus"}}
{"data":{"version":"Nimbus/v24.2.0-81b849-stateofus"}}
from nimbus-eth2.
Sure, thanks, any explanation for what was actually happening? I see nothing in the PR or the commit.
from nimbus-eth2.
Almost all beacon node of the Holesky network built on the branch nim-libp2p-auto-bump-unstable show this issue this weekend:
Get "http://localhost:9304/eth/v1/node/version": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
from nimbus-eth2.
Almost all beacon node of the Holesky network built on the branch nim-libp2p-auto-bump-unstable show this issue this weekend:
Get "http://localhost:9304/eth/v1/node/version": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Could you please provide some commit hash or something, because branch
is moving all the time and right now its impossible to verify is this branch had fixes from unstable or not.
from nimbus-eth2.
Thanks for the explanation and more importantly the fix.
I reviewed some of our alerts and I do think this has been resolved. If need be we can always reopen.
from nimbus-eth2.
Related Issues (20)
- Clear single-vote attestations from pool when aggregate is full
- Optimizing syncing of sparse branches on stalled chain HOT 1
- Segmentation fault of 24.2.2 on Windows Server 2019 Standard HOT 10
- Nimbus CL < > Prysm VC incompatibility HOT 3
- Nimbus CL < > Lodestar VC incompatibility HOT 4
- Handle 404 errors in getAggregatedAttestation response HOT 1
- publishBlockV2 fails gossip validation for valid block HOT 7
- The debug REST API is not serving recent states (less than 2 days old) HOT 6
- Checkpoint-synced nodes appear to not use ERA files HOT 2
- Error: Unhandled exception: Asynchronous task [sendMessageSlow() at pubsubpeer.nim:301] was cancelled! [FutureDefect] HOT 8
- Release tarballs missing vendor folder
- Single command for beacon node and checkpoint sync. Remove separate command for `trustedNodeSync` HOT 3
- Require flag when resuming from past-weak-subjectivity database / genesis HOT 1
- build error: incompatible pointer type HOT 5
- Check conten-type and return 415 if not supported by route HOT 7
- Beacon node's P2P degrades permanently after 40 minutes of no connectivity HOT 8
- Bug: Error build with cmd "make -jX nimbus_beacon_node" HOT 11
- Compilation error when "import libp2p/multicodec" is added to sync_manager.nim HOT 12
- Checkpoint sync fails with "Attempt to download checkpoint state timed out" on holesky on all endpoints HOT 7
- [SEC]
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nimbus-eth2.