Giter Club home page Giter Club logo

Comments (86)

erikd avatar erikd commented on May 27, 2024 1

Ok, profling got me my first traceback:

*** Exception (reporting due to +RTS -xc): (THUNK), stack trace:
  Pos.DB.Block.Internal.putSerializedBlunds.\,
  called from Pos.DB.Block.Internal.putSerializedBlunds,
  called from Pos.DB.Block.Internal.dbPutSerBlundsRealDefault,
  called from Cardano.Wallet.Kernel.Mode.dbPutSerBlunds,
  called from Pos.DB.Block.Load.putBlunds,
  called from Pos.DB.Block.Slog.Logic.slogApplyBlocks,
  called from Pos.DB.Block.Logic.Internal.applyBlocksDbUnsafeDo,
  called from Pos.DB.Block.Logic.Internal.applyBlocksUnsafe.app,
  called from Pos.DB.Block.Logic.Internal.applyBlocksUnsafe,
  called from Pos.DB.Block.Logic.VAR.rollingVerifyAndApply.\,
  called from Pos.DB.Block.Logic.VAR.rollingVerifyAndApply,
  called from Pos.DB.Block.Logic.VAR.verifyAndApplyBlocks,
  called from Pos.Network.Block.Logic.applyWithoutRollback.applyWithoutRollbackDo,
  called from Pos.DB.GState.Lock.stateLockHelper.\,

Going to run it a couple of more times to make sure it crashes the same way each time.

from cardano-sl.

erikd avatar erikd commented on May 27, 2024 1

So, is cardano-node expected to be a de-facto replacement for cardano-sl?

Yes!

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

Please cut and paste the actual log output rather than including a screenshot. Screenshots are unreadable for people with high DPI monitors.

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024
[cardano-sl.*production*:Info:ThreadId 382] [2020-01-17 07:31:07.48 UTC] Trying to apply blocks w/o rollback. First 3: [MainBlockHeader:
    hash: 19b1f1eec6f9abb145114bfeda8cad76f2ff9fda4ff3e4ccdaf369f4052f9c3b
    previous block: 1b3c31eb41b0d38af18d9cf908a7c1848911d081271fb29b88d5b010932b2eba
    slot: 8364th slot of 152nd epoch
    difficulty: 3290025
    leader: pub:993a8f05
    signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:993a8f05, dPk = pub:89c29f8c } }
    block: v0.2.0
    software: cardano-sl:1
, MainBlockHeader:
    hash: a7b1b58758880db395796ce8b8cba290de717097a3a1de5b50e0a9923a2941f0
    previous block: 19b1f1eec6f9abb145114bfeda8cad76f2ff9fda4ff3e4ccdaf369f4052f9c3b
    slot: 8365th slot of 152nd epoch
    difficulty: 3290026
    leader: pub:0bdb1f5e
    signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:0bdb1f5e, dPk = pub:5fddeeda } }
    block: v0.2.0
    software: cardano-sl:1
, MainBlockHeader:
    hash: f25b190e0f961f05b111952b72a8cba6b30cffa4caac4c60eda64697471dc606
    previous block: a7b1b58758880db395796ce8b8cba290de717097a3a1de5b50e0a9923a2941f0
    slot: 8366th slot of 152nd epoch
    difficulty: 3290027
    leader: pub:1bc97a2f
    signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:1bc97a2f, dPk = pub:61261a95 } }
    block: v0.2.0
    software: cardano-sl:1
]
Last 3: [MainBlockHeader:
    hash: 197d5cfea25e990f6893e1250ea248ac25c0465db43abef95677b32c0d3ebbff
    previous block: 14b798e52f1b215d77b8be0ff1315ca25885f37dc27cacb7ac6db897d877f8a1
    slot: 8425th slot of 152nd epoch
    difficulty: 3290086
    leader: pub:9a6fa343
    signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:9a6fa343, dPk = pub:8b532076 } }
    block: v0.2.0
    software: cardano-sl:1
, MainBlockHeader:
    hash: 5a0795022c4786191d90eaf83f0a58c927d399a4779b1a61b78a0188de439c3a
    previous block: 197d5cfea25e990f6893e1250ea248ac25c0465db43abef95677b32c0d3ebbff
    slot: 8426th slot of 152nd epoch
    difficulty: 3290087
    leader: pub:0bdb1f5e
    signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:0bdb1f5e, dPk = pub:5fddeeda } }
    block: v0.2.0
    software: cardano-sl:1
, MainBlockHeader:
    hash: d571750aee77c352ae4a3be20b1f229d4e3d6c549668a06b2a19b3b8bc301843
    previous block: 5a0795022c4786191d90eaf83f0a58c927d399a4779b1a61b78a0188de439c3a
    slot: 8427th slot of 152nd epoch
    difficulty: 3290088
    leader: pub:0bdb1f5e
    signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:0bdb1f5e, dPk = pub:5fddeeda } }
    block: v0.2.0
    software: cardano-sl:1
]
[cardano-sl.*production*:Debug:ThreadId 382] [2020-01-17 07:31:07.48 UTC] MemPool metrics wait: ApplyBlock queue length is 1
[cardano-sl.*production*:Debug:ThreadId 382] [2020-01-17 07:31:07.48 UTC] MemPool metrics acquire: ApplyBlock wait time was 12mcs
[cardano-sl.*production*:Info:ThreadId 382] [2020-01-17 07:31:07.48 UTC] Verifying and applying blocks...
[cardano-sl.*production*:Debug:ThreadId 382] [2020-01-17 07:31:07.48 UTC] Rolling: verifying
[cardano-sl.*production*:Debug:ThreadId 382] [2020-01-17 07:31:07.48 UTC] verifyBlocksPrefix: 64
[cardano-sl.*production*:Info:ThreadId 382] [2020-01-17 07:31:07.48 UTC] slogVerifyBlocks: Consensus era is Original
[cardano-sl.*production*:Debug:ThreadId 382] [2020-01-17 07:31:07.71 UTC] Rolling: Verification done, applying unsafe block
[cardano-sl.node:Debug:ThreadId 316] [2020-01-17 07:31:07.72 UTC] applying some blocks (non-rollback)
[cardano-sl.*production*:Info:ThreadId 382] [2020-01-17 07:31:07.74 UTC] Verifying and applying blocks done
[cardano-sl.*production*:Debug:ThreadId 382] [2020-01-17 07:31:07.74 UTC] MemPool metrics release: ApplyBlock modify time was 258859mcs size is 0
[cardano-sl.*production*:Debug:ThreadId 382] [2020-01-17 07:31:07.74 UTC] Not relaying block in recovery mode
[cardano-sl.*production*:Info:ThreadId 382] [2020-01-17 07:31:07.74 UTC] Blocks have been adopted: [19b1f1eec6f9abb1, a7b1b58758880db3, f25b190e0f961f05, 0e907e6cf3f6ca4d, d4f74c5d312b0549, e4d06fea6f988ce9, ee279270b61135e9, 6c7998c9ba54d17e, 94853e29ce7200e6, f08ed74a4fcfd8e3, 834b772bd7901ffb, fb514730332114d2, c8a4b3a5697044ee, 5c9a27718384810e, 98573eeb0e53dbd4, a3be980a5027b156, 3713b244551e732b, ff9fa80350026e9e, 1704bdb34a16c6be, 229b40d03c219294, dfcbec3a426b4ca3, 860e4a48c46e67fa, 40b75fe17d58e8bf, faddbd98928a5527, 7b1d29bc099b5b19, ab08ff7a02399e92, ae699dcf30f5ef08, a6d99576b5002d9a, 62b679c913e1c9ee, 4d01d5340c42fdb3, 196d33de29e35ae9, 98ee850aac9a1b9d, 4f3037f91ccd71fd, 24fbc16a4479d02f, 53ac78eb4d6ab7f2, d48b6feb41138da5, 676d449b1ca1fb74, 3327651659423dee, 1a3c9a6c2f095cd7, 16b36ed4da3e05ff, 2223acc65da5f5db, e8d4e12a616f4227, 443ff6e4fffeac0b, 22db605e0059c4d8, 643d9645b54e09b5, fdf7bfe14b34aed0, 6a1650622dec4d07, 1af30b82a5ee01a8, 1e585328b2eda019, db55785d3ea21869, 0ac37f9d03244062, cc9d0a297ea9af53, ebffb3fc15d7b62b, e7b81541b23ace3a, a48604d65cef0180, 7af2bd4c948d283f, 013dc0ea380f67a1, de88e7e0c354e93a, a762148ba5fbf0a6, 432652c976cbbdc4, 14b798e52f1b215d, 197d5cfea25e990f, 5a0795022c478619, d571750aee77c352]
^Z[9]   Segmentation fault      nohup sudo ./mainnet.sh > log.log 2>&1  (wd: /data/ada/cardano-sl)
(wd now: /data/ada/cardano-sl/state-wallet-mainnet/logs/pub)

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

What version (use the git hash) is this?

Is this repeatable? Ie, it you run it again, do you get the same error?

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

master 1a792d7 [origin/master] Merge #4242
yes,this is repeatable
@erikd

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

That git hash is from Sep 2019.

Try checking out tag 3.2.0 which is from Nov 22, 2019.

If that does not fix it, I will look at this on Monday morning. Its currently 9pm Friday here.

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

thank you @erikd

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

@erikd
I use "nix-build -A connectScripts.mainnet.wallet -o mainnet.sh" command to build.
mainnet.sh:
#!/nix/store/vlb7kcc1k035vpyrgsj9kk7380yh68wd-bash-4.4-p23/bin/bash

set -euo pipefail

if [[ "${1-}" == "--delete-state" ]]; then
echo "Deleting state-wallet-mainnet ... "
rm -Rf state-wallet-mainnet
shift
fi
if [[ "${1-}" == "--runtime-args" ]]; then
RUNTIME_ARGS="${2-}"
shift 2
else
RUNTIME_ARGS=""
fi

echo "Keeping state in state-wallet-mainnet"
mkdir -p state-wallet-mainnet/logs

echo "Launching a node connected to 'mainnet' ..."
export LC_ALL=en_GB.UTF-8
export LANG=en_GB.UTF-8

if [ ! -d state-wallet-mainnet/tls ]; then
mkdir -p state-wallet-mainnet/tls/server && mkdir -p state-wallet-mainnet/tls/client
/nix/store/ra45xgy1ngy9bpn12h5fib7m81925i80-cardano-sl-tools-3.2.0-exe-cardano-x509-certificates/bin/cardano-x509-certificates
--server-out-dir state-wallet-mainnet/tls/server
--clients-out-dir state-wallet-mainnet/tls/client
--configuration-file /nix/store/r02jsbcld1cmy47y1cxr8c9l6y9z7a8n-tls-config-mainnet.yaml
--configuration-key mainnet_full
fi
ln -sf /nix/store/0gzajk6rskv7xigvwhgly1zrn3m75d4r-curl-wallet-mainnet state-wallet-mainnet/curl

exec /nix/store/ya8iqz0l34w9mszd06ir3pchasryqz4a-cardano-wallet-3.2.0-exe-cardano-node/bin/cardano-node
--configuration-file /nix/store/j4rz117v3paa3ys3abfkxacvghyd7chn-cardano-sl-config/lib/configuration.yaml --configuration-key mainnet_full
--tlscert state-wallet-mainnet/tls/server/server.crt
--tlskey state-wallet-mainnet/tls/server/server.key
--tlsca state-wallet-mainnet/tls/server/ca.crt
--log-config /nix/store/j4rz117v3paa3ys3abfkxacvghyd7chn-cardano-sl-config/log-configs/connect-to-cluster.yaml
--topology "/nix/store/kiwxslk8q90j8rrjj4vqnnc9np5a9bhy-topology-mainnet"
--logs-prefix "state-wallet-mainnet/logs"
--db-path "state-wallet-mainnet/db"
--wallet-db-path 'state-wallet-mainnet/wallet-db'
--no-client-auth

--keyfile state-wallet-mainnet/secret.key
--wallet-address 0.0.0.0:8090
--wallet-doc-address 127.0.0.1:8091
--ekg-server 127.0.0.1:8000 --metrics
+RTS -N2 -qg -A1m -I0 -T -RTS

$RUNTIME_ARGS

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

I use "nix - build - A connectScripts.mainnet.wallet - o mainnet.sh" command to build.

There seem to be some extra spaces around the - character in that. It should be nix-build -A and -o.

I just did:

> git checkout 3.2.0 -b tag-3.2.0
> nix-build -A connectScripts.mainnet.wallet -o mainnet.sh

and it worked as expected.

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

@erikd
That's probably why I copied it.It added the space for me automatically.

The process ended abruptly.
same error

[cardano-sl.*production*:Debug:1689] [2020-01-20 02:54:16.92 UTC] Rolling: verifying
[cardano-sl.*production*:Debug:1689] [2020-01-20 02:54:16.92 UTC] verifyBlocksPrefix: 64
[cardano-sl.*production*:Info:1689] [2020-01-20 02:54:16.92 UTC] slogVerifyBlocks: Consensus era is Original
^Z[5]   Segmentation fault      sudo nohup ./mainnet.sh > log.log 2>&1

[6]+  Stopped                 tail -200f log.log

Do I need to upgrade my server? now my server : 2c 4GB RAM

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

4G should be enough RAM, especically if nothing else is happening on that machine.

I am currently running the ./mainnet.sh script. Any idea what epoch you are getting up to when it segfaults?

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

And now it segfaults for me too! At 5046th slot of 3rd epoch.

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

I think this error occurs when the process reaches 1gb of memory.Because I restarted the process it can continue to synchronize.I rebooted once, now at 10365th slot of 13th epoch

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

I am running on a 16G VM, and I was able to recreate that problem so that is not it.

Oh, hang on, you are running on a 64 bit CPU aren't you?

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

I also checked out the HEAD of the develop branch and that synced to epoch 4 without a problem.

Then I switched back to the 3.2.0 tag ans synced from scratch to epoch 5, again without a problem.

I wonder if there is a peer somewhere on the network that is serving up corrupted blocks.

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

admin@ada:/data/ada$ getconf LONG_BIT
64

On Friday, I restarted the node an infinite number of times and finished synchronizing. On Monday, the process was killed.

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

What base OS and OS version are you running this on?

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

Ubuntu 18.04

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

Ubuntu 18.04 should be fine.

I would try deleting the ./state-wallet-mainnet directory, and then try resyncing. Each time it segfaults, record the slot and epoch number and restart it. When you have about 10 entries, post the list here.

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

8280th slot of 8th epoch
18274th slot of 18th epoch
20293rd slot of 19th epoch
7566th slot of 31st epoch
1618th slot of 33rd epoch
4089th slot of 36th epoch
15837th slot of 55th epoch

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

Ok, if you delete the ./state-wallet-mainnet directory and run it again listing the first 10 entries here.

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

What do you mean?

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

Delete ./state-wallet-mainnet directory and do the same test again. Would be useful to know if we get the same results.

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

Do I need to delete this directory when I run it again

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

Yes. Thats is the state directory where the node stores blocks.

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024
[cardano-sl.*production*:Debug:1689] [2020-01-20 04:56:29.75 UTC] Handling block w/ LCA, which is a07d3104
[cardano-sl.*production*:Info:1689] [2020-01-20 04:56:29.75 UTC] Trying to apply blocks w/o rollback. First 3: [MainBlockHeader:
    hash: 92d68c2ba61d115b3c53f1c857c77355062c2ee76680e8afed94980e7fbba239
    previous block: a07d310498f2417e1d6ade2dd2e3da8f802995407845d164616053dfc683a44d
    slot: 18033rd slot of 26th epoch
    difficulty: 579559
    leader: pub:9a6fa343
    signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:9a6fa343, dPk = pub:8b532076 } }
    block: v0.1.0
    software: cardano-sl:0
, MainBlockHeader:
    hash: c5342a48f472640576a177d2992b77f7939e406e463d00bc480ddb339747e85f
    previous block: 92d68c2ba61d115b3c53f1c857c77355062c2ee76680e8afed94980e7fbba239
    slot: 18034th slot of 26th epoch
    difficulty: 579560
    leader: pub:0bdb1f5e
    signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:0bdb1f5e, dPk = pub:5fddeeda } }
    block: v0.1.0
    software: cardano-sl:0
, MainBlockHeader:
    hash: d8462f486a46f0688786e3880c7982a6f69ce8f0adcf18c1e1089bb9480b7490
    previous block: c5342a48f472640576a177d2992b77f7939e406e463d00bc480ddb339747e85f
    slot: 18035th slot of 26th epoch
    difficulty: 579561
    leader: pub:9a6fa343
    signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:9a6fa343, dPk = pub:8b532076 } }
    block: v0.1.0
    software: cardano-sl:0

The log is not finished. The process is killed.But this time no errors were reported.

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

@shenyaqi9527 I need a list of where (ie epoch and slot number) the process gets killed, started from scratch. Please run it again so I can compare it with the last list.

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

Do I need to delete "./state-wallet-mainnet"

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024
[cardano-sl.*production*:Debug:1686] [2020-01-20 05:11:51.96 UTC] verifyBlocksPrefix: 64
[cardano-sl.*production*:Info:1686] [2020-01-20 05:11:51.96 UTC] slogVerifyBlocks: Consensus era is Original
cardano-node: internal error: evacuate: strange closure type 0
    (GHC version 8.4.4 for x86_64_unknown_linux)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

I am really beginning to suspect that your machine is having hardware issues. Can you run some form of diagnostic on it?

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

How to diagnose?
This is the server of aliyun.

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

I would try memtext86+ first and then contact your provider.

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

As a point of reference, I have seen this issue exactly once. I have since restarted and synced to the 135th epoch (and its still going) without a recurrence of the segfault.

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

I changed the server and got the same error

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

What is the git hash?

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

tag-3.2.0 5d0a227 Merge #4252
Are there any commands that need to be executed before creation?

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

No commands require other than what you have been running.

Mine is running quite happily using up to about 15% of my RAM on a 16G rVM.

Maybe try running it on an 8G or 16G machine.

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

Does nix limit memory usage?I used 8GB of memory with the same result. slot: 7832nd slot of 8th epoch.

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

Does nix limit memory usage?

I don't think so.

I would still like a list of the epoch/slot info for the first 10 failures.

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

I tried a few times
18035th slot of 26th epoch
6841st slot of 3rd epoch
16250th slot of 3rd epoch
15769th slot of 4th epoch
20023rd slot of 1st epoch
7832nd slot of 8th epoch

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

So its not deterministic.

When you moved machines, did you keep the same disk image or reinstall from scratch?

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

reinstall from scratch

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

I have just about run out of ideas 😢 .

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

I can't do anything about it now.

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

@shenyaqi9527 Does the dmesg output on your machine list any segfaults?

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

How do I use this command

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

sudo dmesg | grep segfault

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024
admin@ada:/$ sudo dmesg | grep segfault
[ 1547.424449] cardano-node:w[5554]: segfault at 840bffe2a0 ip 00007fae70921c55 sp 00007fae6d919a98 error 6 in libc-2.27.so[7fae707d0000+1aa000]
[ 1910.998425] cardano-node:w[5786]: segfault at 8402c16940 ip 00007fca29cefc55 sp 00007fca05bd4a98 error 6 in libc-2.27.so[7fca29b9e000+1aa000]
[ 1991.020628] cardano-node:w[5883]: segfault at 84044bfac0 ip 00007fc7aee9ec55 sp 00007fc77e7f7a98 error 6 in libc-2.27.so[7fc7aed4d000+1aa000]
[ 2427.408873] cardano-node:w[5942]: segfault at 840c2d55c0 ip 00007ff7f4bcfc55 sp 00007ff7dbffaa98 error 6 in libc-2.27.so[7ff7f4a7e000+1aa000]

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

I got two more in the last 30 minutes. I am seeing something similar to you:

[6124226.626857] cardano-node:w[13755]: segfault at 840554dfc0 ip 00007fa4abdd9d6e sp 00007fa49bffaa98 error 6 in libc-2.27.so[7fa4abc88000+1aa000]
[6138577.260721] cardano-node:w[24086]: segfault at 84079fd1d0 ip 00007f5a12efeb24 sp 00007f5a0a7f7a98 error 6 in libc-2.27.so[7f5a12dad000+1aa000]
[6139714.043880] cardano-node:w[24190]: segfault at 84009cd480 ip 00007fb987390d6e sp 00007fb97e7f7a98 error 6 in libc-2.27.so[7fb98723f000+1aa000]

I'm running this on Debian and I have just noticed there is a libc-2.29 available, so I am going to try upgrading to that.

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

What am I going to do?

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

Wait for me to report back after I do a complete upgrade of my system, reboot and retest?

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

ok

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

Its the end of my day here so you may not hear from me until tomorrow.

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

@shenyaqi9527 is there any chance you are running out of disk space?

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

my disk is 1TB

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

Ok, probably not running out of disk space, but df -h will tell you for sure.

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

Since starting investigation of this I have encounter 5 instances of this. From kern.log:

# grep cardano-node /var/log/kern.log
Jan 20 14:01:36 nix kernel: [6124226.626857] cardano-node:w[13755]: segfault at 840554dfc0 ip 00007fa4abdd9d6e sp 00007fa49bffaa98 error 6 in libc-2.27.so[7fa4abc88000+1aa000]
Jan 20 18:00:47 nix kernel: [6138577.260721] cardano-node:w[24086]: segfault at 84079fd1d0 ip 00007f5a12efeb24 sp 00007f5a0a7f7a98 error 6 in libc-2.27.so[7f5a12dad000+1aa000]
Jan 20 18:19:44 nix kernel: [6139714.043880] cardano-node:w[24190]: segfault at 84009cd480 ip 00007fb987390d6e sp 00007fb97e7f7a98 error 6 in libc-2.27.so[7fb98723f000+1aa000]
Jan 20 19:17:49 nix kernel: [ 2608.843672] traps: cardano-node:w[1150] general protection fault ip:7fae31864d6e sp:7fadfcff4a98 error:0 in libc-2.27.so[7fae31713000+1aa000]
Jan 21 07:34:36 nix kernel: [46815.966810] cardano-node:w[5470]: segfault at 1dbf3fffff8 ip 0000000000413bd5 sp 00007fbe14ff88a0 error 4 in cardano-node[400000+1b00000]
Jan 21 11:03:19 nix kernel: [59339.472653] cardano-node:w[6310]: segfault at 84109bc549 ip 00007fb3d2482b24 sp 00007fb3a27f7a98 error 6 in libc-2.27.so[7fb3d2331000+1aa000]

The last instance happened after a complete apt update && apt upgrade followed by a reboot.

I do notice that the libc version is still 2.27 which seems to come from Nix.

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

IOHK devops have checked all the production/staging/testing instances under their control and they do not see any cardano-node segfaults in the kernel logs. One thing to notice is that the devops machines are all running version 3.1.0 rather than 3.2.0.

I am going to try 3.1.0.

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

And I got a segfault on the 3.1.0 tag.

Jan 21 12:01:33 nix kernel: [62833.180323] cardano-node:w[7063]: segfault at 8410e19440 ip 00007f3228456d6e sp 00007f3217ffaa98 error 6 in libc-2.27.so[7f3228305000+1aa000]

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

Is there a solution to this?

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

I do not even know what is causing this, so obviously there is not yet a solution.

I also cannot recreate this reliably enough. How reliably can you recreate it?

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

Can't IOHK developers fix it?

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

I am an IOHK developer!

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

Please help me think of something to solve this problem. Do I need to change to an OS?

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

You do not need to change OS. The cardano node is devleoped on and for Linux.

This is an obscure and difficult to reproduce problem. You are going to need to show patience.

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

I am currently running the node under valgrind to see if that can help me find the cause.

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

I'm running 3.1.0 and so far this has not been a problem.

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

What epoch/slot are you up to?

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

14651st slot of 54th epoch
root 20 0 1.001t 959396 21924 S 120.6 23.8 31:46.79 cardano-node

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

17635th slot of 67th epoch.This is where the process is killed.

[74788.914053] cardano-node:w[8914]: segfault at 840ccae7a0 ip 00007f4de11d9c55 sp 00007f4dc59cea98 error 6 in libc-2.27.so[7f4de1088000+1aa000]

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

The node has been running for over 20 hours under valgrind and has not yet failed.

Unfortunately, valgrind runs things at last 10 times slower, so it has only managed to sync from zero to epoch 59 in that time.

I am also a bit concerned that valgrind itself may reduce the chance of the bug being triggered.

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

I gave up on that valgrind run so I could try running it under gdb. Unfortunately the heavy use of exceptions within the node app means gdb cannot be used.

Trying without valgrind again.

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

Another segfault (without valgrind):

Jan 22 12:30:26 nix kernel: [150967.689171] cardano-node:w[26733]: segfault at 841eca1560 ip 00007f47db188d6e sp 00007f47a6ff8a98 error 6 in libc-2.27.so[7f47db037000+1aa000]
Jan 22 12:30:26 nix kernel: [150967.689177] Code: ff ff 0f 18 89 40 fe ff ff c5 fe 6f 01 c5 fe 6f 49 e0 c5 fe 6f 51 c0 c5 fe 6f 59 a0 48 81 e9 80 00 00 00 48 81 ea 80 00 00 00 <c4> c1 7d e7 01 c4 c1 7d e7 49 e0 c4 c1 7d e7 51 c0 c4 c1 7d e7 59

According to this, error 6 means "(data) write to an unmapped area".

This is probably due to some C code accessed via the C FFI.

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

I noticed that the Nix build that we are using is linking to version 5.11 of RocksDB whereas Debian has version 5.17 of that library.

Now trying to native Debian build of cardano-node rather than the Nix build.

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

And I almost immediately got a segfault with the version I built without Nix. 😢

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

Currently building a profiled version of cardano-node under Debian. Hoping that gives a proper traceback. 🤞

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

Have run this a number of times and always get a traceback starting with putSerializedBlunds.

@dcoutts asked why there is no C code in that traceback. I think that is because of incompatible debug formats. GHC's profiling uses Dwarf debugging symbols and the C code in the backtrace either may not be enabled or may be an incompatible format.

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

Has the problem been dealt with? @erikd

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

It has not. I have been assigned to other higher priority work.

from cardano-sl.

shenyaqi9527 avatar shenyaqi9527 commented on May 27, 2024

No one is dealing with this problem right now? @erikd

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

No one that I am aware of. It seems that you are one of the few people who has been hitting this problem, and hence the priority has been downgraded.

My advice is to set the node up as a systemd service or something and have it restart automatically.

from cardano-sl.

DZDomi avatar DZDomi commented on May 27, 2024

We are also running into the same issue with all of our 3 nodes. This is the Dockerfile we are using to run our node:

FROM nixos/nix:2.3

ENV CARDANO_VERSION 3.2.0

RUN apk update && \
    apk add git curl bzip2 bash

ADD nix.conf /etc/nix/nix.conf

WORKDIR /opt

RUN git clone https://github.com/input-output-hk/cardano-sl.git

WORKDIR /opt/cardano-sl

RUN git checkout $CARDANO_VERSION
RUN nix-build -A connectScripts.mainnet.wallet -o connect-to-mainnet && \
    nix-build -A connectScripts.mainnet.explorer -o connect-explorer-to-mainnet && \
    nix-build -A connectScripts.testnet.wallet -o connect-to-testnet && \
    nix-build -A connectScripts.testnet.explorer -o connect-explorer-to-testnet

ADD entrypoint.sh .

ENTRYPOINT ["./entrypoint.sh"]

entrypoint.sh:

#!/usr/bin/env bash

if [[ "$1" == "explorer" ]]; then
  if [[ -n "$TESTNET" ]]; then
    exec ./connect-explorer-to-testnet --runtime-args --web-port="$RPC_PORT"
  else
    exec ./connect-explorer-to-mainnet --runtime-args --web-port="$RPC_PORT"
  fi
fi

if [[ "$1" == "node" ]]; then
  if [[ -n "$TESTNET" ]]; then
    sed -i "s/--wallet-address 127.0.0.1:8090/--wallet-address 0.0.0.0:$RPC_PORT/g" connect-to-testnet
    exec ./connect-to-testnet --runtime-args --no-tls
  else
    sed -i "s/--wallet-address 127.0.0.1:8090/--wallet-address 0.0.0.0:$RPC_PORT/g" connect-to-mainnet
    exec ./connect-to-mainnet --runtime-args --no-tls
  fi
fi

nix.conf:

cores = 0
max-jobs = auto
sandbox = false
substituters = https://hydra.iohk.io https://cache.nixos.org
trusted-substituters =
trusted-public-keys  = hydra.iohk.io:f/Ea+s+dFdN+3Y/G+FDgSq+a5NEWhJGzdjvKNGv0/EQ= cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY=

@erikd

from cardano-sl.

michdr avatar michdr commented on May 27, 2024

I can confirm that it happens both in v3.2.0 and v3.1.0.
Any estimate when is this bug going to be taken care of? @erikd

from cardano-sl.

erikd avatar erikd commented on May 27, 2024

The code base is this repo has been maintenance mode for close to a year. This bug is difficult to reproduce suggesting it is machine specific (insufficient memory?). As such it almost certainly will never be fixed.

However, the new code base in the cardano-node repository is nearing completion and can actually connect to mainnet and operate as a full node. It does not yet work with Daedalus. I am not sure if it currently connects to Daedalus.

To provide the best advice on your best path forward, it would be useful if you could tell me your goals.

from cardano-sl.

michdr avatar michdr commented on May 27, 2024

Thanks @erikd . So, is cardano-node expected to be a de-facto replacement for cardano-sl?

The goal is to be able to operate a full and reliable node for various operations on the mainnet (create transactions, verify existing ones, etc.).

from cardano-sl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.