Comments (86)
Ok, profling got me my first traceback:
*** Exception (reporting due to +RTS -xc): (THUNK), stack trace:
Pos.DB.Block.Internal.putSerializedBlunds.\,
called from Pos.DB.Block.Internal.putSerializedBlunds,
called from Pos.DB.Block.Internal.dbPutSerBlundsRealDefault,
called from Cardano.Wallet.Kernel.Mode.dbPutSerBlunds,
called from Pos.DB.Block.Load.putBlunds,
called from Pos.DB.Block.Slog.Logic.slogApplyBlocks,
called from Pos.DB.Block.Logic.Internal.applyBlocksDbUnsafeDo,
called from Pos.DB.Block.Logic.Internal.applyBlocksUnsafe.app,
called from Pos.DB.Block.Logic.Internal.applyBlocksUnsafe,
called from Pos.DB.Block.Logic.VAR.rollingVerifyAndApply.\,
called from Pos.DB.Block.Logic.VAR.rollingVerifyAndApply,
called from Pos.DB.Block.Logic.VAR.verifyAndApplyBlocks,
called from Pos.Network.Block.Logic.applyWithoutRollback.applyWithoutRollbackDo,
called from Pos.DB.GState.Lock.stateLockHelper.\,
Going to run it a couple of more times to make sure it crashes the same way each time.
from cardano-sl.
So, is cardano-node expected to be a de-facto replacement for cardano-sl?
Yes!
from cardano-sl.
Please cut and paste the actual log output rather than including a screenshot. Screenshots are unreadable for people with high DPI monitors.
from cardano-sl.
[cardano-sl.*production*:Info:ThreadId 382] [2020-01-17 07:31:07.48 UTC] Trying to apply blocks w/o rollback. First 3: [MainBlockHeader:
hash: 19b1f1eec6f9abb145114bfeda8cad76f2ff9fda4ff3e4ccdaf369f4052f9c3b
previous block: 1b3c31eb41b0d38af18d9cf908a7c1848911d081271fb29b88d5b010932b2eba
slot: 8364th slot of 152nd epoch
difficulty: 3290025
leader: pub:993a8f05
signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:993a8f05, dPk = pub:89c29f8c } }
block: v0.2.0
software: cardano-sl:1
, MainBlockHeader:
hash: a7b1b58758880db395796ce8b8cba290de717097a3a1de5b50e0a9923a2941f0
previous block: 19b1f1eec6f9abb145114bfeda8cad76f2ff9fda4ff3e4ccdaf369f4052f9c3b
slot: 8365th slot of 152nd epoch
difficulty: 3290026
leader: pub:0bdb1f5e
signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:0bdb1f5e, dPk = pub:5fddeeda } }
block: v0.2.0
software: cardano-sl:1
, MainBlockHeader:
hash: f25b190e0f961f05b111952b72a8cba6b30cffa4caac4c60eda64697471dc606
previous block: a7b1b58758880db395796ce8b8cba290de717097a3a1de5b50e0a9923a2941f0
slot: 8366th slot of 152nd epoch
difficulty: 3290027
leader: pub:1bc97a2f
signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:1bc97a2f, dPk = pub:61261a95 } }
block: v0.2.0
software: cardano-sl:1
]
Last 3: [MainBlockHeader:
hash: 197d5cfea25e990f6893e1250ea248ac25c0465db43abef95677b32c0d3ebbff
previous block: 14b798e52f1b215d77b8be0ff1315ca25885f37dc27cacb7ac6db897d877f8a1
slot: 8425th slot of 152nd epoch
difficulty: 3290086
leader: pub:9a6fa343
signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:9a6fa343, dPk = pub:8b532076 } }
block: v0.2.0
software: cardano-sl:1
, MainBlockHeader:
hash: 5a0795022c4786191d90eaf83f0a58c927d399a4779b1a61b78a0188de439c3a
previous block: 197d5cfea25e990f6893e1250ea248ac25c0465db43abef95677b32c0d3ebbff
slot: 8426th slot of 152nd epoch
difficulty: 3290087
leader: pub:0bdb1f5e
signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:0bdb1f5e, dPk = pub:5fddeeda } }
block: v0.2.0
software: cardano-sl:1
, MainBlockHeader:
hash: d571750aee77c352ae4a3be20b1f229d4e3d6c549668a06b2a19b3b8bc301843
previous block: 5a0795022c4786191d90eaf83f0a58c927d399a4779b1a61b78a0188de439c3a
slot: 8427th slot of 152nd epoch
difficulty: 3290088
leader: pub:0bdb1f5e
signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:0bdb1f5e, dPk = pub:5fddeeda } }
block: v0.2.0
software: cardano-sl:1
]
[cardano-sl.*production*:Debug:ThreadId 382] [2020-01-17 07:31:07.48 UTC] MemPool metrics wait: ApplyBlock queue length is 1
[cardano-sl.*production*:Debug:ThreadId 382] [2020-01-17 07:31:07.48 UTC] MemPool metrics acquire: ApplyBlock wait time was 12mcs
[cardano-sl.*production*:Info:ThreadId 382] [2020-01-17 07:31:07.48 UTC] Verifying and applying blocks...
[cardano-sl.*production*:Debug:ThreadId 382] [2020-01-17 07:31:07.48 UTC] Rolling: verifying
[cardano-sl.*production*:Debug:ThreadId 382] [2020-01-17 07:31:07.48 UTC] verifyBlocksPrefix: 64
[cardano-sl.*production*:Info:ThreadId 382] [2020-01-17 07:31:07.48 UTC] slogVerifyBlocks: Consensus era is Original
[cardano-sl.*production*:Debug:ThreadId 382] [2020-01-17 07:31:07.71 UTC] Rolling: Verification done, applying unsafe block
[cardano-sl.node:Debug:ThreadId 316] [2020-01-17 07:31:07.72 UTC] applying some blocks (non-rollback)
[cardano-sl.*production*:Info:ThreadId 382] [2020-01-17 07:31:07.74 UTC] Verifying and applying blocks done
[cardano-sl.*production*:Debug:ThreadId 382] [2020-01-17 07:31:07.74 UTC] MemPool metrics release: ApplyBlock modify time was 258859mcs size is 0
[cardano-sl.*production*:Debug:ThreadId 382] [2020-01-17 07:31:07.74 UTC] Not relaying block in recovery mode
[cardano-sl.*production*:Info:ThreadId 382] [2020-01-17 07:31:07.74 UTC] Blocks have been adopted: [19b1f1eec6f9abb1, a7b1b58758880db3, f25b190e0f961f05, 0e907e6cf3f6ca4d, d4f74c5d312b0549, e4d06fea6f988ce9, ee279270b61135e9, 6c7998c9ba54d17e, 94853e29ce7200e6, f08ed74a4fcfd8e3, 834b772bd7901ffb, fb514730332114d2, c8a4b3a5697044ee, 5c9a27718384810e, 98573eeb0e53dbd4, a3be980a5027b156, 3713b244551e732b, ff9fa80350026e9e, 1704bdb34a16c6be, 229b40d03c219294, dfcbec3a426b4ca3, 860e4a48c46e67fa, 40b75fe17d58e8bf, faddbd98928a5527, 7b1d29bc099b5b19, ab08ff7a02399e92, ae699dcf30f5ef08, a6d99576b5002d9a, 62b679c913e1c9ee, 4d01d5340c42fdb3, 196d33de29e35ae9, 98ee850aac9a1b9d, 4f3037f91ccd71fd, 24fbc16a4479d02f, 53ac78eb4d6ab7f2, d48b6feb41138da5, 676d449b1ca1fb74, 3327651659423dee, 1a3c9a6c2f095cd7, 16b36ed4da3e05ff, 2223acc65da5f5db, e8d4e12a616f4227, 443ff6e4fffeac0b, 22db605e0059c4d8, 643d9645b54e09b5, fdf7bfe14b34aed0, 6a1650622dec4d07, 1af30b82a5ee01a8, 1e585328b2eda019, db55785d3ea21869, 0ac37f9d03244062, cc9d0a297ea9af53, ebffb3fc15d7b62b, e7b81541b23ace3a, a48604d65cef0180, 7af2bd4c948d283f, 013dc0ea380f67a1, de88e7e0c354e93a, a762148ba5fbf0a6, 432652c976cbbdc4, 14b798e52f1b215d, 197d5cfea25e990f, 5a0795022c478619, d571750aee77c352]
^Z[9] Segmentation fault nohup sudo ./mainnet.sh > log.log 2>&1 (wd: /data/ada/cardano-sl)
(wd now: /data/ada/cardano-sl/state-wallet-mainnet/logs/pub)
from cardano-sl.
What version (use the git hash) is this?
Is this repeatable? Ie, it you run it again, do you get the same error?
from cardano-sl.
master 1a792d7 [origin/master] Merge #4242
yes,this is repeatable
@erikd
from cardano-sl.
That git hash is from Sep 2019.
Try checking out tag 3.2.0 which is from Nov 22, 2019.
If that does not fix it, I will look at this on Monday morning. Its currently 9pm Friday here.
from cardano-sl.
thank you @erikd
from cardano-sl.
@erikd
I use "nix-build -A connectScripts.mainnet.wallet -o mainnet.sh" command to build.
mainnet.sh:
#!/nix/store/vlb7kcc1k035vpyrgsj9kk7380yh68wd-bash-4.4-p23/bin/bash
set -euo pipefail
if [[ "${1-}" == "--delete-state" ]]; then
echo "Deleting state-wallet-mainnet ... "
rm -Rf state-wallet-mainnet
shift
fi
if [[ "${1-}" == "--runtime-args" ]]; then
RUNTIME_ARGS="${2-}"
shift 2
else
RUNTIME_ARGS=""
fi
echo "Keeping state in state-wallet-mainnet"
mkdir -p state-wallet-mainnet/logs
echo "Launching a node connected to 'mainnet' ..."
export LC_ALL=en_GB.UTF-8
export LANG=en_GB.UTF-8
if [ ! -d state-wallet-mainnet/tls ]; then
mkdir -p state-wallet-mainnet/tls/server && mkdir -p state-wallet-mainnet/tls/client
/nix/store/ra45xgy1ngy9bpn12h5fib7m81925i80-cardano-sl-tools-3.2.0-exe-cardano-x509-certificates/bin/cardano-x509-certificates
--server-out-dir state-wallet-mainnet/tls/server
--clients-out-dir state-wallet-mainnet/tls/client
--configuration-file /nix/store/r02jsbcld1cmy47y1cxr8c9l6y9z7a8n-tls-config-mainnet.yaml
--configuration-key mainnet_full
fi
ln -sf /nix/store/0gzajk6rskv7xigvwhgly1zrn3m75d4r-curl-wallet-mainnet state-wallet-mainnet/curl
exec /nix/store/ya8iqz0l34w9mszd06ir3pchasryqz4a-cardano-wallet-3.2.0-exe-cardano-node/bin/cardano-node
--configuration-file /nix/store/j4rz117v3paa3ys3abfkxacvghyd7chn-cardano-sl-config/lib/configuration.yaml --configuration-key mainnet_full
--tlscert state-wallet-mainnet/tls/server/server.crt
--tlskey state-wallet-mainnet/tls/server/server.key
--tlsca state-wallet-mainnet/tls/server/ca.crt
--log-config /nix/store/j4rz117v3paa3ys3abfkxacvghyd7chn-cardano-sl-config/log-configs/connect-to-cluster.yaml
--topology "/nix/store/kiwxslk8q90j8rrjj4vqnnc9np5a9bhy-topology-mainnet"
--logs-prefix "state-wallet-mainnet/logs"
--db-path "state-wallet-mainnet/db"
--wallet-db-path 'state-wallet-mainnet/wallet-db'
--no-client-auth
--keyfile state-wallet-mainnet/secret.key
--wallet-address 0.0.0.0:8090
--wallet-doc-address 127.0.0.1:8091
--ekg-server 127.0.0.1:8000 --metrics
+RTS -N2 -qg -A1m -I0 -T -RTS
$RUNTIME_ARGS
from cardano-sl.
I use "nix - build - A connectScripts.mainnet.wallet - o mainnet.sh" command to build.
There seem to be some extra spaces around the -
character in that. It should be nix-build
-A
and -o
.
I just did:
> git checkout 3.2.0 -b tag-3.2.0
> nix-build -A connectScripts.mainnet.wallet -o mainnet.sh
and it worked as expected.
from cardano-sl.
@erikd
That's probably why I copied it.It added the space for me automatically.
The process ended abruptly.
same error
[cardano-sl.*production*:Debug:1689] [2020-01-20 02:54:16.92 UTC] Rolling: verifying
[cardano-sl.*production*:Debug:1689] [2020-01-20 02:54:16.92 UTC] verifyBlocksPrefix: 64
[cardano-sl.*production*:Info:1689] [2020-01-20 02:54:16.92 UTC] slogVerifyBlocks: Consensus era is Original
^Z[5] Segmentation fault sudo nohup ./mainnet.sh > log.log 2>&1
[6]+ Stopped tail -200f log.log
Do I need to upgrade my server? now my server : 2c 4GB RAM
from cardano-sl.
4G should be enough RAM, especically if nothing else is happening on that machine.
I am currently running the ./mainnet.sh
script. Any idea what epoch you are getting up to when it segfaults?
from cardano-sl.
And now it segfaults for me too! At 5046th slot of 3rd epoch
.
from cardano-sl.
I think this error occurs when the process reaches 1gb of memory.Because I restarted the process it can continue to synchronize.I rebooted once, now at 10365th slot of 13th epoch
from cardano-sl.
I am running on a 16G VM, and I was able to recreate that problem so that is not it.
Oh, hang on, you are running on a 64 bit CPU aren't you?
from cardano-sl.
I also checked out the HEAD
of the develop
branch and that synced to epoch 4 without a problem.
Then I switched back to the 3.2.0
tag ans synced from scratch to epoch 5, again without a problem.
I wonder if there is a peer somewhere on the network that is serving up corrupted blocks.
from cardano-sl.
admin@ada:/data/ada$ getconf LONG_BIT
64
On Friday, I restarted the node an infinite number of times and finished synchronizing. On Monday, the process was killed.
from cardano-sl.
What base OS and OS version are you running this on?
from cardano-sl.
Ubuntu 18.04
from cardano-sl.
Ubuntu 18.04 should be fine.
I would try deleting the ./state-wallet-mainnet
directory, and then try resyncing. Each time it segfaults, record the slot and epoch number and restart it. When you have about 10 entries, post the list here.
from cardano-sl.
8280th slot of 8th epoch
18274th slot of 18th epoch
20293rd slot of 19th epoch
7566th slot of 31st epoch
1618th slot of 33rd epoch
4089th slot of 36th epoch
15837th slot of 55th epoch
from cardano-sl.
Ok, if you delete the ./state-wallet-mainnet
directory and run it again listing the first 10 entries here.
from cardano-sl.
What do you mean?
from cardano-sl.
Delete ./state-wallet-mainnet
directory and do the same test again. Would be useful to know if we get the same results.
from cardano-sl.
Do I need to delete this directory when I run it again
from cardano-sl.
Yes. Thats is the state directory where the node stores blocks.
from cardano-sl.
[cardano-sl.*production*:Debug:1689] [2020-01-20 04:56:29.75 UTC] Handling block w/ LCA, which is a07d3104
[cardano-sl.*production*:Info:1689] [2020-01-20 04:56:29.75 UTC] Trying to apply blocks w/o rollback. First 3: [MainBlockHeader:
hash: 92d68c2ba61d115b3c53f1c857c77355062c2ee76680e8afed94980e7fbba239
previous block: a07d310498f2417e1d6ade2dd2e3da8f802995407845d164616053dfc683a44d
slot: 18033rd slot of 26th epoch
difficulty: 579559
leader: pub:9a6fa343
signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:9a6fa343, dPk = pub:8b532076 } }
block: v0.1.0
software: cardano-sl:0
, MainBlockHeader:
hash: c5342a48f472640576a177d2992b77f7939e406e463d00bc480ddb339747e85f
previous block: 92d68c2ba61d115b3c53f1c857c77355062c2ee76680e8afed94980e7fbba239
slot: 18034th slot of 26th epoch
difficulty: 579560
leader: pub:0bdb1f5e
signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:0bdb1f5e, dPk = pub:5fddeeda } }
block: v0.1.0
software: cardano-sl:0
, MainBlockHeader:
hash: d8462f486a46f0688786e3880c7982a6f69ce8f0adcf18c1e1089bb9480b7490
previous block: c5342a48f472640576a177d2992b77f7939e406e463d00bc480ddb339747e85f
slot: 18035th slot of 26th epoch
difficulty: 579561
leader: pub:9a6fa343
signature: BlockPSignatureHeavy: Proxy signature { psk = ProxySk { w = #0, iPk = pub:9a6fa343, dPk = pub:8b532076 } }
block: v0.1.0
software: cardano-sl:0
The log is not finished. The process is killed.But this time no errors were reported.
from cardano-sl.
@shenyaqi9527 I need a list of where (ie epoch and slot number) the process gets killed, started from scratch. Please run it again so I can compare it with the last list.
from cardano-sl.
Do I need to delete "./state-wallet-mainnet"
from cardano-sl.
[cardano-sl.*production*:Debug:1686] [2020-01-20 05:11:51.96 UTC] verifyBlocksPrefix: 64
[cardano-sl.*production*:Info:1686] [2020-01-20 05:11:51.96 UTC] slogVerifyBlocks: Consensus era is Original
cardano-node: internal error: evacuate: strange closure type 0
(GHC version 8.4.4 for x86_64_unknown_linux)
Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
from cardano-sl.
I am really beginning to suspect that your machine is having hardware issues. Can you run some form of diagnostic on it?
from cardano-sl.
How to diagnose?
This is the server of aliyun.
from cardano-sl.
I would try memtext86+
first and then contact your provider.
from cardano-sl.
As a point of reference, I have seen this issue exactly once. I have since restarted and synced to the 135th epoch (and its still going) without a recurrence of the segfault.
from cardano-sl.
I changed the server and got the same error
from cardano-sl.
What is the git hash?
from cardano-sl.
tag-3.2.0 5d0a227 Merge #4252
Are there any commands that need to be executed before creation?
from cardano-sl.
No commands require other than what you have been running.
Mine is running quite happily using up to about 15% of my RAM on a 16G rVM.
Maybe try running it on an 8G or 16G machine.
from cardano-sl.
Does nix limit memory usage?I used 8GB of memory with the same result. slot: 7832nd slot of 8th epoch.
from cardano-sl.
Does nix limit memory usage?
I don't think so.
I would still like a list of the epoch/slot info for the first 10 failures.
from cardano-sl.
I tried a few times
18035th slot of 26th epoch
6841st slot of 3rd epoch
16250th slot of 3rd epoch
15769th slot of 4th epoch
20023rd slot of 1st epoch
7832nd slot of 8th epoch
from cardano-sl.
So its not deterministic.
When you moved machines, did you keep the same disk image or reinstall from scratch?
from cardano-sl.
reinstall from scratch
from cardano-sl.
I have just about run out of ideas 😢 .
from cardano-sl.
I can't do anything about it now.
from cardano-sl.
@shenyaqi9527 Does the dmesg
output on your machine list any segfaults?
from cardano-sl.
How do I use this command
from cardano-sl.
sudo dmesg | grep segfault
from cardano-sl.
admin@ada:/$ sudo dmesg | grep segfault
[ 1547.424449] cardano-node:w[5554]: segfault at 840bffe2a0 ip 00007fae70921c55 sp 00007fae6d919a98 error 6 in libc-2.27.so[7fae707d0000+1aa000]
[ 1910.998425] cardano-node:w[5786]: segfault at 8402c16940 ip 00007fca29cefc55 sp 00007fca05bd4a98 error 6 in libc-2.27.so[7fca29b9e000+1aa000]
[ 1991.020628] cardano-node:w[5883]: segfault at 84044bfac0 ip 00007fc7aee9ec55 sp 00007fc77e7f7a98 error 6 in libc-2.27.so[7fc7aed4d000+1aa000]
[ 2427.408873] cardano-node:w[5942]: segfault at 840c2d55c0 ip 00007ff7f4bcfc55 sp 00007ff7dbffaa98 error 6 in libc-2.27.so[7ff7f4a7e000+1aa000]
from cardano-sl.
I got two more in the last 30 minutes. I am seeing something similar to you:
[6124226.626857] cardano-node:w[13755]: segfault at 840554dfc0 ip 00007fa4abdd9d6e sp 00007fa49bffaa98 error 6 in libc-2.27.so[7fa4abc88000+1aa000]
[6138577.260721] cardano-node:w[24086]: segfault at 84079fd1d0 ip 00007f5a12efeb24 sp 00007f5a0a7f7a98 error 6 in libc-2.27.so[7f5a12dad000+1aa000]
[6139714.043880] cardano-node:w[24190]: segfault at 84009cd480 ip 00007fb987390d6e sp 00007fb97e7f7a98 error 6 in libc-2.27.so[7fb98723f000+1aa000]
I'm running this on Debian and I have just noticed there is a libc-2.29
available, so I am going to try upgrading to that.
from cardano-sl.
What am I going to do?
from cardano-sl.
Wait for me to report back after I do a complete upgrade of my system, reboot and retest?
from cardano-sl.
ok
from cardano-sl.
Its the end of my day here so you may not hear from me until tomorrow.
from cardano-sl.
@shenyaqi9527 is there any chance you are running out of disk space?
from cardano-sl.
my disk is 1TB
from cardano-sl.
Ok, probably not running out of disk space, but df -h
will tell you for sure.
from cardano-sl.
Since starting investigation of this I have encounter 5 instances of this. From kern.log
:
# grep cardano-node /var/log/kern.log
Jan 20 14:01:36 nix kernel: [6124226.626857] cardano-node:w[13755]: segfault at 840554dfc0 ip 00007fa4abdd9d6e sp 00007fa49bffaa98 error 6 in libc-2.27.so[7fa4abc88000+1aa000]
Jan 20 18:00:47 nix kernel: [6138577.260721] cardano-node:w[24086]: segfault at 84079fd1d0 ip 00007f5a12efeb24 sp 00007f5a0a7f7a98 error 6 in libc-2.27.so[7f5a12dad000+1aa000]
Jan 20 18:19:44 nix kernel: [6139714.043880] cardano-node:w[24190]: segfault at 84009cd480 ip 00007fb987390d6e sp 00007fb97e7f7a98 error 6 in libc-2.27.so[7fb98723f000+1aa000]
Jan 20 19:17:49 nix kernel: [ 2608.843672] traps: cardano-node:w[1150] general protection fault ip:7fae31864d6e sp:7fadfcff4a98 error:0 in libc-2.27.so[7fae31713000+1aa000]
Jan 21 07:34:36 nix kernel: [46815.966810] cardano-node:w[5470]: segfault at 1dbf3fffff8 ip 0000000000413bd5 sp 00007fbe14ff88a0 error 4 in cardano-node[400000+1b00000]
Jan 21 11:03:19 nix kernel: [59339.472653] cardano-node:w[6310]: segfault at 84109bc549 ip 00007fb3d2482b24 sp 00007fb3a27f7a98 error 6 in libc-2.27.so[7fb3d2331000+1aa000]
The last instance happened after a complete apt update && apt upgrade
followed by a reboot.
I do notice that the libc
version is still 2.27
which seems to come from Nix.
from cardano-sl.
IOHK devops have checked all the production/staging/testing instances under their control and they do not see any cardano-node segfaults in the kernel logs. One thing to notice is that the devops machines are all running version 3.1.0
rather than 3.2.0
.
I am going to try 3.1.0
.
from cardano-sl.
And I got a segfault on the 3.1.0
tag.
Jan 21 12:01:33 nix kernel: [62833.180323] cardano-node:w[7063]: segfault at 8410e19440 ip 00007f3228456d6e sp 00007f3217ffaa98 error 6 in libc-2.27.so[7f3228305000+1aa000]
from cardano-sl.
Is there a solution to this?
from cardano-sl.
I do not even know what is causing this, so obviously there is not yet a solution.
I also cannot recreate this reliably enough. How reliably can you recreate it?
from cardano-sl.
Can't IOHK developers fix it?
from cardano-sl.
I am an IOHK developer!
from cardano-sl.
Please help me think of something to solve this problem. Do I need to change to an OS?
from cardano-sl.
You do not need to change OS. The cardano node is devleoped on and for Linux.
This is an obscure and difficult to reproduce problem. You are going to need to show patience.
from cardano-sl.
I am currently running the node under valgrind
to see if that can help me find the cause.
from cardano-sl.
I'm running 3.1.0 and so far this has not been a problem.
from cardano-sl.
What epoch/slot are you up to?
from cardano-sl.
14651st slot of 54th epoch
root 20 0 1.001t 959396 21924 S 120.6 23.8 31:46.79 cardano-node
from cardano-sl.
17635th slot of 67th epoch.This is where the process is killed.
[74788.914053] cardano-node:w[8914]: segfault at 840ccae7a0 ip 00007f4de11d9c55 sp 00007f4dc59cea98 error 6 in libc-2.27.so[7f4de1088000+1aa000]
from cardano-sl.
The node has been running for over 20 hours under valgrind
and has not yet failed.
Unfortunately, valgrind
runs things at last 10 times slower, so it has only managed to sync from zero to epoch 59 in that time.
I am also a bit concerned that valgrind
itself may reduce the chance of the bug being triggered.
from cardano-sl.
I gave up on that valgrind
run so I could try running it under gdb
. Unfortunately the heavy use of exceptions within the node app means gdb
cannot be used.
Trying without valgrind
again.
from cardano-sl.
Another segfault (without valgrind):
Jan 22 12:30:26 nix kernel: [150967.689171] cardano-node:w[26733]: segfault at 841eca1560 ip 00007f47db188d6e sp 00007f47a6ff8a98 error 6 in libc-2.27.so[7f47db037000+1aa000]
Jan 22 12:30:26 nix kernel: [150967.689177] Code: ff ff 0f 18 89 40 fe ff ff c5 fe 6f 01 c5 fe 6f 49 e0 c5 fe 6f 51 c0 c5 fe 6f 59 a0 48 81 e9 80 00 00 00 48 81 ea 80 00 00 00 <c4> c1 7d e7 01 c4 c1 7d e7 49 e0 c4 c1 7d e7 51 c0 c4 c1 7d e7 59
According to this, error 6
means "(data) write to an unmapped area".
This is probably due to some C code accessed via the C FFI.
from cardano-sl.
I noticed that the Nix build that we are using is linking to version 5.11 of RocksDB whereas Debian has version 5.17 of that library.
Now trying to native Debian build of cardano-node
rather than the Nix build.
from cardano-sl.
And I almost immediately got a segfault with the version I built without Nix. 😢
from cardano-sl.
Currently building a profiled version of cardano-node
under Debian. Hoping that gives a proper traceback. 🤞
from cardano-sl.
Have run this a number of times and always get a traceback starting with putSerializedBlunds
.
@dcoutts asked why there is no C code in that traceback. I think that is because of incompatible debug formats. GHC's profiling uses Dwarf debugging symbols and the C code in the backtrace either may not be enabled or may be an incompatible format.
from cardano-sl.
Has the problem been dealt with? @erikd
from cardano-sl.
It has not. I have been assigned to other higher priority work.
from cardano-sl.
No one is dealing with this problem right now? @erikd
from cardano-sl.
No one that I am aware of. It seems that you are one of the few people who has been hitting this problem, and hence the priority has been downgraded.
My advice is to set the node up as a systemd
service or something and have it restart automatically.
from cardano-sl.
We are also running into the same issue with all of our 3 nodes. This is the Dockerfile we are using to run our node:
FROM nixos/nix:2.3
ENV CARDANO_VERSION 3.2.0
RUN apk update && \
apk add git curl bzip2 bash
ADD nix.conf /etc/nix/nix.conf
WORKDIR /opt
RUN git clone https://github.com/input-output-hk/cardano-sl.git
WORKDIR /opt/cardano-sl
RUN git checkout $CARDANO_VERSION
RUN nix-build -A connectScripts.mainnet.wallet -o connect-to-mainnet && \
nix-build -A connectScripts.mainnet.explorer -o connect-explorer-to-mainnet && \
nix-build -A connectScripts.testnet.wallet -o connect-to-testnet && \
nix-build -A connectScripts.testnet.explorer -o connect-explorer-to-testnet
ADD entrypoint.sh .
ENTRYPOINT ["./entrypoint.sh"]
entrypoint.sh:
#!/usr/bin/env bash
if [[ "$1" == "explorer" ]]; then
if [[ -n "$TESTNET" ]]; then
exec ./connect-explorer-to-testnet --runtime-args --web-port="$RPC_PORT"
else
exec ./connect-explorer-to-mainnet --runtime-args --web-port="$RPC_PORT"
fi
fi
if [[ "$1" == "node" ]]; then
if [[ -n "$TESTNET" ]]; then
sed -i "s/--wallet-address 127.0.0.1:8090/--wallet-address 0.0.0.0:$RPC_PORT/g" connect-to-testnet
exec ./connect-to-testnet --runtime-args --no-tls
else
sed -i "s/--wallet-address 127.0.0.1:8090/--wallet-address 0.0.0.0:$RPC_PORT/g" connect-to-mainnet
exec ./connect-to-mainnet --runtime-args --no-tls
fi
fi
nix.conf:
cores = 0
max-jobs = auto
sandbox = false
substituters = https://hydra.iohk.io https://cache.nixos.org
trusted-substituters =
trusted-public-keys = hydra.iohk.io:f/Ea+s+dFdN+3Y/G+FDgSq+a5NEWhJGzdjvKNGv0/EQ= cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY=
from cardano-sl.
I can confirm that it happens both in v3.2.0 and v3.1.0.
Any estimate when is this bug going to be taken care of? @erikd
from cardano-sl.
The code base is this repo has been maintenance mode for close to a year. This bug is difficult to reproduce suggesting it is machine specific (insufficient memory?). As such it almost certainly will never be fixed.
However, the new code base in the cardano-node
repository is nearing completion and can actually connect to mainnet and operate as a full node. It does not yet work with Daedalus. I am not sure if it currently connects to Daedalus.
To provide the best advice on your best path forward, it would be useful if you could tell me your goals.
from cardano-sl.
Thanks @erikd . So, is cardano-node
expected to be a de-facto replacement for cardano-sl
?
The goal is to be able to operate a full and reliable node for various operations on the mainnet (create transactions, verify existing ones, etc.).
from cardano-sl.
Related Issues (20)
- Installation fails HOT 3
- Legacy explorer frontend dependency on NodeJS 6.x HOT 1
- wallet api - account created without spending password
- wallet api - account created successfully with non existing wallet name
- `Accounts` endpoint returning incorrect balance for subsequent queries
- Lack of address history when calling API via cardano-sl fullnode HOT 2
- Need support regarding Cardano API and SDK HOT 2
- How to query original transactions not associated with wallet
- nix-build -A cardano-sl-node-static --out-link master error HOT 1
- Build and Run Wallet using docker without tls
- Questions about query blocks HOT 3
- Several new "panicked at" errors while bootstrap HOT 1
- how to get testnet coin
- BlockNetLogicException: DialogUnexpected "stake distribution for epoch #179 is unknown" HOT 1
- Failed to verify blocks: stake distribution for epoch #179 is unknown HOT 28
- Not synchronized after a certain block HOT 5
- new node is not synchronized HOT 2
- Does the mainnet have an upgrade plan now? HOT 1
- cardano-sl don`t sync .How do I move my wallet from Cardano-SL to Cardano-Wallet HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cardano-sl.