Giter Club home page Giter Club logo

Comments (11)

tchaikov avatar tchaikov commented on June 2, 2024

scylla-376.log

see https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/5868

from scylladb.

denesb avatar denesb commented on June 2, 2024

Seen again: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/5922/testReport/junit/(root)/non-boost%20tests/Tests___Unit_Tests___topology_test_coordinator_queue_management_dev_2/

from scylladb.

tchaikov avatar tchaikov commented on June 2, 2024

a similar issue is found when runing topology_experimental_raft/test_tablets.py::test_tablet_cleanup:

    @pytest.mark.asyncio
    async def test_tablet_cleanup(manager: ManagerClient):
        cmdline = ['--smp=2', '--commitlog-sync=batch']
...
        await wait_for_cql_and_get_hosts(cql, servers, time.time() + 60)
>       partitions_after_loss = (await cql.run_async("SELECT COUNT(*) FROM test.test"))[0].count
E       cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 127.69.70.49:9042 datacenter1>: <Error from server: code=0000 [Server error] message="No live endpoint available">, <Host: 127.69.70.12:9042 datacenter1>: <Error from server: code=0000 [Server error] message="No live endpoint available">})

INFO  2024-01-27 18:28:31,359 [shard 0:stmt] cql_server_controller - Setting up maintenance socket on /scylladir/testlog/x86_64/dev/scylla-1771/cql.m
INFO  2024-01-27 18:28:31,359 [shard 0:stmt] cql_server_controller - Starting listening for CQL clients on /scylladir/testlog/x86_64/dev/scylla-1771/cql.m (unencrypted, non-shard-aware)
WARN  2024-01-27 18:28:31,359 [shard 0:stmt] gossip - Fail to apply application_state: std::runtime_error (endpoint_state_map does not contain endpoint = 127.69.70.49, application_states = {{RPC_READY -> Value(1,1)}})
INFO  2024-01-27 18:28:31,359 [shard 0:comp] compaction - [Compact system.group0_history 1bdcb3c0-bd31-11ee-8786-dfb990458783] Compacted 2 sstables to [/scylladir/testlog/x86_64/dev/scylla-1771/data/system/group0_history-027e42f5683a3ed7b404a0100762063c/me-3gd4_19rj_24awx2271dhw7a7qf7-big-Data.db:level=0]. 19kB to 13kB (~70% of original) in 2ms = 9MB/s. ~256 total partitions merged to 1.
INFO  2024-01-27 18:28:31,359 [shard 0:comp] compaction - [Compact system.local 1bdd28f0-bd31-11ee-8786-dfb990458783] Compacting [/scylladir/testlog/x86_64/dev/scylla-1771/data/system/local-7ad54392bcdd35a684174e047860b377/me-3gd4_19rj_01xg02pffkgzp6tkao-big-Data.db:level=0:origin=memtable,/scylladir/testlog/x86_64/dev/scylla-1771/data/system/local-7ad54392bcdd35a684174e047860b377/me-3gd4_19rg_4blup2pffkgzp6tkao-big-Data.db:level=0:origin=compaction]
INFO  2024-01-27 18:28:31,359 [shard 1:comp] compaction - [Compact system.truncated 1bdcb3c0-bd31-11ee-b3fb-dfb890458783] Compacted 2 sstables to [/scylladir/testlog/x86_64/dev/scylla-1771/data/system/truncated-38c19fd0fb863310a4b70d0cc66628aa/me-3gd4_19rj_24awx2qj7zo4ducrcz-big-Data.db:level=0]. 148kB to 74kB (~50% of original) in 3ms = 49MB/s. ~256 total partitions merged to 6.
INFO  2024-01-27 18:28:31,359 [shard 1:comp] compaction - [Compact system_distributed.cdc_streams_descriptions_v2 1bdd28f0-bd31-11ee-b3fb-dfb890458783] Compacting [/scylladir/testlog/x86_64/dev/scylla-1771/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/me-3gd4_19rf_4vjg02vwmhl9zj6sqo-big-Data.db:level=0:origin=repair,/scylladir/testlog/x86_64/dev/scylla-1771/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/me-3gd4_19ri_5y4c02vwmhl9zj6sqo-big-Data.db:level=0:origin=memtable]
INFO  2024-01-27 18:28:31,360 [shard 0:main] raft_group0 - setup_group0: group 0 ID present. Starting existing Raft server.
INFO  2024-01-27 18:28:31,360 [shard 0:main] raft_group0 - Server e4726541-3d82-47f0-99b6-989415a34417 is starting group 0 with id 187d8331-bd31-11ee-8483-432ac45b5434
INFO  2024-01-27 18:28:31,362 [shard 1:comp] compaction - [Compact system_distributed.cdc_streams_descriptions_v2 1bdd28f0-bd31-11ee-b3fb-dfb890458783] Compacted 2 sstables to [/scylladir/testlog/x86_64/dev/scylla-1771/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/me-3gd4_19rj_255s02qj7zo4ducrcz-big-Data.db:level=0]. 12kB to 37kB (~294% of original) in 3ms = 4MB/s. ~256 total partitions merged to 2.
DEBUG 2024-01-27 18:28:31,363 [shard 0:main] raft_topology - reload raft topology state
WARN  2024-01-27 18:28:31,364 [shard 0:main] gossip - Fail to apply application_state: std::runtime_error (endpoint_state_map does not contain endpoint = 127.69.70.49, application_states = {{STATUS -> Value(NORMAL,8055936798076345848,2)}})

see https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/6149/artifact/testlog/x86_64/dev/topology_experimental_raft.test_tablets.4.log

from scylladb.

tchaikov avatar tchaikov commented on June 2, 2024

spotted again at https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/6131/testReport/junit/(root)/non-boost%20tests/Tests___Unit_Tests___auth_cluster_test_password_login_message_debug_3/

see https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/6131/artifact/testlog/x86_64/debug/scylla-27.log

INFO  2024-01-26 12:48:12,781 [shard 0:stmt] cql_server_controller - Setting up maintenance socket on /scylladir/testlog/x86_64/debug/scylla-27/cql.m
INFO  2024-01-26 12:48:12,783 [shard 0:stmt] cql_server_controller - Starting listening for CQL clients on /scylladir/testlog/x86_64/debug/scylla-27/cql.m (unencrypted, non-shard-aware)
WARN  2024-01-26 12:48:12,783 [shard 0:stmt] gossip - Fail to apply application_state: std::runtime_error (endpoint_state_map does not contain endpoint = 127.204.200.11, application_states = {{RPC_READY -> Value(1,1)}})
INFO  2024-01-26 12:48:12,787 [shard 0:main] raft_group0 - Disabling migration_manager schema pulls because Raft is enabled and we're bootstrapping.
INFO  2024-01-26 12:48:12,788 [shard 0:strm] messaging_service - Starting Messaging Service on address 127.204.200.11 port 7000

from scylladb.

kbr-scylla avatar kbr-scylla commented on June 2, 2024

@tchaikov the 3 different failures you reported are... different


In the first one we see that the joining node got stuck on

INFO  2024-01-18 10:48:12,180 [shard 0:strm] storage_service - raft topology: join: request to join placed, waiting for the response from the topology coordinator

so it was stuck waiting for the topology coordinator to join it.
Unfortunately, the artifacts are gone, so I cannot identify what happened to the coordinator.


In the second one we see a failure not when trying to join a server, but when trying to do a query. We got a dedicated issue for that #17029 and the solution is known


In the third one we see that the server successfully started:

INFO  2024-01-26 12:48:25,346 [shard 0:stmt] cql_server_controller - Starting listening for CQL clients on 127.204.200.11:9042 (unencrypted, non-shard-aware)
INFO  2024-01-26 12:48:25,346 [shard 0:stmt] cql_server_controller - Starting listening for CQL clients on 127.204.200.11:19042 (unencrypted, shard-aware)
INFO  2024-01-26 12:48:25,383 [shard 0:main] init - serving
INFO  2024-01-26 12:48:25,384 [shard 0:main] init - Scylla version 5.5.0~dev-0.20240126.c02893f7ef92 initialization completed.

but the driver couldn't connect -- part of the test framework's server start function is to check CQL connectivity, and that part apparently timed out.

This looks like a new one. @patjed41 also encountered it in his PR, and is going to open a new issue

from scylladb.

tchaikov avatar tchaikov commented on June 2, 2024

thank you Kamil! i was using endpoint_state_map does not contain endpoint was the only feature when clustering the failures. but seems i should have used feature vectors with more elements. i will try to do better next time.

from scylladb.

kbr-scylla avatar kbr-scylla commented on June 2, 2024

The third one is #17041

from scylladb.

nyh avatar nyh commented on June 2, 2024

I'm not sure this issue should really be a single issue (?), but anyway I saw again the exact same test_tablet_cleanup failure that @tchaikov reported above, in another CI failure https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/6201/testReport/junit/(root)/test_tablets/Tests___Unit_Tests___test_tablet_cleanup/:

       await manager.server_start(servers[1].server_id)
        await wait_for_cql_and_get_hosts(cql, servers, time.time() + 60)
>       partitions_after_loss = (await cql.run_async("SELECT COUNT(*) FROM test.test"))[0].count
E       cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 127.136.164.36:9042 datacenter1>: <Error from server: code=0000 [Server error] message="No live endpoint available">, <Host: 127.136.164.2:9042 datacenter1>: <Error from server: code=0000 [Server error] message="No live endpoint available">})

from scylladb.

nyh avatar nyh commented on June 2, 2024

Oh, I guess it's #17029?

from scylladb.

kbr-scylla avatar kbr-scylla commented on June 2, 2024

Yes @nyh looks like your failure was before #17040

from scylladb.

mykaul avatar mykaul commented on June 2, 2024

Closing this one - as we seem to have an issue per each different failure here.

from scylladb.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.