Comments (11)
see https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/5868
from scylladb.
from scylladb.
a similar issue is found when runing topology_experimental_raft/test_tablets.py::test_tablet_cleanup
:
@pytest.mark.asyncio
async def test_tablet_cleanup(manager: ManagerClient):
cmdline = ['--smp=2', '--commitlog-sync=batch']
...
await wait_for_cql_and_get_hosts(cql, servers, time.time() + 60)
> partitions_after_loss = (await cql.run_async("SELECT COUNT(*) FROM test.test"))[0].count
E cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 127.69.70.49:9042 datacenter1>: <Error from server: code=0000 [Server error] message="No live endpoint available">, <Host: 127.69.70.12:9042 datacenter1>: <Error from server: code=0000 [Server error] message="No live endpoint available">})
INFO 2024-01-27 18:28:31,359 [shard 0:stmt] cql_server_controller - Setting up maintenance socket on /scylladir/testlog/x86_64/dev/scylla-1771/cql.m
INFO 2024-01-27 18:28:31,359 [shard 0:stmt] cql_server_controller - Starting listening for CQL clients on /scylladir/testlog/x86_64/dev/scylla-1771/cql.m (unencrypted, non-shard-aware)
WARN 2024-01-27 18:28:31,359 [shard 0:stmt] gossip - Fail to apply application_state: std::runtime_error (endpoint_state_map does not contain endpoint = 127.69.70.49, application_states = {{RPC_READY -> Value(1,1)}})
INFO 2024-01-27 18:28:31,359 [shard 0:comp] compaction - [Compact system.group0_history 1bdcb3c0-bd31-11ee-8786-dfb990458783] Compacted 2 sstables to [/scylladir/testlog/x86_64/dev/scylla-1771/data/system/group0_history-027e42f5683a3ed7b404a0100762063c/me-3gd4_19rj_24awx2271dhw7a7qf7-big-Data.db:level=0]. 19kB to 13kB (~70% of original) in 2ms = 9MB/s. ~256 total partitions merged to 1.
INFO 2024-01-27 18:28:31,359 [shard 0:comp] compaction - [Compact system.local 1bdd28f0-bd31-11ee-8786-dfb990458783] Compacting [/scylladir/testlog/x86_64/dev/scylla-1771/data/system/local-7ad54392bcdd35a684174e047860b377/me-3gd4_19rj_01xg02pffkgzp6tkao-big-Data.db:level=0:origin=memtable,/scylladir/testlog/x86_64/dev/scylla-1771/data/system/local-7ad54392bcdd35a684174e047860b377/me-3gd4_19rg_4blup2pffkgzp6tkao-big-Data.db:level=0:origin=compaction]
INFO 2024-01-27 18:28:31,359 [shard 1:comp] compaction - [Compact system.truncated 1bdcb3c0-bd31-11ee-b3fb-dfb890458783] Compacted 2 sstables to [/scylladir/testlog/x86_64/dev/scylla-1771/data/system/truncated-38c19fd0fb863310a4b70d0cc66628aa/me-3gd4_19rj_24awx2qj7zo4ducrcz-big-Data.db:level=0]. 148kB to 74kB (~50% of original) in 3ms = 49MB/s. ~256 total partitions merged to 6.
INFO 2024-01-27 18:28:31,359 [shard 1:comp] compaction - [Compact system_distributed.cdc_streams_descriptions_v2 1bdd28f0-bd31-11ee-b3fb-dfb890458783] Compacting [/scylladir/testlog/x86_64/dev/scylla-1771/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/me-3gd4_19rf_4vjg02vwmhl9zj6sqo-big-Data.db:level=0:origin=repair,/scylladir/testlog/x86_64/dev/scylla-1771/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/me-3gd4_19ri_5y4c02vwmhl9zj6sqo-big-Data.db:level=0:origin=memtable]
INFO 2024-01-27 18:28:31,360 [shard 0:main] raft_group0 - setup_group0: group 0 ID present. Starting existing Raft server.
INFO 2024-01-27 18:28:31,360 [shard 0:main] raft_group0 - Server e4726541-3d82-47f0-99b6-989415a34417 is starting group 0 with id 187d8331-bd31-11ee-8483-432ac45b5434
INFO 2024-01-27 18:28:31,362 [shard 1:comp] compaction - [Compact system_distributed.cdc_streams_descriptions_v2 1bdd28f0-bd31-11ee-b3fb-dfb890458783] Compacted 2 sstables to [/scylladir/testlog/x86_64/dev/scylla-1771/data/system_distributed/cdc_streams_descriptions_v2-0bf73fd765b236b085e5658131d5df36/me-3gd4_19rj_255s02qj7zo4ducrcz-big-Data.db:level=0]. 12kB to 37kB (~294% of original) in 3ms = 4MB/s. ~256 total partitions merged to 2.
DEBUG 2024-01-27 18:28:31,363 [shard 0:main] raft_topology - reload raft topology state
WARN 2024-01-27 18:28:31,364 [shard 0:main] gossip - Fail to apply application_state: std::runtime_error (endpoint_state_map does not contain endpoint = 127.69.70.49, application_states = {{STATUS -> Value(NORMAL,8055936798076345848,2)}})
from scylladb.
INFO 2024-01-26 12:48:12,781 [shard 0:stmt] cql_server_controller - Setting up maintenance socket on /scylladir/testlog/x86_64/debug/scylla-27/cql.m
INFO 2024-01-26 12:48:12,783 [shard 0:stmt] cql_server_controller - Starting listening for CQL clients on /scylladir/testlog/x86_64/debug/scylla-27/cql.m (unencrypted, non-shard-aware)
WARN 2024-01-26 12:48:12,783 [shard 0:stmt] gossip - Fail to apply application_state: std::runtime_error (endpoint_state_map does not contain endpoint = 127.204.200.11, application_states = {{RPC_READY -> Value(1,1)}})
INFO 2024-01-26 12:48:12,787 [shard 0:main] raft_group0 - Disabling migration_manager schema pulls because Raft is enabled and we're bootstrapping.
INFO 2024-01-26 12:48:12,788 [shard 0:strm] messaging_service - Starting Messaging Service on address 127.204.200.11 port 7000
from scylladb.
@tchaikov the 3 different failures you reported are... different
In the first one we see that the joining node got stuck on
INFO 2024-01-18 10:48:12,180 [shard 0:strm] storage_service - raft topology: join: request to join placed, waiting for the response from the topology coordinator
so it was stuck waiting for the topology coordinator to join it.
Unfortunately, the artifacts are gone, so I cannot identify what happened to the coordinator.
In the second one we see a failure not when trying to join a server, but when trying to do a query. We got a dedicated issue for that #17029 and the solution is known
In the third one we see that the server successfully started:
INFO 2024-01-26 12:48:25,346 [shard 0:stmt] cql_server_controller - Starting listening for CQL clients on 127.204.200.11:9042 (unencrypted, non-shard-aware)
INFO 2024-01-26 12:48:25,346 [shard 0:stmt] cql_server_controller - Starting listening for CQL clients on 127.204.200.11:19042 (unencrypted, shard-aware)
INFO 2024-01-26 12:48:25,383 [shard 0:main] init - serving
INFO 2024-01-26 12:48:25,384 [shard 0:main] init - Scylla version 5.5.0~dev-0.20240126.c02893f7ef92 initialization completed.
but the driver couldn't connect -- part of the test framework's server start function is to check CQL connectivity, and that part apparently timed out.
This looks like a new one. @patjed41 also encountered it in his PR, and is going to open a new issue
from scylladb.
thank you Kamil! i was using endpoint_state_map does not contain endpoint
was the only feature when clustering the failures. but seems i should have used feature vectors with more elements. i will try to do better next time.
from scylladb.
The third one is #17041
from scylladb.
I'm not sure this issue should really be a single issue (?), but anyway I saw again the exact same test_tablet_cleanup
failure that @tchaikov reported above, in another CI failure https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/6201/testReport/junit/(root)/test_tablets/Tests___Unit_Tests___test_tablet_cleanup/:
await manager.server_start(servers[1].server_id)
await wait_for_cql_and_get_hosts(cql, servers, time.time() + 60)
> partitions_after_loss = (await cql.run_async("SELECT COUNT(*) FROM test.test"))[0].count
E cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 127.136.164.36:9042 datacenter1>: <Error from server: code=0000 [Server error] message="No live endpoint available">, <Host: 127.136.164.2:9042 datacenter1>: <Error from server: code=0000 [Server error] message="No live endpoint available">})
from scylladb.
Oh, I guess it's #17029?
from scylladb.
Yes @nyh looks like your failure was before #17040
from scylladb.
Closing this one - as we seem to have an issue per each different failure here.
from scylladb.
Related Issues (20)
- [RFE] live-update of encryption options
- Invalid experimental feature name in scylla.yaml makes the whole option to be ignored HOT 2
- Deprecated config options are not processed, as if they were Invalid HOT 8
- TWCS reshape may happen unnecessarily when windows are disjoint in token range
- Backport PR with single commit reference to the wrong commit SHA HOT 10
- "sstable not found" error during file-based tablet migration HOT 3
- docs: Issue on page Replace a Running Node in a ScyllaDB Cluster HOT 1
- Scylla io query total bandwidth increase after upgrading to 5.2 HOT 2
- Bundle cqlsh with support for "DESC SCHEMA WITH INTERNALS"
- Adding a secondary index can break an ongoing paged read using another index
- Scylla is not using ME sstables by default HOT 6
- Improve query page stats HOT 1
- `hintedhandoff_additional_test.TestHintedHandoff.test_hintedhandoff_switch_config_in_runtime_via_http_api`: AssertionError: Expected [] to have length 1, but instead is of length 0 HOT 1
- streaming/repair: add a flag to control tombstone GC for streaming/repair compaction
- reader_concurrency_semaphore: CPU concurrency of 1 is too strict in some cases
- repair: make row-level repair diff-based
- [x86_64, dev] topology_experimental_raft/test_tablets_intranode failed with AssertionError
- Authentication failed for cql query after upgrade from 6.0 -> 6.1 with error: group0_raft_sm - Error while applying mutations
- Coredump happened on 2 when tables has been dropping during repair
- Topology coordinator and replaceing node stop see each other after entered transition state
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scylladb.