I am trying to write a code for a cluster, where only one node would periodially download a file and store it for everyone else. I don't want to put the file via promote
because I want to know who is the leader first, so that only that node does work. I plan to write the file to another key and let everyone observe. I could probably still use promote, but the file should still be readable if it is stale and I'll probably have to put an empty value during initial campaign
call.
Anyway, it seems like I cannot implement a continious master election. I am having spurious wake-up from client.campaign(...)
for both leaders and not leaders. This is due to lease expiry during the campaign
call.
use etcd_client::*;
use tracing::info;
async fn elect_and_work(
idx: i32,
mut etcd_client: Client,
etcd_path: String,
) -> anyhow::Result<()> {
loop {
let resp = etcd_client.lease_grant(10, None).await?;
let lease_id = resp.id();
// info!("grant ttl:{:?}, id:{:?}", resp.ttl(), resp.id());
let election_path = format!("{}/test_election", etcd_path);
// campaign
let resp = etcd_client
.campaign(election_path.as_str(), "123", lease_id)
.await?;
let leader = resp.leader().unwrap();
info!(
"Won election name: idx={}, {:?}, leaseId:{:?}, my_lease: {:?}",
idx,
leader.name_str(),
leader.lease(),
lease_id,
);
let (mut keeper, mut keeper_responses) = etcd_client.lease_keep_alive(lease_id).await?;
for _ in 0..5 {
info!("Keeping alive (idx={})...", idx);
keeper.keep_alive().await?;
if let Some(resp) = keeper_responses.message().await? {
info!(
"Lease on idx={}, {:?} kept alive, new ttl {:?}",
idx,
resp.id(),
resp.ttl()
);
}
tokio::time::delay_for(tokio::time::Duration::from_secs(3)).await;
}
}
}
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
::std::env::set_var("RUST_LOG", "election_repro=info");
let endpoints = vec!["http://127.0.0.1:2379"];
tracing_subscriber::fmt::init();
let etcd_client = etcd_client::Client::connect(endpoints, None).await?;
let mut join_handles = vec![];
for idx in 0..5 {
let client = etcd_client.clone();
let join = tokio::spawn(async move {
elect_and_work(idx, client, "/election_repro".into())
.await
.unwrap();
});
join_handles.push(join);
}
futures::future::join_all(join_handles).await;
Ok(())
}
Dec 28 16:55:42.986 INFO election_repro: Won election name: idx=0, Ok("/election_repro/test_election"), leaseId:7587851380158607485, my_lease: 7587851380158607485
Dec 28 16:55:42.987 INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:55:42.988 INFO election_repro: Lease on idx=0, 7587851380158607485 kept alive, new ttl 10
Dec 28 16:55:45.989 INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:55:45.991 INFO election_repro: Lease on idx=0, 7587851380158607485 kept alive, new ttl 10
Dec 28 16:55:48.992 INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:55:48.993 INFO election_repro: Lease on idx=0, 7587851380158607485 kept alive, new ttl 10
Dec 28 16:55:51.994 INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:55:51.995 INFO election_repro: Lease on idx=0, 7587851380158607485 kept alive, new ttl 10
Dec 28 16:55:54.996 INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:55:54.998 INFO election_repro: Lease on idx=0, 7587851380158607485 kept alive, new ttl 10
Dec 28 16:56:05.031 INFO election_repro: Won election name: idx=4, Ok("/election_repro/test_election"), leaseId:7587851380158607493, my_lease: 7587851380158607493
Dec 28 16:56:05.031 INFO election_repro: Won election name: idx=1, Ok("/election_repro/test_election"), leaseId:7587851380158607487, my_lease: 7587851380158607487
Dec 28 16:56:05.032 INFO election_repro: Won election name: idx=0, Ok("/election_repro/test_election"), leaseId:7587851380158607512, my_lease: 7587851380158607512
Dec 28 16:56:05.032 INFO election_repro: Won election name: idx=2, Ok("/election_repro/test_election"), leaseId:7587851380158607489, my_lease: 7587851380158607489
Dec 28 16:56:05.032 INFO election_repro: Won election name: idx=3, Ok("/election_repro/test_election"), leaseId:7587851380158607491, my_lease: 7587851380158607491
Dec 28 16:56:05.033 INFO election_repro: Keeping alive (idx=1)...
Dec 28 16:56:05.034 INFO election_repro: Keeping alive (idx=4)...
Dec 28 16:56:05.034 INFO election_repro: Keeping alive (idx=3)...
Dec 28 16:56:05.034 INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:56:05.034 INFO election_repro: Keeping alive (idx=2)...
Dec 28 16:56:05.035 INFO election_repro: Lease on idx=3, 7587851380158607491 kept alive, new ttl 0
Dec 28 16:56:05.035 INFO election_repro: Lease on idx=1, 7587851380158607487 kept alive, new ttl 0
Dec 28 16:56:05.035 INFO election_repro: Lease on idx=4, 7587851380158607493 kept alive, new ttl 0
Dec 28 16:56:05.035 INFO election_repro: Lease on idx=2, 7587851380158607489 kept alive, new ttl 0
Dec 28 16:56:05.035 INFO election_repro: Lease on idx=0, 7587851380158607512 kept alive, new ttl 10
Dec 28 16:56:08.037 INFO election_repro: Keeping alive (idx=2)...
Dec 28 16:56:08.037 INFO election_repro: Keeping alive (idx=4)...
Dec 28 16:56:08.037 INFO election_repro: Keeping alive (idx=1)...
Dec 28 16:56:08.037 INFO election_repro: Keeping alive (idx=3)...
Dec 28 16:56:08.037 INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:56:08.038 INFO election_repro: Lease on idx=0, 7587851380158607512 kept alive, new ttl 10
Dec 28 16:56:08.038 INFO election_repro: Lease on idx=2, 7587851380158607489 kept alive, new ttl 0
Dec 28 16:56:08.038 INFO election_repro: Lease on idx=1, 7587851380158607487 kept alive, new ttl 0
Dec 28 16:56:08.038 INFO election_repro: Lease on idx=3, 7587851380158607491 kept alive, new ttl 0
Dec 28 16:56:08.038 INFO election_repro: Lease on idx=4, 7587851380158607493 kept alive, new ttl 0
Dec 28 16:56:11.039 INFO election_repro: Keeping alive (idx=4)...
Dec 28 16:56:11.039 INFO election_repro: Keeping alive (idx=2)...
Dec 28 16:56:11.039 INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:56:11.039 INFO election_repro: Keeping alive (idx=3)...
Dec 28 16:56:11.039 INFO election_repro: Keeping alive (idx=1)...
Dec 28 16:56:11.040 INFO election_repro: Lease on idx=3, 7587851380158607491 kept alive, new ttl 0
Dec 28 16:56:11.040 INFO election_repro: Lease on idx=0, 7587851380158607512 kept alive, new ttl 10
Dec 28 16:56:11.040 INFO election_repro: Lease on idx=2, 7587851380158607489 kept alive, new ttl 0
Dec 28 16:56:11.040 INFO election_repro: Lease on idx=1, 7587851380158607487 kept alive, new ttl 0
Dec 28 16:56:11.040 INFO election_repro: Lease on idx=4, 7587851380158607493 kept alive, new ttl 0
Dec 28 16:56:14.042 INFO election_repro: Keeping alive (idx=3)...
Dec 28 16:56:14.042 INFO election_repro: Keeping alive (idx=1)...
Dec 28 16:56:14.042 INFO election_repro: Keeping alive (idx=2)...
Dec 28 16:56:14.042 INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:56:14.042 INFO election_repro: Keeping alive (idx=4)...
Dec 28 16:56:14.043 INFO election_repro: Lease on idx=0, 7587851380158607512 kept alive, new ttl 10
Dec 28 16:56:14.043 INFO election_repro: Lease on idx=2, 7587851380158607489 kept alive, new ttl 0
Dec 28 16:56:14.043 INFO election_repro: Lease on idx=3, 7587851380158607491 kept alive, new ttl 0
Dec 28 16:56:14.043 INFO election_repro: Lease on idx=4, 7587851380158607493 kept alive, new ttl 0
Dec 28 16:56:14.043 INFO election_repro: Lease on idx=1, 7587851380158607487 kept alive, new ttl 0
Dec 28 16:56:17.044 INFO election_repro: Keeping alive (idx=2)...
Dec 28 16:56:17.044 INFO election_repro: Keeping alive (idx=4)...
Dec 28 16:56:17.044 INFO election_repro: Keeping alive (idx=3)...
Dec 28 16:56:17.044 INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:56:17.045 INFO election_repro: Keeping alive (idx=1)...
Dec 28 16:56:17.046 INFO election_repro: Lease on idx=0, 7587851380158607512 kept alive, new ttl 10
Dec 28 16:56:17.046 INFO election_repro: Lease on idx=4, 7587851380158607493 kept alive, new ttl 0
Dec 28 16:56:17.046 INFO election_repro: Lease on idx=2, 7587851380158607489 kept alive, new ttl 0
Dec 28 16:56:17.046 INFO election_repro: Lease on idx=3, 7587851380158607491 kept alive, new ttl 0
Dec 28 16:56:17.046 INFO election_repro: Lease on idx=1, 7587851380158607487 kept alive, new ttl 0
Dec 28 16:56:27.537 INFO election_repro: Won election name: idx=0, Ok("/election_repro/test_election"), leaseId:7587851380158607519, my_lease: 7587851380158607519
Dec 28 16:56:27.539 INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:56:27.539 INFO election_repro: Lease on idx=0, 7587851380158607519 kept alive, new ttl 10
Dec 28 16:56:30.541 INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:56:30.542 INFO election_repro: Lease on idx=0, 7587851380158607519 kept alive, new ttl 10
Dec 28 16:56:33.543 INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:56:33.544 INFO election_repro: Lease on idx=0, 7587851380158607519 kept alive, new ttl 10
Dec 28 16:56:36.546 INFO election_repro: Keeping alive (idx=0)...
Dec 28 16:56:36.547 INFO election_repro: Lease on idx=0, 7587851380158607519 kept alive, new ttl 10
Notice how first election runs correctly. Only one client gets to win the election. Everyone else waits.
The leader sends keep-alives as expected. To simulate failure it stops sending keep-alive after 5 times. Now the interesting part begins.
New election runs. Someone wins, but everyone's campaign
call completes. I'd expect an error to be returned at this point, but instead everyone gets a LeaderKey
with their own lease_id
. Did everyone just won and lost instantly? I am not sure, but it does not look like it. Only one can successfully update keep-alive on the lease and this is idx=0
because this client got a chance to refresh the lease due to completing the loop. Still, I don't really known the leader.
Eventually they all complete this strange loop, refresh their grants and participate in the correct election again.