Giter Club home page Giter Club logo

Comments (8)

diman-io avatar diman-io commented on June 12, 2024 1

out of curiosity could you describe your use case for calling set-identity during startup? I was under the impression it was mainly used for hot swap setups on already running validators

I don't store keys with providers. Unfortunately, I've sometimes received "new" machines that had recoverable partitions from previous users. Also, this eliminates, for example, the risk associated with the unknown fate of a disk if it fails. Overall, I sleep better this way :)
Plus, it allows viewing primary and backup machines as truly equivalent (for instance, using completely identical configurations, etc.), with the actual assignment of roles (primary/backup) happening exclusively through external tools.

Not all developers are aware that you can set up identity in this way:

ssh node /path/to/solana-validator -l /path/to/ledger set-identity < ~/keys/identity.json

here, ~/keys/identity.json is on a local machine

In reality, of course, it's more complicated.

from solana.

t-nelson avatar t-nelson commented on June 12, 2024

proposal seems reasonable for a stopgap. i think we'd ideally want to support set-identity during WFSM for the sake of convenience

from solana.

AshwinSekar avatar AshwinSekar commented on June 12, 2024

agreed, we can have WFSM reinitialize the tower if set-identity is detected, or make ReplayStage smart enough to reinitialize the tower before the first iteration.

However that would clash with #33865. @ryleung-solana is it feasible to initialize the admin rpc service before the tpu?

from solana.

t-nelson avatar t-nelson commented on June 12, 2024

theoretically it should be feasible to start an admin interface immediately. the way we have it designed today turns it into a dependency nightmare (both package-wise and initialization-wise). it'd be more flexible if we redesigned it around channels rather than actually holding Arcs

from solana.

AshwinSekar avatar AshwinSekar commented on June 12, 2024

that would definitely save us some hassle. having to poll around when cluster_info changes is less than ideal in this spot.

In terms of a minimum changeset to backport, i'm thinking we:

  • initialize admin service after wait for supermajority
  • use the initial identity from Validator::new in ReplayStage for the first identity change comparison, to update the tower if necessary. happens right before the first vote is sent out.

from solana.

ryleung-solana avatar ryleung-solana commented on June 12, 2024

agreed, we can have WFSM reinitialize the tower if set-identity is detected, or make ReplayStage smart enough to reinitialize the tower before the first iteration.

However that would clash with #33865. @ryleung-solana is it feasible to initialize the admin rpc service before the tpu?

I mean, we could initialize it once before the TPU is initialize, then reinitialize it afterwards...I guess the only thing this means is that set-identity calls before the second initialization will not get propagated to the quic layer. Then again, right now, any calls to set-identity will not get propagated to the quic layer before the initialization anywway...

from solana.

diman-io avatar diman-io commented on June 12, 2024

However this still creates a small gap of time from when the rpc service is initialized to the first iteration of ReplayStage is executed in which the identity could be changed from under us.

Actually, this is a significant problem in the everyday use of the validator. Until the replay_stage starts, changing the identity leads to a crash. The main issue here is that the startup script should check whether the replay_stage has begun before setting a new identity.

If any of the validators are looking for a solution right now, they can use this patch. I'm using it, and it saves from crashing (and actually saved me during the last restart, but I had to lose several hundred credits), and then you just need to change the identity twice more (return the fake one and set the real one).

         generate_time.stop();
         replay_timing.generate_vote_us += generate_time.as_us();
         if let Some(vote_tx) = vote_tx {
+            if tower.node_pubkey != identity_keypair.pubkey() {
+                error!(
+                    "Most likely, the identity was changed from {} to {} before the voting started.",
+                    tower.node_pubkey,
+                    identity_keypair.pubkey()
+                );
+                return;
+            }
             tower.refresh_last_vote_tx_blockhash(vote_tx.message.recent_blockhash);
 
             let saved_tower = SavedTower::new(tower, identity_keypair).unwrap_or_else(|err| {

A more complete solution would be to fail set-identity until we are sure ReplayStage is running.

To be honest, I thought about saving the desired identity in a new cluster_info variable and then triggering the change from the replay_stage, because, as I understand, there is still a very small (in time) window when changing the identity can lead to such an error. I don't remember why I didn't implement it, either because of time or I didn't like the architecture of the solution.

from solana.

AshwinSekar avatar AshwinSekar commented on June 12, 2024

@diman-io ack will think about this some more

out of curiosity could you describe your use case for calling set-identity during startup? I was under the impression it was mainly used for hot swap setups on already running validators

from solana.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.