Hello: I've been testing the replication options available in NFSdb. <ul dir="

Advise on building a fail-over solution on top of NFSdb (with two+ nodes) about questdb HOT 7 CLOSED

questdb commented on May 22, 2024

Advise on building a fail-over solution on top of NFSdb (with two+ nodes)

from questdb.

Comments (7)

bluestreak01 commented on May 22, 2024

Hi Venkatt,

The bad news is that at present NFSdb can only run in-process, so its failover is that of parent process failover mechanism. There is on-going effort to have nfsdb run out-of-process, in which case it will have its own client with failover built-in.

The good news is that getting a client to be a server at the same time can be done pretty easily. I'll write up an example very shortly.

Server recovery after fail over is relatively easy. Because updates are incremental it is possible to wrap a journal in a client instance and have it replicate from former client-now-server.

Multicast is not supported for data, not yet anyway. This is partly because nfsdb protocol allows each client to have different state and replication is tailored to state of client. But in BAU over dedicated network link i guess multicast will have an advantage. May be one day :)

from questdb.

bluestreak01 commented on May 22, 2024

Hi Venkatt,

I had a recap on replication and failover and it isn't possible to fail over writer automatically. Client can reconnect if server goes down, but that is all that is atumatic in current version.

Making automated failover for your scenario is not difficult and there is a plan to do it now! Here is sample logic:

Node 1 and Node 2 are identical, they both will run "ClusterNode", which is both server and client
On startup nodes automatically decide who master is, this will depend on startup order.
"ClusterNode" will signal application that it is a master and provide you with a way to get JournalWriter(s)
On client "ClusterNode" will signal your code that is in standby mode
Both server and client will maintain heartbeat and once it is lost "client" will become server and will notify your code of state change
When old server is restarted it will assume role of client automatically due to other node being present.
Client will automatically recover itself and will start replicating from server node.

Let me know if this works for you.

Vlad

from questdb.

vguhesan commented on May 22, 2024

Vlad:

I believe that the last model you have described with "ClusterNode" will work. So is "ClusterNode" a code/class that you will be adding in an upcoming release or is this something I can develop with your guidance and/or examples? Please advise.
What I can do on my application side is programmatically determine if the underlying instance is running on master or on standby mode. If running on standby mode, I can have my web application send a HTTP-302 redirect for the REST API onto the master server which will consume the POST data normally.
Question - in the example you had described, is there a way, I could get a list of all other nodes participating in this group? For example, if I POST to the client, can it determine what the IP for the master is and send the redirect URL with the correct master IP?
Please advise on how I can proceed forward with the "ClusterNode" implementation.
Thanks in advance.

Venkatt

from questdb.

bluestreak01 commented on May 22, 2024

Implementing cluster will require changes in both server and client code, so i'll do that. Changes are not very complex, so it won't be long.

It should be possible to announce cluster winner to other nodes. After voting for master all remaining nodes will have to connect their clients there and this information can be published to the app code.

I'll post more details on usage model very soon, need to prove that all the parts work first.

from questdb.

bluestreak01 commented on May 22, 2024

Hi Venkatt,

I have an example of creating a cluster of producers for you: ClusteredProducerMain.java

Although it is for two producers, you can extend it for three or more as you need. Important thing to be aware of that each cluster node must have their unique integer instance id. It is used in logging and also for tie break voting in case two nodes start up at the same time.

As things stand it is safe to have nodes started by either monitoring tools or schedulers, if they come up at the same time they will resolve their roles automatically.

Shutdown procedure is as graceful as possible and will wait for all in-flight network transmissions before cutting the wire. I will expose a timeout API though in case waiting is not in option. In this case in-flight transactions may be lost.

There is more work needed to make reades fail over between cluster nodes and automatically error correct. But that should not take long at all.

Let me know if you think current API can improve in some way or if anything doesn't work for you.

Regards,
Vlad

from questdb.

vguhesan commented on May 22, 2024

Hi Vlad,

Thank you very much on devising this solution. I will try this out either
tonight or in the next few days and get back to you.

Best Regards,
Venkatt Guhesan

On Thu, Jan 22, 2015 at 12:07 PM, Vlad Ilyushchenko <
[email protected]> wrote:

Hi Venkatt,

I have an example of creating a cluster of producers for you:
ClusteredProducerMain.java
https://github.com/NFSdb/nfsdb/blob/master/nfsdb-examples/src/main/java/org/nfsdb/examples/network/cluster/ClusteredProducerMain.java

Although it is for two producers, you can extend it for three or more as
you need. Important thing to be aware of that each cluster node must have
their unique integer instance id. It is used in logging and also for tie
break voting in case two nodes start up at the same time.

As things stand it is safe to have nodes started by either monitoring
tools or schedulers, if they come up at the same time they will resolve
their roles automatically.

Shutdown procedure is as graceful as possible and will wait for all
in-flight network transmissions before cutting the wire. I will expose a
timeout API though in case waiting is not in option. In this case in-flight
transactions may be lost.

There is more work needed to make reades fail over between cluster nodes
and automatically error correct. But that should not take long at all.

Let me know if you think current API can improve in some way or if
anything doesn't work for you.

Regards,
Vlad

—
Reply to this email directly or view it on GitHub
#29 (comment).

from questdb.

bluestreak01 commented on May 22, 2024

this feature is complete, lets open another issue should we discover defects with it.

from questdb.

Advise on building a fail-over solution on top of NFSdb (with two+ nodes) about questdb HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent