Comments (7)
Hi Venkatt,
The bad news is that at present NFSdb can only run in-process, so its failover is that of parent process failover mechanism. There is on-going effort to have nfsdb run out-of-process, in which case it will have its own client with failover built-in.
The good news is that getting a client to be a server at the same time can be done pretty easily. I'll write up an example very shortly.
Server recovery after fail over is relatively easy. Because updates are incremental it is possible to wrap a journal in a client instance and have it replicate from former client-now-server.
Multicast is not supported for data, not yet anyway. This is partly because nfsdb protocol allows each client to have different state and replication is tailored to state of client. But in BAU over dedicated network link i guess multicast will have an advantage. May be one day :)
from questdb.
Hi Venkatt,
I had a recap on replication and failover and it isn't possible to fail over writer automatically. Client can reconnect if server goes down, but that is all that is atumatic in current version.
Making automated failover for your scenario is not difficult and there is a plan to do it now! Here is sample logic:
- Node 1 and Node 2 are identical, they both will run "ClusterNode", which is both server and client
- On startup nodes automatically decide who master is, this will depend on startup order.
- "ClusterNode" will signal application that it is a master and provide you with a way to get JournalWriter(s)
- On client "ClusterNode" will signal your code that is in standby mode
- Both server and client will maintain heartbeat and once it is lost "client" will become server and will notify your code of state change
- When old server is restarted it will assume role of client automatically due to other node being present.
- Client will automatically recover itself and will start replicating from server node.
Let me know if this works for you.
Vlad
from questdb.
Vlad:
I believe that the last model you have described with "ClusterNode" will work. So is "ClusterNode" a code/class that you will be adding in an upcoming release or is this something I can develop with your guidance and/or examples? Please advise.
What I can do on my application side is programmatically determine if the underlying instance is running on master or on standby mode. If running on standby mode, I can have my web application send a HTTP-302 redirect for the REST API onto the master server which will consume the POST data normally.
Question - in the example you had described, is there a way, I could get a list of all other nodes participating in this group? For example, if I POST to the client, can it determine what the IP for the master is and send the redirect URL with the correct master IP?
Please advise on how I can proceed forward with the "ClusterNode" implementation.
Thanks in advance.
Venkatt
from questdb.
Implementing cluster will require changes in both server and client code, so i'll do that. Changes are not very complex, so it won't be long.
It should be possible to announce cluster winner to other nodes. After voting for master all remaining nodes will have to connect their clients there and this information can be published to the app code.
I'll post more details on usage model very soon, need to prove that all the parts work first.
from questdb.
Hi Venkatt,
I have an example of creating a cluster of producers for you: ClusteredProducerMain.java
Although it is for two producers, you can extend it for three or more as you need. Important thing to be aware of that each cluster node must have their unique integer instance id. It is used in logging and also for tie break voting in case two nodes start up at the same time.
As things stand it is safe to have nodes started by either monitoring tools or schedulers, if they come up at the same time they will resolve their roles automatically.
Shutdown procedure is as graceful as possible and will wait for all in-flight network transmissions before cutting the wire. I will expose a timeout API though in case waiting is not in option. In this case in-flight transactions may be lost.
There is more work needed to make reades fail over between cluster nodes and automatically error correct. But that should not take long at all.
Let me know if you think current API can improve in some way or if anything doesn't work for you.
Regards,
Vlad
from questdb.
Hi Vlad,
Thank you very much on devising this solution. I will try this out either
tonight or in the next few days and get back to you.
Best Regards,
Venkatt Guhesan
On Thu, Jan 22, 2015 at 12:07 PM, Vlad Ilyushchenko <
[email protected]> wrote:
Hi Venkatt,
I have an example of creating a cluster of producers for you:
ClusteredProducerMain.java
https://github.com/NFSdb/nfsdb/blob/master/nfsdb-examples/src/main/java/org/nfsdb/examples/network/cluster/ClusteredProducerMain.javaAlthough it is for two producers, you can extend it for three or more as
you need. Important thing to be aware of that each cluster node must have
their unique integer instance id. It is used in logging and also for tie
break voting in case two nodes start up at the same time.As things stand it is safe to have nodes started by either monitoring
tools or schedulers, if they come up at the same time they will resolve
their roles automatically.Shutdown procedure is as graceful as possible and will wait for all
in-flight network transmissions before cutting the wire. I will expose a
timeout API though in case waiting is not in option. In this case in-flight
transactions may be lost.There is more work needed to make reades fail over between cluster nodes
and automatically error correct. But that should not take long at all.Let me know if you think current API can improve in some way or if
anything doesn't work for you.Regards,
Vlad—
Reply to this email directly or view it on GitHub
#29 (comment).
from questdb.
this feature is complete, lets open another issue should we discover defects with it.
from questdb.
Related Issues (20)
- Flaky VarcharTypeDriverTest. testSetAppendPosition()
- SQL: Sample by with ASOF join errors with ambiguous column since 7.4.0
- Improvements to feature flagging and handling of breaking changes HOT 3
- Create special function to calculate trade price given L2 Order Book price/quantity pairs HOT 1
- flaky test - RetryIODispatcherTest.testInsertWaitsExceedsRerunProcessingQueueSizeLoop
- Unsupported machine HOT 15
- Support Aggregation Functions with GROUPS Clause in Window Functions
- CASE returning unexpected value for timestamp IS NULL comparison HOT 1
- COALESCE failing for NULL timestamps
- Symbol incorrectly written to table in 7.4.2 when multiple processes sending to QuestDB HOT 7
- Web Console not available after 7.3.7 HOT 8
- JIT bug: COUNT with FLOAT not equals returns wrong results with null values
- split_part returns result from previous row when current row does not have the part HOT 1
- The timestamp was forcibly changed. Procedure HOT 2
- Incorrect result when having NULL with IN operation HOT 1
- HTTPS redirect when using FQDN instead of IP HOT 2
- SAMPLE BY breaks with `last(timestamp)`
- Error while parsing timestamps with times between midnight and 1am formatted in the 12-hour clock convention HOT 2
- Support of Decimal64 type wanted
- Incorrect result when having null in case clause
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from questdb.