Giter Club home page Giter Club logo

Comments (5)

mna avatar mna commented on August 24, 2024

Hello,

Thanks for the kind words! I looked at your diff, it's an impressive amount of work.

I'm not sure if this matches the Redisc design philosophy.

That's great of you to wonder about this, indeed I'm not sure the feature belongs inside the redisc package. Often times, when designing an app for redis cluster, care is taken in the design of redis keys to ensure that related keys live on the same node by using key hash tags (https://redis.io/docs/reference/cluster-spec/#hash-tags). Doing so ensures that you can use a single connection or a standard pipeline of commands and all of them will work on the same connection (or fail with a MOVED, but all of them will be moved to the same node).

Similarly, when running a pipeline of commands, I think there is an expectation that they will run in the order provided. From a caller's perspective I think it would be surprising that behind the scenes it gets broken out as batches run concurrently on different nodes with no ordering, though if that's a requirement in some cases it makes sense to have some higher-level code like you have, but I think it makes more sense as something outside the core library.

More generally, I don't think it's a good idea (or something I want to go out of my way to support with redisc) to send a random batch of commands and keys with no special care given for things like hash tag and expect a single call to run them all.

Again, if it's something you need for some reason in a specific case and you can't control or design the keys in a different way, I think it's totally fine to build that feature, but as I said, I prefer for this to belong outside the redisc package (by the way I think something similar can be built completely on top of redisc without access to internal private functions and methods, with things like https://pkg.go.dev/github.com/mna/redisc#SplitBySlot).

Hopefully that makes sense!

Thanks and have fun with your project,
Martin

from redisc.

rmker avatar rmker commented on August 24, 2024

Hello,
Appreciated for feedback in detail.

Totally I understand your thought on design. Maybe I don't write clearly previously in some key mechanisms:

  1. The PipeLiner splits a pipeline request into several requests based on the slot of every command key. So, the hashtag mechanism still works here. All commands that are located on the same node will be put into the same request on the same connection in the original order. At last, all responses will be returned to the caller in the original order even though the requests run concurrently. If the caller sends commands in the same hashtag or in the same slot range of a node, there is only one request sent in order.

  2. Those keys on different nodes don't need to run in order. Because the Redis pipeline doesn't support the dependence on commands that have different keys. For example, a pipeline includes "get keyA, set keyB valueB", you can't set valueB as the value of keyA. The application should use Lua script in the same EVAL command to do that. So, running in the original order on different nodes for pipeline is not a good choice in the library. It would reduce the throughput significantly. In my view, Redis Cluster supports pipeline so that the driver and Redis proxy does. As far as I know, the official proxy "RedisLabs/redis-cluster-proxy" and twitter's proxy "twitter/twemproxy" use a similar mechanism on this. The latter is also very popular and used by many companies and cloud providers as well.

  3. If the above is reasonable, the batch commands(MGET/MSET) based on the PipeLiner are feasible as well.

Anyhow, handling the pipeline and batch commands outside of the Redisc library is also a reasonable way. But the applications don't need to do similar implementation if pipeline and batch commands can be supported directly in the library.

Many thanks:)

from redisc.

mna avatar mna commented on August 24, 2024

Hey, thanks for the follow-up! A lot of very useful feedback for me in there.

  • About the PipeLiner, thanks for the clarification, this is pretty much what I thought it was doing - what I meant regarding the key hashes was not so much that it wouldn't support them, but that the reason for using the PipeLiner in the first place would be because the application is not designed with key hashes/cluster support in mind (I'll expand on this a bit later).
  • About the ordering, you're absolutely right, across different nodes there wouldn't be an ordering concern, I did miss the fact that you ordered the results as per the original pipeline. Thanks for the heads-up about the redis-cluster-proxy and the other one, I'll take a look at those eventually but I have some thoughts on pipelining across nodes (again, more on this in a bit).
  • MGET and MSET exist strictly because of their atomic properties, otherwise sending multiple GET/SET in a pipeline achieves exactly the same result (MGET/MSET save a few bytes but it's minimal). Breaking down an MSET into multiple commands due to the keys living on different nodes breaks that guarantee, so it is not an MSET at this point. I don't think it's a good idea to hide that fact, IMO having the calling code do something like conn.Do("MSET", ...) actually being turned into a series of non-atomic commands on different nodes is a bug, not a feature, so I won't add this support to redisc. The proper way to use those is to design your keys in a way that they live in the same hash, or use SplitBySlot and send the resulting batches of MSET commands on their own appropriate connection (or using a RetryConn). Yes it's a bit more work, but it's explicit at the callsite and doesn't break the atomic guarantees.

I have been thinking lately of some improvements I want to add to redisc, and hopefully I'll get them out soon as proposal issues to think them through and see if they make sense and cover a good number of use cases, but to get back about the pipelines:

  • redisc already supports pipelining, the only constraint at the moment is that a RetryConn cannot use it. This is why I said that people who would use your pipeline-across-nodes are more likely to be sending keys that haven't been properly thought for cluster use, because otherwise they can already use pipelining (albeit without automatic retry).
  • One of the things I'd like to look into is to add something like SplitByNode, which is like SplitBySlot but uses the current mapping of slots to node (as known by the client) to optimistically split the keys by node that serves them, and thus optimizing the number of connections/pipelines one has to send to run the commands. If the cluster remains stable, then those mappings are still good. This would help sending more efficient pipelines assuming cluster stability (of course it would return the same MOVED errors if somehow the mapping was wrong or the cluster is unstable).
  • One of the other things I want to look into is to relax the restriction of no pipeline support for RetryConn. I created the library with this restriction at first as I wasn't sure how it could be supported safely, and I still have some concerns that I need to investigate and test out and make sure it doesn't turn into a footgun.

So that's more where I'm thinking of going regarding pipeline support. It's not completely unlike what you suggested, I think, except with less "magic", in the sense that it's not "send anything and it'll spin goroutines and retry behind the scenes" and no special casing the MGET/MSET. Not that there's anything wrong with doing this (well, I do think it's wrong to break the atomic guarantess of MGET/MSET, but about the goroutines/retry thing), but to me it belongs outside the package as it's more a decision that the caller should make.

from redisc.

rmker avatar rmker commented on August 24, 2024

Many thanks for the deep discussion. I totally agree with you that the app should care about cluster operation handling. But seems that we had not quite the same thought for the atomicity of batch operation. I think there is no need to worry about the guarantee for atomicity in the cluster library because:

  1. MGET/MSET are atomic operations in non-cluster. The apps can design their algorithm based on this feature. They should not choose a cluster library, a non-cluster driver directly instead, like redigo.

  2. MGET/MSET are possibly not atomic operations in cluster. The apps should learn all rules of cluster and check out if they need to update their algorithm once they switch to cluster. They don't need the guarantee for atomicity in the library because they would change their strategy before switching to cluster. Generally, they need a complete cluster library to handle those non-atomic batch operations instead of doing much work to handle them in the top layer. As for those cases of atomic batch commands in cluster. Just like you mentioned before, the apps should guarantee all keys on the same node by using hashtag. They can obviously run well in the suggested code because all commands are split into the same node.

  3. It doesn't break the library flexibility since the apps have still chance to handle batch commands with specific strategies, such as splitting commands into different requests and then handling them in their own way.

Anyway, no offense on the design of Redisc, but only have reservations on which layer the batch commands handling should be in. All such things I mentioned above happened in my project, and that's why I wrote the code and raise the issue here. The issue could be closed later if no further discussion on it:)

from redisc.

mna avatar mna commented on August 24, 2024

I think there is no need to worry about the guarantee for atomicity in the cluster library because:

Yeah, as you say this is where we disagree.

MGET/MSET are possibly not atomic operations in cluster.

This is not so - MGET/MSET are atomic all the time. Even if in cluster mode - you can try it by starting a cluster and running them with keys that share the same hash tag, it works atomically as in non-cluster, and if you send keys that don't share the same hash tag, the commands return a CROSSLOT error. Some libraries may choose to break that guarantee, but it's their choice and is a bug IMHO. The MGET/MSET commands are Redis commands and their behaviour is defined by Redis, their guarantees do not change (or should not, anyway) because of the library you use.

MGET/MSET are atomic operations in non-cluster. The apps can design their algorithm based on this feature. They should not choose a cluster library, a non-cluster driver directly instead, like redigo.

Some applications use redisc as a way to support both Redis standalone and Redis Cluster. This is a goal of this package, an important feature that explains why the Cluster type shares a common interface with the redigo.Pool type, to make this abstraction easier. In such applications, when you encounter something like conn.Do("MSET", key1, val1, key2 val2), you should be able to assume that this is an atomic command, as this may be crucial to the correctness of the program. It shouldn't depend on whether the connection is to a Redis Cluster or Redis Standalone.

All such things I mentioned above happened in my project, and that's why I wrote the code and raise the issue here.

And that's fine and totally ok that you built some higher-level logic that works for your specific case, and thanks for raising the issue too as it's good to know about use cases and have the chance to challenge the design and direction of the package! In this case, I think the improvements I have in mind for the package will go a long way into supporting a good portion of what you proposed (although in a different manner, with more left to the discretion of the caller), but regarding MGET/MSET specifically, we can agree to disagree if my points above didn't change your opionion.

The issue could be closed later if no further discussion on it:)

Sure thing, I'll close but do not hesitate if you have more to discuss on this!

Thanks,
Martin

from redisc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.