Giter Club home page Giter Club logo

Comments (26)

jongillham avatar jongillham commented on May 26, 2024

Implementing a CachePutMulti function would likely only save two memcache calls and one datastore.GetMulti call in the best case scenario compared to using a PutMulti immediately followed by GetMulti to prime the cache.

Interesting that I recently got a request to not have PutMulti insert items into memcache, until the requester realised that it doesn't in the first place.

I am not sure if it is worth saving the cost of one datastore call on the occasion a primed cache is needed vs keeping the API completely inline with appengine/datastore. I assume latency is not most important with your data processing pipeline but you want processed data to be quickly accessible via cache from the frontend?

from nds.

derekperkins avatar derekperkins commented on May 26, 2024

In my pipeline, I'm processing millions of records a day. As each of those records comes in, I'm passing sending it to an external task for additional processing, with a callback url for when it completes. That task will take anywhere from seconds to minutes, but I know that I'm immediately going to need that data again. I can't reasonably use the memcache API myself because in the event that the data has been evicted, it will need to fallback to the datastore. I'm mostly looking at the unnecessary cost of:

  1. millions of daily reads
  2. extra instance hours (if I save 20ms per call, 1M calls = ~5.5 instance hours / # of concurrent requests)

Adding the functionality should be fairly simple, won't change how it works for any existing users and it makes the package more useful. I don't think it confuses the package for new users either. It's still a drop-in replacement, and I think having some memcache specific calls is expected. I envision simply calling PutMulti like normal, then taking the same data and immediately caching it.

// data processing
nds.PutMulti()
nds.CachePutMulti() / nds.PrimeCache()

I'd also be fine if CachePutMulti did the actual datastore.PutMulti and then inserted it into memcache.

from nds.

jongillham avatar jongillham commented on May 26, 2024

You make a good case! I'll leave this issue open.

While you're here. I committed some updates to the package today to improve RunInTransaction and reduce the likelihood of one possible inconsistency edge case occurring. I can't see any practical way to remove the edgecase though and it must also be present in Python's NDB.

There's also one bug I noticed when using Get, but not GetMulti in RunInTransaction therefore I encourage you to update to the latest commit of this package.

from nds.

jongillham avatar jongillham commented on May 26, 2024

@derekperkins are the entities that you want to use with CachePutMulti & CacheGetMulti immutable? If so then there is a much simpler solution (which I also require in my code). We wouldn't need need any of the locking mechanics at all and could use CloudFlare LRU cache you mentioned to save some of them in memory.

from nds.

derekperkins avatar derekperkins commented on May 26, 2024

Some of them are immutable and some of them aren't. I would love to have the LRU cache in local instance memory as an option though.

from nds.

jongillham avatar jongillham commented on May 26, 2024

I still can't figure out how to do this safely in transactions.

What about one of these two options:

  1. Expose methods:
MemcacheKey(key *datastore.Key) string
MemcacheMarshal(val interface{}) ([]byte, error)
MemcacheUnmarshal(data []byte, val interface{}) error

To allow you to add/get/set/delete nds memcache entities independently of the nds package - but still allow nds to access the cached entities because they will be marshalled in a format that nds recognises.

  1. Add methods that only puts things in memcache:
PutCacheMulti(c appengine.Context, keys []*datastore.Key, vals interface{}) ([]*memcache.Item, error)
PutCache(c appengine.Context, key *datastore.Key, val interface{}) (*memcache.Item, error)

In your case PutCacheMulti could be called after PutMulti to prime the cache:

PutMulti(c, keys, vals)
PutCacheMulti(c, keys, vals)

I am still a bit concerned as both methods enable the user to completely destroy the strong cache consistency guarantees - which is the whole point of this package.

from nds.

derekperkins avatar derekperkins commented on May 26, 2024

I would prefer option 2, since I don't need much insight into the innards of Memcache.

I think some strong documentation warnings would work fine. If users want to hang themselves, there's no way for you to completely stop them.

An alternative would be to use the ndsutil package for advanced things, starting with this and hopefully eventually including the LRUCache. That way the nds package maintains its strong consistency guarantee, and you aren't going to "accidentally" include ndsutil if you don't know what you're doing.

from nds.

derekperkins avatar derekperkins commented on May 26, 2024

@jongillham I started investigating using Redis or some other store when it hit me what exactly I'm looking for in nds and I think can be best provided here. Currently, the paradigm of the package is that the datastore is the center of the universe and memcache is simply there to speed it up.

What I'm really looking for is a solution that has memcache as the center of my universe, and datastore is being used merely as a persistence layer in the event that my data is evicted. Normally when using nds this way, I'm inside of some data processing and expect to use the data immediately, hence wanting to put it directly into memcache rather than waiting on the first datastore.Get, and then delete the data once I've processed it.

I think the package is already 80-90% there, and in my use cases, I'm not doing transactional work. It's just a simple in memory cache with a persistence guarantee. Does that make sense? Is this CacheMulti attempt even the right way to go about this? Maybe it is, I'm just reopening the discussion.

from nds.

jongillham avatar jongillham commented on May 26, 2024

@derekperkins you are right that with nds the datastore is the center of the universe. It is really difficult, at least for me, to maintain cache consistency with functions like CacheMulti. On the surface it seems straight forward to just add CacheMulti but as you dig deeper all sorts of edge case consistency issues arise. The cache strategy I base nds on required Guido Van Rossum, the creator of Python, to invent. I tried a few times with your suggestion but ran into hurdles. I'm certainly open to someone making it work though.

However, with your use case it should be really easy to create a wrapper around datastore.Get/Put/Delete to use the cache. You could even do some interesting things if you are really concerned with latency at the expense of cost such as firing off datastore.Get and memcache.Get requests at the same time and use which ever one returns first. The only 'gotcha' I would envisage is the one your colleague found where by memcache keys would need to be SHA1 hashed if they were too long.

from nds.

derekperkins avatar derekperkins commented on May 26, 2024

The only thing that I am looking to do is to have the option to have data put into cache on nds.Put instead of having to wait until nds.Get. I don't understand how prefilling the cache in the exact same way just with different timing could affect consistency. Is it only edge cases with regards to transactions?

from nds.

jongillham avatar jongillham commented on May 26, 2024

I've thought about this some more and might have a really simple solution. I'll implement after I have stabilised the context branch as it will be easier to do after the refactoring.

So I don't forget:
In a transaction nds.Put must work exactly as it does now.
Outside a transaction nds.PutCacheMulti should:

  1. Lock memcache keys with set method.
  2. Get memcache locks with get method.
  3. Put keys and values with datastore put.
  4. Remove memcache locks with CAS, replacing them with values.

from nds.

derekperkins avatar derekperkins commented on May 26, 2024

That sounds great.

from nds.

jongillham avatar jongillham commented on May 26, 2024

Instead of creating new methods it might be cleaner to pass a new context with different caching policies:

    type CachePolicy int
    const (
        NoCache CachePolicy = iota
        CacheOnGet
        CacheOnPut
    )

    func NewContext(ctx context.Context, policy CachePolicy) context.Context {
        return context.WithValue(cacheTypeKey, cacheType)
    }

from nds.

derekperkins avatar derekperkins commented on May 26, 2024

I think that using Context that way is perfect. This is a much better solution than different methods. I'm 100% behind implementing this. It's also great because it's a completely non-breaking change, in fact completely transparent to anyone else using the package.

from nds.

derekperkins avatar derekperkins commented on May 26, 2024

Any thoughts on putting this into production?

from nds.

jongillham avatar jongillham commented on May 26, 2024

@derekperkins sorry I know you have been asking for this for a long time. I'll get it done in the next three weeks but will document a proviso that using the feature will decrease the strength of cache consistency. One day when I figure out how to do it properly I can remove the proviso.

from nds.

derekperkins avatar derekperkins commented on May 26, 2024

@jongillham - Thanks, I'm not trying to put undue pressure on you. We're scaling up now and starting to see the impact of extra reads that I know we don't need. I really appreciate your help on this.

We're not using transactions anywhere, so that's not an issue for our use case. I imagine that anyone willing to change the context from the default should also be wearing their big boy/girl pants and pay attention to the proviso. :)

from nds.

derekperkins avatar derekperkins commented on May 26, 2024

@jongillham: Did you have a specific implementation in mind? I could also take a stab at it if you're swamped.

from nds.

jongillham avatar jongillham commented on May 26, 2024

@derekperkins I was thinking about this earlier and maybe the whole context stuff will over engineer the package. For example I now need to cache my datastore calls in memory which means I would need to add a memory cache context to the package and implement a specific cache strategy.

Instead why don't I just expose createMemcacheKey and marshalPropertyList as public functions so that users can do what they like with memcache? For your case, you could just manually save your key and entity into memcache after your nds.Put so that nds.Get will be able to read it next time.

That would be the easiest change and would take two seconds. What do you think? Do you think the names of these functions should change?

from nds.

derekperkins avatar derekperkins commented on May 26, 2024

I don't really see any reason why I would need either of those functions separate from wanting the Put to also cache the record. Is your hesitation on adding a PutAndCache type function solely based on the edge case with transactions where the cache might lose consistency?

If you don't want to pollute the main package with an extra two functions, what about putting them into an nds/ndsutil package? Then they could live there until the consistency issue is taken care of, at which point you could promote them to the main package.

from nds.

jongillham avatar jongillham commented on May 26, 2024

I created this ndsutil package as an experiment with CreateMemcacheItem but it still doesn't feel right.

Maybe you should be wrapping your calls to the datastore in your own cache:

func PutEntity(key *datastore.Key, entity Entity) error {
    // Put in datastore.
    // Put in your cache.
}

func GetEntity(key *datastore.Key) (Entity, error) {
    // Get from your cache.
    // If not in cache get from nds.Get.
}

from nds.

derekperkins avatar derekperkins commented on May 26, 2024

Thanks for giving that a shot. I think we're going to have to migrate away from datastore. We're doing lots of small reads / writes which is killing us from a cost perspective. I think we've spent over $500 in the last 10 days just on datastore reads/writes, and that's with no indexes on any of our structs. Even if we got this to work, which should drop our reads in half, it still isn't a sustainable solution.

I'm currently looking at deploying Couchbase on Container Engine as a replacement. It's been great working with you and I think nds is a fantastic package!

from nds.

jongillham avatar jongillham commented on May 26, 2024

Ouch! It sucks you have to add complexity. Does your data need to be durable? Could you use a large dedicated memcache and refetch the odd piece of data from its source?

from nds.

derekperkins avatar derekperkins commented on May 26, 2024

I have a few different use cases, some of which I investigated using dedicated memcache. I have one high throughput that might work, but I have one that will require a few hundred GB of storage that needs to be semi-durable. I have that data sitting in BigQuery, which I then pull out, transform and have it available in json for client consumption. BQ isn't fast enough nor does it allow enough concurrent connections to be safely used for the front end connections. I also want to add syncing for offline access by mobile clients, which is what led me to Couchbase.

I figure that if I'm going to have to add complexity for one of those scenarios, I might as well use it for all of them. I also looked at using Cloud Bigtable, but that's a $1500 / mo minimum, and it doesn't support a few of the aggregations that I anticipate running.

It's been a hellish weekend, leaving code for server admin tasks, but I can't think of any better solutions. Datastore is great for the right purposes, but the pricing is all wrong for our use case.

BTW, have you tried Managed VMs yet? They take significantly longer to deploy for me, but they give you a little more freedom both in code choices and in machine types. I'm just getting started, but I think I'm going to migrate our entire application.

from nds.

jongillham avatar jongillham commented on May 26, 2024

What you are building sounds pretty powerful!

I've played around with Managed VMs when they were in trusted tester status ages ago and they were great. However I have a small team and always try to stay as far away from any from of sys admin creep as possible. Having said that, it seems feasible to set up the Managed VMs so you can just forget them once they are deployed.

from nds.

jongillham avatar jongillham commented on May 26, 2024

Closing this because the OP doesn't require it any more and it is probably cleaner to wrap the nds package in a custom cache if this feature is truly needed.

from nds.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.