Comments (26)
Implementing a CachePutMulti
function would likely only save two memcache calls and one datastore.GetMulti
call in the best case scenario compared to using a PutMulti
immediately followed by GetMulti
to prime the cache.
Interesting that I recently got a request to not have PutMulti insert items into memcache, until the requester realised that it doesn't in the first place.
I am not sure if it is worth saving the cost of one datastore call on the occasion a primed cache is needed vs keeping the API completely inline with appengine/datastore
. I assume latency is not most important with your data processing pipeline but you want processed data to be quickly accessible via cache from the frontend?
from nds.
In my pipeline, I'm processing millions of records a day. As each of those records comes in, I'm passing sending it to an external task for additional processing, with a callback url for when it completes. That task will take anywhere from seconds to minutes, but I know that I'm immediately going to need that data again. I can't reasonably use the memcache API myself because in the event that the data has been evicted, it will need to fallback to the datastore. I'm mostly looking at the unnecessary cost of:
- millions of daily reads
- extra instance hours (if I save 20ms per call, 1M calls = ~5.5 instance hours / # of concurrent requests)
Adding the functionality should be fairly simple, won't change how it works for any existing users and it makes the package more useful. I don't think it confuses the package for new users either. It's still a drop-in replacement, and I think having some memcache specific calls is expected. I envision simply calling PutMulti like normal, then taking the same data and immediately caching it.
// data processing
nds.PutMulti()
nds.CachePutMulti() / nds.PrimeCache()
I'd also be fine if CachePutMulti did the actual datastore.PutMulti and then inserted it into memcache.
from nds.
You make a good case! I'll leave this issue open.
While you're here. I committed some updates to the package today to improve RunInTransaction
and reduce the likelihood of one possible inconsistency edge case occurring. I can't see any practical way to remove the edgecase though and it must also be present in Python's NDB.
There's also one bug I noticed when using Get
, but not GetMulti
in RunInTransaction
therefore I encourage you to update to the latest commit of this package.
from nds.
@derekperkins are the entities that you want to use with CachePutMulti & CacheGetMulti immutable? If so then there is a much simpler solution (which I also require in my code). We wouldn't need need any of the locking mechanics at all and could use CloudFlare LRU cache you mentioned to save some of them in memory.
from nds.
Some of them are immutable and some of them aren't. I would love to have the LRU cache in local instance memory as an option though.
from nds.
I still can't figure out how to do this safely in transactions.
What about one of these two options:
- Expose methods:
MemcacheKey(key *datastore.Key) string
MemcacheMarshal(val interface{}) ([]byte, error)
MemcacheUnmarshal(data []byte, val interface{}) error
To allow you to add/get/set/delete nds memcache entities independently of the nds package - but still allow nds to access the cached entities because they will be marshalled in a format that nds recognises.
- Add methods that only puts things in memcache:
PutCacheMulti(c appengine.Context, keys []*datastore.Key, vals interface{}) ([]*memcache.Item, error)
PutCache(c appengine.Context, key *datastore.Key, val interface{}) (*memcache.Item, error)
In your case PutCacheMulti
could be called after PutMulti
to prime the cache:
PutMulti(c, keys, vals)
PutCacheMulti(c, keys, vals)
I am still a bit concerned as both methods enable the user to completely destroy the strong cache consistency guarantees - which is the whole point of this package.
from nds.
I would prefer option 2, since I don't need much insight into the innards of Memcache.
I think some strong documentation warnings would work fine. If users want to hang themselves, there's no way for you to completely stop them.
An alternative would be to use the ndsutil package for advanced things, starting with this and hopefully eventually including the LRUCache. That way the nds package maintains its strong consistency guarantee, and you aren't going to "accidentally" include ndsutil if you don't know what you're doing.
from nds.
@jongillham I started investigating using Redis or some other store when it hit me what exactly I'm looking for in nds
and I think can be best provided here. Currently, the paradigm of the package is that the datastore is the center of the universe and memcache is simply there to speed it up.
What I'm really looking for is a solution that has memcache as the center of my universe, and datastore is being used merely as a persistence layer in the event that my data is evicted. Normally when using nds
this way, I'm inside of some data processing and expect to use the data immediately, hence wanting to put it directly into memcache rather than waiting on the first datastore.Get
, and then delete the data once I've processed it.
I think the package is already 80-90% there, and in my use cases, I'm not doing transactional work. It's just a simple in memory cache with a persistence guarantee. Does that make sense? Is this CacheMulti
attempt even the right way to go about this? Maybe it is, I'm just reopening the discussion.
from nds.
@derekperkins you are right that with nds
the datastore is the center of the universe. It is really difficult, at least for me, to maintain cache consistency with functions like CacheMulti
. On the surface it seems straight forward to just add CacheMulti
but as you dig deeper all sorts of edge case consistency issues arise. The cache strategy I base nds
on required Guido Van Rossum, the creator of Python, to invent. I tried a few times with your suggestion but ran into hurdles. I'm certainly open to someone making it work though.
However, with your use case it should be really easy to create a wrapper around datastore.Get/Put/Delete
to use the cache. You could even do some interesting things if you are really concerned with latency at the expense of cost such as firing off datastore.Get
and memcache.Get
requests at the same time and use which ever one returns first. The only 'gotcha' I would envisage is the one your colleague found where by memcache keys would need to be SHA1 hashed if they were too long.
from nds.
The only thing that I am looking to do is to have the option to have data put into cache on nds.Put
instead of having to wait until nds.Get
. I don't understand how prefilling the cache in the exact same way just with different timing could affect consistency. Is it only edge cases with regards to transactions?
from nds.
I've thought about this some more and might have a really simple solution. I'll implement after I have stabilised the context branch as it will be easier to do after the refactoring.
So I don't forget:
In a transaction nds.Put
must work exactly as it does now.
Outside a transaction nds.PutCacheMulti
should:
- Lock memcache keys with set method.
- Get memcache locks with get method.
- Put keys and values with datastore put.
- Remove memcache locks with CAS, replacing them with values.
from nds.
That sounds great.
from nds.
Instead of creating new methods it might be cleaner to pass a new context with different caching policies:
type CachePolicy int
const (
NoCache CachePolicy = iota
CacheOnGet
CacheOnPut
)
func NewContext(ctx context.Context, policy CachePolicy) context.Context {
return context.WithValue(cacheTypeKey, cacheType)
}
from nds.
I think that using Context
that way is perfect. This is a much better solution than different methods. I'm 100% behind implementing this. It's also great because it's a completely non-breaking change, in fact completely transparent to anyone else using the package.
from nds.
Any thoughts on putting this into production?
from nds.
@derekperkins sorry I know you have been asking for this for a long time. I'll get it done in the next three weeks but will document a proviso that using the feature will decrease the strength of cache consistency. One day when I figure out how to do it properly I can remove the proviso.
from nds.
@jongillham - Thanks, I'm not trying to put undue pressure on you. We're scaling up now and starting to see the impact of extra reads that I know we don't need. I really appreciate your help on this.
We're not using transactions anywhere, so that's not an issue for our use case. I imagine that anyone willing to change the context from the default should also be wearing their big boy/girl pants and pay attention to the proviso. :)
from nds.
@jongillham: Did you have a specific implementation in mind? I could also take a stab at it if you're swamped.
from nds.
@derekperkins I was thinking about this earlier and maybe the whole context stuff will over engineer the package. For example I now need to cache my datastore calls in memory which means I would need to add a memory cache context to the package and implement a specific cache strategy.
Instead why don't I just expose createMemcacheKey
and marshalPropertyList
as public functions so that users can do what they like with memcache? For your case, you could just manually save your key and entity into memcache after your nds.Put
so that nds.Get
will be able to read it next time.
That would be the easiest change and would take two seconds. What do you think? Do you think the names of these functions should change?
from nds.
I don't really see any reason why I would need either of those functions separate from wanting the Put
to also cache the record. Is your hesitation on adding a PutAndCache
type function solely based on the edge case with transactions where the cache might lose consistency?
If you don't want to pollute the main package with an extra two functions, what about putting them into an nds/ndsutil
package? Then they could live there until the consistency issue is taken care of, at which point you could promote them to the main package.
from nds.
I created this ndsutil
package as an experiment with CreateMemcacheItem
but it still doesn't feel right.
Maybe you should be wrapping your calls to the datastore in your own cache:
func PutEntity(key *datastore.Key, entity Entity) error {
// Put in datastore.
// Put in your cache.
}
func GetEntity(key *datastore.Key) (Entity, error) {
// Get from your cache.
// If not in cache get from nds.Get.
}
from nds.
Thanks for giving that a shot. I think we're going to have to migrate away from datastore. We're doing lots of small reads / writes which is killing us from a cost perspective. I think we've spent over $500 in the last 10 days just on datastore reads/writes, and that's with no indexes on any of our structs. Even if we got this to work, which should drop our reads in half, it still isn't a sustainable solution.
I'm currently looking at deploying Couchbase on Container Engine as a replacement. It's been great working with you and I think nds is a fantastic package!
from nds.
Ouch! It sucks you have to add complexity. Does your data need to be durable? Could you use a large dedicated memcache and refetch the odd piece of data from its source?
from nds.
I have a few different use cases, some of which I investigated using dedicated memcache. I have one high throughput that might work, but I have one that will require a few hundred GB of storage that needs to be semi-durable. I have that data sitting in BigQuery, which I then pull out, transform and have it available in json for client consumption. BQ isn't fast enough nor does it allow enough concurrent connections to be safely used for the front end connections. I also want to add syncing for offline access by mobile clients, which is what led me to Couchbase.
I figure that if I'm going to have to add complexity for one of those scenarios, I might as well use it for all of them. I also looked at using Cloud Bigtable, but that's a $1500 / mo minimum, and it doesn't support a few of the aggregations that I anticipate running.
It's been a hellish weekend, leaving code for server admin tasks, but I can't think of any better solutions. Datastore is great for the right purposes, but the pricing is all wrong for our use case.
BTW, have you tried Managed VMs yet? They take significantly longer to deploy for me, but they give you a little more freedom both in code choices and in machine types. I'm just getting started, but I think I'm going to migrate our entire application.
from nds.
What you are building sounds pretty powerful!
I've played around with Managed VMs when they were in trusted tester status ages ago and they were great. However I have a small team and always try to stay as far away from any from of sys admin creep as possible. Having said that, it seems feasible to set up the Managed VMs so you can just forget them once they are deployed.
from nds.
Closing this because the OP doesn't require it any more and it is probably cleaner to wrap the nds
package in a custom cache if this feature is truly needed.
from nds.
Related Issues (20)
- Remove zeroMemcache* code. HOT 1
- It is not possible to detect if nested contexts are transactional. HOT 5
- Make the transaction context safe for concurrent access. HOT 1
- Failing tests with "appengine: NewContext passed an unknown http.Request" HOT 2
- Different namespaces used for memcache and datastore HOT 4
- Migrating data when the underlying struct changes HOT 2
- Tests fail when using nds HOT 14
- nds item not stored warning HOT 4
- Calling AddMulti for a single entity instead of Add HOT 2
- Panic with datastore.PropertyLoadSaver HOT 3
- Make compatible with cloud.google.com/go HOT 8
- ErrFieldMismatch in GetMulti returns only one result HOT 4
- Gracefully handle memcache quota limits HOT 2
- Support saving of entities to memcache in putMulti() HOT 7
- Tag v2 branch as v2.0.0 HOT 3
- [v2] Possible marshal/unmarshaling bug HOT 2
- Change locking policy for Transaction HOT 1
- Change module versioning from git branches to subdirectories. HOT 2
- want new tag in v2 HOT 2
- failed to Put dial tcp 192.168.0.3:15127: connect: connection refused HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nds.