Giter Club home page Giter Club logo

dataloader's Introduction

DataLoader

DataLoader is a generic utility to be used as part of your application's data fetching layer to provide a simplified and consistent API over various remote data sources such as databases or web services via batching and caching.

Build Status Coverage Status

A port of the "Loader" API originally developed by @schrockn at Facebook in 2010 as a simplifying force to coalesce the sundry key-value store back-end APIs which existed at the time. At Facebook, "Loader" became one of the implementation details of the "Ent" framework, a privacy-aware data entity loading and caching layer within web server product code. This ultimately became the underpinning for Facebook's GraphQL server implementation and type definitions.

DataLoader is a simplified version of this original idea implemented in JavaScript for Node.js services. DataLoader is often used when implementing a graphql-js service, though it is also broadly useful in other situations.

This mechanism of batching and caching data requests is certainly not unique to Node.js or JavaScript, it is also the primary motivation for Haxl, Facebook's data loading library for Haskell. More about how Haxl works can be read in this blog post.

DataLoader is provided so that it may be useful not just to build GraphQL services for Node.js but also as a publicly available reference implementation of this concept in the hopes that it can be ported to other languages. If you port DataLoader to another language, please open an issue to include a link from this repository.

Getting Started

First, install DataLoader using npm.

npm install --save dataloader

To get started, create a DataLoader. Each DataLoader instance represents a unique cache. Typically instances are created per request when used within a web-server like express if different users can see different things.

Note: DataLoader assumes a JavaScript environment with global ES6 Promise and Map classes, available in all supported versions of Node.js.

Batching

Batching is not an advanced feature, it's DataLoader's primary feature. Create loaders by providing a batch loading function.

const DataLoader = require('dataloader');

const userLoader = new DataLoader(keys => myBatchGetUsers(keys));

A batch loading function accepts an Array of keys, and returns a Promise which resolves to an Array of values*.

Then load individual values from the loader. DataLoader will coalesce all individual loads which occur within a single frame of execution (a single tick of the event loop) and then call your batch function with all requested keys.

const user = await userLoader.load(1);
const invitedBy = await userLoader.load(user.invitedByID);
console.log(`User 1 was invited by ${invitedBy}`);

// Elsewhere in your application
const user = await userLoader.load(2);
const lastInvited = await userLoader.load(user.lastInvitedID);
console.log(`User 2 last invited ${lastInvited}`);

A naive application may have issued four round-trips to a backend for the required information, but with DataLoader this application will make at most two.

DataLoader allows you to decouple unrelated parts of your application without sacrificing the performance of batch data-loading. While the loader presents an API that loads individual values, all concurrent requests will be coalesced and presented to your batch loading function. This allows your application to safely distribute data fetching requirements throughout your application and maintain minimal outgoing data requests.

Batch Function

A batch loading function accepts an Array of keys, and returns a Promise which resolves to an Array of values or Error instances. The loader itself is provided as the this context.

async function batchFunction(keys) {
  const results = await db.fetchAllKeys(keys);
  return keys.map(key => results[key] || new Error(`No result for ${key}`));
}

const loader = new DataLoader(batchFunction);

There are a few constraints this function must uphold:

  • The Array of values must be the same length as the Array of keys.
  • Each index in the Array of values must correspond to the same index in the Array of keys.

For example, if your batch function was provided the Array of keys: [ 2, 9, 6, 1 ], and loading from a back-end service returned the values:

{ id: 9, name: 'Chicago' }
{ id: 1, name: 'New York' }
{ id: 2, name: 'San Francisco' }

Our back-end service returned results in a different order than we requested, likely because it was more efficient for it to do so. Also, it omitted a result for key 6, which we can interpret as no value existing for that key.

To uphold the constraints of the batch function, it must return an Array of values the same length as the Array of keys, and re-order them to ensure each index aligns with the original keys [ 2, 9, 6, 1 ]:

[
  { id: 2, name: 'San Francisco' },
  { id: 9, name: 'Chicago' },
  null, // or perhaps `new Error()`
  { id: 1, name: 'New York' },
];

Batch Scheduling

By default DataLoader will coalesce all individual loads which occur within a single frame of execution before calling your batch function with all requested keys. This ensures no additional latency while capturing many related requests into a single batch. In fact, this is the same behavior used in Facebook's original PHP implementation in 2010. See enqueuePostPromiseJob in the source code for more details about how this works.

However sometimes this behavior is not desirable or optimal. Perhaps you expect requests to be spread out over a few subsequent ticks because of an existing use of setTimeout, or you just want manual control over dispatching regardless of the run loop. DataLoader allows providing a custom batch scheduler to provide these or any other behaviors.

A custom scheduler is provided as batchScheduleFn in options. It must be a function which is passed a callback and is expected to call that callback in the immediate future to execute the batch request.

As an example, here is a batch scheduler which collects all requests over a 100ms window of time (and as a consequence, adds 100ms of latency):

const myLoader = new DataLoader(myBatchFn, {
  batchScheduleFn: callback => setTimeout(callback, 100),
});

As another example, here is a manually dispatched batch scheduler:

function createScheduler() {
  let callbacks = [];
  return {
    schedule(callback) {
      callbacks.push(callback);
    },
    dispatch() {
      callbacks.forEach(callback => callback());
      callbacks = [];
    },
  };
}

const { schedule, dispatch } = createScheduler();
const myLoader = new DataLoader(myBatchFn, { batchScheduleFn: schedule });

myLoader.load(1);
myLoader.load(2);
dispatch();

Caching

DataLoader provides a memoization cache for all loads which occur in a single request to your application. After .load() is called once with a given key, the resulting value is cached to eliminate redundant loads.

Caching Per-Request

DataLoader caching does not replace Redis, Memcache, or any other shared application-level cache. DataLoader is first and foremost a data loading mechanism, and its cache only serves the purpose of not repeatedly loading the same data in the context of a single request to your Application. To do this, it maintains a simple in-memory memoization cache (more accurately: .load() is a memoized function).

Avoid multiple requests from different users using the DataLoader instance, which could result in cached data incorrectly appearing in each request. Typically, DataLoader instances are created when a Request begins, and are not used once the Request ends.

For example, when using with express:

function createLoaders(authToken) {
  return {
    users: new DataLoader(ids => genUsers(authToken, ids)),
  };
}

const app = express();

app.get('/', function (req, res) {
  const authToken = authenticateUser(req);
  const loaders = createLoaders(authToken);
  res.send(renderPage(req, loaders));
});

app.listen();

Caching and Batching

Subsequent calls to .load() with the same key will result in that key not appearing in the keys provided to your batch function. However, the resulting Promise will still wait on the current batch to complete. This way both cached and uncached requests will resolve at the same time, allowing DataLoader optimizations for subsequent dependent loads.

In the example below, User 1 happens to be cached. However, because User 1 and 2 are loaded in the same tick, they will resolve at the same time. This means both user.bestFriendID loads will also happen in the same tick which results in two total requests (the same as if User 1 had not been cached).

userLoader.prime(1, { bestFriend: 3 });

async function getBestFriend(userID) {
  const user = await userLoader.load(userID);
  return await userLoader.load(user.bestFriendID);
}

// In one part of your application
getBestFriend(1);

// Elsewhere
getBestFriend(2);

Without this optimization, if the cached User 1 resolved immediately, this could result in three total requests since each user.bestFriendID load would happen at different times.

Clearing Cache

In certain uncommon cases, clearing the request cache may be necessary.

The most common example when clearing the loader's cache is necessary is after a mutation or update within the same request, when a cached value could be out of date and future loads should not use any possibly cached value.

Here's a simple example using SQL UPDATE to illustrate.

// Request begins...
const userLoader = new DataLoader(...)

// And a value happens to be loaded (and cached).
const user = await userLoader.load(4)

// A mutation occurs, invalidating what might be in cache.
await sqlRun('UPDATE users WHERE id=4 SET username="zuck"')
userLoader.clear(4)

// Later the value load is loaded again so the mutated data appears.
const user = await userLoader.load(4)

// Request completes.

Caching Errors

If a batch load fails (that is, a batch function throws or returns a rejected Promise), then the requested values will not be cached. However if a batch function returns an Error instance for an individual value, that Error will be cached to avoid frequently loading the same Error.

In some circumstances you may wish to clear the cache for these individual Errors:

try {
  const user = await userLoader.load(1)
} catch (error) {
  if (/* determine if the error should not be cached */) {
    userLoader.clear(1)
  }
  throw error
}

Disabling Cache

In certain uncommon cases, a DataLoader which does not cache may be desirable. Calling new DataLoader(myBatchFn, { cache: false }) will ensure that every call to .load() will produce a new Promise, and requested keys will not be saved in memory.

However, when the memoization cache is disabled, your batch function will receive an array of keys which may contain duplicates! Each key will be associated with each call to .load(). Your batch loader should provide a value for each instance of the requested key.

For example:

const myLoader = new DataLoader(
  keys => {
    console.log(keys);
    return someBatchLoadFn(keys);
  },
  { cache: false },
);

myLoader.load('A');
myLoader.load('B');
myLoader.load('A');

// > [ 'A', 'B', 'A' ]

More complex cache behavior can be achieved by calling .clear() or .clearAll() rather than disabling the cache completely. For example, this DataLoader will provide unique keys to a batch function due to the memoization cache being enabled, but will immediately clear its cache when the batch function is called so later requests will load new values.

const myLoader = new DataLoader(keys => {
  myLoader.clearAll();
  return someBatchLoadFn(keys);
});

Custom Cache

As mentioned above, DataLoader is intended to be used as a per-request cache. Since requests are short-lived, DataLoader uses an infinitely growing Map as a memoization cache. This should not pose a problem as most requests are short-lived and the entire cache can be discarded after the request completes.

However this memoization caching strategy isn't safe when using a long-lived DataLoader, since it could consume too much memory. If using DataLoader in this way, you can provide a custom Cache instance with whatever behavior you prefer, as long as it follows the same API as Map.

The example below uses an LRU (least recently used) cache to limit total memory to hold at most 100 cached values via the lru_map npm package.

import { LRUMap } from 'lru_map';

const myLoader = new DataLoader(someBatchLoadFn, {
  cacheMap: new LRUMap(100),
});

More specifically, any object that implements the methods get(), set(), delete() and clear() methods can be provided. This allows for custom Maps which implement various cache algorithms to be provided.

API

class DataLoader

DataLoader creates a public API for loading data from a particular data back-end with unique keys such as the id column of a SQL table or document name in a MongoDB database, given a batch loading function.

Each DataLoader instance contains a unique memoized cache. Use caution when used in long-lived applications or those which serve many users with different access permissions and consider creating a new instance per web request.

new DataLoader(batchLoadFn [, options])

Create a new DataLoader given a batch loading function and options.

  • batchLoadFn: A function which accepts an Array of keys, and returns a Promise which resolves to an Array of values.

  • options: An optional object of options:

Option Key Type Default Description
batch Boolean true Set to false to disable batching, invoking batchLoadFn with a single load key. This is equivalent to setting maxBatchSize to 1.
maxBatchSize Number Infinity Limits the number of items that get passed in to the batchLoadFn. May be set to 1 to disable batching.
batchScheduleFn Function See Batch scheduling A function to schedule the later execution of a batch. The function is expected to call the provided callback in the immediate future.
cache Boolean true Set to false to disable memoization caching, creating a new Promise and new key in the batchLoadFn for every load of the same key. This is equivalent to setting cacheMap to null.
cacheKeyFn Function key => key Produces cache key for a given load key. Useful when objects are keys and two objects should be considered equivalent.
cacheMap Object new Map() Instance of Map (or an object with a similar API) to be used as cache. May be set to null to disable caching.
name String null The name given to this DataLoader instance. Useful for APM tools.
load(key)

Loads a key, returning a Promise for the value represented by that key.

  • key: A key value to load.
loadMany(keys)

Loads multiple keys, promising an array of values:

const [a, b] = await myLoader.loadMany(['a', 'b']);

This is similar to the more verbose:

const [a, b] = await Promise.all([myLoader.load('a'), myLoader.load('b')]);

However it is different in the case where any load fails. Where Promise.all() would reject, loadMany() always resolves, however each result is either a value or an Error instance.

var [a, b, c] = await myLoader.loadMany(['a', 'b', 'badkey']);
// c instanceof Error
  • keys: An array of key values to load.
clear(key)

Clears the value at key from the cache, if it exists. Returns itself for method chaining.

  • key: A key value to clear.
clearAll()

Clears the entire cache. To be used when some event results in unknown invalidations across this particular DataLoader. Returns itself for method chaining.

prime(key, value)

Primes the cache with the provided key and value. If the key already exists, no change is made. (To forcefully prime the cache, clear the key first with loader.clear(key).prime(key, value).) Returns itself for method chaining.

To prime the cache with an error at a key, provide an Error instance.

Using with GraphQL

DataLoader pairs nicely well with GraphQL. GraphQL fields are designed to be stand-alone functions. Without a caching or batching mechanism, it's easy for a naive GraphQL server to issue new database requests each time a field is resolved.

Consider the following GraphQL request:

{
  me {
    name
    bestFriend {
      name
    }
    friends(first: 5) {
      name
      bestFriend {
        name
      }
    }
  }
}

Naively, if me, bestFriend and friends each need to request the backend, there could be at most 13 database requests!

When using DataLoader, we could define the User type using the SQLite example with clearer code and at most 4 database requests, and possibly fewer if there are cache hits.

const UserType = new GraphQLObjectType({
  name: 'User',
  fields: () => ({
    name: { type: GraphQLString },
    bestFriend: {
      type: UserType,
      resolve: user => userLoader.load(user.bestFriendID),
    },
    friends: {
      args: {
        first: { type: GraphQLInt },
      },
      type: new GraphQLList(UserType),
      resolve: async (user, { first }) => {
        const rows = await queryLoader.load([
          'SELECT toID FROM friends WHERE fromID=? LIMIT ?',
          user.id,
          first,
        ]);
        return rows.map(row => userLoader.load(row.toID));
      },
    },
  }),
});

Common Patterns

Creating a new DataLoader per request.

In many applications, a web server using DataLoader serves requests to many different users with different access permissions. It may be dangerous to use one cache across many users, and is encouraged to create a new DataLoader per request:

function createLoaders(authToken) {
  return {
    users: new DataLoader(ids => genUsers(authToken, ids)),
    cdnUrls: new DataLoader(rawUrls => genCdnUrls(authToken, rawUrls)),
    stories: new DataLoader(keys => genStories(authToken, keys)),
  };
}

// When handling an incoming web request:
const loaders = createLoaders(request.query.authToken);

// Then, within application logic:
const user = await loaders.users.load(4);
const pic = await loaders.cdnUrls.load(user.rawPicUrl);

Creating an object where each key is a DataLoader is one common pattern which provides a single value to pass around to code which needs to perform data loading, such as part of the rootValue in a graphql-js request.

Loading by alternative keys.

Occasionally, some kind of value can be accessed in multiple ways. For example, perhaps a "User" type can be loaded not only by an "id" but also by a "username" value. If the same user is loaded by both keys, then it may be useful to fill both caches when a user is loaded from either source:

const userByIDLoader = new DataLoader(async ids => {
  const users = await genUsersByID(ids);
  for (let user of users) {
    usernameLoader.prime(user.username, user);
  }
  return users;
});

const usernameLoader = new DataLoader(async names => {
  const users = await genUsernames(names);
  for (let user of users) {
    userByIDLoader.prime(user.id, user);
  }
  return users;
});

Freezing results to enforce immutability

Since DataLoader caches values, it's typically assumed these values will be treated as if they were immutable. While DataLoader itself doesn't enforce this, you can create a higher-order function to enforce immutability with Object.freeze():

function freezeResults(batchLoader) {
  return keys => batchLoader(keys).then(values => values.map(Object.freeze));
}

const myLoader = new DataLoader(freezeResults(myBatchLoader));

Batch functions which return Objects instead of Arrays

DataLoader expects batch functions which return an Array of the same length as the provided keys. However this is not always a common return format from other libraries. A DataLoader higher-order function can convert from one format to another. The example below converts a { key: value } result to the format DataLoader expects.

function objResults(batchLoader) {
  return keys =>
    batchLoader(keys).then(objValues =>
      keys.map(key => objValues[key] || new Error(`No value for ${key}`)),
    );
}

const myLoader = new DataLoader(objResults(myBatchLoader));

Common Back-ends

Looking to get started with a specific back-end? Try the loaders in the examples directory.

Other Implementations

Listed in alphabetical order

Video Source Code Walkthrough

DataLoader Source Code Walkthrough (YouTube):

A walkthrough of the DataLoader v1 source code. While the source has changed since this video was made, it is still a good overview of the rationale of DataLoader and how it works.

Contributing to this repo

This repository is managed by EasyCLA. Project participants must sign the free (GraphQL Specification Membership agreement before making a contribution. You only need to do this one time, and it can be signed by individual contributors or their employers.

To initiate the signature process please open a PR against this repo. The EasyCLA bot will block the merge if we still need a membership agreement from you.

You can find detailed information here. If you have issues, please email [email protected].

If your company benefits from GraphQL and you would like to provide essential financial support for the systems and people that power our community, please also consider membership in the GraphQL Foundation.

dataloader's People

Contributors

adamherrmann avatar andrew-demb avatar appleboy avatar ardatan avatar basarat avatar benjie avatar blakeembrey avatar boopathi avatar brianwarner avatar caub avatar colin-h avatar danielcaballero avatar dependabot[bot] avatar dhritzkiv avatar efueyo avatar erkin97 avatar ezralalonde avatar flarnie avatar gagoar avatar gajus avatar kouak avatar leebyron avatar mcg-web avatar moret avatar ryanspradlin avatar saihaj avatar sheerun avatar simenb avatar thekevinbrown avatar wincent avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dataloader's Issues

Object/array as key

Would you consider supporting Object/array as key? Currently that will work apart from caching, because Map use === identity.

There are situations when I need to pass more than just simple number or string. In that case I have to stringify it and parse at the moment to be able use cache. Is that something that could be part of dataloader?

Flow support

This library compiles itself into dist/index.js using Babel which strips out the Flow annotations, but what about those of us using this in a flowtyped project that are using babel to compile later down the line? Is there any way to get autocompletion/type checks?

The only thing I can think of is writing a separate flowtype interface file and including it in my project, but given the definitions are already written it seems like there should be a way to load them from the node module.

Add support for asynchronous cacheMap

When dealing with large datasets it would be nice to be able to use also other cache drivers as Redis, memcache etc. as it could be too much for just memory storage, so DataLoader should be able to handle Promise specifically when using get() to retrieve cached Promise on cacheMap ...

Q: batching assumption

@leebyron this lib looks great, thanks for sharing!

Is my assumption correct that dataloader batches requests within one event loop and send request to db on nextTick or something? I guess its only way to say when batching is finished.

If so, maybe would be good to explicitly mention that in docs as it can get important when designing backend flow.

Lets say that I have sql backend without parallelize - it seems that in that case it still might make sense to batch sql queries there, so they resolve at the same event loop, therefore they have higher chance to batch together in subsequent queries and result in less requests. Does that sounds like reasonable thought process? :-)

Reconciling array sort order in your batch load function

I was looking at the documentation and though it doesn't explicitly state this, I've reviewed the code and have come to this understanding so please feel free to correct any of my misinterpretations.

The keys parameter for the batchLoadFn will be ordered according to the order in which load or loadMany was called - i.e. Developers should probably treat it as a random order.

The return value of the Promise from the batch load function is expected to be an array of values ordered corresponding to the array of keys parameter.

i.e. If the keys parameter of the batch load function is
[1, 9, 5, 3]
then the returned array of values should be
[
{ id: 1, name: 'Stevo' }
{ id: 9, name: 'Marco' }
{ id: 5, name: 'Warro' }
{ id: 3, name: 'Damo' }
]

What is the recommended approach for achieving this from within the batch load function?
As far as I can tell, you would need to do something like this which seems non-performant:
https://gist.github.com/panetta-net-au/c68fe9fcdc0de93c1e3d51099de45573

Wouldn't it be more performant to provide a mechanism to sort the keys before calling the batch load function? The same sort function could then optionally be used inside the batch load function before returning the results to ensure correctness.

Comparison of "ordered keys versus randomly sorted keys" performance here:
https://gist.github.com/panetta-net-au/ebf42b169ae59a80f9e4aa015f41e269

Example implementation of my suggestion here:
panetta-net-au@c1eb122

Apologies if I've totally missed the mark. If this is the case, I would love an explanation of or link to best practices on how to integrate with a DB.

Thanks for your time.

Question: Does async/await remove the benefits of DataLoader?

In my implementation of GraphQL & DataLoader, I often use async/await to get the result of a database query. Then I pass the result into a model where I control the ACL (access control).

By doing this, does it remove the benefits of DataLoader? i.e. am I waiting for each DataLoader query to finish one-by-one by doing async/await on every query? I ask because if I don't do async/await and return a promise as an output (without my model ACL layer), the GraphQL APIs continue to work, I'd like to know if my usage of the ACL model is eliminating the benefits of DataLoader..

LoadMany spawns Promise for each key

Hi! Now loadMany calls load inside for each key which spawns lots of promises (it is cheap, but we spawning lots of them), but it can just return array from bulk function. Can/Should this be improved?

DataLoader doesn't seem to be clearing cache

I noticed that GraphQL hasn't been updating a few records, after some digging I tried to use clear() and clearAll(), unfortunately none of them worked.

I use DataLoader as follows:
https://github.com/sibelius/graphql-dataloader-boilerplate/blob/master/src/loader/UserLoader.js#L21-L23

And a function for clearing DataLoader which is something like this:

static clearCache(id) {
  return User.userLoader.clear(id.toString());
}

The .toString() function is because sometimes the id passed is a Mongoose ObjectId, which I also thought that could be a problem so I tried instantiating DataLoader like this:

static orderLoader = new DataLoader(
    async ids => Promise.all(
      ids.map(id =>
        OrderModel.findOne({
          _id: id,
        }),
      ),
    ), { cacheKeyFn: key => key.toString() },
  );

Which also did not work.

I also have a few workers that updates records based on their statuses, so I need to call UserLoader.clearCache(userId) whenever I update an User record, the same goes for a mutation.

DataLoader cache works like a charm but it keeps hitting the cache after clearing, even using clearAll(), I'm not sure what the problem is.

Is it safe to create an app-level DataLoader instance that isn't cached?

I'm looking for more code examples of DataLoader.

I understand why there should be a new data loader for every request for cached data loaders per data type.

However for QueryLoaders, where the data is not cached, is it safe to make it an app-level data loader where its instance is created only once in app start?

Suggestion: Cache TTL

I absolutely love DataLoader so far. One thing that would make it even better is if there were a way to specify a TTL. It could be as simple as running a setTimeout with a call to clear inside it at the ttl. Would this be possible? And I'm happy to help contribute to this too!

Promise.all() should better be a join()

This is not really an issue, but more of an idea to consider.

The implementation uses ES6 Promise.all that resolves when all promises are successful. To handle partial failure each promise either returns the result or an 'Error'.
When porting to Java this means I would return an instance of a RuntimeException or otherwise serialized error information. This would also mean the value would be type Object.

Better would be if each Promise itself reported result or failure through a flag (similar to how Vert.x handles it with CompositeFuture (https://github.com/eclipse/vert.x/blob/master/src/main/java/io/vertx/core/CompositeFuture.java#L200). Unfortunately this is not standard ES6 Promise, so would need extra code.

For Vert.x I have proposed adding overloads to 'CompositeFutre.all()' and 'CompositeFuture.any()' that resolve only when all constituent futures / promises either succeed, or fail. The end result still depends on the ANY and ALL predicate, but when the composite is resolved all individual results are also present.
I found http://bluebirdjs.com/docs/api/promise.join.html but don't know if that's similar in idea.

Hope I explained clearly. Full discussion on Vert.x here eclipse-vertx/vert.x#1534

Java port

I'm very impressed by DataLoader and trying to port the DataLoader into Java world but I got stuck with this.

If I understand correctly, the key principle of DataLoader is JavaScript eventloop. It can automatically determine when to run batch function just by adding it's execution on next tick.

But in Java there are no evenloop or something similar. The loader code just can't know when to run this batching function.

I tried to do something with CompletableFuture but with no luck. It doable but only with manual execution of batching function of all loaders which were used to load something.

Something like this:

DataLoader userLoader = new DataLoader((keys) -> { //return values; });
CompletableFuture<String> result = CompletableFuture.supplyAsync(() -> {
  List<User> users = new ArrayList<>();
  userLoader.load(1).thenApply(user -> { users.add(user); });
  userLoader.load(2).thenApply(user -> { users.add(user); });
  return users;
}).thenApply(users -> {
  userLoader.batchLoad(); // we need to run this new function to process the queue
  return users;
});

I just want to ask for suggestions. I know this library is port of Haxl library for Haskell, but I know nothing about haskell. It has an eventloop too? Or maybe there's some other way to implement this that I can't see?

If this repo is not suitable for this kind of questions (I understand it's Java and not JS) then I apologize. I'm just hoping that maybe someone has already tried something like this or suddenly the DataLoader's author knows java. :)

Add option to skip cache if result not found

If you have the following events:

DataLoader. load(1) // returns null
Some code then inserts 1
DataLoader.load(1) // returns null, even though the record exists.

What to do here? Is there an elegant way to clear the cache for keys that returned null?

Incorrect logic?

Here is a confusing logic for understanding about dispatch necessity.

Shouldnt it be

var needsDispatch = this._queue.length > 0;

Would be happy to make a PR if I'm right.

Knowing when to wait before fetching

Suppose you have two loaders, one which loads some featured photos and one which loads photos for a particular user:

async function loadFeaturedPhotos() {
  let ids = SV_FEATURED_PHOTO_IDS;
  let photos = await photoLoader.loadMany(ids);
  return photos;
}

async function loadUserPhotos(userId) {
  let user = await userLoader.load(userId);
  let photos = await photoLoader.loadMany(user.photoIds);
  return photos;
}

Let's say you run them at the same time:

let ... = Promise.all([
  loadFeaturedPhotos(),
  loadUserPhotos(4),
]);

If my understanding here is right, the featured photos and current user will fetch simultaneously, then the photos loader will fetch separately to get the user's photos. If the photos loader was really expensive and you'd save significantly by batching the two photos requests together, is there any recommendation for how to make that happen using DataLoader? Specifically, somehow realizing that there'd be another photos request and deferring the fetch until then.

It seems like this would be hard to codify without the full data dependencies known statically but I'm hoping there's a smart technique I haven't thought of.

Race condition

Hello. I have a dataloader in my application which fetches a current user's profile. I am passing 'me' as a key and i want my dataloader to return same Promise for all it's load function.

   UserProfile: new DataLoader(async (keys) => {
      return Promise.all(keys.map(async () => {
          ..fetch and return data. hit /api/user enpoint
      }));
   });

So it's working, but the loader still sends two request to my API server. I found that when load function gets invoke in the first two times this._promiseCache.entries() returns empty object and only in the third time it returns MapIterator { [ 'me', Promise { _d: [Object] } ] }. And i also tried to add the key to my custom object in the beginning of the load function and print it every time the load function gets invoked. And it worked perfect, first time it was undefined but the next invokations it was defined. Is there any way to fix it? Do i something wrong?

Redis async cacheMap

I am trying to implement a cacheMap that is async. Meaning the the "get" function needs to inform asynchronously wether the specified key is cached or not. But that is not working.

My code is here: https://gist.github.com/rbpinheiro/942218abe33a6b533d37bbd9e70e129c

It seems that returning a promise on "get" is the same as stating a cache hit.
If you uncomment line 9, that returns false, it will always assume a cache miss.
How to implement the cache miss asynchronously?

I investigated dataloader's code and found this on line 87:

var cachedPromise = this._promiseCache.get(cacheKey);
if (cachedPromise) {
  return cachedPromise;
}

That "if" seems to be the only check for cache miss, which doesn't cover the asynchronous case.

manually clear key when Transient Errors encounted?

The README says:

Transient Errors: A load may fail because it simply can't be loaded (a permanent issue) or it may fail because of a transient issue such as a down database or network issue. For transient errors, clear the cache:

and gives an code snippet on how to deal with this situation:

userLoader.load(1).catch(error => {
  if (/* determine if error is transient */) {
    userLoader.clear(1);
  }
  throw error;
});

It is recommended to do clear(key) manually. But after i read the code, it seems that DataLoader already take care of this. Am i missing something or the doc need to update?

synchronously get the data or null?

Did you consider providing a get(key) which would do like load(key) but instead would just pull the data out OR returns null if it's not loaded yet.

With current API we can do this

class Foo extends Component {
  state = {
    foo: null,
  };
  componentWillMount () {
    FooStore.load(1).then(foo => this.setState({ foo }));
  }
  render () {
    const {foo} = this.state;
    return !foo ? <div>loading...</div> : <div>{JSON.stringify(foo)}</div>;
  }
}

the only downside is that there is a slight time where the component first renders "loading..." even if the data is already loaded because of the nature of Promise (always run in next tick).

Here is how we could rewrite with a get() API:

class Foo extends Component {
  state = {
    foo: FooStore.get(1),
  };
  componentWillMount () {
    FooStore.load(1).then(foo => {
      if (foo !== this.state.foo) this.setState({ foo });
    });
  }
  render () {
    const {foo} = this.state;
    return !foo ? <div>loading...</div> : <div>{JSON.stringify(foo)}</div>;
  }
}

Option to specify maximum batch sizes

There are situations where it makes sense to cap batch size, be it for performance reasons or for technical limitations (such as URL length when using a REST endpoint). It'd be very useful to be able to specify the maximum size for a batch when using dataloader, which would result in multiple promises being created for the tick, rather than just one.

Example:

new DataLoader(batchLoadFn, {maxBatchSize: 100});

Proposal: Batching requests in an interval of time

I would like to propose considering a interval option to dataloaders.

This option would allow to batch together requests spanning multiple ticks of the event loop.

A value of 0 (the default) would mean to batch only requests in the same tick, that is, the current behaviour.

It is useful when you have many requests close enough but not exactly in the same tick, and a small latency for individual requests is okay. For example, apollo-client library supports this approach to batch graphql queries in a window of, by default, 10ms.

Duplicate keys in "IN", relational calls

Hi there,

I'm currently working on a GraphQL version of our API and i must say its a joy to work with your dataloader, however i just recently noticed that its not deduplicating keys in my "IN" select, possibly making my database look up the same thing 18 times and probably comparing it each time.

As an example I'm trying to pull out 18 cases from our database, each of them with a customer attached.

The SQL that gets executed looks like this:

SELECT "company_name" AS "company", "customer_id" AS "id", "company_name" AS "company", "unik_id" AS "APIKey" FROM "complex_modules_realtor_customer_settings" AS "customer" WHERE "customer"."customer_id" IN (810, 810, 810, 810, 810, 810, 810, 810, 810, 810, 810, 810, 810, 810, 810, 810, 810, 810);

Rather than just :

SELECT "company_name" AS "company", "customer_id" AS "id", "company_name" AS "company", "unik_id" AS "APIKey" FROM "complex_modules_realtor_customer_settings" AS "customer" WHERE "customer"."customer_id" IN (810);

Is this intentional or maybe im doing something wrong? (I'm using sequelize and graphql-sequelize so help me out).

Syntax errors in README

  1. A comma and a colon are missing near here in the GraphQL example:
      type: new GraphQLList(UserType)
      resolve (user, { first }) => queryLoader.load([
  1. The last line of the Redis example }); is missing one closing parenthesis.

  2. This indentation is probably a mistake:

      id => rows.find(
        row => row.id === id) || new Error(`Row not found: ${id}`
      )

Custom cache examples

Are there examples of custom cache implementations with dataloader? For example using redis to provide a LRU cache with expiry? Or in-memory cache with upper memory bound.

facebook/Haxl

Looks like dataloader shares some ideology with Haxl, would be cool to add a note about this into readme too.

Distinct keys batching with cache disabled

Hello guys.

This lib is very valuable ! great job

I am encountering an issue when calling the load function with the same key and having the cache disabled. I expected that batching would only call my getBatch with distinct keys but instead I got repeated keys.

This use case generate issues when selecting data in a DB and for example using a WHERE IN statement.
It will only return one value if we pass the same key more than once but DataLoader will throw an exceptions.

I believe we could use a temp cache for batching to know if it is useful to add the same key loading to a batch without caching the promise.

I'd love to make a PR if everyone feels it's a good idea

What do you think ?

License question

I am intending to write a Java port of dataloader. It will be Apache 2.0 licensed.
The code will be in Java and using Vert.x.

I want to rewrite the test in Java. It will match the js as close as possible. This should be no problem licensing-wise I assume, but just asking anyway.

How does pagination work with dataloader?

I've been having trouble wrapping my head around how pagination works with a batched data loader.

scalar Cursor # an opaque string which identifies a resource

query GetUsers($first: Int, $after: Cursor) {
  users(first: $first, after: $after) {
   name
  }
}

Do the same pagination parameters $first and $after get passed into every possible batch? Is there some page accounting that has to happen a layer above the dataloader layer? Or is dataloader batching incompatible with pagination?

Would love to get your thoughts @leebyron (cc @nicksrandall)

Promise.all(...load) returns different order when batched

Given the following GraphQL resolver that does a Promise.all on several loader.load(id) calls:

/**
 * GraphQL resolver that, given an ugly "posts" feed fetches
 * the pretty version from a loader.
 */
posts(ugly, args, context) {
  console.log("ugly", ugly.posts.map((post) => post.id));

  // Wait on all posts to load from the prettier source
  return Promise.all(
    ugly.posts.map((post) => loader.load(post.id))
  ).then((posts) => {
    console.log("pretty", posts.map((post) => post.id));

    return posts;
  });
},

When batch: false, the result is correct:

// ugly [ 7, 82, 102, 9, 11, 6, 201, 22, 75, 327 ]
// pretty [ 7, 82, 102, 9, 11, 6, 201, 22, 75, 327 ]

However, when batch: true, the output is somehow ordered:

// ugly [ 7, 82, 102, 9, 11, 6, 201, 22, 75, 327 ]
// pretty [ 6, 7, 9, 11, 22, 75, 82, 102, 201, 327 ]

(Note that the value of cache has no impact on this whatsoever).

I couldn't find any place in the code that ordering occurs based off of batch.

This behavior is worrisome as it doesn't match expectations with example code:

var [ a, b ] = await Promise.all([
  myLoader.load('a'),
  myLoader.load('b')
]);

Talking of Dataloader as a Cache confuses developers

Hi, I'll keep this fairly brief for now, but try to expand it later.

A common issue I've seen with people using Dataloader is that they either use it as an application-wide cache, or they don't understand how dataloader works at a lower-level, and misunderstand why they are using it.

I feel like this is mainly a documentation issue. Dataloader is a data loading mechanism, where you load that data from is arbitrary, all that Dataloader ensures is that a signal value by a given key always resolves to the same value.

That is, I've seen people create a dataloader instance at application level, and think that this is safe because they're calling userLoader.clear() after changing a user. The problem with this is that that userLoader.clear() isn't broadcast to all other instances in your cluster.

Say you have a node.js server using dataloader, and you horizontally scale it — if you retain data between requests than you can get a case where server1's dataloader has a different local state to server2 which could have a different state from the database (e.g., server3 has written to the database) — this issue is encountered as soon as you go from one process to more than one process, because the dataloader "cache" is in-process-memory, not some sort of shared memory for all processes. (e.g., using memcached for caching would be sharing that memory between servers / processes)

I think there's a better way to communicate what dataloader is doing, and it should be highly promoted that dataloader is best suited to per-request caching (such that all values by a given key resolve to the same value within the one request, such that you don't get data-drift within the one request — sort of a snapshot of the world at that given time).

if the storage for the "cached" state for dataloader actually came from a shared datasource (memcached, redis, etc), then this issue would be avoided completely; but then you're also by-passing the point of dataloader in some ways.

I'd be willing to restructure / rewrite the documentation to be clearer as to split the "what it does" from the "api for doing it" – then again, I'd be fairly inclined to say clear or reset are anti-patterns, and shouldn't be generally used — i.e., don't use them to clear state between requests, only between the instance of that object in one request, e.g.,

1 request start
2  -> userLoader.get(1)
3    -> Users.update(1, { new: "data" })
4       -> userLoader.clear(1);
5          respond with userLoader.get(1)
6          -> request end

i.e., that second userLoader.get call shouldn't return the result from line 2, instead, it should return the fresh result from the database.

Perhaps a better API would be userLoader.get() and userLoader.getFresh() — i.e., we're not talking about a cache, as it's not a "cache" but more a call identity memoization.

I'd be interested in hearing others thoughts here, in particular, let me know if this makes any sense at all, and whether you'd be interested in a PR to redocument this project to help users avoid misunderstandings

Dataloader for each field

Please refer to: graphql/graphql-js#623 first, these issues are linked.
Let's say I'm querying this:

viewer { 
  posts(id:1) { 
   id
   title
   comments(first: 10) { 
    edges { 
    node { 
      text 
       name
        ...  
     }
   }
  }
 }
}

What happens on server:

  1. post id is fetched
  2. In "PostType", title is fetched ( through dataloader? ) and then comments
  3. etc..

How would you use dataloader for loading data for each field? Create one instance per field e.g. const title = new Dataloader( ...load posts's title... )?
But in this case title would be cached, and if user changes data - logic should be applied to update each loader separately (based on fields defined in the type that use dalataloader)

And where do you store all the loaders? It seems like a common practice to store them in an object e.g. const loaders = { .. loaders .. }

@leebyron @josephsavona


Relay/graphql boilerplate has code that lightly touches on this topic: https://github.com/codefoundries/UniversalRelayBoilerplate/blob/master/graphql/ObjectManager.js#L162 check it out.

It would be nice if devs could provide some docs on this topic

Expose list of keys that are cached by a loader

// naive cache invalidation strategy

myLoader.clearAll();
myLoader.loadMany([1, 2, 3]); // list of keys is computed

// ... later
myLoader.clearAll();
myLoader.loadMany([2, 3, 4]); // compute new list of keys

The list of keys that are loaded would need to be tracked externally to determine the set of keys to clear:

// want to get this:
[1, 2, 3] \ [2, 3, 4] = [1]

For this to work, this._promiseCache would need to be officially exposed (e.g .cachedKeys()).

Thoughts?

populating cache from non-key queries

Are there any patterns to deal with mixing results from a non-key query into a dataloader which uses the key?

For example I have a userLoader which takes the id.

SELECT * FROM users WHERE id IN (1, 2, 3)

Then I have a childrenLoader which takes a user id and returns all of its children.

SELECT * FROM users WHERE parentId IN (13)

I'd like to populate the userLoader cache with the results from childrenLoader. I don't know if this is a good idea, just wondering if this has come up / any suggestions.

I'm guessing the pattern is just:

SELECT id FROM users WHERE parentId = 13

And then use them with loadAll from userLoader.

Typings after installation via npm

Hello.

After installation via npm I don't see the typings. Just dist/index.js:

$ ls -la node_modules/dataloader/dist
total 20
drwxr-xr-x 3 501 dialout   102 Apr  5  2016 .
drwxr-xr-x 7 501 dialout   238 Jan 15 11:10 ..
-rw-r--r-- 1 501 dialout 10343 Apr  5  2016 index.js

Thanks in advance!

[Custom cache] Can't use promise

I want to implement my own cache based on Redis in order to have a shared and distributed cache unlike the current one which is coupled per node instance (Map object). Obviously the Redis client lib is async and return a promise but the Dataloader cache methods (get() set() clear() delete()) are synchronous only.

Any ideas how to solve this situation ?

Implementation in Golang

I have a WIP implementation in Golang.

The concurrency/async primitives in JavaScript don't translate very well to Golang but I have done my best to take the core themes of DataLoader and implement them in Go.

I would love to get feedback from others who are familiar with Golang and DataLoader so I'm posting this here to solicit feedback.

https://github.com/nicksrandall/dataloader

Return null as valid value

Is it possible to return null or undefined value as correct value?
I will always get next error: DataLoader must be constructed with a function which accepts Array and returns Promise<Array>, but the function did not return a Promise of an Array of the same length as the Array of keys.

I am trying to solve issue where user is not exists in the table but this state is OK

Confusion about key.length === value.length

So I have been writing a GraphQL API and loving it!
We have run into an N+1 issue, and after watching Lee's video,
we have decided to try out Dataloader.

Here there is an error about the key length and value length matching.

Say for instance we have an ArtistType. An ArtistType has a GraphQLList of ArticleType.

// ArtistType
export default new GraphQLObjectType({
    name: 'Artist',
    fields: () => ({
                ...
                articles: {
                       type: new GraphQLList(ArticleType),
                       resolve: (artist, args, ast, { rootValue }) =>  {
                               return ArtistArticlesLoader.load(artist.id)
                       }
                }
        })
})

My dataloader is something similar to this (note that I'm using Sequelize:

export const ArtistArticlesLoader = new DataLoader((ids) => new Promise((resolve, reject) => {
    sequelizeDB.query(
        `SELECT a2.*
                  FROM ARTIST a1
                  LEFT JOIN ARTICLES a2 ON a2.subjectId = a1.id
                  WHERE a1.id IN (:ids)`,
        {
                        type: Sequelize.QueryTypes.SELECT,
                        replacements: {
                            ids: ids
                        },
                        raw: true
        }
    ).then((results) => {
        let res = results.map(result => {
            return {
                ...result
            }
        })
        resolve(res)
    )
}))

Let's say I throw a query like this:

query {
    artists {
        articles {
            title
        }
    }
}

Now since Artists' Articles are m:n, the keys and values lengths don't match and I'm getting that error.
Am I using this in the wrong way?
I've checked out examples like this, but don't know how the lengths match..

I'm seeing two queries for a single value despite using a DataLoader

I've just started to implement DataLoader within my project. My project involves GraphQL and Relay. I've used a few third party libraries that provide a custom Relay Network Layer that supports batching queries within a single HTTP request. My GraphQL server instantiates DataLoaders once per request.

With that said, I'm seeing Load execute the same batch loading function twice, despite being given the same key. It's my understanding that a DataLoader is given an array of keys and should return a promise that resolves to an array of values (not an array of promises, but an array of values). With that knowledge, I've got the following DataLoader ...

const gameBySlugs = new DataLoader(Games.getBySlugs);
export function getBySlugs(slugs) {
  const query = `
    SELECT *
    FROM game
    WHERE slug = ANY($1)
  `;
  const params = [
    slugs
  ];
  return new Promise((resolve, reject) => {
    pg.connect(settings.SQL_HOST, (err, db, done) => {
      if (err) { return reject(err); }
      db.query(query, params, (err, result) => {
        done();
        if (err) {
          return reject(err);
        } else {
          let games = result.rows.map(r => Game(r));
          return resolve(games);
        }
      });
    });
  });
};

And here is the resolve function that uses this DataLoader ...

let game = () => ({
  type: GameType,
  args: {
    slug: {
      type: GraphQLString
    }
  },
  resolve: (source, {slug}, context, {rootValue}) => {
    const {gameBySlugs} = rootValue.loaders;
    return gameBySlugs.load(slug);
  }
});

Using this though, I'm seeing that query executed more than once even when given the same single key (slug).

Am I doing something wrong? Should the batch loading function return a promise that resolves to an array of promises, instead?

is loadAll operation outside of dataloader?

First, thanks for the great lib!

I'm learning how to use dataloader in Relay and I got it working with loading the singular object by id. Now I'm wondering if it's possible to use dataloader to do something like lessonLoader.loadAll() in my root "viewer" query? Not sure if it's a really stupid question and completely outside of this lib or should I do a 2-phase loading i.e. first query all ids from lessons table and then use loadMany afterwards?

Loading by an alternative key

I imagine this is a well-understood problem in GraphQL-land: when you can query a user object by userid and vanity (both of which are fields of a User object), you still want just one user cache. Do you think it's worth complicating the DataLoader for this use case or just caching the vanity => userid mapping separately?

Cache expiration

Hi, is it possible to set the interval after which data shoukd be purged from the cache?
Thanks

Handling fetch all

For the case of loading all rows from a table, you don't really care what the ID is, but you may want to cache all responses with their IDs for future requests. This would also be applicable when you have top-level queries

It might make sense for there to be a way to manually cache results.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.