Giter Club home page Giter Club logo

Comments (4)

kofrasa avatar kofrasa commented on July 18, 2024

In view of #59 and #60 I think this no longer applies. $lookup does not perform a deep clone on the inputs.

Please let me know if I have missed something. If not feel free to close the issue.

Thanks

from mingo.

Redsandro avatar Redsandro commented on July 18, 2024

I'm not sure. First, lets establish we use the same semantics. 'Inputs' is ambiguous when having two collections.

Let's assume the collection from the (previous step in the) pipeline is called the pipeline collection and the collection from the lookup is called the lookup collection.

I'm assuming (correct me if I'm wrong) that now:

  • pipeline collection is shallow-cloned.
  • lookup collection is referenced.

I would like to see the option to modify rather than shallow-clone the pipeline collection.

(Or if that is already the case, I think you should shallow-clone by default, not for me, but in order to "guarantee that the underlying collection is not changed".)

With such option, pipelines on big collections can be programmatically optimized further.

from mingo.

kofrasa avatar kofrasa commented on July 18, 2024

Shallow-cloning the pipeline-collection is the default here.

Surely, there is a performance hit, but I would argue this addresses 99% of use-cases. The cloning is just a new object populated with the fields and values of the old object. The lookup-collection should be referenced (forgot to remove the clone. must fix)

On thread for #59, I mentioned that modifying the pipeline-collection in place would break other operators such as $out. The result there won't make sense anymore unless it detects that we are not cloning and so must deep clone the current result into a new array.

I think the cost of the edge cases introduced and non-resilience to future operators (i.e. each operator must be aware of that option, including custom operators) does not make it worth the benefit.

If your collection is big enough that this is an issue, then mingo may not be the right tool.

Thoughts?

from mingo.

Redsandro avatar Redsandro commented on July 18, 2024

Fair enough. I think you are correct. 👍

With the side-note that you probably mean 99% of use-cases within the current audience, being small and medium datasets. Because high performance in-memory big datastores like redis are hugely popular.

For high performance collection querying in node.js without separate servers (redis) there is LokiJS. However this is nowhere near as convenient as full mongo pipeline syntax support, and it attaches data to the original collection.


mingo has some sought-after functionality that LokiJS lacks, and vice-versa. If at any point you're interested in making mingo a candidate for audiences that work with heavy loads, it would be interesting to revisit "edge case" optimizations, as they quickly add up. Imagine sending 1,000,000 documents with 100 root properties each through the pipeline. Memory usage would multiply the collection size with the number of $lookups for the duration of the function call, as the garbage collector would only (eventually) remove pre-clone collections from memory when they are moved out of scope.

I can easily work around the memory problem with the subdocuments/references described elsewhere, but in the end you want to be able to query mingo the same as mongodb without having to differ the data layout between the two.

As for your comment:

I mentioned that modifying the pipeline-collection in place would break other operators such as $out.

I'd argue that's why it would need to be an option, only for people who know what they are doing.

mingo.setup({
    lookupClone: false // defaults to true
});

and/or for single shots:

    $lookup: {
        from: 'collection1',
        localField: 'id',
        foreignField: 'id',
        as: 'lookup1',
        _noClone: true
    }

from mingo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.