There should be an option to modify the input objects and add a reference to the looku

In view of <a class="issue-link js-issue-link" data-error-text="Failed to load title"

Fair enough. I think you are correct. <g-emoji class="g-emoji" alias="+1" fallback-src

Option to modify the pipeline documents for $lookup about mingo HOT 4 CLOSED

kofrasa commented on July 18, 2024

Option to modify the pipeline documents for $lookup

from mingo.

Comments (4)

kofrasa commented on July 18, 2024

In view of #59 and #60 I think this no longer applies. $lookup does not perform a deep clone on the inputs.

Please let me know if I have missed something. If not feel free to close the issue.

Thanks

from mingo.

Redsandro commented on July 18, 2024

I'm not sure. First, lets establish we use the same semantics. 'Inputs' is ambiguous when having two collections.

Let's assume the collection from the (previous step in the) pipeline is called the pipeline collection and the collection from the lookup is called the lookup collection.

I'm assuming (correct me if I'm wrong) that now:

pipeline collection is shallow-cloned.
lookup collection is referenced.

I would like to see the option to modify rather than shallow-clone the pipeline collection.

(Or if that is already the case, I think you should shallow-clone by default, not for me, but in order to "guarantee that the underlying collection is not changed".)

With such option, pipelines on big collections can be programmatically optimized further.

from mingo.

kofrasa commented on July 18, 2024

Shallow-cloning the pipeline-collection is the default here.

Surely, there is a performance hit, but I would argue this addresses 99% of use-cases. The cloning is just a new object populated with the fields and values of the old object. The lookup-collection should be referenced (forgot to remove the clone. must fix)

On thread for #59, I mentioned that modifying the pipeline-collection in place would break other operators such as $out. The result there won't make sense anymore unless it detects that we are not cloning and so must deep clone the current result into a new array.

I think the cost of the edge cases introduced and non-resilience to future operators (i.e. each operator must be aware of that option, including custom operators) does not make it worth the benefit.

If your collection is big enough that this is an issue, then mingo may not be the right tool.

Thoughts?

from mingo.

Redsandro commented on July 18, 2024

Fair enough. I think you are correct. 👍

With the side-note that you probably mean 99% of use-cases within the current audience, being small and medium datasets. Because high performance in-memory big datastores like redis are hugely popular.

For high performance collection querying in node.js without separate servers (redis) there is LokiJS. However this is nowhere near as convenient as full mongo pipeline syntax support, and it attaches data to the original collection.

mingo has some sought-after functionality that LokiJS lacks, and vice-versa. If at any point you're interested in making mingo a candidate for audiences that work with heavy loads, it would be interesting to revisit "edge case" optimizations, as they quickly add up. Imagine sending 1,000,000 documents with 100 root properties each through the pipeline. Memory usage would multiply the collection size with the number of $lookups for the duration of the function call, as the garbage collector would only (eventually) remove pre-clone collections from memory when they are moved out of scope.

I can easily work around the memory problem with the subdocuments/references described elsewhere, but in the end you want to be able to query mingo the same as mongodb without having to differ the data layout between the two.

As for your comment:

I mentioned that modifying the pipeline-collection in place would break other operators such as $out.

I'd argue that's why it would need to be an option, only for people who know what they are doing.

mingo.setup({
    lookupClone: false // defaults to true
});

and/or for single shots:

    $lookup: {
        from: 'collection1',
        localField: 'id',
        foreignField: 'id',
        as: 'lookup1',
        _noClone: true
    }

from mingo.

Option to modify the pipeline documents for $lookup about mingo HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent