Deion For one to many relationship, after migrating to data

Bulk Fetch : Low performance SQL are generated for Objects that are having lists of other objects about datanucleus-rdbms HOT 8 CLOSED

datanucleus commented on July 19, 2024

Bulk Fetch : Low performance SQL are generated for Objects that are having lists of other objects

from datanucleus-rdbms.

Comments (8)

andyjefferson commented on July 19, 2024

You mean BULK FETCH, that was introduced in DN 4.x, generates SQL to attempt to load objects now, whereas in earlier versions it simply ignored the users fetch plan and you had lazy loading always. If you think the SQL could/should be improved then you can kindly go to the code and contribute improvements. Here is the starting point
https://github.com/datanucleus/datanucleus-rdbms/blob/master/src/main/java/org/datanucleus/store/rdbms/query/BulkFetchExistsHelper.java

from datanucleus-rdbms.

andreiIfrim commented on July 19, 2024

Thank you for your fast response Andy. I will have a look on how this can be done better!

from datanucleus-rdbms.

andyjefferson commented on July 19, 2024

No comment on how this can be improved, nor a test that demonstrates the problem, so closing. Feel free to comment here and it could be reopened, or raise a new issue with some specific improvement and pull request that provides it.

from datanucleus-rdbms.

jonathanvx commented on July 19, 2024

The way to fix this is choosing JOIN over EXISTS (SELECT ..FROM).
As in, use BulkFetchJoinHandler.java instead of BulkFetchExistsHandler.java

from datanucleus-rdbms.

andyjefferson commented on July 19, 2024

Actually not totally correct; the OP is talking about loading objects into memory, NOT the actual SQL invoked. No demonstration of this was presented.

A comparison of different methods for bulk-fetching is available from the authors of EclipseLink, https://java-persistence-performance.blogspot.co.uk/2010/08/batch-fetching-optimizing-object-graph.html, and while JOIN actually comes out best with a simple query, it comes out poorest when the query it is utilised with becomes complex.

Clearly if someone is particularly interested in this feature for their projects they can contribute time to provide the JOIN bulk-fetch implementation, as well as investigate what this issue is actually about.

from datanucleus-rdbms.

jonathanvx commented on July 19, 2024

Unfortunately, I am a MySQL performance engineer and not a developer. Therefore, I am only able to analyse the SQL thats hitting the database and make recommendations on how to fix them. I can tell you that the vast majority of cases would benefit (sometimes greatly) from JOIN over EXISTS. In the event where you have many JOINs, this would be a more complicated issue that may be solved with better indexes, index hints... but it is more of a specialist issue at that point.
Your decision to use EXISTS means that you usually do a full table scan on the main table while using a subquery where the data in that subquery is dependant on data in the main table. This is really the worst sort of performance (at least for MySQL).
Bringing a lot of blocks into memory and then filtering them - instead of using indexes - leads to high IO and high CPU which is what the OP was complaining about. Either way, it is well worth looking at it.

from datanucleus-rdbms.

jonathanvx commented on July 19, 2024

What would be great to know if there is a queryHint or some annotation where we can help datanucleus decide whether to use a JOIN or an EXISTS and then we can see for ourselves if there is a performance improvement.

from datanucleus-rdbms.

andyjefferson commented on July 19, 2024

As my comment above said, JOIN is not yet implemented; needs resource to do it and not a priority for me. So if your project/company needs it then it is there to do, and any comments relating to it should be put against the correct issue ... #171 and the query extension is documented at http://www.datanucleus.org/products/accessplatform_5_0/jdo/query.html#FetchPlan as datanucleus.rdbms.query.multivaluedFetch with values of "exists" and "none" currently.

from datanucleus-rdbms.

Bulk Fetch : Low performance SQL are generated for Objects that are having lists of other objects about datanucleus-rdbms HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent