Giter Club home page Giter Club logo

Comments (14)

eywalker avatar eywalker commented on June 26, 2024 1

This issue needs more thorough coverage in Wiki with graphics to explain - it will be to much to ask the users to read through this issue and to understand.

from datajoint-matlab.

peabody124 avatar peabody124 commented on June 26, 2024

I don't have a good answer for this but I do have a work around. First though be sure that you haven't been debugging recently because I've definitely found accidentally breaking out of a process and not canceling the transaction makes things much worse.

The workaround though is to change the granularity of the tasks you are computing. For example if it is a cell level analysis then make the parpopulate poprel run at the level of SpikeSets and then inside the populate function loop over the cells below that. It means that each cluster node will be more rows apart and decrease the frequency of the problem.

Can you try and get more information? E.g. can you query the innodb status and try and see which table is getting locked? It would be information to know if it's the jobs table (bug in parpopulate) or the transaction locks for the populate. If the later and you are not doing anything with subtables I still don't think the lock in that case is beneficial enough for the downsides.

from datajoint-matlab.

aecker avatar aecker commented on June 26, 2024

Ok, good to know that' I'm not the only one that has the problem.

It's not because of uncanceled transactions. It occurs even if I restart the database server and create the tables from scratch before doing anything. I'm pretty sure it's the row-level locks of InnoDB.

It's definitely not the jobs table that is being locked. I can't change the granularity easily since it's a population analysis and the computation is being done on the SpikeSet level.

I now have a hypothesis how I can get around it (below) but I would also like to understand what's causing the lock so I can prevent it in the future.

The situation is as follows: I'm fitting a model on the entire population using different values for the parameters and multiple cross-validation runs. So I have a ModelSet table as the parent, a Model table containing the individual fits, and a ModelUnits table to store which cells went into the model. I'm doing the inserts into each table as I go along and I guess for some reason those inserts are causing some kind of lock (no idea why). Since the fitting takes quite a while each ModelSet makeTuples runs for a few minutes, during which the ModelSet table seems to be locked and concurrent makeTuples get the timeouts.

I can see two fixes, both of which are not very satisfactory:

  1. Increase the lock wait timeout. Would cause less errors but still mean that the parallelization is pretty inefficient since it would spent a lot of time waiting for locks.
  2. Compute all the tuples and do the inserts at the end. Would make the code much less clean.

from datajoint-matlab.

dimitri-yatsenko avatar dimitri-yatsenko commented on June 26, 2024

Have you tried MySQL 5.6 yet? It is supposed to be much faster.

from datajoint-matlab.

aecker avatar aecker commented on June 26, 2024

No I haven't but I should probably give that a shot, too

It looks like after flattening the joins and moving all restrictions as far to the end out as possible I got rid of most of my problems.

from datajoint-matlab.

dimitri-yatsenko avatar dimitri-yatsenko commented on June 26, 2024

Since this is not a DataJoint-specific problem, I propose writing a wiki page on query optimization and closing this issue, unless you think that there is something specific to DataJoint that could be improved. Performance issues will always exist for complex queries.

from datajoint-matlab.

aecker avatar aecker commented on June 26, 2024

Haha, I thought this was the issue on performance problems when I answered to your comment... of course this has nothing to do with query optimization.

The solution was indeed to do all the computations first and then insert all tuples at the end. Makes the makeTuples a bit less clean but has worked every time so far.

You have a point that it's not really DJ related, so I'll put together a wiki page as soon as I get to it. Let's leave the issue open for the time being as a reminder.

from datajoint-matlab.

dimitri-yatsenko avatar dimitri-yatsenko commented on June 26, 2024

I still have never encountered this problem without an obvious explanation. I can reproduce this problem in several ways, which result from incorrect populate calls or when makeTuples tries to insert something that it shouldn't. This may happen when another process is computing the same thing. This should not happen when parpopulate is used.

Do you still see run into this problem? How often? Is it reproducible? Could I examine the schemas and classes where this problem occurs?

from datajoint-matlab.

dimitri-yatsenko avatar dimitri-yatsenko commented on June 26, 2024

okay, i have reproduced this consistently and I think understand when this happens. It only happens when multiple processes populate an overlapping range of primary key values. This usually happens when makeTuples fill out a primary key attribute that is not last in the the primary key. Transactions block continuous ranges of indices.

The solution is to design the schemas so that makeTuples only adds primary key attributes at the end of the primary key. The reason I have not seen this problem before is that I have typically designed tables this way anyway. You could probably reduce this problem by changing the order of the primary foreign keys or changing the granularity of your popRel.

The workaround above is good: delay the insertions until all the tuples are computed. This shortens the total time that the index range is locked by any one process.

I think we should modify populate to recognize timeouts and not log them as errors. Instead, it should try the same call again later.

from datajoint-matlab.

aecker avatar aecker commented on June 26, 2024

Can you post your specific example to reproduce the problem somewhere? I've always been pretty sure that it's related to locking of index ranges, but could never figure out why it would do it. What do you mean by changing the order of primary foreign keys?

from datajoint-matlab.

dimitri-yatsenko avatar dimitri-yatsenko commented on June 26, 2024

Ideally, popRel's primary key comprises the top attributes of the table. Then makeTuples fills out the remaining primary attributes. However, it is easy to create cases when popRel's primary key includes the first and third attributes of the primary key and makeTuples must fill the second. This will result in timeouts and that's the expected behavior. A timeout will occur anytime when process y attempts to insert a tuple between two tuples inserted by process x before x completes its transaction.

One such example is reso.PeriStimTrace.
Its primary key is defined as follows:

->reso.Sync
->reso.Trace
->psy.Trial

which produces the following primary key (animal_id,tp_session,scan_idx,slice_num,trace_id,psy_id,trial_idx).

Its popRel is reso.Sync*reso.Segment*psy.Session whose primary key is (animal_id,tp_session,scan_idx,slice_num,psy_id).

Then makeTuples must fill out trial_idx and trace_id. Because trace_id comes before psy_id, timeouts ensue.

We could attempt to solve the problem by changing the order of the primary foreign keys. For example,

reso.PeriStimTrace (computed) # calcium signals around the stimulus
->reso.Sync
->psy.Trial
->reso.Trace

Then the primary key will be (animal_id,tp_session,scan_idx,psy_id,trial_idx,slice_num,trace_id. The problem remains unsovled.

Instead I could keep the original order but change the popRel to include the trace: popRel = reso.Sync*reso.Trace*psy.Session. Then makeTuples only needs to fill out trial_idx, which is last in the primary key. I would then also need to change the makeTuples to process only one trace at a time.

from datajoint-matlab.

dimitri-yatsenko avatar dimitri-yatsenko commented on June 26, 2024

in light of the above, I am making the following two changes in populate in dj.version 2.6.5.

  1. If popRel's primary key attributes are not top in the primary key list, parpopulate will display a warning that timeouts may occur.
  2. If a timeout does occur, the offending key will be saved and attempted again after all other keys have been processed up to maxTimeouts=5 attempts. This will not always solve the problem, but will make tasks more likely to complete, though less efficiently.

from datajoint-matlab.

dimitri-yatsenko avatar dimitri-yatsenko commented on June 26, 2024

We need to remember to describe this more concisely on wiki.

from datajoint-matlab.

mjlm avatar mjlm commented on June 26, 2024

I agree with @eywalker ... I run into the timeouts a lot but I don't understand what the problem is and how to solve it.

from datajoint-matlab.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.