As per the mailing list at: https://groups.google.com/forum/#!topic/elm-discuss/JJaWcxKy6L4
This might be a bug in core or even the compiler who knows, but here it happened.
I am still trying to reduce it, but so far in my giant application if I use animation-frame then it randomly dies in Chrome.
Do note, I subscribe and unsubscribe fairly rapidely to AnimationFrame via code kind of like:
subscriptions model =
Sub.batch
[ if List.isEmpty model.onNextFrame then
Sub.none
else
AnimationFrame.times HelpersMsg_Frame
]
So I am guessing that something is not being cleaned up or so in the right order maybe? I really really do not want to keep it always subscribed when there are no updates that need to be done...
I tested it by removing the conditional subscription and instead always subscribing (ohcrapmylog) and I've not hit the issue yet. I need to switch it back though as this constant polling is hurting the site performance...
Here is the overall message, lot of hairy Elm internals, do not know enough about it to simplify yet:
The message that contains the one that gets wiped is id: 38
as stated before, it is a "_Task_andThen" with a task
of what will become id: 39
.
When id: 38
gets processed it enqueues id: 39
, at this time the cancel
key on the task that becomes Process id: 39
is null, but walking back up the parent stack of id: 38
shows it calls the callback
on id: 4
while creating the cancel key in the step
function when if (ctor === '_Task_nativeBinding') {
.
Traced through the entire id: 38
path, it ends up creating id: 39
when it calls Native_AnimationFrame native callback on rAF
, which then ends up calling the callback given to requestAnimationFrame
, which then calls callback(_elm_lang$core$Native_Scheduler.succeed(timeNow));
, which then calls "_Task_succeed" to be called on id: 38
, which then calls the callback on its 'stack' key, which ends up calling sendToSelf
with the animation frame time value, and id: 38
gets called more times on the work loop because it ended up getting queued up a few times earlier, but these next ones do not do anything of importance since its stack is null (it early breaks). Process id: 4
appears to get fed the message to queue up the task for the animationframe callback.
When id: 39
does work and there is no exception in a run it is filled with the animation frame callback succeed task and then message.
When id: 39
does not work, I.E. root is null during the 'work' call, root is not null and is assigned an object when id: 39
is initially created, and that object is a "_Task_andThen" that will do a "_Task_nativeBinding" to the animation frame "callback' function.
It gets cleared during the step
of id: 4
, which involves a "_Task_succeed", that then calls the callback on the stack, which is a "_Task_andThen", which then immediately gets called in the step loop again (did not exit as it was via the internal loop), which then process as a "_Task_andThen" this time, which then bumps on a "_Task_succeed", which calls the list 'Cons' operator (empty list and a tuple0) and gives the result back as a "_Task_succeed", which then gets called via the id: 4
loop to get wrapped up in a 'Just', sent to the animation frame native code that then packages up the time information into another "_Task_succeed", which then loops again and passes that value into the main application spawnLoop loop
function and stuffs things into a ready to handle onMessage, then id: 4
loops again to handle the "_Task_andThen" that the loop
put into it just now, which then stuffs the loop callback back onto itself and loops again to handle the "_Task_receive" for the onMessage callbacks (the id: 4
process at this point has 3 things in its mailbox), the first message is the animationframe callback, which ends up calling down to _elm_lang$animation_frame$AnimationFrame$onEffects
of which is passed in is a subscription object that holds id: 39
(whoo found it! ow...), which stuffs our id: 39
object into a return object on the request._0
key, which all gets bundled into a "_Task_succeed", and processed via the loop for id: 4
and passing that to the main loop
callback again that stuffs a "_Task_receive" onto id: 4
(the elm compiler could really optimize a LOT of this, maybe translate elm to llvm, optimize it, then translate to javascript as a start, like holy heck..., hmm webassembly is not a bad idea for elm at all...), so then id: 4
handles the "_Task_andThen" which then puts on then handles the "_Task_receive", then it pops the second of the originally 3 messages on id: 4
and gives it to the callback that then calls onMessage with the animationframe time stuff again, which calls back into _elm_lang$animation_frame$AnimationFrame$onEffects
again and OhHeyLook id: 39
Again (>.>), which stuffs it into the request._0
key again on the return object so we loop around id: 4
a couple more times again while a "_Task_receive" shuffles to the top yet again (really, elm compiler, optimization, maybe llvm to output to both javascript and webassembly for options...), and we receive the last message on the mailbox which is all the same stuff through _elm_lang$animation_frame$AnimationFrame$onEffects
yet again except this last time instead of returning a "_Task_succeed" it instead goes down the other path where a kill
function is called's callback for a native binding within the process of id: 39
(Oh hey there it is again!), which is then pushed on to be called so yet again id: 4
gets looped around again to handle a "_Task_andThen" that then handles the prior "_Task_nativeBinding" that builds the cancel function on the root via the callback function on that same root (which is the prior kill
returned callback function that was on id: 39
), this function does:
function kill(process) {
return nativeBinding(function (callback) {
var root = process.root;
if (root.ctor === '_Task_nativeBinding' && root.cancel) {
root.cancel();
}
process.root = null;
callback(succeed(_elm_lang$core$Native_Utils.Tuple0));
});
}
Where 'process' is the process with id: 39
, so you see here the function (callback)
function that was passed into a nativeBinding function (that returned the nativeBinding task message that is 'now' being processed) is being called, so first it does var root = process.root
, so far so good, it then calls root.cancel
if root.ctor === '_Task_nativeBinding'
, which it is not (it is a "_Task_andThen"), so it skips that if
entirely and continues on down to process.root = null
, and BOOM there id: 39
was just corrupted, so when id: 39
is run through later then its root being null causes it to die when it is trying to figure out what to do.
So yes, this is a bug, probably in AnimationFrame (maybe in core, I don't know).
And this hurts... How do I work around this bug for the time being until it is fixed?
On Wednesday, August 3, 2016 at 12:28:47 PM UTC-6, OvermindDL1 wrote:
Sometime between when id: 39
and id: 40
is created during rawSpawn
the root
key in the object goes null. Still trying to find what code is wiping it...
On Wednesday, August 3, 2016 at 12:21:18 PM UTC-6, OvermindDL1 wrote:
The task object with id: 39
does not have a null root
at the time it is put into workQueue. At the time it is put into the workQueue it is: {callback : function(b), ctor: "_Task_andThen", task: {callback: function(callback), cancel: function(), ctor: "_Task_nativeBinding"}}
The root.task.callback function seems to have one interesting closure that has a key/value of navStart:1470247637381
, and there is another link in the cancel to the process with id: 38
so it appears to be a continuation of that one. I am not sure of the internal structure of Elm so I am not sure what Native Binding it is called, and I use no Native Bindings in my project (only what comes with Elm core libraries is what exists here at all).
A little further digging and I am seeing _elm_lang$animation_frame$Native_AnimationFrame
in the stack, further digging makes it seem (although I am unsure) that the root.task.callback
is the same function as defined at: https://github.com/elm-lang/animation-frame/blob/master/src/Native/AnimationFrame.js#L13
I am not yet seeing how the root
key on this "_Process" object is getting cleared yet before it has a chance to be processed, still tracing...
On Wednesday, August 3, 2016 at 11:59:23 AM UTC-6, OvermindDL1 wrote:
Debugged into it and caught the exception at the point to get the stack values:
numSteps = 403
process = Object {ctor: "_Process", id: 39, root: null, stack: null, mailbox: Array[0]}
So root is null, why would it be trying to access a null value without checking if null?
Any ideas how to work around this in this project so I can at least keep working in chrome?
On Wednesday, August 3, 2016 at 11:52:36 AM UTC-6, OvermindDL1 wrote:
I keep getting this exception thrown from inside Elm, so far only from Chrome Version 51.0.2704.103 m
elm.js:2417 Uncaught TypeError: Cannot read property 'ctor' of null
Where that line and the surrounding context is purely Elm generated code, and is:
// STEP PROCESSES // Line 2411
function step(numSteps, process)
{
while (numSteps < MAX_STEPS)
{
var ctor = process.root.ctor; // Line 2417 -- This is the error: Uncaught TypeError: Cannot read property 'ctor' of null
if (ctor === '_Task_succeed')
{
while (process.stack && process.stack.ctor === '_Task_onError')
{
process.stack = process.stack.rest;
}
if (process.stack === null)
The same javascript seems to run fine in firefox, IE, and edge, this only seems to happen in Chrome and only 'sometimes'. It seems to happen pretty quickly during loading and if it does not happen at the start then it does not seem to happen. I've not been able to find code to whittle down that lets my app still do anything while still causing this error.
The 'step' function is being called from the 'work' function of (and with context):
javascript
// WORK QUEUE
var working = false;
var workQueue = [];
function enqueue(process) {
workQueue.push(process);
if (!working) {
setTimeout(work, 0);
working = true;
}
}
function work() {
var numSteps = 0;
var process;
while (numSteps < MAX_STEPS && (process = workQueue.shift())) {
numSteps = step(numSteps, process); // This is the place in the callstack before step
}
if (!process) {
working = false;
return;
}
setTimeout(work, 0);
}
Chrome is not reporting anything in the stack below work
so this appears to be during the setTimeout callback set a few lines prior to work that calls work.
Any thoughts as to the cause?