When checking the model of this big file form the smt-lib, QF_BV/2019-Mann/ridecore-qf

Memory leaks during Model Checking? about dolmen HOT 14 CLOSED

gbury commented on May 14, 2024

Memory leaks during Model Checking?

from dolmen.

Comments (14)

Gbury commented on May 14, 2024

That's interesting ! I'll look at it as soon as I can.

from dolmen.

Gbury commented on May 14, 2024

Ah... after an absurdly large amount of time used to fetch the absurdly large problem file, I think I see the problem.

Currently, dolmen does the following for model checking:

parse the input problem file (the .smt2 one)
type the input problem statements
store the input problem definitions in a list
store the input asserts in a list
once reaching the check-sat statement, parse and type the model definitions (from the .rsmt2 file)
evaluate and record the input problem definition from the list saved previously
finally, evaluate each assert that was stored in the list earlier

I think that the large memory consumption is because of the list of stored asserts.
The reason for this order of computation (and thus the need to store a list of asserts) is that: 1) definitions from the input problem file can contain references to values that will be defined by the model (see the discussion in #98 that talks about that), 2) when seeing the first assert in the input problem file, do not have any guarantee that there won't be new definitions after that (e.g. define-fun; assert; declare-fun; define-fun; assert), and thus we need to wait for the check-sat statement before trying to parse and type the model (because typing the model requires that we parsed the type declarations from the input problem).

My current solution to solve that would be an option to type the entire model when we reach the first assertion, which would then allow to treat each assert right after typing it (and thus skip the need to store them in a list). However, that requires the input problem to have all its declarations and definitions before the first assert (which I think should be the case for most problems).

from dolmen.

bobot commented on May 14, 2024

I'm quite surprised that just keeping the assertions takes so much space, but indeed it is the case. It is worrying that the typed AST takes so much space.

from dolmen.

Gbury commented on May 14, 2024

Sorry for the delay. I'll try and fix that asap.

About the memory consumption, I'd expect at least a 10x (maybe closer to 20x ?) factor in size between text representation and the typed dolmen AST. For instance, consider the following term: (+ a 1). It takes 7 bytes to represent as text, so a bit less than 1 memory word on a 64-bit machine. Whereas for the typed representation, we need at the very least:

a node for the + application with pointers to arguments, so at least 1 (header) + 1 (pointer to the '+' symbol') + 2 (pointers to arguments) + 1 (pointer to the int type)
a node for the 'a' symbol, so at least 1 (header) + 1 (pointer to the 'a' symbol) + 1 (pointer to the int type)
a node for the integer 1, so at least 1 (header) + at least 2 (to represent the fact that this node is the interger 1)
some more space used to actually store the strings for a and +, which each use 2 words of memory.

That sums up to at least a dozen memory words, and I've not counted some of the fields of the typed representation (e.g. memoized hash and the custom tag map, which each adds 2 memory words for each term). So I'm not surprised that if we need to store in memory the whole AST, then we need quite a lot of RAM for big problems.

Actually, textual representation is insanely compact, and AST takes quite a lot of space because: in OCaml the base unit for representation is memory words rather than bytes, and we need to store metadata as well as pointers (whereas text just concatenates everything, so we need to pay for the Tree structure).

from dolmen.

Gbury commented on May 14, 2024

To be clear, i have some plans and ideas to reduce the memory used to store the ASTs used by Dolmen, but I don't think we can ever realistically go lower than a ~10x factor between text size and AST size, mainly because we have to pay for the metadata associated to typed terms, as well as the convenience of having a tree instead of a linear term: the ability to directly go the the n-th argument of a function application requires some space to store pointers (and/or offsets, but that has the same result).

from dolmen.

c-cube commented on May 14, 2024

A small nitpick, but if you store offsets in a int32 bigarray, you use half the space you'd use with pointers :-).

from dolmen.

Gbury commented on May 14, 2024

A small nitpick, but if you store offsets in a int32 bigarray, you use half the space you'd use with pointers :-).

Sure, but then we'd need to re-implement a GC on this bigarray (to correctly handle cases where we can treat each assert one at a time and then forget it before treating the next one, which works quite well currently), ^^

from dolmen.

c-cube commented on May 14, 2024

For sure, it's difficult. I think hashconsing with a weak table is the reasonable compromise. Compared to the text, it's possible that not all terms are in memory at once, and sharing is super important. How many distinct terms does a problem contain?

from dolmen.

Gbury commented on May 14, 2024

I'll try and experiment with hashconsing when I have the time, but I'm not really convinced there are that many duplicated terms.

from dolmen.

c-cube commented on May 14, 2024

We're talking about assertions, right? I think they often have a ton of duplicated terms, especially if you rely on hashconsing to expand let :).

from dolmen.

Gbury commented on May 14, 2024

Dolmen doesn't expand let-bindings, ^^

from dolmen.

c-cube commented on May 14, 2024

My point is that maybe, for checking models, it should.

from dolmen.

Gbury commented on May 14, 2024

Hm.. if the size of the env during typechecking becomes a problem, maybe we'll need to consider that, but I don't think that's a problem for now.

from dolmen.

Gbury commented on May 14, 2024

@bobot I have a fix for the memory consumption on your example. I'll explain a bit more in a forthcoming PR, but basically, with the fix, dolmen manages to check the model, all while staying at ~500M of RAM used. Unfortunately, I didn't have the time to include that fix as part of the 0.8.1 release, but it will be part of the next release (i.e. 0.9).

By the way, the model file you provide actually has the model duplicated, i.e. the whole model definition is actually present twice in the file, I don't know if it's intended or not.

from dolmen.

Memory leaks during Model Checking? about dolmen HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent