Giter Club home page Giter Club logo

Comments (14)

Gbury avatar Gbury commented on May 14, 2024

That's interesting ! I'll look at it as soon as I can.

from dolmen.

Gbury avatar Gbury commented on May 14, 2024

Ah... after an absurdly large amount of time used to fetch the absurdly large problem file, I think I see the problem.

Currently, dolmen does the following for model checking:

  • parse the input problem file (the .smt2 one)
  • type the input problem statements
  • store the input problem definitions in a list
  • store the input asserts in a list
  • once reaching the check-sat statement, parse and type the model definitions (from the .rsmt2 file)
  • evaluate and record the input problem definition from the list saved previously
  • finally, evaluate each assert that was stored in the list earlier

I think that the large memory consumption is because of the list of stored asserts.
The reason for this order of computation (and thus the need to store a list of asserts) is that: 1) definitions from the input problem file can contain references to values that will be defined by the model (see the discussion in #98 that talks about that), 2) when seeing the first assert in the input problem file, do not have any guarantee that there won't be new definitions after that (e.g. define-fun; assert; declare-fun; define-fun; assert), and thus we need to wait for the check-sat statement before trying to parse and type the model (because typing the model requires that we parsed the type declarations from the input problem).

My current solution to solve that would be an option to type the entire model when we reach the first assertion, which would then allow to treat each assert right after typing it (and thus skip the need to store them in a list). However, that requires the input problem to have all its declarations and definitions before the first assert (which I think should be the case for most problems).

from dolmen.

bobot avatar bobot commented on May 14, 2024

I'm quite surprised that just keeping the assertions takes so much space, but indeed it is the case. It is worrying that the typed AST takes so much space.

from dolmen.

Gbury avatar Gbury commented on May 14, 2024

Sorry for the delay. I'll try and fix that asap.

About the memory consumption, I'd expect at least a 10x (maybe closer to 20x ?) factor in size between text representation and the typed dolmen AST. For instance, consider the following term: (+ a 1). It takes 7 bytes to represent as text, so a bit less than 1 memory word on a 64-bit machine. Whereas for the typed representation, we need at the very least:

  • a node for the + application with pointers to arguments, so at least 1 (header) + 1 (pointer to the '+' symbol') + 2 (pointers to arguments) + 1 (pointer to the int type)
  • a node for the 'a' symbol, so at least 1 (header) + 1 (pointer to the 'a' symbol) + 1 (pointer to the int type)
  • a node for the integer 1, so at least 1 (header) + at least 2 (to represent the fact that this node is the interger 1)
  • some more space used to actually store the strings for a and +, which each use 2 words of memory.

That sums up to at least a dozen memory words, and I've not counted some of the fields of the typed representation (e.g. memoized hash and the custom tag map, which each adds 2 memory words for each term). So I'm not surprised that if we need to store in memory the whole AST, then we need quite a lot of RAM for big problems.

Actually, textual representation is insanely compact, and AST takes quite a lot of space because: in OCaml the base unit for representation is memory words rather than bytes, and we need to store metadata as well as pointers (whereas text just concatenates everything, so we need to pay for the Tree structure).

from dolmen.

Gbury avatar Gbury commented on May 14, 2024

To be clear, i have some plans and ideas to reduce the memory used to store the ASTs used by Dolmen, but I don't think we can ever realistically go lower than a ~10x factor between text size and AST size, mainly because we have to pay for the metadata associated to typed terms, as well as the convenience of having a tree instead of a linear term: the ability to directly go the the n-th argument of a function application requires some space to store pointers (and/or offsets, but that has the same result).

from dolmen.

c-cube avatar c-cube commented on May 14, 2024

from dolmen.

Gbury avatar Gbury commented on May 14, 2024

A small nitpick, but if you store offsets in a int32 bigarray, you use half the space you'd use with pointers :-).

Sure, but then we'd need to re-implement a GC on this bigarray (to correctly handle cases where we can treat each assert one at a time and then forget it before treating the next one, which works quite well currently), ^^

from dolmen.

c-cube avatar c-cube commented on May 14, 2024

For sure, it's difficult. I think hashconsing with a weak table is the reasonable compromise. Compared to the text, it's possible that not all terms are in memory at once, and sharing is super important. How many distinct terms does a problem contain?

from dolmen.

Gbury avatar Gbury commented on May 14, 2024

I'll try and experiment with hashconsing when I have the time, but I'm not really convinced there are that many duplicated terms.

from dolmen.

c-cube avatar c-cube commented on May 14, 2024

We're talking about assertions, right? I think they often have a ton of duplicated terms, especially if you rely on hashconsing to expand let :).

from dolmen.

Gbury avatar Gbury commented on May 14, 2024

Dolmen doesn't expand let-bindings, ^^

from dolmen.

c-cube avatar c-cube commented on May 14, 2024

My point is that maybe, for checking models, it should.

from dolmen.

Gbury avatar Gbury commented on May 14, 2024

Hm.. if the size of the env during typechecking becomes a problem, maybe we'll need to consider that, but I don't think that's a problem for now.

from dolmen.

Gbury avatar Gbury commented on May 14, 2024

@bobot I have a fix for the memory consumption on your example. I'll explain a bit more in a forthcoming PR, but basically, with the fix, dolmen manages to check the model, all while staying at ~500M of RAM used. Unfortunately, I didn't have the time to include that fix as part of the 0.8.1 release, but it will be part of the next release (i.e. 0.9).

By the way, the model file you provide actually has the model duplicated, i.e. the whole model definition is actually present twice in the file, I don't know if it's intended or not.

from dolmen.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.