ssoelvsten / adiar Goto Github PK

View Code? Open in Web Editor NEW

20.0 5.0 13.0 12.38 MB

An I/O-efficient implementation of (Binary) Decision Diagrams

Home Page: https://ssoelvsten.github.io/adiar/

License: MIT License

Makefile 0.20% C++ 99.53% CMake 0.27%

binary-decision-diagrams bdd external-memory zero-suppressed-decision-diagrams zdd

adiar's Introduction

adiar's People

Contributors

Stargazers

Watchers

Forkers

jaschdoc tah125 logsem vichetking rit12345678 anbclausen halvko bartoszkedziorek rugveth1210 maximsmolskiy itsmeyo92 lvzhichao0 nhusung

adiar's Issues

Generalise API to use Label Predicates

Currently the Quantification and the ZDD subset functions take a label_file as an input. This works but it feels quite clunky. One can keep this interface, but provide more general and easy-to-use functions.

Provide a predicate, which is a function of the type label → bool. Common examples of such a predicate would be odd, even, eq(arg), and lt(arg) / gt(arg).
- bdd_exists
- bdd_forall
- zdd_project

To bucket or not to bucket

Currently we always use a single bucket for the levelized priority queue. Experimentally this does provide a performance increase of 25% or more for reasonably large instances. Yet, this idea should not work well in practice, if the levels are very narrow. In that case, it would work better with a normal priority queue.

In #98 , #168 , #165 we derive estimates on the maximal size of the priority queue. This is also an upper bound on the size of each sorter in the levelized priority queue. If this estimate falls below some specific ε value, then we should use a non-bucketed version instead.

Implement a non-bucketed levelized priority queue as a separate template class. This one should have a similar public interface to the bucketed version we currently provide.

This also needs to be thoroughly unit tested to have similar behaviour to the bucketed one.

The question is: what value of ε should be used?

Some experiments are needed to figure out what value of ε should be used, but I imagine between 10 and 100 is a good place to start. One may be able to look at the Queens and Knight's Tour examples in this repository, but I think the most usable results would be the Picotrav benchmark here; just change the branch Adiar in the external/adiar folder.
An even better value is probably by basing (also) on the number of elements that fits into a few cache-lines (64 bytes each). That is, whether on average, it is so small that everything fits into one or two cache lines.

Another question is: can we not do better than just reusing the above bounds? Can we weigh that value against one or more of the following?

Average level size
We can compute the average elements per level based on the given max_size. If we have more than one BDD, then we take the maximum number of levels (i.e. assuming the worst-case and average-case of as much overlap as possible)
Use max bucket size
We can have an upper bound on the number of elements of each level precomputed based on 1-level cuts. This is a better value to pick than the given max size that is based on 2-level cuts and the input size.

Move COOM_DEBUG precompiler flag usage inside the debug functions

Currently the code is littered with statements like this:

#if COOM_DEBUG
    print_node(node)
#endif

which quickly gets unwieldy. Maybe we should move the #if COOM_DEBUG inside the print functions and place them in a debug namespace (inside the coom namespace), such that the line would become

    debug::print_node(node)

Then the C++ compiler should be able to merely remove the unnecessary function call if it is empty.

Make label_t type only uint32_t to safe space

The type uint16_t only occupies 16 bits, whereas uint64_t occupies 64 bits. Since we only use 16 bits for the label anyway, we may as well reduce the size of label_t, which will impact the size of the meta file and the stack; the latter probably is not too important.

This changes to uint32_t that only occupies 32 bits as discussed in issue #40 / changed in pull request #107 . The nice thing about 16 bits was, that the type/compiler directly could help with overflows etc.

Move COOM_ASSERT Compiler flag inside assertion functions

Following up on #16 we could also clean up the code heavily by moving the COOM_ASSERT compiler flag behind a forceinline function. As it is inlined, we know that the compiler is able to remove all computations related to these assertions in the production version.

Remove _debug_ files and their output

We already provide a dot output that can be used before and after reduction, and most debugging anyways uses specifically added console logging that is not the standard. So, we might as well remove the complexity and clean up the code.

Explicitly handle unary Apply as an O(1) algorithm

Currently, the Apply algorithm treats being given the same underlying node_file twice the same way as being given two different files. Since the refactor we have begun returning the same file in multiple places, so we would gain from identifying this case and treat the cases.

One will notice, that the result is at this point independant of the given BDD, except for whether an odd number of the two have been negated.

If no or both inputs are negated, then the sinks will always be the same at the end of recursion.
Cases to check for (T,T) and (F,F)
If one of the inputs are negated, then the sinks will always be different.
Cases to check for (T,F) and (F,T)

All of these cases can only result in one of the following four cases

The true sink
The false sink
The node_file unnegated
The node_file negated

Checking all of this takes only O(1) work, and all of that would be hidden behind a cheap node_file pointer comparison guard.

DOT output has swapped styling as in litterature

As pointed out by @AnnaBlume99 in Logic in Computer Science the low edge is dashed while the high edge is solid. The DOT output has it the other (wrong) way around. Double checking with the papers of Tom van Dijk the same is the case.

Create an easy to use interface

Many pieces of software that use an OBDD make use of the CUDD Package. To better interface and ease transitioning to this library (for whatsoever reason anyone would like to do so), one might want to also provide the same interface for COOM. This is most likely best done with an adapter.

`bdd_pathcount`, `bdd_satcount`, and `zdd_size` can overflow in their output

Our current examples do not show this problem, but the bdd_pathcount and bdd_satcount functions can potentially output values much larger than 2⁶⁴-1. There are three possible solutions

Add the option to use double instead of uint64_t for a "non-exact" output.
Use a fixed-width unsigned integer that is std::trivially_copyable. This could be with a fixed width of 1024 bits. This effectively postpones the problem to 2¹⁰²⁴ number of paths and satisfiable solutions.
- Should the number of bits then be chosen at compile-time?
Use boost::cpp_int or boost::gmp_int. These support any arbitrary number of bits. This means, that unlike in (1) and (2) we do not just delay when the problem happens.

To this end, we need to create a custom serialize and deserialize function for these variable-length entities needed. TPIE already supports a lot of this out-of-the-box when it comes to tpie::file_stream and tpie::merge_sorter but not (yet) their tpie::priority queue. So, we would need to first add support for such non-trivial elements in a priority queue.
- Alternatively to extending TPIE, we can add a fully bucketed levelized priority queue, and so abuse the fact that tpie::merge_sorter already supports variable sized elements.

Optimise multi-label quantification on rvalues

The quantify(const bdd&, const label_file&) functions are good for when the argument is an lvalue and should be kept. But if the value is the result of some other computation (i.e. is an rvalue), then we can optimise the garbage collection a bit, by accumulating onto the input bdd rather than the new variable out.

Move memory management into one place: memory.h

Currently we have memory-based computations in three places:

adiar/adiar.h: Sets the maximum memory
adiar/internal/reduce.h: Computes the size of each sorter based on the given memory
adiar/internal/levelized_priority_queue.h: Computes the size of each bucket based on the given memory

The last two even contain some code duplication. So instead, I would like a adiar/memory.h file that has these functions:

void set_limit(size_t memory_limit_bytes) from adiar/adiar.h .
Functions for tpie::merge_sorter:
Alternatively, these can be made static functions on the wrapper classes described in #98 .
- merge_sorter_min():
  Returns a struct { size_t phase_1, size_t phase_2, size_t phase_3 } that provides the minimal amount of memory for each phase. The phase_1 and phase_3 values are to be copied from the m_single_block() function in adiar/levelized_priority_queue.h and the sorter_phase_1 variable in adiar/reduce.cpp. The phase_2 value should be twice the phase_ value.
- merge_sorter_parallel(size_t memory_bytes, size_t sorters = 1, size_t max_elem = UINT_MAX):
  Derives the best values to be used for sorters number of tpie::merge_sorters that should exist in parallel and share some memory (e.g. the levelized priority queue and the Reduce algorithm).
  
  The max_elem argument may be skipped until we look at #152 .
Functions for tpie::file_stream:
- file_stream_memory_usage():
  Computes the amount of memory used by a tpie::file_stream with T typed elements. This is necessary to implement #98 .

This file should probably only be used internally but then be called from adiar/adiar.h. Maybe a new nested namespace adiar::memory is needed to resolve name conflicts. That is, if we want to provide an adiar::set_limit function in adiar/adiar.h that calls something in adiar/memory.cpp.

Add level size to meta file

Adding the level size in the meta file provides multiple benefits

The equality checking can catch more cases on "obviously not equal instances". The expensive homomorphism check should only be done, if there are the exact same number of levels of exactly the same size.
It brings us closer to resolving #98

Also, should the meta file maybe be renamed to level file?

Extend usage of GitHub actions

The current set of GitHub actions try to help ensure the correctness of the BDD library. To this end, we may also want to look into one or more of the following.

Code quality and reliability

Unit test coverage: CodeCov is free to use for open source projects and works easily with the lcov reports one can generate.

Added with pull request #104 and #105
Linters: Examples could be one of these actions: 1, 2.
Code checkers: Examples could be SonarQube, but it requires a server of our own to run the software.
- cppcheck included in #121
- include guards included in #121
- CodeImprover Duplication . This has since been removed again, since it was complaining too much about duplication since we have so many quite similar, but not entirely the same, algorithms.
- PREfix: Finds bugs with symbolic execution.
Other things I've looked at already.
- Codacy looks to be easy to get started with and has a lot of things it complains about. Quite a few of them I recognise from Cppcheck. One should double check what options are available first.
- Qodana looks promising, but does (yet) not support C++.
- SoftaCheck does not work well, since it does not think to include submodules. I got 247 errors, most of which were due to the Bandit framework not being in scope.

Compiler and OS

We should have a simple action for each OS, such that we can check whether it truly compiles on most common configurations. One can copy over most of it from the Actions on my programming challenge repository.

Ubuntu: GCC and Clang
MacOS: GCC and Clang
Windows: see thomasmoelhave/tpie#257

Performance

Benchmark change in performance: Whenever a pull request changes a prior algorithm, then I have to manually run a benchmark to gauge whether there was any dangerous slowdown in performance. The only preexisting Action I have found is Benchmark Action.

But, it is not sufficient for my cases. To minimise the possible error in the results I have to run the same benchmark on the main and then the feature branch interchangeably. We are talking about so precise measurements, that we cannot use numbers that are on a different machine several months ago. Furthermore, the disk/CPU of the computer may slow down unexpectedly while it is running.

The relative performance difference only makes sense for a computer in the same state. So, I run my experiments as follows:
1. Keep running tests switching between both branches, until you get three or more values in the same state.
2. Compute the minimum of each branch and finally the relative difference.
Added with #176

Other quality of life actions

Skip jobs for unrelated changes.

Added with #117
Cancel workflow actions: This is useful when force-pushing to fix some minor things on a pull request. For example, look into using this GitHub Action.

Added in #128
Use of the Release Drafter action to automate/keep track of the changes since last release.

Shortcut merge-conditional at compile-time when sink-arcs are known to be in-order

Context

With #128 the sink arcs in an arc_file was split in two: the ones given in-order and the ones given out-of-order. This reduces the number of elements involved in the sorting of sink arcs before calling adiar::reduce. Yet, this also introduces an overhead when writing and reading from these files, since the two sink arc files need to be merged.

This seems to result in a 1% or 2% relative slowdown in performance. When we are have implemented many of the internal memory milestone issues, then these "few" percent performance decrease can be quite noticeable.

Tasks

We can reuse the idea from the rest of the internal memory milestone and change the type of the arc_stream and arc_writer at the start of the each algorithm.

Template the arc_stream and arc_writer classes with a boolean sort_sinks. Based on this boolean, they run the current merging/splitting logic or (at compile time) go-to one of the branches.
Use policies to derive the sort_sinks parameter in a top-down sweep.
At the start of reduce, check whether there are any out-of-order sinks that needs to be merged in. Call __reduce with the boolean moved into a template parameter.

Alternatively, we may also consider whether we can redesign the if-statement to favour the no-sorting case.

Additional Context

The following is a list whether an operation outputs edges in order ( 🟢 ) or not ( 🔴 )

BDD
- 🟢 bdd_apply
- 🟢 bdd_ite
- 🔴 bdd_restrict
- 🔴 bdd_exists / bdd_forall
ZDD
- 🔴 zdd_binop
- 🔴 zdd_change
- 🟢 zdd_complement
- 🟢 zdd_expand
- 🔴 zdd_offset
- 🔴 zdd_onset
- 🔴 zdd_project

Linker is unable to handle use of header files.

If one tries to compile the test_coom_data.cpp test cases with #include <coom/data.h> rather than #include <coom/data.cpp>, then compilation fails with the following errors:

test.cpp:(.text+0x648): undefined reference to `coom::value_of(unsigned long)'
test.cpp:(.text+0x659): undefined reference to `coom::value_of(unsigned long)'

After messing around a lot it is very much evident, that the coom/data.h path is correctly placed and the compiler happily does all its work. The linker on the other hand is unable to find the implementation of coom::value_of(unsigned long) in the object file of data.cpp.

This also seems to be due to the fact, that I tried to have the header file include the source file. Apparently one should do it the other way around (as we currently already almost do). So, the linker may work, if we actually do as expected.

This would also heavily improve the compilation time...

Make version number obtainable within C++

The BuDDy package has functions within C++ to obtain the library version.

One can place CMake variables inside of the source code such that we can merely place define and bump the project version inside of CMake, and then C++ follows along nicely. Here is another source of how to do so. Most reasonable place to put it would be adiar/adiar.h.in or in a new file adiar/version.h.in.

Describe edge-based algorithms

In [Arge96] he describes the possibility to use an edge-based representation (source, data, target) which immediately makes the output of Apply in reverse sorted order for the next Reduce. This also happens when one has a node-based representation with the outgoing edges containing all the information of the children.

We notice though, that there also on top of this observation of Arge is another important benefit to using an edge-based representation.

The node-based representation spends 3 · 64 bits on each node and has the following properties:
- Easy to traverse
- Can at the same time be used as the dependency graph in Reduce
The edge-based representation spends 4 · 64 bits on each node and has the following properties:
- Easy to traverse
- Can at the same time be used as the dependency graph in Reduce

The increase in size is completely covered by the need for the dependency graph.

Remove Terminate called without an active exception during error unit tests

The unit tests for temporary files has since 655aa7b been passing, though not for the GitHub Action, since it sees an exit code of 134 due to a "terminate called without an active exception"

Support storing and reading ASCII format for BDDs

File Format

Tom van Dijk has suggested the following format for a BDD:

Options are (optional) key-value pairs; each written on a single line as follows
.<option> <value>
Possible options are:

option name count order

type string uint "pre", "post"

This is then followed by a set of nodes, starting with
.nodes
Terminals are given with the t key and are of the form.
t "type" "value"
Nodes are of the form
b <label> '-'?<then> '-'?<else>
where then is also known as high and else is also known as low. The optional '-' character marks an edge to be complemented, which depends on the attributed edge feature. Every node is identified by referencing its linenumber or the name of another bdd in the same file.
Finally, the list is ended with
.end

option	name	count	order
*type*	string	uint	"pre", "post"

Parsing a file

This feature might be blocked until we support attributed edges.

The simplest way to implement this would be to do the following:

Read the entire bdd from a file and push it as edges { source_line, target_line } to a tpie::merge_sorter.
Run a bottom-up version of Reduce that merely puts it into the normal node based form.

By use of a b-tree we could also store the id of all prior nodes and then that way skip the merge sorter. Most likely this will not run as fast as using the sorter. Furthermore, it also takes up much more space

The "reference another bdd" also makes things much more complicated. At that point one probably has to first check whether some reference like this exists to then "inline" the other BDD inside of this one and then run the actual Reduce algorithm.

Comments

Notice, while this is quite close to #126 , it also is quite disjoint (both in terms of how to do these and the usecase they are solving. The purpose of this is to have a common file format that can be used between different BDD packages, while the other is supposed to be used for Adiar alone by using the least amount of space.

quantify_resolve_request is missing sink-sink base case

I wonder why it works anyways...

Remove DOT test file outputting

Currently, if no files are provided, then make dot will create an input for a filestream<node> and create an output of filestream<arc> for testing. This should be removed.

Resource memory limit exceeded on 15-Queens and less than 2GB of memory

When running the 15-Queens benchmark on the Grendel-S cluster with varying amount of memory we get a Resource memory limit exceeded. Specifically, we get one for 128, 256, 512, and 1024 MB of memory.

Presumably, this is because we somewhere are allocating more memory than we actually are given. Maybe due to some stream object being created after the priority queues.

Add gathering of (optional) statistics

The branch feature/statistics adds gathering and printing of statistics ( see also pull request #141 ). This has been requested by Tom van Dijk.

Specifically, this adds the two CMake options:

ADIAR_STATS: Gather the statistics that have a very low O(1) overhead in every function.
ADIAR_STATS_EXTRA: Gather much more verbose statistics for every recursion step.

Both add some overhead, but I hope that the hierarchical gathering of statistics allows you to choose a balance between the fidelity of your statistics and the impact on running time. One can argue whether the naming of these options could be better?

It then adds the two new functions for Adiar (names derived from BuDDy).

adiar::adiar_stats() obtains a struct with all the gathered raw data.
adiar::adiar_printstat(std::ostream &o) pretty prints the set of statistics to o (default std::cout).

This was done on the main branch before the ZDD refactor in #128 . The feature/statistics branch should be made anew on top of the ZDD branch while also adding the remaining set of interesting statistics.

create_node function with nodes as argument

We can abstract the create_node function to also automatically extract the pointer to its children.

Switch to internal memory levelized priority queue for top-down algorithms based on input size

WIth #169 and #168 we added support for the levelized_priority_queue switching to internal memory if its max_size fits into the given amount of main memory. We now similarly want to derive a max_size for the levelized priority queues in all other algorithms and then switch to an internal memory variant if possible.

We'll start with using the simple bounds below. We will later ( #165 ) use some (most often) much better bounds. So, it is a good idea to already now add an (inline) helper function to compute each max_size below.

Worst-case bounds for top-down Levelized Priority Queue (and the others)

The following are some simple O(1) derivable bounds on the levelized priority queues. Based on these, we can then be used to switch between the different versions.

Pretty print for DOT files

This way we can graphically draw the output

Create TPIE decorators

With #14 an O(N) time and O(N/B) I/O algorithm was created for the not operator. One will notice though, that it is merely a mapping operator on each element. That means one could merely map to the negated node inside the other algorithms at no extra cost to the number of I/O's and merely a very small O(N) amount of time spent.

The most elegant solution would be some kind of simplistic decorator pattern on top of the tpie::file_stream creating a coom::node_stream. With this we can then do the following

Negation becomes a mere flag that turns on the mapping on in every read(), read_back(), and peek().
- The not operator at that point becomes an O(1) operation
One can implement a peek_back(), which might be useful.
We have one single place to open and close a file_stream, meaning noone will have to remember whether it was compressed or not.
One can implement a seek_node(node_ptr) function to abstract away quite a lot. This will on the other hand also possibly result in a considerable slowdown if used incorrectly.
One can hide that the fact the file_stream is in reverse. Then one would want a seek_root() / seek_start() method together with a seek_end(), which internally do the exact opposite thing. Similarly, read() would internally call read_back().
- This would force it to be read only, but since they are supposed to be immutable that is fine.
One can implement an index (don't know why we'd need it though)

The only problem would be that one should vary not to parse the same internal tpie::file_stream to the same algorithm twice, since that would result in the file stream going out of sync with the algorithms. But that by itself already is an issue at the current moment.

Compress Apply layer-priority queue

Currently, there are places one can optimise the appD_data priority queue within Apply.

It uses 64 bits for the entire from_1 bit-flag. Yet, all flags are already occupied for is_high (on source) or reserved for complement edges (t1 and t2). The sink-flag on t1 and t2 are also in use, so that only leaves the sink flag on source as the only place to put it.

EDIT: The from_1 flag can be removed. See #101 .
It stores from_1, data_low, and data_high for all requests forwarded across the layer. Yet, two requests to the same t1 and t2 do not need to store this same data multiple times. We could split appD_data into two priority queues: One still to pick the next request from and another in sync with it with all other requests to the same (and without duplication of data).

Enforce TPIE does not claim more space than was given

While working on #71 , I introduced by accident some Resource Limit exceeded warnings, where it then claimed more memory than was intended. For competitive comparison with other libraries, it's fine to just be sure, that this case does not occur.

We can also enforce it to crash, if this happens with tpie::resource_manager::set_enforcement(ENFORCE_THROW). The question is, if that is really of interest?

bdd_sink, bdd_ithvar, ... are not canonical

All the functions in bdd/build.h are not canonical in the sense, that their id are not from MAX_LABEL and decreasing, and they are (maybe?) not sorted based on their children.

Add handler for running out of space on disk

TPIE handles possible errors by throwing exceptions, In the case of running out of space on the disk, it throws an we internally catch the tpie::out_of_space_exception .

In this case, the user may want to do one of the following things (specified in the exec_policy)

Rethrow it as an adiar::out_of_space_exception (which is just an alias for TPIE's exception. Otherwise, we'd introduce breaking changes).
Return a null_ptr Decision Diagram (similar to CUDD) to silently propagate the error.

bdd_ite does not set canonicity in zip inputs case

For the cases of adiar::bdd_ite where the inputs have disjunct levels they are all merely zipped together. Here the canonicity is not set correctly in the case that all three BDDs are canonical (which is the usual case).

Pop ingoing arcs before pushing new recursion requests

Some of the worst-case bounds in #98 have a +2 which could be removed by popping all ingoing arcs from the priority queue before pushing the new recursion requests for the children.

Yet, removing the +2 is not really worth making the algorithms more incomprehensible; they already are hard to get your head around as of now. So, if this makes the code much worse, then one may want to just ignore it.

Remove `setup` targets

Before publication of this repository, then the setup and setup-* targets should be removed from the makefile. Instead one should ensure that all these dependencies are mentioned in README.md

Incompatible types in ternaries: 'bdd' and '__bdd'

The bdd and __bdd classes work really well to completely hide the file management, and garbage collection from both the clients and the developers point of view. The copy-constructor of the bdd class does not kick in though in ternaries.

bdd a = bdd_ithvar(0);
bdd b = bdd_nithvar(1);
bdd c = a == b ? a : bdd_xor(a,b);

Will give an "operands to ?: have different types ‘adiar::bdd’ and ‘adiar::__bdd" error.

Move is-high bit flag into node_ptr

Currently the arc-based representation uses 6 bytes while we could sacrifice one more bit in the node_ptr representation to reduce down to only 4 bytes. This will result in only a 4/3 increase in size due to the change of representation.

No sorting is by is_high first but always second; either after source or target. So, if we place it on the lowest bit on source (or target), then we incorporate sorting secondly by is_high directly within the integer comparison.

Cleanup in examples

Currently, there is a lot of code duplication between all of the examples, such as the statistics update and timing functions. This only got worse as I added these examples to the benchmarking repository.

One thing we'd need to investigate is whether we can do the following:

Create a GitHub action that clones the benchmarking repository and replaces the Adiar external dependency with the pull request branch.
See: https://github.com/actions/checkout
See: https://stackoverflow.com/questions/58033366/how-to-get-current-branch-within-github-actions

If we can do that, then I'd propose that we

Do no code cleanup to remove code duplication in example, as it will be removed when moving to bdd-benchmark.
Move all examples to the benchmarking repository
Only provide the examples in the GitHub pages documentation.

bdd_varcount function

We have a bdd_nodecount to obtain the number of nodes in the BDD, but from the Sylvan Bdd class it also looks like we may want to provide a bdd_varcount for the number of variables (i.e. the size of the meta file)

Sort unit tests into groups

Currently, each file is a set of unit tests by itself. The test output is quite hard to look around in. Together with #69 we may also want to sort the unit tests into data structures, bdd algorithms, and other

Design specialized priority queue exploiting the structure of OBDDs

Since all algorithms are layer-by-layer, the graph is acyclic, and all nodes within a layer are independent, then one may have a priority queue aware of the current layer number i and have k buckets for the layers i+1, i+2, ..., i+k. Each bucket is then sorted, when one finally reaches layer j > i. If a request is made to a layer j > i + k, then it is placed within a spillover priority queue that one merges with the buckets.

Mathias Rav mentions, that when all layers are not very wide (L = Θ(N)), then the mere TPIE priority queue is the best bet. So, we might want to be able to differentiate between "good" input OBDDs where we have k = 0 and "bad" inputs, where k > 0.

Provided all algorithms output a secondary file_stream of layer-sizes, one can make the priority queue able to dynamically infer the number of buckets needed to optimise its behaviour by looking ahead on the size of the layers?
As long as a single layer can stay in memory, then nothing is specifically gained by use of tpie::merge_sorter, but when the layer has to go to disk (the total size of all buckets exceed M), then the tpie::merge_sorter already sorts the base cases, before they go out to the disk; this will safe N/B very heavy I/Os during the sorting. For k small enough and each sorter given M/k memory this will definitely have quite a positive impact on performance - for larger k maybe also.

As mentioned by Gerth Stølting Brodal, to further optimise it we may want to manually manage the blocks used for the k buckets, as long as k blocks may exist in memory at any given point in time.

Have one block in memory for the bucket for the current layer i in memory to pop from. When it is empty, get the next block. This will then only use O(N_i/B) where N_i is the size of the i'th bucket.
Have k blocks in memory for each bucket. When the block is full, then push it out of memory to the bucket. Similar to the merge_sorter, we may already have this sorted before pushing.

I'm not sure whether there is a major difference between the above or just using the tpie::merge_sorter, but it is worth thinking about and have written down.

coom_assert gets deactivated with NDEBUG

Currently we use coom_assert to primarily check user input. We should have it split coom_assert up for internal and external usage.

Apply return early on sink-only BDD that leaves the other unchanged

Currently we already use can_left_shortcut and can_right_shortcut to prune the generated temporary result. We also shortcut on the root, but we can also inverse shortcut the root. For example, if one is computing the AND of two bdd's and one of the two BDDs is a true sink, then the result is exactly the other one. Similarly, the OR has a false sink as being irrelevant, and XOR merely negates the other input.

Move BDD related files into src/coom/bdd

Right now all source files are in the src/coom/ folder. The evaluate, apply, restrict, and quantify algorithms and more all could be moved to a bdd/ subfolder. This both increases how legible the files are to find, but also ensures we can differentiate between files for _BDD_s and _ZDD_s, should that become a Bachelor's project.

Persistable Decision Diagrams

Despite the fact that Adiar works by storing everything on disk, we currently do not allow the user to truly save/load files
on disk.

Solution

Add move member function to file<elem_t> and levelized_file<elem_t> ( Done in #397 )
Move meta information of levelized_file<elem_t> into a __levelized_file_stats<elem_t> struct that levelized_file<elem_t> publicly inherits from.
Add make_persistent member function to file<elem_t> and levelized_file<elem_t> ( Done in #397 )
- Store the __levelized_file_stats<elem_t> in the file header of the levels file, as shown in the External memory queue example.
Store (and load) the non-shareable information of the BDD, e.g. the negate flag.
Since persisted files might be reused, the above values should instead be placed in a text file. If so, we should store the path to the persisted file.
- .version = <adiar::version()> ( see #140 )
  With the version number we can warn the user for the need to migrate older files. This will be useful if we ever implement complement edges.
- .diagram_type = <bdd / zdd>
  With this we know what type of decision diagram to return (or can verify it indeed contains the requested type).
- .negate = <negation flag>
  Should be used with the constructor of the decision_diagram
- elem_size = <sizeof(node)>
  Just to double-check it (probably) matches the node type of the decision diagram.
- nodes_path = <path/to/nodes>
- levels_path = <path/to/levels>
void bdd_save(const bdd &f, std::string file_path):
Move the underlying temporary file and mark it as a persistent file instead. This way we do not duplicate the entire (possibly several GiB sized) file.
1. Use (move(prefix_path)) to place the files at the requested place.
  - If it already was persistent then use copy(file) + move(prefix_path) . Alternatively, if we want to reuse persisted files, then we can abort early and just goto step 4.
2. Set it to be persistent (make_persistent())
3. Store the non-shareable information.
bdd bdd_load(std::string file_path):
1. Make a new node_file using the non-default path-specific constructor.
2. Using the above constructor, create a node_file and then use the current constructor for a bdd that takes both a node_file and a negate boolean and return that.

These functions can be defined in adiar/bdd/io.cpp and adiar/zdd/io.cpp

Human readable error messages for the core data types

There are functions within data.h and data.cpp that we'd like the end-user to use. When they do, we'd also like to run a few sanity checks on their input and provide error messages, if they use them incorrectly. But, we don't want them to be run when within any of the COOM algorithms.

The best solution at this point seems to be a template function with a bool check and a constexpr if. This will also make the functions inlinable again, which might be of benefit.

Generalise `bdd_counter` for other comparators

Also, we should find a better name than bdd_counter. Maybe bdd_threshold... or? Or, maybe instead of exposing the templated function, we should maybe just provide a set of functions; one for each operator.

Make bdd_sink and bdd_ithvar files cached

The bdd_sink and bdd_ithvar functions will be used often. We can safe space and improve computation on them by ensuring the reuse of said files.

bdd_sink: We can have a node_file initiated on the first call; in fact, that way we can make both the true and false sinks the very same file.

See #87 for an attempt to do so.
bdd_ithvar: We could have a hash-table of the prior created ithvars.

Add `bdd_compose`

The Composition of two BDDs f and g for some label i ∊ [n] is f _{i = g} is to be interpreted as f(x₁, ..., x_i-1, g(x₁, ..., x_n), x_i+1, ..., x_n), i.e. g dictates the value of the ith input variable. This function is used in some model checking algorithms.

Based on BuDDy, the declarations in adiar/bdd.h should be

  //////////////////////////////////////////////////////////////////////////////
  /// \brief Functional composition.
  ///
  /// \param f   Outer function
  ///
  /// \param g   Inner function
  ///
  /// \param var The variable that \c g replaces in the input of \c f
  /// 
  /// \returns \f$ f|_{x_{\mathit{var}} = g} \f$
  //////////////////////////////////////////////////////////////////////////////
  __bdd bdd_compose(const bdd &f, const bdd &g, label_t var);

  inline __bdd bdd_compose(const bdd &f, const bdd &g, const bdd &var)
  { bdd_compose(f, g, min_label(v)); }

Implementation

This can be implemented with a single sweep through f and g by reusing the ideas in the Quantification and the If-Then-Else algorithms. A priority queue contains requests on triples t1, t2, t3 where t2 and t3 are nodes from f and t1 is from g. Here, similar to If-Then-Else, t2 represents the true (high) branch on variable i and t3 the false branch (low).

For level i we deal with it somewhat similar to Quantify where a single node of f is turned into two. To this end, we need to introduce the same idea as we did for the product construction etc. with the std::variant<skipto, output> and the policy::may_skip optimisation, where we use skipto if g is not on level i.

The pruning of the If-Then-Else still apply. Ideas from Quantification also somewhat applies here.

If any branches of f become NIL (due to a sink in g), then we should not differentiate between (..., t, NIL) and (..., NIL, t). As a convention, just always have the NIL in the last of the two.
If both branches are for the same node in f then set both the g and the second f node to NIL.
Unlike in Quantification we cannot sort f's targets! This is due to us having to remember which one belongs to the true and which one to the false branch.

All of this should be possible to do with the policy pattern on the algorithm in the adiar/bdd/if_then_else.cpp.

Move the main logic in adiar/bdd/if_then_else.cpp to a templated function in adiar/internal/prod3.h and turn bdd_ite into a call to prod3. The following things should stay in bdd_ite
- Collapse to the Apply operation.
- Zipping of disjunct levels case.

When this is done, then slowly add enough hooks in the policy of prod3 to vary the behaviour between the two algorithms.

More options in adiar::adiar_init

Partially in the same spirit as #15 it would be of use to us to hide initialisation of TPIE in an init function for Adiar. This function can provide one or more of the following options:

Specify the memory allowed to be used.
Implemented in #64
Specify the folder to place the temporary files.
Implemented in #93
Specify the number of bits used for the label, i.e. the varcount.
Not implemented, but increased to 24 bits in #107 . See the comment further below.