tudocomp / tudocomp Goto Github PK
View Code? Open in Web Editor NEWTU DOrtmund lossless COMPression framework
Home Page: http://tudocomp.org/
License: Apache License 2.0
TU DOrtmund lossless COMPression framework
Home Page: http://tudocomp.org/
License: Apache License 2.0
The unmaintained public
branch is currently the default branch.
However, this branch does not offer the 0-escaping needed for compressing binary files.
I therefore would suggest to mitigate everything to the master branch and keep the master branch as the current stable branch, and have a develop branch for more experimental stuff.
In LZ78Compressor.hpp
and LZWCompressor.hpp
, the root nodes defined via the API call add_rootnode
seem not necessary. Calling add_rootnode(c)
just adds the character c
into the LZ78 trie and gives it the LZ node ID c
. Both compressors query these added nodes with get_rootnode(c)
, but the call always returns c
. So there is no need to pollute the LZ78 tries with these 'root nodes' as the compressors can take care of these by themselves.
Some implementations of LZ78 tries could exploit the fact whether they build a LZ78 or a LZW trie.
However, this is not possible in the current API. Can we somehow propagate the type LZ78 / LZW to the LZ78 tries?
Is there a reason behind the ']]' in line 20, /src/tudocomp_driver/CMakeLists.txt ?
The closing brackets match the opening brackets in line 1, but these are commented out.
Interestingly, cmake only compiles the CMakeLists.txt if either both brackets are deleted, or both are present.
Is this a cmake bug?
Is it possible to simplify the "stats" array currently consisting of {key:"key, value:"value"} dictionaries as elements
into a simple dictionary in the form {"key": "value"}?
Is there then a limitation on what characters can be used for "key"?
Motivation: It would allow to easily extract measured stats with the command line tool jq
.
I would like to have the following selectable parameters for the precomputed data structures
for all lexicographic parses (including lexparse and plcpcomp):
The current naming situation in regard to c++ namespaces, header files and class names developed ad-hoc and is now highly confusing.
I would like to have a extra program that can check my input text for bad characters that let my selection of compression algorithms abort. For instance, some require that the text ends with the \0 byte. Others additionally require that no \0 byte appears in the text such that such occurrences have to be escaped.
Currently, tdc
just aborts with a segmentation fault
in the latter case.
The implemented LZ77 rules are actually rules for the ESA-comp compressor.
ESA-comp uses rules of the form (src, target, num), but the LZSS variant of LZ77 uses just (src, num).
The stoi
command for prefix lengths fails for 2 GiB and above. Can you use strtoul
or strtoull
depending on len_t
?
... and after a brief look at the test source, it appears that the assertion expects a wrong value and the actual value is actually correct. This needs investigation.
When using ./compare.py --suite ../etc/compare-suites/default.suite [filename]
, I get the following error message:
ERROR: Failed to load suite '../etc/compare-suites/default.suite'
__new__() got an unexpected keyword argument 'args'
This message does not help me in finding the error at all.
Similar to PLCP, we can store Φ succinctly, see https://dx.doi.org/http://dx.doi.org/10.17877/DE290R-20775
Since we already have rank/select data structures, reducing Φ to its succinct variant should be easy.
Change the doxygen doc generator such that the sdsl docs are no longer build in-tree, and rather link to the offical ones on the main page.
the current default implementation requestion the plcp array
uses code of include/tudocomp/ds/providers/PhiAlgorithm.hpp
which builds the PLCP as a plain array,
although we have the succinct
include/tudocomp/ds/LCPSada.hpp
.
The SDSL, glog and boost dependencies now behave the same way:
${FOO_include_dirs}
and similar cmake variables that find_package()
generates.The LZ77 and LZ78 compressors define different stat keys for the same thing ("factor_count" vs "num_factors"). Can we unify this?
LZ77Classic
for (back_relative_pos, len, next_char)
rulesLZ77SS
for (back_relative_pos, len)
rules embedded in plain textEsacomp
for (absolute_pos, absolute_target, len)
rulesLZ78
for (prev_dict_idx, next_char)
rules.LZW
for (prev_dict_idx)
rules.The class NoKGrow
has the meta name no_kv_grow
, while the class NoKVGrow
has the meta name no_k_grow
.
It is not clear which class does which -> add some comments?
Issuing ./tdc --list=bi
gives
There is more than algorithm named 'bi'. Please specify using one of the following canonical IDs:
bi:lzss_bidirectional_coder
bi:lzss_bidirectional_coder
It could be also nice for lcpcomp
to list all parameters for compression/depcompression at the same time instead of outputting
There is more than algorithm named 'lcpcomp'. Please specify using one of the following canonical IDs:
lcpcomp:decompressor
lcpcomp:compressor
``
Can we push default parameters into the coder `bi` such that we do not have to write `bi(binary,binary,binary)`,
and use as default strategy the best one, i.e., plcpcomp?
The Config
class is not described in the docu, but seems essential for writing more complex compressors.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.