Giter Club home page Giter Club logo

lcg's Introduction

LCG : Locally-consistent grammar compression

This repository contains the implementation of LCG, a scalable text compressor that relies on the concept of locally-consistent grammars. The main features of LCG are:

  1. Compression of TBs of text efficiently

Third-party libraries

  1. xxHash
  2. CLI11

Prerequisites

  1. C++ >= 17
  2. CMake >= 3.7

The xxHash and CLI11 libraries are already included in the source files of this repository.

DISCLAIMER

This implementation is still under development. Not all the features described in the help have been tested, and some of them are partially implemented.

For the moment, use LCG only to measure compression ratios.

Installation

Clone repository, enter the project folder and execute the following commands:

mkdir build
cd build
cmake ..
make

Compressing text

./lcg comp sample_file.txt

Input

Our tool currently assumes the input is a concatenated collection of one or more strings, where every string ends with the same separator symbol. The tool assumes the last symbol in the file is the separator.

For collections of ASCII characters (i.e, regular text, DNA, protein, etc), inputs in one-string-per-line format should work just fine.

Long and short strings

The compression algorithm of LCG is optimized to work with collections of strings that do not exceed the 4 GBs in length. This cap in enough for most practical applications. However, if your collection contains strings longer than that value, you can pass the flag -l/--long-strings. Note the 4GB cap is on the string length, not the collection size. For instance, you can have a 1TB-size collection, but if all the strings are less than 4GB in length, then the -l flag is not necessary.

Bugs

This tool still contains experimental code, so you will probably find bugs. Please report them in this repository.

Author

This implementation was written by Diego Díaz .

lcg's People

Contributors

ddiazdom avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.