Giter Club home page Giter Club logo

Comments (8)

pps83 avatar pps83 commented on July 28, 2024 1

Btw, what type of data are you using?
Is it possible to upload a large dataset (1..10MB) in raw or text format (for icapp) from your data for testing?

I added 4 sample files of data that that I use:
https://github.com/pps83/TurboPFor/tree/sample-data/data

Data is in 7*390 blocks of uint32_t, and I added simple code to read the blocks from these files:

typedef uint32_t[7 * 390] datablock;
struct file_fmt
{
    uint32_t count;
    datablock blocks[1]; // actual count is not 1, but indicated by count member
};

static file_fmt* read_data_file(const char* fileName);

full code is here:
https://github.com/pps83/TurboPFor/blob/sample-data/data/read-data.h

For example, file.01 contains 89 of such blocks and after I compress them I need to be able to read them in random order, that is, I encode each block separately. I've read somewhere in the docs (correct me if I'm wrong) that with turbop4 I can encode all the blocks at once and then have some sort of random access reading to "seek" withing compressed file and start decoding somewhere from the middle to decode only relevant block.

from turbopfor-integer-compression.

powturbo avatar powturbo commented on July 28, 2024

Thanks for reporting. Per definition the results are undefined, but AFIR gcc returns 32 for __builtin_ctz and 64 for __builtin_ctzll.
This is why I'm making the check for windows.
However, I'm not using this fact in TurboPFor.

from turbopfor-integer-compression.

pps83 avatar pps83 commented on July 28, 2024

This is what's generated with gcc/clang/mscl for x86/x64: https://godbolt.org/z/4erge8
bsf is undefined for 0, only if the opcode itself does anything then it would be the same, otherwise all compilers generate identical code.

from turbopfor-integer-compression.

powturbo avatar powturbo commented on July 28, 2024

You're right. It is better to make these functions consistent, in case somebody else want to use "conf.h". I'll change this in the next version. You can also make a pull request.

Is it possible to put the "_sse" and "_avx2" files under the directory "vs"? These files are only needed by the microsoft compiler" and I want to keep the number of files in the main directory as small as possible.

Btw, what type of data are you using?
Is it possible to upload a large dataset (1..10MB) in raw or text format (for icapp) from your data for testing?

from turbopfor-integer-compression.

pps83 avatar pps83 commented on July 28, 2024

Regarding _sse/_avx, I'll move them and make a pull request. By the way, by including .c files from .c file I discovered curious bug in Visual Studio dependency tracking, and when I showed the problem to a couple people pretty much everybody had strong opinion that the project that does "that" is completely broken. "That" is the way TurboPFor does compilation where a single source file needs to be compiled with different compilation flags to produce different obj files.
Seems like the best way would be to keep these _sse/_avx files, and the file that actually contains all the code should be renamed to something other than .c (either .inl or simply .h).

I'll try to upload a couple of data samples of the data that I test with.

from turbopfor-integer-compression.

powturbo avatar powturbo commented on July 28, 2024

Opinions without any argumentation are subjective. Some programmers love to create a file for every few lines of code spending most of the time opening and closing files. The project is not broken, it compiles with a simple makefile (no autoconf, no cmake, ...) on every plattform linux, windows, iOs, ARM aarch64 and compiler gcc, mingw64, clang, Intel icc, gcc on ARM,...
I can better manage the complexity when the number of files (and lines) is kept to a possible minimum without sacrificing modularity.

from turbopfor-integer-compression.

powturbo avatar powturbo commented on July 28, 2024

Thank you for your valuable contribution and for taking the time posting your pull requests.

from turbopfor-integer-compression.

powturbo avatar powturbo commented on July 28, 2024

I can encode all the blocks at once and then have some sort of random access reading to "seek" withing compressed file and start decoding somewhere from the middle to decode only relevant block.

You must use an offset array, storing the start of each 128/256 block in the compressed buffer. This is used in "inverted index" demo idxcr.c. Only one block must be decoded.

It is also possible to use the direct-access to a single value : after determining the block start from the offset array, you can directly access single values with the p4encx16/p4encx32 functions. See vp4.h and the usage in vp4.c. In this case, the block decoding is omitted.

from turbopfor-integer-compression.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.