Comments (8)
Btw, what type of data are you using?
Is it possible to upload a large dataset (1..10MB) in raw or text format (for icapp) from your data for testing?
I added 4 sample files of data that that I use:
https://github.com/pps83/TurboPFor/tree/sample-data/data
Data is in 7*390 blocks of uint32_t
, and I added simple code to read the blocks from these files:
typedef uint32_t[7 * 390] datablock;
struct file_fmt
{
uint32_t count;
datablock blocks[1]; // actual count is not 1, but indicated by count member
};
static file_fmt* read_data_file(const char* fileName);
full code is here:
https://github.com/pps83/TurboPFor/blob/sample-data/data/read-data.h
For example, file.01
contains 89 of such blocks and after I compress them I need to be able to read them in random order, that is, I encode each block separately. I've read somewhere in the docs (correct me if I'm wrong) that with turbop4 I can encode all the blocks at once and then have some sort of random access reading to "seek" withing compressed file and start decoding somewhere from the middle to decode only relevant block.
from turbopfor-integer-compression.
Thanks for reporting. Per definition the results are undefined, but AFIR gcc returns 32 for __builtin_ctz and 64 for __builtin_ctzll.
This is why I'm making the check for windows.
However, I'm not using this fact in TurboPFor.
from turbopfor-integer-compression.
This is what's generated with gcc/clang/mscl for x86/x64: https://godbolt.org/z/4erge8
bsf is undefined for 0, only if the opcode itself does anything then it would be the same, otherwise all compilers generate identical code.
from turbopfor-integer-compression.
You're right. It is better to make these functions consistent, in case somebody else want to use "conf.h". I'll change this in the next version. You can also make a pull request.
Is it possible to put the "_sse" and "_avx2" files under the directory "vs"? These files are only needed by the microsoft compiler" and I want to keep the number of files in the main directory as small as possible.
Btw, what type of data are you using?
Is it possible to upload a large dataset (1..10MB) in raw or text format (for icapp) from your data for testing?
from turbopfor-integer-compression.
Regarding _sse/_avx, I'll move them and make a pull request. By the way, by including .c files from .c file I discovered curious bug in Visual Studio dependency tracking, and when I showed the problem to a couple people pretty much everybody had strong opinion that the project that does "that" is completely broken. "That" is the way TurboPFor does compilation where a single source file needs to be compiled with different compilation flags to produce different obj files.
Seems like the best way would be to keep these _sse/_avx files, and the file that actually contains all the code should be renamed to something other than .c (either .inl or simply .h).
I'll try to upload a couple of data samples of the data that I test with.
from turbopfor-integer-compression.
Opinions without any argumentation are subjective. Some programmers love to create a file for every few lines of code spending most of the time opening and closing files. The project is not broken, it compiles with a simple makefile (no autoconf, no cmake, ...) on every plattform linux, windows, iOs, ARM aarch64 and compiler gcc, mingw64, clang, Intel icc, gcc on ARM,...
I can better manage the complexity when the number of files (and lines) is kept to a possible minimum without sacrificing modularity.
from turbopfor-integer-compression.
Thank you for your valuable contribution and for taking the time posting your pull requests.
from turbopfor-integer-compression.
I can encode all the blocks at once and then have some sort of random access reading to "seek" withing compressed file and start decoding somewhere from the middle to decode only relevant block.
You must use an offset array, storing the start of each 128/256 block in the compressed buffer. This is used in "inverted index" demo idxcr.c. Only one block must be decoded.
It is also possible to use the direct-access to a single value : after determining the block start from the offset array, you can directly access single values with the p4encx16/p4encx32 functions. See vp4.h and the usage in vp4.c. In this case, the block decoding is omitted.
from turbopfor-integer-compression.
Related Issues (20)
- Turbopfor 256 performs worse than Turbopfor 128 HOT 5
- macos 13.3.1 m1 build issue HOT 8
- D1 Differential Coding HOT 2
- Boundary check in idxqry.C HOT 3
- Benchmark: TurboPFor Integer Compression on APPLE M1
- Just some questions about TurboPFor Implementation HOT 3
- p4ddec32 HOT 1
- Cross-compiling for iOS HOT 1
- python support HOT 1
- small array compression HOT 3
- negative ints? HOT 1
- Streaming Data HOT 3
- icapp I and J arguments HOT 1
- -E option HOT 2
- fpxenc8 error
- Is lzturbo dead?
- Messy project management, fixes randomly reverted HOT 10
- vlccomp32, vhicomp32 corrupt memory for small input buffers
- Cargo publish Rust crate (probably renamed turbo-pfor-sys) HOT 2
- Contract for output buffer size?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from turbopfor-integer-compression.