powturbo / turbopfor-integer-compression Goto Github PK
View Code? Open in Web Editor NEWFastest Integer Compression
License: GNU General Public License v2.0
Fastest Integer Compression
License: GNU General Public License v2.0
@powturbo Can you release a binary file for Windows for us to download?
TurboPFor: IcApp Integer Compression Benchmark - Skylake i7-6700 3.4GHz
docs: Document Ids
./icapp test_collection.docs
file: max bits histogram:
00: 0.002%
01:## 1.7%
02:# 0.7%
03: 0.4%
04: 0.2%
05: 0.2%
06:# 0.5%
07:# 1.4%
08:## 1.6%
09:#### 3.6%
10:###### 6.4%
11:######## 8.1%
12:################# 17%
13:############################################# 45%
14:############# 13%
E MB/s size ratio D MB/s function (integer size=32 bits) BOLD=Pareto 1038.82 3017800 21.93% 2661.63 vszenc32 TurboVSimple zigzag 1139.82 3147043 22.87% 4950.83 p4nzenc128v32 TurboPForV zigzag 1076.60 3147043 22.87% 3642.05 p4nzenc32 TurboPFor zigzag 964.09 3186374 23.15% 2922.15 p4nzzenc128v32 TurboPFor zzag/delta 887.67 3190622 23.18% 1954.18 bitshuffleZ+lz Transpose+zigzag+lz 1088.78 3228329 23.46% 2693.93 tpnibbleZ+lz Transpose+zigzag+lz 1122.80 3239377 23.54% 5056.32 p4nd1enc128v32 TurboPForV delta1 1075.09 3239377 23.54% 3680.03 p4nd1enc32 TurboPFor delta1 1331.20 3246518 23.59% 6489.07 p4nzenc256v32 TurboPFor256 zigzag 1034.37 3277695 23.81% 2621.08 tpbyteZ+lz Transpose+zigzag+lz 1135.96 3357179 24.39% 5334.62 p4ndenc128v32 TurboPForV delta 1075.59 3357179 24.39% 3644.94 p4ndenc32 TurboPFor delta 1305.82 3423417 24.87% 6158.08 p4nd1enc256v32 TurboPFor256 delta1 1346.84 3515568 25.54% 1550.80 vbddenc32 TurboVByte zzag delt 1311.54 3528896 25.64% 6330.87 p4ndenc256v32 TurboPFor256 delta 920.32 3531916 25.66% 1058.47 bvzzenc32 bitio zigzag/delta 1094.85 3673776 26.69% 3244.53 tpnibble+lz Transpose+lz 1079.98 3676707 26.71% 3193.34 tpbyte+lz Transpose+lz 981.41 3701682 26.90% 2531.42 tpnibbleX+lz Transpose+xor+lz 983.37 3738695 27.16% 2506.98 tpbyteX+lz Transpose+xor+lz 8699.94 3925193 28.52% 11144.38 v8nd1enc128v32 TByte+TPackV delta1 9571.15 3945921 28.67% 11644.09 v8ndenc128v32 TByte+TPackV delta 2644.76 3948726 28.69% 2783.84 vbzenc32 TurboVByte zigzag 7443.65 4014799 29.17% 9577.81 v8nzenc128v32 TByte+TPackV zigzag 11010.65 4173952 30.33% 12030.87 v8ndenc256v32 TByte+TPackV delta 10286.48 4161012 30.23% 11723.43 v8nd1enc256v32 TByte+TPackV delta1 3054.44 4180490 30.37% 3489.68 vbd1enc32 TurboVByte delta1 3230.82 4181163 30.38% 3716.80 vbdenc32 TurboVByte delta 8755.29 4215033 30.63% 10769.41 v8nzenc256v32 TByte+TPackV zigzag 867.36 4228629 30.72% 1893.95 bitshuffleX+lz Transpose+xor+lz 970.41 4260129 30.95% 2366.86 bitshuffle+lz Transpose+lz 1062.80 4329918 31.46% 1271.79 bvzenc32 bitio zigzag 8738.61 4406850 32.02% 10031.57 bitnzpack128v32 TurboPackV zigzag 4255.82 4406850 32.02% 5017.61 bitnzpack32 TurboPack zigzag 11063.76 4724718 34.33% 10735.81 v8zenc32 TurboByte zigzag 3235.38 4724718 34.33% 4183.38 streamvbyte zzag StreamVByte zigzag 11926.61 4819041 35.01% 11644.09 bitnzpack256v32 TurboPack256 zigzag 12041.39 4889190 35.52% 12223.19 v8d1enc32 TurboByte delta1 12661.74 4889479 35.53% 12850.90 v8denc32 TurboByte delta 10490.33 4889479 35.53% 14101.75 streamvbyte delt StreamVByte delta 385.16 5050551 36.70% 423.20 SPDP SPDP Floating Point 1931.69 5265005 38.25% 8086.55 fpxenc32 TurboFloat XOR 1350.93 5265005 38.25% 1524.01 fpfcmenc32 TurboFloat FCM 997.49 5310026 38.58% 1184.65 fpgenc32 bitio TurboGorilla 1657.63 5552401 40.34% 11926.61 p4nenc128v32 TurboPForV 1578.90 5552401 40.34% 7583.09 p4nenc32 TurboPFor 1883.32 5626136 40.88% 7616.66 FastPFor FastPFor 45.75 5626576 40.88% 4884.07 SimdOptPFor FastPFor SIMD 2247.07 5626832 40.88% 13818.59 SimdFastPFor FastPFor SIMD 1903.38 5638279 40.97% 13045.79 p4nenc256v32 TurboPFor256 8291.15 5653941 41.08% 15343.71 v8nenc128v32 TurboByte+TbPackV 18400.15 5661003 41.13% 15275.60 bitnpack128v32 TurboPackV 10458.44 5661003 41.13% 10263.47 bitnpack32 TurboPack 9331.06 5745570 41.75% 15429.72 v8nenc256v32 TurboByte+TbPackV 19115.71 5748842 41.77% 15464.40 bitnpack256v32 TurboPack256 2062.54 6226583 45.24% 6671.50 vsenc32 TurboVSimple 777.85 6577162 47.79% 1149.34 fpdfcmenc32 TurboFloat DFCM 784.99 6658332 48.38% 1150.49 fp2dfcmenc32 TurboFloat DFCM 2D 2628.09 6682141 48.55% 4124.46 vbenc32 TurboVByte scalar 3850.95 6705452 48.72% 6507.48 maskeydvbyte MasedVByte SIMD 10949.33 6790511 49.34% 11527.06 bitnd1pack128v32 TurboPackV delta1 6674.74 6790511 49.34% 8212.00 bitnd1pack32 TurboPack delta1 12169.15 6816223 49.52% 11854.70 bitndpack128v32 TurboPackV delta 7004.23 6816223 49.52% 9001.51 bitndpack32 TurboPack delta 11498.17 7511530 54.58% 14442.09 v8enc32 TurboByte SIMD 10769.41 7511530 54.58% 14321.86 streamvbyte StreamVByte SIMD 13681.22 8062094 58.58% 10710.75 bitnd1pack256v32 TurboPack256 delta1 14262.50 8076334 58.68% 10854.35 bitndpack256v32 TurboPack256 delta 475.40 9044107 65.71% 2907.33 lz lz 1427.58 10833105 78.71% 5124.09 srlez32 TurboRLE32 ESC zzag 1681.94 13712109 99.63% 5869.22 srlex32 TurboRLE32 ESC xor 1869.00 13763312 100.00% 17532.88 srle32 TurboRLE32 ESC 298.29 13763312 100.00% 17532.88 trle TurboRLE 216.18 13763312 100.00% 17532.88 trlex TurboRLE xor 200.00 13763312 100.00% 17532.88 trlez TurboRLE zigzag 17334.15 13763312 100.00% 17510.58 memcpy memcpy 9151.14 13763312 100.00% 11595.04 tpenc Byte transpose 8891.03 13763312 100.00% 11054.87 tp4enc Nibble transpose 4135.61 13763312 100.00% 4112.13 bitshuffle Bit transpose * : external library
freqs: Term Frequencies
./icapp test_collection.freqs
file: max bits histogram:
00: 0.001%
01:########################### 27%
02:############ 12%
03:###### 6.0%
04:### 3.1%
05:## 1.6%
06:# 1.0%
07:# 1.0%
08:# 0.9%
09:## 1.9%
10:### 3.2%
11:#### 4.1%
12:######## 8.5%
13:####################### 23%
14:###### 6.5%
E MB/s size ratio D MB/s function (integer size=32 bits) 91.61 1692992 12.30% 4615.46 SimdOptPFor FastPFor SIMD 1204.45 1722849 12.52% 4061.17 vsenc32 TurboVSimple 1402.56 1730336 12.57% 8397.38 p4nenc128v32 TurboPForV 1339.76 1730336 12.57% 4979.49 p4nenc32 TurboPFor 1343.42 1778796 12.92% 6106.17 FastPFor FastPFor 1447.85 1779168 12.93% 12169.15 SimdFastPFor FastPFor SIMD 1662.23 1780534 12.94% 12603.76 p4nenc256v32 TurboPFor256 1701.06 1907713 13.86% 3908.92 tpnibble+lz Transpose+lz 1084.24 1922544 13.97% 2792.88 vszenc32 TurboVSimple zigzag 1435.32 1955549 14.21% 2904.88 tpnibbleX+lz Transpose+xor+lz 1023.67 1958057 14.23% 2334.74 bitshuffleX+lz Transpose+xor+lz 1194.83 1962095 14.26% 3022.91 bitshuffle+lz Transpose+lz 1185.37 1996059 14.50% 5303.78 p4nzenc128v32 TurboPForV zigzag 1126.29 1996059 14.50% 3584.19 p4nzenc32 TurboPFor zigzag 1364.32 2087485 15.17% 7105.47 p4nzenc256v32 TurboPFor256 zigzag 1081.60 2130744 15.48% 4695.77 fpxenc32 TurboFloat XOR 877.76 2130744 15.48% 1330.30 fpfcmenc32 TurboFloat FCM 1326.33 2145784 15.59% 2916.57 tpnibbleZ+lz Transpose+zigzag+lz 1293.42 2219153 16.12% 3770.77 tpbyte+lz Transpose+lz 1015.97 2259515 16.42% 1055.63 bvzenc32 bitio zigzag 1003.67 2300817 16.72% 2364.83 bitshuffleZ+lz Transpose+zigzag+lz 1128.69 2381528 17.30% 2856.05 tpbyteX+lz Transpose+xor+lz 1021.17 2398296 17.43% 2984.24 p4nzzenc128v32 TurboPFor zzag/delta 1137.28 2407925 17.50% 2936.48 tpbyteZ+lz Transpose+zigzag+lz 1004.84 2764046 20.08% 1203.82 bvzzenc32 bitio zigzag/delta 16483.00 2784387 20.23% 17312.33 v8nenc128v32 TurboByte+TbPackV 17488.32 2804702 20.38% 17118.54 bitnpack128v32 TurboPackV 9268.22 2804702 20.38% 10522.40 bitnpack32 TurboPack 1670.91 2883341 20.95% 1931.15 vbddenc32 TurboVByte zazg delt 1110.66 3029119 22.01% 1114.80 fpgenc32 bitio TurboGorilla 8271.22 3146929 22.86% 10202.60 v8nzenc128v32 TByte+TPackV zigzag 20948.71 3150687 22.89% 16764.07 v8nenc256v32 TurboByte+TbPackV 22824.72 3189005 23.17% 16764.07 bitnpack256v32 TurboPack256 8591.33 3189535 23.17% 10340.57 bitnzpack128v32 TurboPackV zigzag 4135.61 3189535 23.17% 5150.94 bitnzpack32 TurboPack zigzag 11546.40 3451880 25.08% 11365.24 vbenc32 TurboVByte scalar 8661.61 3457754 25.12% 16846.15 maskeydvbyte MasedVByte SIMD 5340.82 3475286 25.25% 5265.23 vbzenc32 TurboVByte zigzag 10611.65 3502875 25.45% 12959.80 v8nzenc256v32 TByte+TPackV zigzag 11733.42 3589870 26.08% 13349.47 bitnzpack256v32 TurboPack256 zigzag 810.18 3925934 28.52% 3365.11 lz lz 14087.31 4307847 31.30% 16562.34 v8enc32 TurboByte SIMD 11517.41 4307847 31.30% 16825.56 streamvbyte StreamVByte SIMD 11099.44 4323652 31.41% 10744.19 v8zenc32 TurboByte zigzag 4017.31 4323652 31.41% 4233.56 streamvbyte zzag StreamVByte zigzag 1048.79 5742202 41.72% 5785.33 p4ndenc128v32 TurboPForV delta 1001.91 5742202 41.72% 3535.40 p4ndenc32 TurboPFor delta 1258.30 5773355 41.95% 7098.15 p4ndenc256v32 TurboPFor256 delta 11793.75 7227089 52.51% 12432.98 v8denc32 TurboByte delta 10271.12 7227089 52.51% 12062.49 streamvbyte delt StreamVByte delta 8705.44 7231250 52.54% 11703.49 v8ndenc256v32 TByte+TPackV delta 8077.06 7234307 52.56% 10949.33 v8ndenc128v32 TByte+TPackV delta 1408.88 7343213 53.35% 1508.14 vbdenc32 TurboVByte delta 1277.10 10319134 74.98% 5386.81 srle32 TurboRLE32 ESC 735.30 10350497 75.20% 1170.55 fpdfcmenc32 TurboFloat DFCM 731.12 10690707 77.68% 1161.46 fp2dfcmenc32 TurboFloat DFCM 2D 1124.82 10963622 79.66% 4061.17 srlex32 TurboRLE32 ESC xor 853.54 10981857 79.79% 4564.94 p4nd1enc128v32 TurboPForV delta1 840.30 10981857 79.79% 3156.72 p4nd1enc32 TurboPFor delta1 1024.06 11014587 80.03% 5551.96 p4nd1enc256v32 TurboPFor256 delta1 1348.15 11113500 80.75% 4623.21 srlez32 TurboRLE32 ESC zzag 169.10 11621965 84.44% 191.26 SPDP SPDP Floating Point 7451.71 11667495 84.77% 9261.98 v8nd1enc128v32 TByte+TPackV delta1 8029.93 11691692 84.95% 9712.99 v8nd1enc256v32 TByte+TPackV delta1 10016.96 11716644 85.13% 10434.65 v8d1enc32 TurboByte delta1 1137.46 13329244 96.85% 1150.97 vbd1enc32 TurboVByte delta1 11614.60 13727224 99.74% 10458.44 bitndpack128v32 TurboPackV delta 8464.52 13727224 99.74% 9054.81 bitndpack32 TurboPack delta 11527.06 13747047 99.88% 9465.82 bitndpack256v32 TurboPack256 delta 16542.43 13763304 100.00% 16116.28 memcpy memcpy 412.58 13763304 100.00% 16116.28 trle TurboRLE 246.88 13763304 100.00% 16116.28 trlez TurboRLE zigzag 361.16 13763304 100.00% 16097.43 trlex TurboRLE xor 10719.08 13763304 100.00% 11663.82 tpenc Byte transpose 8931.41 13763304 100.00% 11001.84 tp4enc Nibble transpose 4247.93 13763304 100.00% 5147.08 bitshuffle Bit transpose 11693.55 13776743 100.10% 8954.65 bitnd1pack256v32 TurboPack256 delta1 11374.63 13790184 100.20% 10450.50 bitnd1pack128v32 TurboPackV delta1 8251.38 13790184 100.20% 8925.62 bitnd1pack32 TurboPack delta1
Synthetic data: zipfian distribution
./icapp -a1.5 -m0 -M255 -n100M ZIPF
bits histogram:
00:######################################## 40%
01:############## 14%
02:############# 13%
03:########## 10%
04:######## 7.8%
05:###### 5.7%
06:#### 4.1%
07:### 2.9%
08:## 2.1%
E MB/s size ratio D MB/s function (integer size=32 bits) 2368.80 62939886 15.73% 10950.80 p4nenc256v32 TurboPFor256 1327.02 63392759 15.85% 7702.23 p4nenc128v32 TurboPForV 1321.43 63392759 15.85% 4556.85 p4nenc32 TurboPFor 66.39 65060504 16.27% 3077.49 SimdOptPFor FastPFor SIMD 606.96 73459928 18.36% 5349.24 FastPFor FastPFor 631.90 73469416 18.37% 10197.58 SimdFastPFor FastPFor SIMD 1010.83 76345141 19.09% 2866.89 vsenc32 TurboVSimple 1745.84 79163645 19.79% 3332.50 tpnibble+lz Transpose+lz 1464.36 80509600 20.13% 2494.28 tpnibbleX+lz Transpose+xor+lz 1097.09 82870974 20.72% 2541.51 bitshuffle+lz Transpose+lz 975.88 83384370 20.85% 2073.53 bitshuffleX+lz Transpose+xor+lz 1321.55 85243365 21.31% 6370.95 p4nzenc256v32 TurboPFor256 zigzag 1178.72 85546946 21.39% 4991.20 p4nzenc128v32 TurboPForV zigzag 1104.59 85546946 21.39% 3735.66 p4nzenc32 TurboPFor zigzag 1334.30 88589767 22.15% 2936.28 tpbyte+lz Transpose+lz 902.66 89283071 22.32% 2240.92 vszenc32 TurboVSimple zigzag 1010.73 94657651 23.66% 4570.28 fpxenc32 TurboFloat XOR 823.29 94657651 23.66% 1348.86 fpfcmenc32 TurboFloat FCM 1284.95 95923933 23.98% 2332.20 tpbyteX+lz Transpose+xor+lz 1376.65 97552451 24.39% 2321.01 tpnibbleZ+lz Transpose+zigzag+lz 1011.41 98127316 24.53% 2129.66 bitshuffleZ+lz Transpose+zigzag+lz 888.50 98371167 24.59% 926.12 bvzenc32 bitio zigzag 17298.05 99910930 24.98% 12408.10 v8nenc128v32 TurboByte+TbPackV 17356.59 99910930 24.98% 12362.47 bitnpack128v32 TurboPackV 11693.51 99910930 24.98% 10137.62 bitnpack32 TurboPack 17057.57 100332929 25.08% 11994.72 v8nenc256v32 TurboByte+TbPackV 17076.50 100332929 25.08% 11170.06 bitnpack256v32 TurboPack256 1013.86 100628462 25.16% 3083.11 p4nzzenc128v32 TurboPFor zzag/delta 11191.32 101015650 25.25% 10332.45 vbenc32 TurboVByte scalar 6689.75 102074663 25.52% 9524.04 maskeydvbyte MasedVByte SIMD 1161.18 103144618 25.79% 1263.64 fpgenc32 bitio TurboGorilla 1164.71 105990559 26.50% 2202.97 tpbyteZ+lz Transpose+zigzag+lz 3857.24 106284616 26.57% 4009.10 vbzenc32 TurboVByte zigzag 9820.77 112368050 28.09% 10469.01 bitnzpack128v32 TurboPackV zigzag 9686.17 112368050 28.09% 10426.44 v8nzenc128v32 TByte+TPackV zigzag 4189.84 112368050 28.09% 5317.24 bitnzpack32 TurboPack zigzag 12560.05 112825409 28.21% 11054.92 v8nzenc256v32 TByte+TPackV zigzag 12680.30 112825409 28.21% 11036.61 bitnzpack256v32 TurboPack256 zigzag 1650.88 116367689 29.09% 1882.50 vbddenc32 TurboVByte zazg delt 832.48 119294130 29.82% 973.89 bvzzenc32 bitio zigzag/delta 13107.45 125000000 31.25% 12375.09 v8enc32 TurboByte SIMD 11186.62 125000000 31.25% 12123.78 streamvbyte StreamVByte SIMD 10956.20 128705458 32.18% 10696.90 v8zenc32 TurboByte zigzag 3208.47 128705458 32.18% 3850.37 streamvbyte zzag StreamVByte zigzag 656.49 140353625 35.09% 2416.76 lz lz 1002.48 231440944 57.86% 5208.47 p4ndenc128v32 TurboPForV delta 959.54 231440944 57.86% 3770.10 p4ndenc32 TurboPFor delta 1180.67 231486347 57.87% 5985.96 p4ndenc256v32 TurboPFor256 delta 488.03 239203583 59.80% 2701.28 trle TurboRLE 10108.16 245851337 61.46% 10481.08 v8denc32 TurboByte delta 9441.09 245851337 61.46% 10096.17 streamvbyte delt StreamVByte delta 8420.70 246241959 61.56% 10398.52 v8ndenc256v32 TByte+TPackV delta 7895.00 246632587 61.66% 9510.45 v8ndenc128v32 TByte+TPackV delta 1142.53 262025557 65.51% 1198.36 vbdenc32 TurboVByte delta 436.08 263233919 65.81% 1532.21 trlex TurboRLE xor 315.02 263233919 65.81% 1053.77 trlez TurboRLE zigzag 1085.73 291716271 72.93% 5591.59 p4nd1enc256v32 TurboPFor256 delta1 918.79 291995716 73.00% 4686.75 p4nd1enc128v32 TurboPForV delta1 897.86 291995716 73.00% 3595.60 p4nd1enc32 TurboPFor delta1 9353.88 304132355 76.03% 9687.11 v8d1enc32 TurboByte delta1 7881.46 304522974 76.13% 8913.05 v8nd1enc256v32 TByte+TPackV delta1 7441.72 304913602 76.23% 8818.73 v8nd1enc128v32 TByte+TPackV delta1 1026.88 339718007 84.93% 1063.92 vbd1enc32 TurboVByte delta1 168.67 374829697 93.71% 189.64 SPDP SPDP Floating Point 1374.41 384560608 96.14% 9134.30 srle32 TurboRLE32 ESC 709.62 387715234 96.93% 1186.37 fpdfcmenc32 TurboFloat DFCM 710.10 388257392 97.06% 1175.15 fp2dfcmenc32 TurboFloat DFCM 2D 1174.42 392073698 98.02% 5061.88 srlex32 TurboRLE32 ESC xor 1433.31 393849411 98.46% 5286.18 srlez32 TurboRLE32 ESC zzag 14002.66 400000000 100.00% 14043.46 memcpy memcpy 3876.57 400000000 100.00% 3789.60 bitshuffle Bit transpose 8518.79 400000000 100.00% 9106.43 tpenc Byte transpose 8304.44 400000000 100.00% 9598.08 tp4enc Nibble transpose 9278.16 400390622 100.10% 7273.65 bitnd1pack256v32 TurboPack256 delta1 9303.19 400390622 100.10% 7575.18 bitndpack256v32 TurboPack256 delta 7950.55 400781247 100.20% 7644.97 bitnd1pack32 TurboPack delta1 8069.89 400781247 100.20% 7650.09 bitndpack32 TurboPack delta 9284.62 400781247 100.20% 8515.17 bitnd1pack128v32 TurboPackV delta1 9294.98 400781247 100.20% 8564.58 bitndpack128v32 TurboPackV delta
I use p4ndec256v32 and p4nzdec256v32 in my project and I'm not sure if these require any data alignment (for input and output buffers) and if there is any padding requirements. E.g. if these functions may read past the end of the buffer (and how much if they do?) and if the trailing bytes have to be zero filled or it's ok for them to be random?
Hi,
We want to use TurboPFor for our large scale applications.
We already use turboRLE which is very nice, thank you for your work.
We want to also use the p4enc* function and unfortunately, we find it hard to compile.
Our code is (as expected) very similar to the "hello world" you provided
#include <stdio.h>
#define NTURBOPFOR_DAC
#include "vp4.h"
#define P4NENC_BOUND(n) ((n+127)/128+(n+32)*sizeof(uint32_t))
int main(int argc, char* argv[]) {
printf("Hello TurboPFor\n");
int ar[32] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
unsigned elnum = 10;
unsigned char* compress_buf = malloc(P4NENC_BOUND(elnum));
int *uncompress_buf = malloc((elnum+32)*sizeof(ar[0]));
size_t compress_size = p4nenc32((uint32_t*)ar, elnum, compress_buf);
printf("compress size is %lu\n", compress_size);
size_t uncompress_size = p4ndec32(compress_buf, elnum, (uint32_t*)uncompress_buf);
printf("uncompress size is %lu\n", uncompress_size);
}
Could you provide a corresponding minimalistic makefile that we could adapt as the repository makefile is quite dense and complex.
Thank you for your time.
Hi I am very intrested in your PFor and RLE repo. What is the licence of PFor and RLE, and may i use them in open source project?
Just to make sure it's tracked properly, decoding speed reduced by more than 10% for p4nzdec256v32 (my personal project seems to have best results with this function).
9dad490 doesn't have the problem, 5dff9d3f has the problem.
It's absolutely clearly almost 10% slower for entire test where p4nzdec256v32 connsumes perhaps 25% of the cpu time of the entire test.
Times of my tests with these TurboPFor versions:
9dad490 - 16.605sec ~MyAlgo
profiler data:
p4nzdec256v32 roughly takes 3.560sec (21.44% of 16.605sec)
5dff9d3 - 17.636sec ~MyAlgo
profiler data:
p4nzdec256v32 roughly takes 4.213sec (23.89% of 17.636sec)
These tests are performed with absolutely identical code, simply TurboPFor version changed.
Also, the test uses 3 threads (according to the debugger, so, actual performance hit in p4nzdec256v32 seems to be way more than 10%, and might easily be in the range of 20%-30%). Also, times were measured without profiler (just to make it clear, it was fully optimized runs of 64-bit builds of TurboPFor)
Unfortunately, due to the way TurboPFor is managed, it's in permanently broken state and I wasn't able to bisect to find offending commit: if I try to check out anything in between, it's always broken and doesn't build with all kinds of compilation errors. I'd strongly recommend adjusting your approach: it's not suitable for projects that are worked on and used by more than one person.
is there any plan of direct access for delta compression?
Is there a guidance on estimating the maximum output buffer required given N values of type T ?
The big endian machine runs the bitpack algorithm, and the decompression is not correct. Has this algorithm considered the big end problem?
TurboPFor: IcApp Integer Compression Benchmark ARM A73-ODROID-N2 1.8GHz
docs: Document Ids
./icapp test_collection.docs
file: max bits histogram:
00: 0.002%
01:## 1.7%
02:# 0.7%
03: 0.4%
04: 0.2%
05: 0.2%
06:# 0.5%
07:# 1.4%
08:## 1.6%
09:#### 3.6%
10:###### 6.4%
11:######## 8.1%
12:################# 17%
13:############################################# 45%
14:############# 13%
E MB/s size ratio D MB/s function (integer size=32 bits) 363.95 3017800 21.93% 860.74 vszenc32 TurboVSimple zigzag 342.10 3147043 22.87% 918.90 p4nzenc128v32 TurboPForV zigzag 337.64 3147043 22.87% 802.48 p4nzenc32 TurboPFor zigzag 311.39 3186374 23.15% 802.15 p4nzzenc128v32 TurboPFor zzag/delta 68.69 3190622 23.18% 82.02 bitshuffleZ+lz Transpose+zigzag+lz 282.14 3228322 23.46% 542.80 tpnibbleZ+lz Transpose+zigzag+lz 321.63 3239377 23.54% 995.61 p4nd1enc128v32 TurboPForV delta1 319.97 3239377 23.54% 868.51 p4nd1enc32 TurboPFor delta1 282.14 3277684 23.81% 604.98 tpbyteZ+lz Transpose+zigzag+lz 334.27 3357179 24.39% 1029.96 p4ndenc128v32 TurboPForV delta 357.76 3357179 24.39% 946.45 p4ndenc32 TurboPFor delta 630.16 3515568 25.54% 757.64 vbddenc32 TurboVByte zzag delt 347.56 3531916 25.66% 463.71 bvzzenc32 bitio zigzag/delta 290.61 3673765 26.69% 703.59 tpnibble+lz Transpose+lz 290.51 3676694 26.71% 792.91 tpbyte+lz Transpose+lz 263.52 3701675 26.90% 541.95 tpnibbleX+lz Transpose+xor+lz 268.20 3738684 27.16% 660.43 tpbyteX+lz Transpose+xor+lz 1457.05 3925193 28.52% 1964.78 v8nd1enc128v32 TByte+TPackV delta1 1706.13 3945921 28.67% 2083.77 v8ndenc128v32 TByte+TPackV delta 1040.31 3948726 28.69% 1175.05 vbzenc32 TurboVByte zigzag 1271.56 4014799 29.17% 1743.96 v8nzenc128v32 TByte+TPackV zigzag 1144.65 4180490 30.37% 1303.84 vbd1enc32 TurboVByte delta1 1247.24 4181163 30.38% 1413.51 vbdenc32 TurboVByte delta 68.63 4228629 30.72% 82.93 bitshuffleX+lz Transpose+xor+lz 70.33 4260129 30.95% 86.04 bitshuffle+lz Transpose+lz 346.33 4329918 31.46% 513.06 bvzenc32 bitio zigzag 1541.42 4406850 32.02% 1699.80 bitnzpack128v32 TurboPackV zigzag 1231.07 4406850 32.02% 1548.00 bitnzpack32 TurboPack zigzag 1702.75 4724718 34.33% 2228.52 v8zenc32 TurboByte zigzag 734.63 4724718 34.33% 1160.19 streamvbyte zzag StreamVByte zigzag* 1762.95 4889190 35.52% 2551.60 v8d1enc32 TurboByte delta1 778.78 4889479 35.53% 827.47 streamvbyte delt StreamVByte delta* 1882.81 4889479 35.53% 2793.45 v8denc32 TurboByte delta 110.53 5050551 36.70% 117.81 SPDP SPDP Floating Point* 505.37 5265005 38.25% 1337.54 fpxenc32 TurboFloat XOR 396.96 5265005 38.25% 463.69 fpfcmenc32 TurboFloat FCM 375.79 5310026 38.58% 445.96 fpgenc32 bitio TurboGorilla 489.19 5552401 40.34% 1348.02 p4nenc128v32 TurboPForV 490.20 5552401 40.34% 1091.29 p4nenc32 TurboPFor 1401.27 5653941 41.08% 1464.96 v8nenc128v32 TurboByte+TbPackV 3249.13 5661003 41.13% 1419.34 bitnpack128v32 TurboPackV 2705.05 5661003 41.13% 949.46 bitnpack32 TurboPack 469.37 6226583 45.24% 2044.46 vsenc32 TurboVSimple 216.40 6577162 47.79% 381.73 fpdfcmenc32 TurboFloat DFCM 217.09 6655645 48.36% 379.77 fp2dfcmenc32 TurboFloat DFCM 2D 1276.51 6682141 48.55% 1103.98 vbenc32 TurboVByte scalar 1503.53 6790511 49.34% 2060.99 bitnd1pack32 TurboPack delta1 1836.09 6790511 49.34% 2000.77 bitnd1pack128v32 TurboPackV delta1 1873.83 6816223 49.52% 2227.79 bitndpack32 TurboPack delta 1965.63 6816223 49.52% 2152.20 bitndpack128v32 TurboPackV delta 1926.83 7511530 54.58% 4486.09 v8enc32 TurboByte SIMD 1347.23 7511530 54.58% 3263.77 streamvbyte StreamVByte SIMD* 129.33 9044107 65.71% 913.17 lz lz 549.52 10833105 78.71% 918.17 srlez32 TurboRLE32 ESC zzag 527.51 13712109 99.63% 1268.04 srlex32 TurboRLE32 ESC xor 4050.42 13763312 100.00% 4135.61 memcpy memcpy 562.11 13763312 100.00% 4128.17 srle32 TurboRLE32 ESC 112.36 13763312 100.00% 4121.99 trle TurboRLE 79.66 13763312 100.00% 4118.29 trlex TurboRLE xor 69.91 13763312 100.00% 4118.29 trlez TurboRLE zigzag 1360.01 13763312 100.00% 2218.82 tp4enc Nibble transpose 1774.54 13763312 100.00% 3416.91 tpenc Byte transpose 87.95 13763312 100.00% 90.10 bitshuffle Bit transpose * : external library
freqs: Term Frequencies
./icapp test_collection.freqs
file: max bits histogram:
00: 0.001%
01:########################### 27%
02:############ 12%
03:###### 6.0%
04:### 3.1%
05:## 1.6%
06:# 1.0%
07:# 1.0%
08:# 0.9%
09:## 1.9%
10:### 3.2%
11:#### 4.1%
12:######## 8.5%
13:####################### 23%
14:###### 6.5%
E MB/s size ratio D MB/s function (integer size=32 bits) 428.07 1722849 12.52% 1974.37 vsenc32 TurboVSimple 420.52 1730336 12.57% 1524.34 p4nenc32 TurboPFor 427.48 1730336 12.57% 1359.47 p4nenc128v32 TurboPForV 423.36 1907705 13.86% 934.50 tpnibble+lz Transpose+lz 314.49 1922544 13.97% 962.94 vszenc32 TurboVSimple zigzag 383.56 1955547 14.21% 656.15 tpnibbleX+lz Transpose+xor+lz 72.45 1958057 14.23% 83.00 bitshuffleX+lz Transpose+xor+lz 74.52 1962095 14.26% 86.33 bitshuffle+lz Transpose+lz 338.18 1996059 14.50% 960.12 p4nzenc128v32 TurboPForV zigzag 336.61 1996059 14.50% 801.40 p4nzenc32 TurboPFor zigzag 326.98 2130744 15.48% 932.79 fpxenc32 TurboFloat XOR 277.91 2130744 15.48% 402.28 fpfcmenc32 TurboFloat FCM 351.60 2145784 15.59% 562.34 tpnibbleZ+lz Transpose+zigzag+lz 375.51 2219138 16.12% 977.37 tpbyte+lz Transpose+lz 388.77 2259515 16.42% 513.15 bvzenc32 bitio zigzag 73.34 2300817 16.72% 83.59 bitshuffleZ+lz Transpose+zigzag+lz 321.11 2381520 17.30% 702.96 tpbyteX+lz Transpose+xor+lz 325.91 2398296 17.43% 815.65 p4nzzenc128v32 TurboPFor zzag/delta 326.45 2407923 17.50% 602.12 tpbyteZ+lz Transpose+zigzag+lz 367.16 2764046 20.08% 499.67 bvzzenc32 bitio zigzag/delta 3449.45 2784387 20.23% 1747.06 v8nenc128v32 TurboByte+TbPackV 2828.46 2804702 20.38% 2064.70 bitnpack32 TurboPack 3340.61 2804702 20.38% 1735.16 bitnpack128v32 TurboPackV 738.89 2883341 20.95% 909.79 vbddenc32 TurboVByte zzag delt 472.07 3029119 22.01% 503.62 fpgenc32 bitio TurboGorilla 1507.65 3146929 22.86% 1760.24 v8nzenc128v32 TByte+TPackV zigzag 1576.91 3189535 23.17% 1773.85 bitnzpack128v32 TurboPackV zigzag 1242.85 3189535 23.17% 1511.45 bitnzpack32 TurboPack zigzag 2596.85 3451880 25.08% 2620.08 vbenc32 TurboVByte scalar 1428.47 3475286 25.25% 1662.03 vbzenc32 TurboVByte zigzag 354.90 3925934 28.52% 1038.90 lz lz 2179.81 4307847 31.30% 4552.86 v8enc32 TurboByte SIMD 1369.07 4307847 31.30% 3289.51 streamvbyte StreamVByte SIMD* 1702.33 4323652 31.41% 2237.57 v8zenc32 TurboByte zigzag 734.98 4323652 31.41% 1158.43 streamvbyte zzag StreamVByte zigzag* 269.10 5742202 41.72% 991.38 p4ndenc128v32 TurboPForV delta 271.95 5742202 41.72% 832.53 p4ndenc32 TurboPFor delta 1744.18 7227089 52.51% 2805.97 v8denc32 TurboByte delta 525.42 7227089 52.51% 619.72 streamvbyte delt StreamVByte delta* 1217.67 7234307 52.56% 1996.42 v8ndenc128v32 TByte+TPackV delta 707.26 7343213 53.35% 732.71 vbdenc32 TurboVByte delta 583.04 10319134 74.98% 1371.67 srle32 TurboRLE32 ESC 192.83 10350497 75.20% 385.57 fpdfcmenc32 TurboFloat DFCM 191.72 10690588 77.67% 383.11 fp2dfcmenc32 TurboFloat DFCM 2D 477.76 10963622 79.66% 904.17 srlex32 TurboRLE32 ESC xor 241.74 10981857 79.79% 858.44 p4nd1enc128v32 TurboPForV delta1 244.88 10981857 79.79% 703.11 p4nd1enc32 TurboPFor delta1 528.46 11113500 80.75% 910.03 srlez32 TurboRLE32 ESC zzag 70.58 11621965 84.44% 71.31 SPDP SPDP Floating Point* 790.72 11667495 84.77% 1868.49 v8nd1enc128v32 TByte+TPackV delta1 1043.94 11716644 85.13% 2462.57 v8d1enc32 TurboByte delta1 534.52 13329244 96.85% 545.36 vbd1enc32 TurboVByte delta1 1899.70 13727224 99.74% 2905.49 bitndpack32 TurboPack delta 1614.84 13727224 99.74% 2089.78 bitndpack128v32 TurboPackV delta 4155.59 13763304 100.00% 4040.90 memcpy memcpy 1693.11 13763304 100.00% 3253.74 tpenc Byte transpose 1328.63 13763304 100.00% 2247.44 tp4enc Nibble transpose 122.10 13763304 100.00% 4042.09 trlex TurboRLE xor 83.47 13763304 100.00% 4042.09 trlez TurboRLE zigzag 152.58 13763304 100.00% 4043.27 trle TurboRLE 88.06 13763304 100.00% 90.70 bitshuffle Bit transpose 1179.37 13790184 100.20% 3577.67 bitnd1pack32 TurboPack delta1 1952.24 13790184 100.20% 1982.33 bitnd1pack128v32 TurboPackV delta1
sizes: Number of terms
./icapp test_collection.sizes
file: max bits histogram:
00: 0.001%
01:########################### 27%
02:############ 12%
03:###### 6.0%
04:### 3.1%
05:## 1.6%
06:# 1.0%
07:# 1.1%
08:# 1.0%
09:## 1.9%
10:### 3.2%
11:#### 4.1%
12:######## 8.5%
13:####################### 23%
14:###### 6.5%
15: 0.002%
16: 0.001%
E MB/s size ratio D MB/s function (integer size=32 bits) 380.99 15077 37.69% 1904.95 p4nenc128v32 TurboPForV 377.40 15077 37.69% 1538.62 p4nenc32 TurboPFor 373.87 15485 38.71% 1600.16 vsenc32 TurboVSimple 79.22 15970 39.92% 89.29 bitshuffle+lz Transpose+lz 77.23 15984 39.96% 87.54 bitshuffleX+lz Transpose+xor+lz 333.37 16009 40.02% 1025.74 p4nzenc128v32 TurboPForV zigzag 330.61 16009 40.02% 833.42 p4nzenc32 TurboPFor zigzag 493.88 16101 40.25% 930.33 tpnibble+lz Transpose+lz 400.04 16159 40.25% 769.31 tpnibbleX+lz Transpose+xor+lz 327.90 16268 40.67% 909.18 vszenc32 TurboVSimple zigzag 373.87 16918 42.29% 678.03 tpnibbleZ+lz Transpose+zigzag+lz 77.23 17022 42.55% 85.85 bitshuffleZ+lz Transpose+zigzag+lz 312.53 17254 43.13% 930.33 p4nzzenc128v32 TurboPFor zzag/delta 701.82 17391 43.47% 769.31 vbzenc32 TurboVByte zigzag 459.82 17399 43.49% 1111.22 tpbyte+lz Transpose+lz 1481.63 17467 43.66% 5714.86 v8nenc128v32 TurboByte+TbPackV 380.99 17625 44.06% 909.18 tpbyteX+lz Transpose+xor+lz 1176.59 17632 44.08% 1000.10 vbenc32 TurboVByte scalar 3636.73 17899 44.74% 6667.33 bitnpack128v32 TurboPackV 3333.67 17899 44.74% 5714.86 bitnpack32 TurboPack 465.16 17958 44.89% 1666.83 fpxenc32 TurboFloat XOR 370.41 17958 44.89% 465.16 fpfcmenc32 TurboFloat FCM 412.41 18004 45.01% 519.53 fpgenc32 bitio TurboGorilla 380.99 18141 45.35% 833.42 tpbyteZ+lz Transpose+zigzag+lz 888.98 18225 45.56% 2000.20 v8nzenc128v32 TByte+TPackV zigzag 1739.30 18768 46.92% 2222.44 v8zenc32 TurboByte zigzag 740.81 18768 46.92% 1142.97 streamvbyte zzag StreamVByte zigzag* 2222.44 18909 47.27% 5000.50 v8enc32 TurboByte SIMD 1428.71 18909 47.27% 3333.67 streamvbyte StreamVByte SIMD* 1250.12 19101 47.75% 2222.44 bitnzpack32 TurboPack zigzag 1481.63 19101 47.75% 1818.36 bitnzpack128v32 TurboPackV zigzag 597.07 20093 50.23% 625.06 vbddenc32 TurboVByte zzag delt 300.78 20339 50.84% 408.20 bvzenc32 bitio zigzag 287.80 22371 55.92% 400.04 bvzzenc32 bitio zigzag/delta 136.53 28940 72.34% 655.80 lz lz 266.69 29111 72.77% 1052.74 p4ndenc128v32 TurboPForV delta 270.30 29111 72.77% 816.41 p4ndenc32 TurboPFor delta 2000.20 29559 73.89% 2857.43 v8denc32 TurboByte delta 400.04 29559 73.89% 454.59 streamvbyte delt StreamVByte delta* 264.93 29583 73.95% 1000.10 p4nd1enc128v32 TurboPForV delta1 268.48 29583 73.95% 754.79 p4nd1enc32 TurboPFor delta1 1250.12 29637 74.09% 2666.93 v8ndenc128v32 TByte+TPackV delta 1818.36 30085 75.20% 2666.93 v8d1enc32 TurboByte delta1 1111.22 30163 75.40% 2353.18 v8nd1enc128v32 TByte+TPackV delta1 519.53 32431 81.07% 512.87 vbdenc32 TurboVByte delta 493.88 33131 82.82% 519.53 vbd1enc32 TurboVByte delta1 80.82 38749 96.86% 80.65 SPDP SPDP Floating Point* 645.23 39842 99.60% 3077.23 srle32 TurboRLE32 ESC 526.37 39884 99.70% 1818.36 srlex32 TurboRLE32 ESC xor 548.00 39891 99.72% 1250.12 srlez32 TurboRLE32 ESC zzag 10001.00 40004 100.00% 10001.00 memcpy memcpy 115.95 40004 100.00% 10001.00 trle TurboRLE 85.85 40004 100.00% 10001.00 trlex TurboRLE xor 71.56 40004 100.00% 10001.00 trlez TurboRLE zigzag 5000.50 40004 100.00% 8000.80 tpenc Byte transpose 1818.36 40004 100.00% 2353.18 tp4enc Nibble transpose 92.18 40004 100.00% 92.60 bitshuffle Bit transpose 256.44 40063 100.15% 416.71 fpdfcmenc32 TurboFloat DFCM 2353.18 40081 100.19% 5714.86 bitndpack32 TurboPack delta 2000.20 40081 100.19% 2666.93 bitnd1pack128v32 TurboPackV delta1 2222.44 40081 100.19% 3077.23 bitndpack128v32 TurboPackV delta 1739.30 40081 100.19% 4000.40 bitnd1pack32 TurboPack delta1 259.77 40124 100.30% 421.09 fp2dfcmenc32 TurboFloat DFCM 2D
#include "conf.h"
#include "bitpack.h"
#include "vint.h"
#include "bitutil.h"
#include "vp4.h"
typedef uint64_t LONG;
int main(LONG argc, char** argv) {
LONG *array = (LONG *)malloc(sizeof(LONG)*5);
array[0] = 24520120;
array[1] = 29620120;
array[2] = 42420120;
array[3] = 20124222;
array[4] = 4294967295;
unsigned char* out = (unsigned char *)malloc(5*10*sizeof(unsigned char));
unsigned char * op = p4enc64(array, 5, out);
printf("%d\n",(int)(op-out) );
LONG *capacity = (LONG *)malloc(sizeof(LONG)*5);
unsigned char * op2 = p4dec64(out, 5, capacity);
printf("%d\n",(int)(op2-out) );
}
When attempting to use p4enc64 in above code, the following error is raised.
malloc.c:2372: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed.
Program received signal SIGABRT, Aborted.
0x00007ffff700dc37 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
No issue occurs with uint32_t and also when malloc is not used.
Please correct me if there is any issue with the code. I am trying to use TurboPFor in my code for compressing 64-bit unsorted integer arrays. Also I am having trouble with allocating the right buffer size for the compressed array.What is the minimum buffer size I can allocate safely while using p4enc64 for integer array of size n?
It's extreme how much trailing whitespace tp4 has all over the place.
While making some changes in my code, I tried to verify that the resulting binary data (compressed with p4nzenc256v32) is identical. To my surprise, the data had many changes as if something corrupted it. I tried to dump hex of compressed data and it clearly had a few modified bytes here and there. What's even more surprising, when I uncompressed the data using p4nzdec256v32 output from these two different buffers was identical.
Any idea why, is that normal? If so, what makes it produce different outputs with identical inputs?
I'm getting an error (with gcc (GCC) 4.9.0
):
CMakeFiles/myproject.dir/TurboPFor/vint.c.o: In function `vbd1dec32':
vint.c:(.text+0xb472): undefined reference to `BITDIZERO32'
collect2: error: ld returned 1 exit status
Of note, I'm compiling just bitutil.c
, vint.c
, and vsimple.c
.
I think it has to do with the last #else
fork in bitutil.h
defined(__SSE2__) && defined(USE_SSE)
/ defined(__AVX2__) && defined(USE_AVX2)
I am trying to compile using mingw in windows. but it throws following error:
g++ eliasfano.o vsimple.o transpose.o transpose_sse.o bitpack.o bitpack_sse.o bitunpack.o bitunpack_sse.o vp4c.o vp4c_sse.o vp4d.o vp4d_sse.o bitutil.o fp.o vint.o ext/trlec.o ext/trled.o icbench.o plugins.o -o icbench
icbench.o:icbench.c:(.text.startup+0xbf2): undefined reference to `fseeko'
icbench.o:icbench.c:(.text.startup+0xbfa): undefined reference to `ftello'
icbench.o:icbench.c:(.text.startup+0xc1f): undefined reference to `fseeko'
collect2.exe: error: ld returned 1 exit status
makefile:291: recipe for target 'icbench' failed
mingw32-make: *** [icbench] Error 1
Tested TP and 7z on 16 bit unsigned bit data. Seems that the best TP algorithm's compress ratio < 7z.
TP: 33.1, 7z: 32.7
TP: 55.2, 7z: 51.2
Undefined symbols for architecture x86_64:
"_aligned_alloc", referenced from:
_main in icbench.o
"_aligned_free", referenced from:
_afree in icbench.o
_main in icbench.o
"_fopen64", referenced from:
_main in icbench.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [icbench] Error 1
Hi,
I am trying to test the inverted-index tools idxcr/idxqry with the TurboPFor-compressed version of the index format, as a code comment claims this results in smaller index sizes (perhaps also faster speed?).
However, I seem unable to compile and link them. I'm on Ubuntu 16.04 with gcc 7.2.0, Intel Core i7-7500U.
First, idxcr.c:123 misses some function arguments - I tried to supply them heuristically, not sure this is correct:
b = _p4dec32(ip+1, n-1, &bx);
b = _p4dec32(ip+1, n-1, out, b, &bx);
Second, even after that I get linker errors when I modify the makefile to define the necessary
C-preprocessor symbol and then issue $ make search AVX2=1:
ifeq ($(UNAME), Linux)
search: CFLAGS += -D_TURBOPFOR=1
search: idxcr
endif
The linker errors are:
In function main': idxcr.c:123: undefined reference to
_p4dec32'
idxcr.c:146: undefined reference to _p4dec32' idxcr.c:146: undefined reference to
_p4dec128v32'
collect2: error: ld returned 1 exit status
idxrqry is similar.
Any fixes/patches to make this work?
Thanks a lot in advance, Markus
save the file as ictest.c in the TurboPFor directory and type
make ictest
./ictest
output:
compress size is 6
uncompress size is 6
#include <stdio.h> #define NTURBOPFOR_DAC #include "vp4.h" #define P4NENC_BOUND(n) ((n+127)/128+(n+32)*sizeof(uint32_t)) int main(int argc, char* argv[]) { printf("Hello TurboPFor\n"); int ar[32] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}; unsigned elnum = 10; unsigned char* compress_buf = malloc(P4NENC_BOUND(elnum)); int *uncompress_buf = malloc((elnum+32)*sizeof(ar[0])); size_t compress_size = p4nenc32((uint32_t*)ar, elnum, compress_buf); printf("compress size is %lu\n", compress_size); size_t uncompress_size = p4ndec32(compress_buf, elnum, (uint32_t*)uncompress_buf); printf("uncompress size is %lu\n", uncompress_size); }
When using icapp to test encoding values of type double, they lose their precision because they get casted to uint64_t via IPUSH, and the benchmark becomes incorrect, particularly on the ratio%
https://github.com/powturbo/TurboPFor/blob/master/icapp.c#L206
Which gets called here:
https://github.com/powturbo/TurboPFor/blob/master/icapp.c#L290
Hi,
I wanted to play around with Floating Point compression and I thought I could just simply do:
#include "fp.h"
#include "fp.c"
to an existing project, but then I get:
undefined reference to `p4enc64'
undefined reference to `p4enc64'
I had a look arround the code but couldn't seem to find p4enc64 or p4dec64? So I assume I am doing something wrong. Using the latest code from github, with Mingw64.
Is TurboPFor a library for int compression? If yes, why does the page look like some compare tool for different libs. Put the docs/hello world right there, not the compare results. It's extremely painful to find anything how to use the lib (I hope it's wasn't assumed that users should read icbench or source of any other tool to decifir how to use the lib).
I use visual studio, and VS build doesn't work at all. Also, that approach that some files result in multiple different output obj files doesn't help overall: it would be better if the lib had tiny wrappers that included those c-files and set required defines to modify compilation.
Even after I made it build on VS it absolutely doesn't work and completely fails for me.
I'm not even sure what function I need to use to compress. After that huge table of irrelevant info, if anybody ever gets as far as "Function usage" section, you'll see some cryptic explanation that doesn't seem to be current anymore. From that section, it seems that I need p4fmencXXX but that doesn't exist (nothing p4fm exist at all). Why is that there that pack/unpack thing doing, to confuse people? From the docs, it seems like p4packXXX should be a valid function, while it's not.
After digging, it seem that I should try p4enc32
, but so far I get stack corruption, and it doesn't seem like it could ever work at all. It crashes for me in _p4enc32
on line while(i != n) MISS;
In my case I call p4enc32
with an array of 2731 uints. That while loop will clearly corrupt stack, because the MISS
expands to _in[i] = in[i]
, that is, it will attempt to assign _in[2730]
because that loop will run up to the length of the input n=2731
. I don't get it, how come that code could even work, as it writes to stack array of 287 elements?! Also, it seems like there are no checks, I don't see a single assert anywhere to check/show conditions/expectations, while in that function it had to make that check!
Hi Hamid,
after buildign the icbench tool via the makefile, I tried tobuild libic.so by following the instructions in java/jicbench.java with some small adaptations detailed below. I also found and tried #5 but no luck. I always get compilation error for libic.so.
My adaptations to the build instructions
1 - generate header jic.h
$ cd ~/TurboPFor/java
$ javah -jni jic
$ cp jtrle.h .. => cp jic.h ..
2 - Compile jic and jicbench
$ javac jic.java
$ javac jicbench.java
3 - compile & link a shared library
$ cd ~/TurboPFor
$ gcc -O3 -march=native -fstrict-aliasing -m64 -shared -fPIC -I/usr/lib/jvm/default-java/include -I/usr/lib/jvm/default-java/include/linux bitpack.c bitunpack.c bitpackv.c bitunpackv.c vp4dc.c vp4dd.c bitpack.c bitunpack.c vp4c.c vp4d.c vsimple.c vint.c bitutil.c jic.c -o libic.so
==>Bunch of (repeating) warnings and errors: compile_errors.txt
Virtual machine running Ubuntu, 3 virtual CPUs, on a Haswell-based host.
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 17.04
Release: 17.04
Codename: zesty
$ gcc --version
gcc (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406
$ java -version
openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-0ubuntu1.17.04.1-b11)
OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)
I recommend you benchmark with the TREC Million Query Track (1MQT) instead of the AOL query log.
There are a few reasons :
Cheers and keep up the good work!
Where can we find your open source for LZTURBO?
Are you still enhancing it?
TurboPFor256N a.k.a. "PFor (AVX2) large blocks" nears memcpy, again stunning speeds and ratios, congratulations!
Out of curiosity, tried ranges 0..1, 0..255, 0..16777215 with 1.0 and 1.5 skewness.
D:\2tb>dir ic*
Volume in drive D is SANMAYCE
Volume Serial Number is EE4D-4AE3
Directory of D:\2tb
03/03/2018 04:44 PM 5,550,385 icappavx2.exe
03/03/2018 04:37 PM 4,320,044 icapp.exe
2 File(s) 9,870,429 bytes
0 Dir(s) 2,902,048,768 bytes free
D:\2tb>timer64 icappavx2.exe -a1.5 -m0 -M1 -n100M ZIPF
zipf alpha=1.50 range[0..1].n=100000000
bits histogram:0:73.87% 1:26.13%
E MB/s size ratio D MB/s function
1991.11 13281250 3.32% 8211.02 p4nenc32 ZIPF
2136.20 13281250 3.32% 11108.33 p4nenc128v32 ZIPF
3126.56 12890625 3.22% 10612.89 p4nenc256v32 ZIPF
840.22 101692201 25.42% 3269.31 p4ndenc32 ZIPF
858.04 101692201 25.42% 5181.35 p4ndenc128v32 ZIPF
5950.61 400390622 100.10% 6285.65 bitndpack256v32 ZIPF
728.98 336853585 84.21% 5002.56 p4nd1enc32 ZIPF
671.33 336853585 84.21% 6126.04 p4nd1enc128v32 ZIPF
814.40 336072335 84.02% 6428.18 p4nd1enc256v32 ZIPF
1070.80 23943088 5.99% 4127.88 p4nzenc32 ZIPF
1025.09 23943088 5.99% 5162.76 p4nzenc128v32 ZIPF
1063.31 23066760 5.77% 5643.90 p4nzenc256v32 ZIPF
820.67 102473451 25.62% 2836.84 p4nsenc32 ZIPF
7887.37 13281250 3.32% 8224.36 bitnpack32 ZIPF
10003.75 13281250 3.32% 11110.80 bitnpack128v32 ZIPF
12790.59 12890625 3.22% 10636.60 bitnpack256v32 ZIPF
4584.42 400781247 100.20% 6117.99 bitndpack32 ZIPF
5965.07 400781247 100.20% 6993.50 bitndpack128v32 ZIPF
5950.70 400390622 100.10% 6295.94 bitndpack256v32 ZIPF
4497.06 400781247 100.20% 6037.74 bitnd1pack32 ZIPF
5849.24 400781247 100.20% 6615.73 bitnd1pack128v32 ZIPF
5676.34 400390622 100.10% 6174.75 bitnd1pack256v32 ZIPF
3690.10 25781251 6.45% 4521.36 bitnzpack32 ZIPF
6980.07 25781251 6.45% 9690.39 bitnzpack128v32 ZIPF
6598.59 25390626 6.35% 10504.75 bitnzpack256v32 ZIPF
4695.88 100000000 25.00% 4637.95 vbzenc32 ZIPF
1221.11 70405794 17.60% 1382.05 vbddenc32 ZIPF
1625.10 14240039 3.56% 8543.54 vsenc32 ZIPF
909.95 63145848 15.79% 987.28 bvzzenc32 ZIPF
852.03 46249480 11.56% 987.35 bvzenc32 ZIPF
963.67 22151115 5.54% 913.63 fpgenc32 ZIPF
930.45 34790081 8.70% 2549.04 fpzzenc32 ZIPF
868.69 23943239 5.99% 1137.39 fpfcmenc32 ZIPF
827.65 32408928 8.10% 971.27 fpdfcmenc32 ZIPF
826.35 32408029 8.10% 959.39 fp2dfcmenc32 ZIPF
669.97 78383804 19.60% 4168.79 trle ZIPF
1118.99 254647894 63.66% 3086.63 srle32 ZIPF
149.68 272484724 68.12% 159.57 SPDP ZIPF
1722.85 46095125 11.52% 1966.00 tpbyte+lz ZIPF
2140.57 28153817 7.04% 2386.01 tpnibble+lz ZIPF
1609.33 28296871 7.07% 1932.43 tpnibbleZ+lz ZIPF
1634.09 28352181 7.09% 1877.49 tpnibbleX+lz ZIPF
905.38 83618404 20.90% 1583.90 lz ZIPF
1970.04 14378586 3.59% 1986.59 bitshuffle+lz ZIPF
11465.26 400000000 100.00% 11149.51 memcpy
Kernel Time = 0.828 = 0%
User Time = 231.937 = 98%
Process Time = 232.765 = 99% Virtual Memory = 3458 MB
Global Time = 234.710 = 100% Physical Memory = 1539 MB
D:\2tb>timer64 icappavx2.exe -a1.0 -m0 -M1 -n100M ZIPF
zipf alpha=1.00 range[0..1].n=100000000
bits histogram:0:66.66% 1:33.34%
E MB/s size ratio D MB/s function
2033.53 13281250 3.32% 8223.18 p4nenc32 ZIPF
2231.52 13281250 3.32% 11084.32 p4nenc128v32 ZIPF
2459.40 12890625 3.22% 10613.18 p4nenc256v32 ZIPF
838.16 113039100 28.26% 3247.07 p4ndenc32 ZIPF
849.96 113039100 28.26% 5099.50 p4ndenc128v32 ZIPF
5993.95 400390622 100.10% 6264.58 bitndpack256v32 ZIPF
736.08 325158377 81.29% 4978.10 p4nd1enc32 ZIPF
680.11 325158377 81.29% 6089.76 p4nd1enc128v32 ZIPF
827.41 324377127 81.09% 6456.72 p4nd1enc256v32 ZIPF
1231.59 25131150 6.28% 4204.33 p4nzenc32 ZIPF
1240.93 25131150 6.28% 6237.23 p4nzenc128v32 ZIPF
1176.89 24467543 6.12% 6237.04 p4nzenc256v32 ZIPF
817.36 113820350 28.46% 2829.87 p4nsenc32 ZIPF
7887.21 13281250 3.32% 8230.11 bitnpack32 ZIPF
10056.32 13281250 3.32% 11095.70 bitnpack128v32 ZIPF
12740.88 12890625 3.22% 10613.46 bitnpack256v32 ZIPF
4591.00 400781247 100.20% 6119.86 bitndpack32 ZIPF
5960.54 400781247 100.20% 6984.09 bitndpack128v32 ZIPF
5996.91 400390622 100.10% 6291.29 bitndpack256v32 ZIPF
4491.96 400781247 100.20% 6032.82 bitnd1pack32 ZIPF
5716.08 400781247 100.20% 6607.20 bitnd1pack128v32 ZIPF
5688.28 400390622 100.10% 6162.95 bitnd1pack256v32 ZIPF
3686.09 25781251 6.45% 4516.35 bitnzpack32 ZIPF
6144.77 25781251 6.45% 9675.39 bitnzpack128v32 ZIPF
6604.26 25390626 6.35% 10467.92 bitnzpack256v32 ZIPF
4692.52 100000000 25.00% 4634.24 vbzenc32 ZIPF
1184.03 79178092 19.79% 1391.53 vbddenc32 ZIPF
1609.87 14289474 3.57% 8896.80 vsenc32 ZIPF
868.72 70840784 17.71% 947.27 bvzzenc32 ZIPF
737.00 51391735 12.85% 817.76 bvzenc32 ZIPF
811.78 23613016 5.90% 783.79 fpgenc32 ZIPF
1100.02 37282301 9.32% 2724.76 fpzzenc32 ZIPF
978.90 25130914 6.28% 1165.77 fpfcmenc32 ZIPF
814.67 33765409 8.44% 954.54 fpdfcmenc32 ZIPF
816.68 33670343 8.42% 942.89 fp2dfcmenc32 ZIPF
597.00 100024919 25.01% 3852.19 trle ZIPF
914.61 293876335 73.47% 3322.62 srle32 ZIPF
139.91 300800299 75.20% 149.69 SPDP ZIPF
1678.62 48982325 12.25% 1990.31 tpbyte+lz ZIPF
2084.02 28987471 7.25% 2396.36 tpnibble+lz ZIPF
1527.84 29191560 7.30% 1992.22 tpnibbleZ+lz ZIPF
1593.02 29225794 7.31% 1878.71 tpnibbleX+lz ZIPF
837.11 93244426 23.31% 1592.22 lz ZIPF
1942.52 14392723 3.60% 1974.67 bitshuffle+lz ZIPF
10647.93 400000000 100.00% 11036.61 memcpy
Kernel Time = 0.796 = 0%
User Time = 226.609 = 98%
Process Time = 227.406 = 99% Virtual Memory = 3458 MB
Global Time = 229.155 = 100% Physical Memory = 1539 MB
D:\2tb>timer64 icappavx2.exe -a1.5 -m0 -M255 -n100M ZIPF
zipf alpha=1.50 range[0..255].n=100000000
bits histogram:0:40.20% 1:14.21% 2:12.77% 3:10.28% 4:7.77% 5:5.69% 6:4.09% 7:2.92% 8:2.08%
E MB/s size ratio D MB/s function
1036.21 63397240 15.85% 3775.44 p4nenc32 ZIPF
1042.24 63397240 15.85% 6210.60 p4nenc128v32 ZIPF
1234.61 62943776 15.74% 9086.16 p4nenc256v32 ZIPF
781.03 231475421 57.87% 3076.83 p4ndenc32 ZIPF
778.34 231475421 57.87% 4213.14 p4ndenc128v32 ZIPF
6016.58 400390622 100.10% 6326.91 bitndpack256v32 ZIPF
732.52 291998562 73.00% 2850.20 p4nd1enc32 ZIPF
713.03 291998562 73.00% 3889.42 p4nd1enc128v32 ZIPF
884.92 291718032 72.93% 4638.38 p4nd1enc256v32 ZIPF
903.89 85552166 21.39% 3043.21 p4nzenc32 ZIPF
912.01 85552166 21.39% 4153.08 p4nzenc128v32 ZIPF
1048.24 85248159 21.31% 5277.74 p4nzenc256v32 ZIPF
775.93 232256671 58.06% 2718.83 p4nsenc32 ZIPF
6425.08 99908242 24.98% 8632.78 bitnpack32 ZIPF
9796.48 99908242 24.98% 9953.47 bitnpack128v32 ZIPF
12148.82 100329537 25.08% 9604.07 bitnpack256v32 ZIPF
4605.80 400781247 100.20% 6124.64 bitndpack32 ZIPF
6000.42 400781247 100.20% 6995.09 bitndpack128v32 ZIPF
5998.62 400390622 100.10% 6324.61 bitndpack256v32 ZIPF
4509.94 400781247 100.20% 6049.88 bitnd1pack32 ZIPF
5899.88 400781247 100.20% 6609.28 bitnd1pack128v32 ZIPF
5689.41 400390622 100.10% 6192.53 bitnd1pack256v32 ZIPF
3694.19 112367090 28.09% 4470.57 bitnzpack32 ZIPF
5568.01 112367090 28.09% 8686.78 bitnzpack128v32 ZIPF
6227.04 112823649 28.21% 9472.61 bitnzpack256v32 ZIPF
3206.67 106294820 26.57% 3269.79 vbzenc32 ZIPF
1320.87 116386412 29.10% 1570.33 vbddenc32 ZIPF
877.49 76359142 19.09% 2377.81 vsenc32 ZIPF
705.40 119303834 29.83% 810.31 bvzzenc32 ZIPF
726.65 98385955 24.60% 781.02 bvzenc32 ZIPF
1013.01 103153709 25.79% 1040.94 fpgenc32 ZIPF
802.12 100632423 25.16% 2068.50 fpzzenc32 ZIPF
747.01 85551816 21.39% 1015.80 fpfcmenc32 ZIPF
726.65 98463575 24.62% 905.67 fpdfcmenc32 ZIPF
728.04 98328582 24.58% 896.84 fp2dfcmenc32 ZIPF
418.86 239221544 59.81% 2212.16 trle ZIPF
1220.42 384575089 96.14% 4981.63 srle32 ZIPF
143.23 374799918 93.70% 153.36 SPDP ZIPF
1210.95 88598454 22.15% 2182.41 tpbyte+lz ZIPF
1515.51 79169862 19.79% 2452.75 tpnibble+lz ZIPF
1199.57 97566061 24.39% 1820.98 tpnibbleZ+lz ZIPF
1267.69 80520988 20.13% 1919.03 tpnibbleX+lz ZIPF
581.00 140363018 35.09% 1764.07 lz ZIPF
924.96 82886520 20.72% 1648.21 bitshuffle+lz ZIPF
11471.18 400000000 100.00% 11186.00 memcpy
Kernel Time = 0.906 = 0%
User Time = 230.281 = 98%
Process Time = 231.187 = 99% Virtual Memory = 3458 MB
Global Time = 232.968 = 100% Physical Memory = 1539 MB
D:\2tb>timer64 icappavx2.exe -a1.0 -m0 -M255 -n100M ZIPF
zipf alpha=1.00 range[0..255].n=100000000
bits histogram:0:16.33% 1:8.16% 2:9.53% 3:10.36% 4:10.82% 5:11.07% 6:11.19% 7:11.25% 8:11.29%
E MB/s size ratio D MB/s function
974.61 85947474 21.49% 3814.68 p4nenc32 ZIPF
962.15 85947474 21.49% 5574.21 p4nenc128v32 ZIPF
1153.58 85277810 21.32% 7808.38 p4nenc256v32 ZIPF
783.55 257508731 64.38% 3109.28 p4ndenc32 ZIPF
772.81 257508731 64.38% 4509.48 p4ndenc128v32 ZIPF
6004.65 400390622 100.10% 6301.99 bitndpack256v32 ZIPF
771.69 270630517 67.66% 2996.50 p4nd1enc32 ZIPF
747.61 270630517 67.66% 4168.14 p4nd1enc128v32 ZIPF
917.49 269850532 67.46% 4838.98 p4nd1enc256v32 ZIPF
854.17 106938431 26.73% 2934.96 p4nzenc32 ZIPF
853.79 106938431 26.73% 3888.06 p4nzenc128v32 ZIPF
985.82 106246317 26.56% 4863.46 p4nzenc256v32 ZIPF
776.86 258289981 64.57% 2825.68 p4nsenc32 ZIPF
6462.45 100781234 25.20% 8611.78 bitnpack32 ZIPF
9988.26 100781234 25.20% 9956.44 bitnpack128v32 ZIPF
12100.68 100390625 25.10% 9601.31 bitnpack256v32 ZIPF
4607.18 400781247 100.20% 6133.18 bitndpack32 ZIPF
5962.85 400781247 100.20% 6982.02 bitndpack128v32 ZIPF
6009.53 400390622 100.10% 6300.70 bitndpack256v32 ZIPF
4506.74 400781247 100.20% 6045.86 bitnd1pack32 ZIPF
5738.88 400781247 100.20% 6607.31 bitnd1pack128v32 ZIPF
5709.96 400390622 100.10% 6202.70 bitnd1pack256v32 ZIPF
3723.04 113281202 28.32% 4462.49 bitnzpack32 ZIPF
6344.47 113281202 28.32% 8745.65 bitnzpack128v32 ZIPF
6217.36 112890625 28.22% 9475.98 bitnzpack256v32 ZIPF
1363.23 125035755 31.26% 1427.88 vbzenc32 ZIPF
906.99 154221057 38.56% 1038.42 vbddenc32 ZIPF
857.41 105321194 26.33% 2455.42 vsenc32 ZIPF
714.25 158182587 39.55% 857.88 bvzzenc32 ZIPF
702.59 134817601 33.70% 774.70 bvzenc32 ZIPF
1294.62 120079490 30.02% 1684.39 fpgenc32 ZIPF
793.07 120978001 30.24% 2076.72 fpzzenc32 ZIPF
720.93 106939168 26.73% 1011.42 fpfcmenc32 ZIPF
775.36 118485007 29.62% 965.32 fpdfcmenc32 ZIPF
779.20 118532992 29.63% 958.70 fp2dfcmenc32 ZIPF
375.92 334760892 83.69% 2143.26 trle ZIPF
1944.92 399499744 99.87% 5706.13 srle32 ZIPF
144.90 374393035 93.60% 155.71 SPDP ZIPF
4050.14 101569805 25.39% 4231.55 tpbyte+lz ZIPF
2368.39 100796897 25.20% 2834.43 tpnibble+lz ZIPF
1658.18 122666071 30.67% 1931.03 tpnibbleZ+lz ZIPF
2059.41 101369011 25.34% 2188.04 tpnibbleX+lz ZIPF
430.90 173718053 43.43% 1917.06 lz ZIPF
1254.67 101508819 25.38% 1982.39 bitshuffle+lz ZIPF
11598.57 400000000 100.00% 11179.43 memcpy
Kernel Time = 0.812 = 0%
User Time = 223.640 = 98%
Process Time = 224.453 = 99% Virtual Memory = 3458 MB
Global Time = 226.318 = 100% Physical Memory = 1539 MB
D:\2tb>timer64 icappavx2.exe -a1.5 -m0 -M16777215 -n100M ZIPF
zipf alpha=1.50 range[0..16777215].n=100000000
bits histogram:0:38.28% 1:13.53% 2:12.16% 3:9.79% 4:7.40% 5:5.42% 6:3.90% 7:2.78% 8:1.97% 9:1.40% 10:0.99% 11:0.70% 12:0.50% 13:0.35% 14:0.25% 15:0.17% 16:0.12% 17:0.09% 18:0.06% 19:0.04% 20:0.03% 21:0.02% 22:0.02% 23:0.01% 24:0.01%
E MB/s size ratio D MB/s function
939.46 86091479 21.52% 3663.91 p4nenc32 ZIPF
952.36 86091479 21.52% 5286.04 p4nenc128v32 ZIPF
1166.66 88072061 22.02% 7426.11 p4nenc256v32 ZIPF
758.13 248012100 62.00% 2978.72 p4ndenc32 ZIPF
744.46 248012100 62.00% 4078.59 p4ndenc128v32 ZIPF
6010.97 400390622 100.10% 6296.93 bitndpack256v32 ZIPF
716.29 302333932 75.58% 2807.90 p4nd1enc32 ZIPF
697.99 302333932 75.58% 3747.92 p4nd1enc128v32 ZIPF
864.39 302336977 75.58% 4420.92 p4nd1enc256v32 ZIPF
833.38 115851801 28.96% 2856.76 p4nzenc32 ZIPF
847.26 115851801 28.96% 3747.28 p4nzenc128v32 ZIPF
1019.84 119554815 29.89% 4812.96 p4nzenc256v32 ZIPF
751.83 248793350 62.20% 2642.64 p4nsenc32 ZIPF
5387.50 189928210 47.48% 8046.67 bitnpack32 ZIPF
7201.89 189928210 47.48% 9089.05 bitnpack128v32 ZIPF
9001.91 212105153 53.03% 8790.63 bitnpack256v32 ZIPF
4604.37 400781247 100.20% 6114.15 bitndpack32 ZIPF
6005.01 400781247 100.20% 6988.85 bitndpack128v32 ZIPF
6010.88 400390622 100.10% 6302.29 bitndpack256v32 ZIPF
4511.36 400781247 100.20% 6047.41 bitnd1pack32 ZIPF
5754.48 400781247 100.20% 6604.47 bitnd1pack128v32 ZIPF
5724.92 400390622 100.10% 6175.98 bitnd1pack256v32 ZIPF
3316.34 202671090 50.67% 4052.73 bitnzpack32 ZIPF
4736.25 202671090 50.67% 6430.45 bitnzpack128v32 ZIPF
5543.47 224721248 56.18% 6920.77 bitnzpack256v32 ZIPF
2116.82 116901527 29.23% 2144.61 vbzenc32 ZIPF
1117.27 132703387 33.18% 1269.38 vbddenc32 ZIPF
758.57 94288260 23.57% 2128.92 vsenc32 ZIPF
638.88 140306377 35.08% 724.22 bvzzenc32 ZIPF
632.87 114478985 28.62% 671.63 bvzenc32 ZIPF
767.43 123291402 30.82% 760.99 fpgenc32 ZIPF
750.16 134183406 33.55% 1990.01 fpzzenc32 ZIPF
696.08 115855322 28.96% 1005.43 fpfcmenc32 ZIPF
685.28 135283348 33.82% 903.21 fpdfcmenc32 ZIPF
685.72 135537115 33.88% 897.72 fp2dfcmenc32 ZIPF
393.49 248798855 62.20% 2159.71 trle ZIPF
1282.04 387188596 96.80% 5076.34 srle32 ZIPF
145.41 382964658 95.74% 154.18 SPDP ZIPF
897.27 112509483 28.13% 1804.07 tpbyte+lz ZIPF
1047.15 105310206 26.33% 1994.48 tpnibble+lz ZIPF
861.71 122108089 30.53% 1651.97 tpnibbleZ+lz ZIPF
909.48 107623191 26.91% 1670.74 tpnibbleX+lz ZIPF
471.35 153956085 38.49% 1731.01 lz ZIPF
683.11 116999441 29.25% 1366.40 bitshuffle+lz ZIPF
11276.50 400000000 100.00% 11254.92 memcpy
Kernel Time = 0.906 = 0%
User Time = 231.234 = 98%
Process Time = 232.140 = 99% Virtual Memory = 3458 MB
Global Time = 234.076 = 100% Physical Memory = 1539 MB
D:\2tb>timer64 icappavx2.exe -a1.0 -m0 -M16777215 -n100M ZIPF
zipf alpha=1.00 range[0..16777215].n=100000000
bits histogram:0:5.81% 1:2.90% 2:3.39% 3:3.68% 4:3.85% 5:3.94% 6:3.98% 7:4.01% 8:4.01% 9:4.02% 10:4.02% 11:4.02% 12:4.03% 13:4.02% 14:4.03% 15:4.03% 16:4.03% 17:4.03% 18:4.02% 19:4.03% 20:4.02% 21:4.03% 22:4.03% 23:4.03% 24:4.03%
E MB/s size ratio D MB/s function
853.25 234524047 58.63% 3611.41 p4nenc32 ZIPF
834.63 234524047 58.63% 5076.01 p4nenc128v32 ZIPF
1051.56 234585143 58.65% 7011.39 p4nenc256v32 ZIPF
717.99 357972550 89.49% 2776.60 p4ndenc32 ZIPF
704.82 357972550 89.49% 3656.27 p4ndenc128v32 ZIPF
5964.27 400390622 100.10% 6327.91 bitndpack256v32 ZIPF
713.43 358789013 89.70% 2754.31 p4nd1enc32 ZIPF
696.99 358789013 89.70% 3626.51 p4nd1enc128v32 ZIPF
872.49 358770020 89.69% 4458.46 p4nd1enc256v32 ZIPF
765.67 277519798 69.38% 2927.64 p4nzenc32 ZIPF
754.60 277519798 69.38% 3632.63 p4nzenc128v32 ZIPF
911.08 277568871 69.39% 4594.16 p4nzenc256v32 ZIPF
712.55 358753800 89.69% 2472.46 p4nsenc32 ZIPF
5367.47 300715346 75.18% 7604.27 bitnpack32 ZIPF
7620.64 300715346 75.18% 8231.13 bitnpack128v32 ZIPF
8265.83 300390465 75.10% 8023.91 bitnpack256v32 ZIPF
4605.38 400781247 100.20% 6114.53 bitndpack32 ZIPF
5999.52 400781247 100.20% 6990.93 bitndpack128v32 ZIPF
5977.82 400390622 100.10% 6347.80 bitndpack256v32 ZIPF
4518.19 400781247 100.20% 6040.20 bitnd1pack32 ZIPF
5752.75 400781247 100.20% 6607.31 bitnd1pack128v32 ZIPF
5673.76 400390622 100.10% 6194.35 bitnd1pack256v32 ZIPF
3367.94 313214128 78.30% 4109.65 bitnzpack32 ZIPF
5204.40 313214128 78.30% 6588.92 bitnzpack128v32 ZIPF
5324.60 312890495 78.22% 6687.84 bitnzpack256v32 ZIPF
647.97 306367133 76.59% 674.85 vbzenc32 ZIPF
722.18 364087555 91.02% 764.38 vbddenc32 ZIPF
577.34 297093559 74.27% 1492.32 vsenc32 ZIPF
681.34 364823129 91.21% 651.31 bvzzenc32 ZIPF
586.81 325210884 81.30% 581.57 bvzenc32 ZIPF
488.85 294061054 73.52% 510.17 fpgenc32 ZIPF
691.01 300186099 75.05% 1973.00 fpzzenc32 ZIPF
655.94 277546018 69.39% 984.83 fpfcmenc32 ZIPF
640.01 297992703 74.50% 893.96 fpdfcmenc32 ZIPF
640.57 298483029 74.62% 887.78 fp2dfcmenc32 ZIPF
196.93 382657677 95.66% 4332.85 trle ZIPF
2332.69 399991823 100.00% 5752.58 srle32 ZIPF
244.97 391918149 97.98% 250.82 SPDP ZIPF
777.42 267766854 66.94% 2307.98 tpbyte+lz ZIPF
946.71 275614096 68.90% 2257.60 tpnibble+lz ZIPF
866.65 300097431 75.02% 1838.28 tpnibbleZ+lz ZIPF
879.41 279554067 69.89% 1916.04 tpnibbleX+lz ZIPF
401.74 337515225 84.38% 1647.17 lz ZIPF
843.76 286695704 71.67% 1751.57 bitshuffle+lz ZIPF
10544.35 400000000 100.00% 10764.26 memcpy
Kernel Time = 0.968 = 0%
User Time = 262.765 = 98%
Process Time = 263.734 = 99% Virtual Memory = 3458 MB
Global Time = 265.572 = 100% Physical Memory = 1539 MB
D:\2tb>
In all the cases, TurboPFor256N/p4nenc256v32 outperforms the rest - bigtime.
I like a lot the way you presented the whole page, allow me just a playful counter - TurboPFor: The new synonym for "integer compression" is an understatement, it could be stated more accurately by replacing the weak 'synonym' with 'definition' or 'mainstay,pillar,backbone,anchor,lynchpin,kingpin,MVP'.
Both in American and British English, 'mainstay' simply means 'a chief support', so it fits nicely.
The widespread MVP (Most-Valuable-Player) has a ring to it when combined with TurboPFor - the MVP (Most-Valuable-...Performer/Packer), 'linchpin' in itself is quite powerful replacement, meaning, a person or thing vital to an enterprise or organization.
Also, one superb synonym to 'mainstay' is 'fulcrum', funny, they are used in air forces as designations not just for superuseful aircrafts, but for Most-Valuable-...Planes.
Example sentences containing 'mainstay':
He is a fantastic footballer and a mainstay in our team. /The Sun (2015)/
They are now in their forties and the mainstay of the economy. /Times, Sunday Times (2009)/
My English is forever buggy, yet, I constantly explore its diversity while not fearing going against/across the mainstreamish dogma, e.g. ask a specialist not well, but wonderfully-versed in English, with how many 'dragon+suffixes' words he/she can come up with. Such hidden versatility/vividness lies in those formations! Even the very suffixes/postfixes are not in-depth explored/catalogued! I mention this to stress how rigid and unyielding are the people upholding the status quo - they refuse to accept the new vivid "coinages/etudes" - as if they have the last word. Preaching to the choir I am, you good well know it.
I have just seen your project and I see that you are using Blosc for comparison. For completeness, I would recommend to include other Blosc codecs as they give different results that may be a good fit in different scenarios.
Below, I have selected several combinations that works well for this problem, and here are my results (Linux, GCC 4.9.2, Xeon E3-1240 v3 @ 3.40GHz):
$ time LD_LIBRARY_PATH=/home/francesc/c-blosc/build/blosc ./icbench -a1.5 -m0 -M255 -n100m
zipf alpha=1.50 range[0..255].n=100000000
bits histogram:0:40.20% 1:14.22% 2:12.75% 3:10.28% 4:7.77% 5:5.69% 6:4.09% 7:2.92% 8:2.07%
62064288 15.52 4.97 27.85 130.24 blosc (zlib, nthreads=1, clevel=3)
62064288 15.52 4.97 42.85 214.16 blosc (zlib, nthreads=4, clevel=3)
63392801 15.85 5.07 396.74 1412.46 TurboPFor
77187330 19.30 6.17 51.44 663.17 blosc (lz4hc, nthreads=1, clevel=3)
77187330 19.30 6.17 78.52 984.59 blosc (lz4hc, nthreads=4, clevel=3)
101473443 25.37 8.12 862.60 896.15 blosc (lz4, nthreads=1, clevel=3)
101473443 25.37 8.12 1193.86 1392.87 blosc (lz4, nthreads=2, clevel=3)
101473443 25.37 8.12 1608.41 1549.34 blosc (lz4, nthreads=4, clevel=3)
102491006 25.62 8.20 595.36 1815.32 blosc (blosclz, nthreads=1, clevel=3)
102491006 25.62 8.20 949.23 2049.91 blosc (blosclz, nthreads=2, clevel=3)
102491006 25.62 8.20 1293.24 1873.03 blosc (blosclz, nthreads=4, clevel=3)
The timings with clang 3.5 are very close to these, so I am not reproducing them.
By the way, very nice work! I did not realized that compressing integers was that important!
Hello,
First of all, sorry for my english,
Currently, I am finishing a thesis, and your implementation is cited in the bibliography. For doing this I would like to know some data about your code,
Thank-you,
Julian
If you remove USE_SSE & USE_AVX2, why don't you grep entire repo to see if you have left overs here and there?..
Hello,
is it possible to actually save the compressed output in a file? And to decompress it afterwards?
thanks.
Hi - if I've compressed a large number of integers with one of the n
functions, is there any way to directly access a block of integers in the middle? For example, suppose I have 1 million integers and I want to read the integers starting at offset 500,000.
I got next error for both gcc-7 and gcc-5 while executing make command:
/usr/lib/gcc/x86_64-linux-gnu/5/include/bmi2intrin.h:62:1: error: inlining failed in call to always_inline ‘_bzhi_u64’: target specific option mismatch
_bzhi_u64 (unsigned long long __X, unsigned long long __Y)
Hi,
when I'm building project on Ubuntu 16.04 (propably distribution / version doesn't really matter, anyway) using just "make" it ends with such errors (and all of them are undefined references):
eliasfano.o: In function
efano1enc256v32': eliasfano.c:(.text+0x1438): undefined reference to
bitpack256v32'
but "make AVX2=1" builds it successfully. Seems like there could be some missing ifdef's. For example in bitpack.c:
#if defined(AVX2) && defined(AVX2_ON)
[...]
unsigned char *bitpack256v32(unsigned *__restrict in, unsigned n, unsigned char *__restrict out, unsigned b) { unsigned char pout = out+PAD8(256b); BITPACK256V32(in, b, out, 0); return pout; }
[...]
#endif
which in "non-AVX2 mode" removes bitfpack256v32 function, but it's still used somewhere else (probably in plugins.cc).
I've just found this library and it looks very promising. Thank you for your work!
https://github.com/powturbo/TurboPFor/blob/56e510087bde39bfadf55a3128f0751433c50e2b/vp4c.c#L53
That's simply confusing for anybody tracking the code. Split these files into .c files that change defines and then include .inl or better simply .h files that contain the code. Self including .c files is just weird.
I would like to point out that an identifier like "__bx
" does eventually not fit to the expected naming convention of the C language standard.
Would you like to adjust your selection for unique names?
It is not rare that we have some blocks of data in hand which aren't of the length of P4DSIZE. And since the lib has already got this functionality. It should be easy to just add several function declarations in the header.
Could you please suggest
Hi,
Congratulations on building this library. I currently use Lemire's FastPfor, and after reading the information on your page I decided to try it out, since Lemire's FastPFor does not offer integer intersection out of the box.
1.I am in the process of building a ruby port for Lemire's fastpfor library and I want to build one for your library also, hence I would really like if you can give me few details like which headers should be included, and which functions need to be referenced. (While building a ruby port I need to make one c file which specifies a single function which is called from Ruby. That function must then do all the interaction with your library. - so for eg, If i pass in a list of integers from ruby, then which function should I call from your library to compress and inversely which function to decompress).
2.Is it possible to use redis as a storage medium?
Many thanks,
RRPhotosoft.
There are padfloat32()
and padfloat64()
, but no padfloat16()
, is it straightforward to create a float16 version based on the float32 version?
Also, what if I do the lossy compression for float16 data using icapp -f4 -g.001 float16data
?
I suggest to add the key word "const" to the type specifiers for parameters like "in" (function "p4denc32").
Would you like to apply the advices from an article to more places in your source files?
Hi, I've been using "JavaFastPFOR", but "TurboPFor" seems much faster than that.
So I tried to use "TurboPFor" but when I build a .so file, the following error occured.
git clone --recursive git://github.com/powturbo/TurboPFor.git
cd TurboPFor/java
javah -jni jic
cp jic.h ..
javac jic.java
javac jicbench.java
cd ..
gcc -O3 -w -march=native -fstrict-aliasing -m64 -shared -fPIC -I/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/include -I/Library/Java/JavaVirtualMachines/jdk1.8.0_66.jdk/Contents/Home/include/darwin bitpack.c bitunpack.c vp4c.c vp4d.c vsimple.c vint.c bitutil.c jic.c -o libic.so
Undefined symbols for architecture x86_64:
"_JNNIDEC", referenced from:
_Java_jic_p4ndec32 in jic-97920b.o
_Java_jic_p4ndec128v32 in jic-97920b.o
_Java_jic_p4ndec256v32 in jic-97920b.o
_Java_jic_p4nddec32 in jic-97920b.o
_Java_jic_p4nd1dec32 in jic-97920b.o
"_JNNIENC", referenced from:
_Java_jic_p4nenc32 in jic-97920b.o
_Java_jic_p4nenc128v32 in jic-97920b.o
_Java_jic_p4nenc256v32 in jic-97920b.o
_Java_jic_p4ndenc32 in jic-97920b.o
_Java_jic_p4nd1enc32 in jic-97920b.o
"_bitd1pack128v32", referenced from:
_Java_jic_bitd1pack128v32 in jic-97920b.o
_JavaCritical_jic_bitd1pack128v32 in jic-97920b.o
(maybe you meant: _JavaCritical_jic_bitd1pack128v32, _Java_jic_bitd1pack128v32 )
"_bitd1pack256v32", referenced from:
_Java_jic_bitd1pack256v32 in jic-97920b.o
_JavaCritical_jic_bitd1pack256v32 in jic-97920b.o
(maybe you meant: _JavaCritical_jic_bitd1pack256v32, _Java_jic_bitd1pack256v32 )
"_bitd1unpack128v32", referenced from:
_Java_jic_bitd1unpack128v32 in jic-97920b.o
_JavaCritical_jic_bitd1unpack128v32 in jic-97920b.o
(maybe you meant: _Java_jic_bitd1unpack128v32, _JavaCritical_jic_bitd1unpack128v32 )
"_bitd1unpack256v32", referenced from:
_Java_jic_bitd1unpack256v32 in jic-97920b.o
_JavaCritical_jic_bitd1unpack256v32 in jic-97920b.o
(maybe you meant: _JavaCritical_jic_bitd1unpack256v32, _Java_jic_bitd1unpack256v32 )
"_bitdpack128v32", referenced from:
_Java_jic_bitdpack128v32 in jic-97920b.o
_JavaCritical_jic_bitdpack128v32 in jic-97920b.o
(maybe you meant: _JavaCritical_jic_bitdpack128v32, _Java_jic_bitdpack128v32 )
"_bitdpack256v32", referenced from:
_Java_jic_bitdpack256v32 in jic-97920b.o
_JavaCritical_jic_bitdpack256v32 in jic-97920b.o
(maybe you meant: _Java_jic_bitdpack256v32, _JavaCritical_jic_bitdpack256v32 )
"_bitdunpack128v32", referenced from:
_Java_jic_bitdunpack128v32 in jic-97920b.o
_JavaCritical_jic_bitdunpack128v32 in jic-97920b.o
(maybe you meant: _JavaCritical_jic_bitdunpack128v32, _Java_jic_bitdunpack128v32 )
"_bitdunpack256v32", referenced from:
_Java_jic_bitdunpack256v32 in jic-97920b.o
_JavaCritical_jic_bitdunpack256v32 in jic-97920b.o
(maybe you meant: _JavaCritical_jic_bitdunpack256v32, _Java_jic_bitdunpack256v32 )
"_bitnd1pack128v32", referenced from:
_Java_jic_bitnd1pack128v32 in jic-97920b.o
_JavaCritical_jic_bitnd1pack128v32 in jic-97920b.o
(maybe you meant: _Java_jic_bitnd1pack128v32, _JavaCritical_jic_bitnd1pack128v32 )
"_bitnd1pack256v32", referenced from:
_Java_jic_bitnd1pack256v32 in jic-97920b.o
_JavaCritical_jic_bitnd1pack256v32 in jic-97920b.o
(maybe you meant: _Java_jic_bitnd1pack256v32, _JavaCritical_jic_bitnd1pack256v32 )
"_bitnd1unpack128v32", referenced from:
_Java_jic_bitnd1unpack128v32 in jic-97920b.o
_JavaCritical_jic_bitnd1unpack128v32 in jic-97920b.o
(maybe you meant: _Java_jic_bitnd1unpack128v32, _JavaCritical_jic_bitnd1unpack128v32 )
"_bitnd1unpack256v32", referenced from:
_Java_jic_bitnd1unpack256v32 in jic-97920b.o
_JavaCritical_jic_bitnd1unpack256v32 in jic-97920b.o
(maybe you meant: _JavaCritical_jic_bitnd1unpack256v32, _Java_jic_bitnd1unpack256v32 )
"_bitndpack128v32", referenced from:
_Java_jic_bitndpack128v32 in jic-97920b.o
_JavaCritical_jic_bitndpack128v32 in jic-97920b.o
(maybe you meant: _JavaCritical_jic_bitndpack128v32, _Java_jic_bitndpack128v32 )
"_bitndpack256v32", referenced from:
_Java_jic_bitndpack256v32 in jic-97920b.o
_JavaCritical_jic_bitndpack256v32 in jic-97920b.o
(maybe you meant: _JavaCritical_jic_bitndpack256v32, _Java_jic_bitndpack256v32 )
"_bitndunpack128v32", referenced from:
_Java_jic_bitndunpack128v32 in jic-97920b.o
_JavaCritical_jic_bitndunpack128v32 in jic-97920b.o
(maybe you meant: _Java_jic_bitndunpack128v32, _JavaCritical_jic_bitndunpack128v32 )
"_bitndunpack256v32", referenced from:
_Java_jic_bitndunpack256v32 in jic-97920b.o
_JavaCritical_jic_bitndunpack256v32 in jic-97920b.o
(maybe you meant: _JavaCritical_jic_bitndunpack256v32, _Java_jic_bitndunpack256v32 )
"_bitnpack128v32", referenced from:
_Java_jic_bitnpack128v32 in jic-97920b.o
_JavaCritical_jic_bitnpack128v32 in jic-97920b.o
(maybe you meant: _JavaCritical_jic_bitnpack128v32, _Java_jic_bitnpack128v32 )
"_bitnpack256v32", referenced from:
_Java_jic_bitnpack256v32 in jic-97920b.o
_JavaCritical_jic_bitnpack256v32 in jic-97920b.o
(maybe you meant: _JavaCritical_jic_bitnpack256v32, _Java_jic_bitnpack256v32 )
"_bitnunpack128v32", referenced from:
_Java_jic_bitnunpack128v32 in jic-97920b.o
_JavaCritical_jic_bitnunpack128v32 in jic-97920b.o
(maybe you meant: _Java_jic_bitnunpack128v32, _JavaCritical_jic_bitnunpack128v32 )
"_bitnunpack256v32", referenced from:
_Java_jic_bitnunpack256v32 in jic-97920b.o
_JavaCritical_jic_bitnunpack256v32 in jic-97920b.o
(maybe you meant: _JavaCritical_jic_bitnunpack256v32, _Java_jic_bitnunpack256v32 )
"_bitpack128v32", referenced from:
_Java_jic_bitpack128v32 in jic-97920b.o
_JavaCritical_jic_bitpack128v32 in jic-97920b.o
(maybe you meant: _Java_jic_bitpack128v32, _JavaCritical_jic_bitpack128v32 )
"_bitpack256v32", referenced from:
_Java_jic_bitpack256v32 in jic-97920b.o
_JavaCritical_jic_bitpack256v32 in jic-97920b.o
(maybe you meant: _JavaCritical_jic_bitpack256v32, _Java_jic_bitpack256v32 )
"_bitunpack128v32", referenced from:
_Java_jic_bitunpack128v32 in jic-97920b.o
_JavaCritical_jic_bitunpack128v32 in jic-97920b.o
(maybe you meant: _Java_jic_bitunpack128v32, _JavaCritical_jic_bitunpack128v32 )
"_bitunpack256v32", referenced from:
_Java_jic_bitunpack256v32 in jic-97920b.o
_JavaCritical_jic_bitunpack256v32 in jic-97920b.o
(maybe you meant: _Java_jic_bitunpack256v32, _JavaCritical_jic_bitunpack256v32 )
"_p4dec128v32", referenced from:
_Java_jic_p4dec128v32 in jic-97920b.o
_JavaCritical_jic_p4dec128v32 in jic-97920b.o
(maybe you meant: _JavaCritical_jic_p4dec128v32, _Java_jic_p4dec128v32 )
"_p4dec256v32", referenced from:
_Java_jic_p4dec256v32 in jic-97920b.o
_JavaCritical_jic_p4dec256v32 in jic-97920b.o
(maybe you meant: _JavaCritical_jic_p4dec256v32, _Java_jic_p4dec256v32 )
"_p4enc128v32", referenced from:
_Java_jic_p4enc128v32 in jic-97920b.o
_JavaCritical_jic_p4enc128v32 in jic-97920b.o
(maybe you meant: _JavaCritical_jic_p4enc128v32, _Java_jic_p4enc128v32 )
"_p4enc256v32", referenced from:
_Java_jic_p4enc256v32 in jic-97920b.o
_JavaCritical_jic_p4enc256v32 in jic-97920b.o
(maybe you meant: _JavaCritical_jic_p4enc256v32, _Java_jic_p4enc256v32 )
"_p4ndec128v32", referenced from:
_Java_jic_p4ndec128v32 in jic-97920b.o
_JavaCritical_jic_p4ndec128v32 in jic-97920b.o
(maybe you meant: _JavaCritical_jic_p4ndec128v32, _Java_jic_p4ndec128v32 )
"_p4ndec256v32", referenced from:
_Java_jic_p4ndec256v32 in jic-97920b.o
_JavaCritical_jic_p4ndec256v32 in jic-97920b.o
(maybe you meant: _Java_jic_p4ndec256v32, _JavaCritical_jic_p4ndec256v32 )
"_p4nenc128v32", referenced from:
_Java_jic_p4nenc128v32 in jic-97920b.o
_JavaCritical_jic_p4nenc128v32 in jic-97920b.o
(maybe you meant: _JavaCritical_jic_p4nenc128v32, _Java_jic_p4nenc128v32 )
"_p4nenc256v32", referenced from:
_Java_jic_p4nenc256v32 in jic-97920b.o
_JavaCritical_jic_p4nenc256v32 in jic-97920b.o
(maybe you meant: _JavaCritical_jic_p4nenc256v32, _Java_jic_p4nenc256v32 )
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
I was doing some benchmarking using your code, and was very interested with your p4nzenc16 and other algorithms defined in vp4.h. Below are the test results with our own data:
icapp.exe -s2 ..\data\data
E MB/s size ratio D MB/s function
291.27 109183 83.30% 1213.63 p4nzenc16 ..\data\data
296.54 109183 83.30% 1472.72 p4nzenc128v16 ..\data\data
13107.20 131072 100.00% 11915.64 memcpy
As shown, the Encoding and Deconding speeds are much lower than your benchmarks. I think the reason should be on the data characteristics. Our data looks like the following int32 array:
3956541806 | 4021030862 | 4156167980 | 4292351536 | 122688988 | 306712316 | 340264262 | 429391892 | 4293721652 | 363856276 | 311752484 | 339799824| 4235594008 | 4524368 | 77400290 | 119604839 | 147588075 | 156565166 | 123075266 | 129103883 | 4290838339 | 114620947 | 47183905 | 13301076
4285659166 | 4236114193 | 4166384448 | 4157799805 | 4175364326 | 4183426246 | 4234216941 | 4256303196 | 13369437 | 4252698291 | 12388188 | 56166432
254473376 | 190640480 | 41675844 | 4243778848 | 4123848842 | 4021352164 | 4006412090 | 3940549509 | 2883989 | 4012638792 | 4077652372 | 4104654984
264900636 | 329517310 | 402456453 | 331937716 | 279507270 | 124447292 | 4257934256 | 4158318916 | 4276027312 | 4211534776 | 4081119688 | 3936678884
53674504 | 59572339 | 54197952 | 55901532 | 23461120 | 7207976 | 4273994955 | 4246338828 | 4284022907 | 4266917110 | 4243455526 | 4239261366
4124374812 | 4083874473 | 4060086691 | 4094035282 | 4155902560 | 4226682114 | 48958370 | 109907428 | 19136432 | 67767338 | 140248949 | 223478996
4278517578 | 4282253572 | 18153714 | 22609897 | 4287561290 | 4280549154 | 4287430841 | 8192371 | 983184 | 4294639905 | 6422579 | 15335227
4287365347 | 15073760 | 26148779 | 7601913 | 4272029419 | 4277928110 | 10486103 | 30277608 | 4292804642 | 25100410 | 10944260 | 4273405619
4290183352 | 4285399269 | 4980480 | 3211078 | 4284940362 | 4283039840 | 6488321 | 10158232 | 786522 | 9306149 | 4259602 | 4291821626
4288020496 | 4278124841 | 17105015 | 7012364 | 1179425 | 4279107371 | 4287561847 | 3539276 | 4292739123 | 786612 | 9109491 | 14483363
4278583511 | 721358 | 22741000 | 1048332 | 4281401193 | 4281597998 | 4291166556 | 22347807 | 8257644 | 20512778 | 13107137 | 4284219109
4290576569 | 15663574 | 35323624 | 3276322 | 4261543643 | 4276158780 | 13304084 | 34340846 | 12451881 | 39059472 | 6094518 | 4272881238
Question 1: How should we compress such kind of data to achieve the results as shown in your benchmarks (> 1.3GB/s Encoding, > 5GB/s deconding, >2x compression ratio)?
Question 2: Do you have any documents regarding the mechanism of the p4nzenc16?
Hi, thanks for your great work on all these algorithms, but I noticed that you haven't put a LICENSE file in almost all of your repositories, I wonder if its possible that I can use your code directly inside my commercial products ?
Testing TurboPFor, I've made some experiments with nanopore signal data.
File: all 16-bits signal data extracted from multi_fast5_zip.fast5
./icapp sig.u16 -Elzturbo,39 -Fs -e83,77,11,39,51,30,76
file: max bits histogram: 04: 0.000% 06: 0.000% 07: 0.002% 08:###### 6.0% 09:######################################################################################### 89% 10:##### 5.5% 11: 0.000% 12: 0.000% 16: 0.001% file: delta max bits histogram: 00:## 2.4% 01:## 2.4% 02:##### 4.7% 03:######### 9.2% 04:################# 17% 05:########################### 27% 06:######################## 24% 07:######### 9.0% 08:#### 3.7% 09:# 0.8% 10: 0.012% 11: 0.002% 12: 0.000% 13: 0.000% 14: 0.000% Filesize: 3.097.862 bytes CPU: Skylake i7-6700 3.4GHz E MB/s size ratio D MB/s function (integer size=16 bits) 94.48 1275037 41.16% 1740.37 Lztp4z Nibble Transpose+zigzag+turboanx,64 81.99 1279609 41.31% 1512.63 Lztpz Byte Transpose+zzag+turboanx,64 1.81 1283751 41.44% 2201.98 Lztp4z Nibble Transpose+zigzag+lzturbo,39 1.13 1289797 41.64% 1912.47 Lztpz Byte Transpose+zzag+lzturbo,39 3.06 1292571 41.72% 1627.03 Lztpz Byte Transpose+zzag+zstd,22 120.73 1293228 41.75% 2010.29 lzv8zenc TurboByte+zzag+turboanx 7.32 1296883 41.86% 1946.86 lzv8zenc TurboByte+zzag+lzturbo,39 7.48 1310074 42.29% 1690.97 lzv8zenc TurboByte+zzag+zstd,22 6.44 1333780 43.05% 1103.62 vbz vbz_compression 614.78 1432663 46.25% 4419.20 p4nzenc128v16 TurboPForV zigzag 547.91 1523046 49.16% 752.46 lzv8zenc TurboByte+zzag+fse 1.55 1577260 50.91% 5779.59 LztpzByte Transpose+zzag+lzturbo,19 11.68 1577173 50.91% 1789.64 LztpzByte Transpose+zzag+lz4,12 10.80 1583766 51.12% 6387.34 lzv8zenc TurboByte+zzag+lzturbo,19 21.06 1587076 51.23% 4863.21 lzv8zenc TurboByte+zzag+lz4,12 367.44 1632853 52.71% 573.68 Lztpz Byte Transpose+zzag+fse 6749.15 1676704 54.12% 8775.81 v8nzenc128v16 TByte+TPackV zigzag 6992.92 1676746 54.13% 8927.56 bitnzpack128v16 TurboPackV zigzag 68.73 1705144 55.04% 959.98 lzv8enc TurboByte+turboanx
ctz64
for non-windows builds is:
for example:
#define ctz64(x) __builtin_ctzll(x)
while for windows it's:
static inline int ctz64(uint64_t x) { unsigned long z; _BitScanForward64(&z, x); return x?z:64; }
__builtin_ctzll
isn't defined for x=0, however, for windows builds it does the check: x?z:64
.
https://github.com/powturbo/TurboPFor/blob/4df4bcea29b670dab3acb985aac83a7562bfa2eb/conf.h#L65
If I understood correctly, vsenc32/vsdec32 does not support zero values in the input stream. Is this correct? I encounter weird crashes (memory corruption), which go away as long as I avoid zero elements in the input stream. If this is the case, please add assert and/or short comment in documentation.
// vsencNN: compress array with n unsigned (NN bits in[n]) values to the buffer out. Return value = end of compressed output buffer out
// vsdecNN: decompress buffer into an array of n unsigned values. Return value = end of compressed input buffer in
Hi
I work on a open-source analytics storage engine (Apache Kudu) was pondering on using this library.
Is this released with an apache-friendly license (MIT, BSD, Apache 2.)?
If not, could it be?
Thanks
-david
$ javah -jni jic or $ javah -jni jic.java
fails - no such file. Could you please add jic.h to the repository ?
So far, I've tried to make a shared library without it and succeeded with following make file:
differences are added flag - -fPIC and
#$(CC) $(OBJS) -lm -o -shared libic.so $(LFLAGS)
$(CC) -shared -Wl,-soname,libic.so -o libic.so.1 -lm $(LFLAGS) $(OBJS)
# Linux: "export CC=clang" windows mingw: "set CC=gcc" or uncomment one of following lines
# CC=clang
# CC=gcc
MARCH=-march=native
#MARCH=-msse2
CFLAGS=-DNDEBUG -fstrict-aliasing -m64 $(MARCH) -Iext -fPIC
UNAME := $(shell uname)
ifeq ($(UNAME), Linux)
LIBTHREAD=-lpthread
LIBRT=-lrt
else
CC=gcc
endif
BIT=./
all: icbench idxcr idxqry idxseg libic
bitpack.o: $(BIT)bitpack.c $(BIT)bitpack.h $(BIT)bitpack64_.h
$(CC) -O2 $(CFLAGS) -c $(BIT)bitpack.c
bitpackv.o: $(BIT)bitpackv.c $(BIT)bitpack.h $(BIT)bitpackv32_.h
$(CC) -O2 $(CFLAGS) -c $(BIT)bitpackv.c
vp4dc.o: $(BIT)vp4dc.c
$(CC) -O3 $(CFLAGS) -funroll-loops -c $(BIT)vp4dc.c
vp4dd.o: $(BIT)vp4dd.c
$(CC) -O3 $(CFLAGS) -funroll-loops -c $(BIT)vp4dd.c
varintg8iu.o: $(BIT)ext/varintg8iu.c $(BIT)ext/varintg8iu.h
$(CC) -O2 $(CFLAGS) -c -funroll-loops -std=c99 $(BIT)ext/varintg8iu.c
idxqryp.o: $(BIT)idxqry.c
$(CC) -O3 $(CFLAGS) -c $(BIT)idxqry.c -o idxqryp.o
SIMDCOMPD=ext/simdcomp/
SIMDCOMP=$(SIMDCOMPD)bitpacka.o $(SIMDCOMPD)src/simdintegratedbitpacking.o $(SIMDCOMPD)src/simdcomputil.o $(SIMDCOMPD)src/simdbitpacking.o
#LIBFOR=ext/for/for.o
MVB=ext/MaskedVByte/src/varintencode.o ext/MaskedVByte/src/varintdecode.o
# Lzturbo not included
#LZT=../lz/lz8c0.o ../lz/lz8d.o ../lz/lzbc0.o ../lz/lzbd.o
# blosc. Set the env. variable "EXT=blosc" to include
#EXT=blosc
ifeq ($(EXT), blosc)
B=ext/
CFLAGS+=-DSHUFFLE_SSE2_ENABLED -DHAVE_LZ4 -DHAVE_ZLIB -Iext/
LFLAGS+=-lpthread
BLOSC=$(B)lz4hc.o $(B)c-blosc/blosc/blosc.o $(B)c-blosc/blosc/blosclz.o $(B)c-blosc/blosc/shuffle.o $(B)c-blosc/blosc/shuffle-generic.o $(B)c-blosc/blosc/shuffle-sse2.o
endif
LZ4=ext/lz4.o
#ZLIB=-lz
#BSHUFFLE=ext/bitshuffle/src/bitshuffle.o
OBJS=icbench.o bitutil.o vint.o bitpack.o bitunpack.o eliasfano.o vsimple.o vp4dd.o vp4dc.o varintg8iu.o bitpackv.o bitunpackv.o $(TRANSP) ext/simple8b.o transpose.o $(BLOSC) $(SIMDCOMP) $(LIBFOR) $(LZT) $(LZ4) $(MVB) $(ZLIB) $(BSHUFFLE)
icbench: $(OBJS)
$(CC) $(OBJS) -lm -o icbench $(LFLAGS)
libic: $(OBJS)
#$(CC) $(OBJS) -lm -o -shared libic.so $(LFLAGS)
$(CC) -shared -Wl,-soname,libic.so -o libic.so.1 -lm $(LFLAGS) $(OBJS)
idxseg: idxseg.o
$(CC) idxseg.o -o idxseg
ifeq ($(UNAME), Linux)
para: CFLAGS += -DTHREADMAX=32
para: idxqry
endif
idxcr: idxcr.o bitpack.o vp4dc.o bitutil.o
$(CC) idxcr.o bitpack.o bitpackv.o vp4dc.o bitutil.o -o idxcr $(LFLAGS)
idxqry: idxqry.o bitunpack.o vp4dd.o bitunpackv.o bitutil.o
$(CC) idxqry.o bitunpack.o bitunpackv.o vp4dd.o bitutil.o $(LIBTHREAD) $(LIBRT) -o idxqry $(LFLAGS)
.c.o:
$(CC) -O3 $(CFLAGS) $< -c -o $@
.cc.o:
$(CXX) -O3 -DNDEBUG -std=c++11 $< -c -o $@
.cpp.o:
$(CXX) -O3 -DNDEBUG -std=c++11 $< -c -o $@
clean:
@find . -type f -name "*\.o" -delete -or -name "*\~" -delete -or -name "core" -delete
cleanw:
del /S ..\*.o
del /S ..\*~
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.