jermp / sshash Goto Github PK
View Code? Open in Web Editor NEWA compressed, associative, exact, and weighted dictionary for k-mers.
License: MIT License
A compressed, associative, exact, and weighted dictionary for k-mers.
License: MIT License
The output is described here: https://github.com/COMBINE-lab/cuttlefish#cuttlefish-1-output.
We want to parse the .cf_seg
files output by Cuttlefish1 to build the unitigs from a reference dBG.
Hi @jermp,
Right now, compilation fails on M1 and M2 Macs for several reasons. First, the current version of Xcode doesn't support -march=native
on these machines. Clang does upstream, so this will go away eventually, but it would be good for the CMakeFiles using this to have a flag one could pass to remove this flag.
Second, there are several places where x86 intrinsics are used (and at least one place where inline ASM is used). The intrinsics could be made portable with simde, and the assembly could likely be tested too (not sure if that actually causes a problem or not but I can't get to it yet because of the other intrinsics).
I think it would be worth the (hopefully) minor changes to be able to support sshash (and hence piscem) compilation directly on M1/M2 hardware!
Best,
Rob
Hello,
Can you give me some advice of how to change the code so that when I query the index it writes the hashes to stdout/a file. Thanks for any advice.
Store the minimizer's absolute positions rather than those of super-kmers.
Align the query kmer to the minimizer's position at query time.
This will simplify the code logic avoiding the scan of the super-kmers.
Hi, I'm getting this Segmentation fault. Am I doing something stupid or is it a bug?
./build/build test.fa 31 13 -o ./index/out.sshash
k = 31, m = 13, seed = 1, l = 6, c = 3, canonical_parsing = false, weighted = false
reading file 'test.fa'...
m_buffer_size 29411764
sorting buffer...
saving to file './sshash.tmp.run_1653987257158295118.minimizers.0.bin'...
read 1 sequences, 63 bases, 33 kmers
num_kmers 33
num_super_kmers 4
num_pieces 2 (+3.63636 [bits/kmer])
=== step 1: 'parse_file' 7.9e-05 [sec] (2393.94 [ns/kmer])
Segmentation fault
This is the file test.fna (I had to change the extension to .txt for github because github did not allow .fa file extension).
We have already tested a prototype here: https://github.com/jermp/sshash/blob/statistics/include/util.hpp.
Hi Giulio,
Thank you for developing and providing this tool!
I was testing it out on some small data sets and noticed that for even values of m it fails to build an index.
I'm testing with the following sequence inputs:
>1
CATGTACTAGCTGATCGTAGCTAGCTAGC
>2
AAAAAAAAAAAA
and the following build command
./sshash build -i dump.fa -k 12 -m 6 -o dump_sshash
The above command produces this output:
k = 12, m = 6, seed = 1, l = 6, c = 3, canonical_parsing = false, weighted = false
reading file 'dump.fa'...
m_buffer_size 29411764
sorting buffer...
saving to file './sshash.tmp.run_1701436753917127000.minimizers.0.bin'...
read 2 sequences, 41 bases, 19 kmers
num_kmers 19
num_super_kmers 5
num_pieces 3 (+3.47368 [bits/kmer])
=== step 1: 'parse_file' 0.000847 [sec] (44578.9 [ns/kmer])
terminate called after throwing an instance of 'std::runtime_error'
what(): mmap failed
Abort trap: 6
I don't see any errors when I run it using valgrind
, but the backtrace shows that it's aborting after this line:
mm::file_source<unsigned long>::open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) (mm_file.hpp:133)
It does work, however, if I set m = 1,3,5,7, for m>=9 it segfaults or produces other errors. Strangely, I get different behaviour in different environments. It fails for all values of m on my Mac with g++12, but works with the values mentioned on a Linux system with g++9... Also, while trying to explore this, I replaced CATGTACTAGCTGATCGTAGCTAGCTAGC
with CATGTAGCTGATCGTAGCTAGCTAGC
and it works for even but not odd values of m on my Mac....
Please let me know if I can provide any more information to help debug this.
Continuing from #22
The problem is, that doesn’t work with this code. We immediately get an illegal instruction error. You can see this, e.g. if you pull down the latest piscem from bioconda.
One question I had is, what compiler flags do we actually need? For example, rosetta 2 handles many intrinsics fine (e.g. SSE4.2), but not others (e.g. AVX512). I am not certain if it handles BMI2 or not.
My thought is, if we figured out what instructions it can handle — we could perhaps add a “CONDA_BUILD” flag to the CMake scripts that would skip march=native, but would still pass the most essential performance flags to the compiler. This would produce (ideally) an x86_64 executable that may not be quite as optimized, but which would run on both x86_64 and M1/2 Macs. Let me know if you have any questions or thoughts on this.
Yeah, we should get an idea of what instructions are not permitted there...but anyway I do not think a x86 binary would execute well on ARM.
Ok, we are mixing two separate (although related) things in this thread:
- support SSHash on ARM (which, now, it should be done!);
- distribute SSHash via bioconda.
For point 2. I do not know since I haven't used bioconda and I'm not familiar with that. But if they do not have any M1/M2 build hosts...then it's not SSHash's (hence, nor our) fault :D Eventually they should get some building environment, I suppose, no?
For the moment, we could just warn the users about this bioconda's limitation and ask them to download SSHash directly from GitHub and compile it manually (which is trivial).
The discussion here is in regard to 2. I have two thoughts here.
I absolutely agree that, until native M1/2 builds are available from bioconda, it would be better for folks to compile themselves, and for that to be made as easy as possible (could we provide pre-compiled binaries?)
Regarding performance, actually, rosetta 2 is pretty amazing in my experience. Even through translation, the M1 (Pro/Max) often outperforms the previous top-end MacBook Pros running i9. My understanding is that rosetta 2 directly translates many of the x86 intrinsics to native Neon intrinsics (or whatever special instructions the M architecture has). While I agree that compilation isn't difficult, I also have a lot of prior experience telling me that my making that statement is very different than the experience a biologist who doesn't focus on software/methods trying to build my tool will have.
Of course, I absolutely understand if you think supporting bioconda builds that run on M1/2 isn't of sufficient priority to warrant effort at this point — we could ask the bioconda people what their path forward and intended timeline is. On the other hand, it would be nice to know what is the delta between what march=native
offers and what instructions are actually useful / necessary. It may be that we can remove that flag in Conda builds, explicitly specify the instructions we want, and get little-to-no performance degradation and the ability to distribute something via Conda that works on all platforms (which makes it trivial for people to use both locally and on a cluster).
Hello @jermp: thank you for designing this awesome data structure! We at @COMBINE-lab (tag: @rob-p) are finding SSHash very useful.
We are interested in using SSHash from within C++ codes, and were wondering if you could provide some C++ API / header for that. We are, for now, interested in the following basic functionalities. I'm using the "unitig" terminology in the following, but the input sequence types (i.e. unitigs / maximal paths) does not matter.
Given a k-mer x
, find its:
lookup(x)
unitig(x)
unitig_offset(x)
Given a unitig ID u
, find its:
size(u)
neighbors(u)
: this likely consists of a query of 2 * 4 (canonical) k-mers.We hope that these might be feasible for you to provide.
Thanks again!
Hi @jermp,
I'm trying out the new external memory construction feature of sshash (super-exciting!). I've noticed that the memory usage seems surprisingly high. For example, I ran on the compacted dBG (not the UST) of the human genome with:
/usr/bin/time ./build unitigs.fa 31 19 -d /scratch/shash_tmp -o grch38_k31_seg.sshash --verbose
and I am getting ~12G of memory usage. The time is ~16m, but I'd expect this given it's running on spinning rust rather than SSD.
2316.51user 47.52system 16:27.23elapsed 239%CPU (0avgtext+0avgdata 12690096maxresident)k
8155944inputs+50695376outputs (1major+8309647minor)pagefaults 0swaps
Any idea why the memory usage might be like this? I noticed that your initial mention was that it was ~4G for the human genome with k=31. Is it because of my minimizer size? When I use a minimizer of 15, this goes down to ~9G, which is better, but still larger than I'd have anticipated. Any thoughts on what might be going on here?
Thanks!
Rob
hello,
i'm trying your software to see memory consumption of it for a metagenomics dataset i have, the problem is that i can't use bcalm to generate unitig for it(GATB/bcalm#71) due to the big number of kmer, have you any software recomandation to transform a lot of read into a input format that work with your software?
Hi @jermp!
I'm currently working on adopting SSHash in MetaGraph, as we want to see if we can improve our performance using it as an internal structure for indexing and traversing DBGs.
As a part of the process, we want to generalize SSHash in a way that allows larger values of kmer_t
class, rather than utils::
functions, as they mostly are now.
So, I plan to write a simple base kmer_t
class with virtual functions that provides contracts for how such methods should look like, and then we will inherit from this base class in MetaGraph to provide implementations for our kmer types. Then, all the methods that currently rely on kmer_t
would take a template argument instead, which should be an implementation k-mer class derived from the base class described above. I think this way we can ensure that Metagraph and SSHash stay reasonably separated.
I started working in this direction in our fork of SSHash (9624612), but before going further I wanted to get in touch and collect some feedback on your side. That being said, I have a couple of questions:
util::
or dictionary
methods?If merging it upstream in the future is of interest to you, I will try to take your opinion into consideration while working in the fork, so it'd be easier to merge afterwards. If it's not a priority for you, no problem, we can keep working on the fork, but still will highly appreciate any insight and advice you might have for these changes.
Hi again,
I'm trying to run queries on a small sshash index, but the program reports that zero queries were ran. What could be the issue?
niklas@phoenix:~/code/SBWT_experiments$ ./sshash/build/query index.sshash test.fna
2022-06-01 19:02:46: loading index from file 'index.sshash'...
index size: 2.13636 [MB] (7.09913 [bits/kmer])
2022-06-01 19:02:46: performing queries from file 'test.fna'...
2022-06-01 19:02:46: DONE
==== query report:
num_kmers = 0
num_valid_kmers = 0 (-nan% of kmers)
num_positive_kmers = 0 (-nan% of valid kmers)
num_searches = 140332537301184/0 (inf%)
num_extensions = 2/0 (inf%)
elapsed = 0.005 millisec / 5e-06 sec / 8.33333e-08 min / inf ns/kmer
The index and the queries are attached here. I'm not attaching the original sequences because they are over 30GB.
data.zip
See the discussion here COMBINE-lab/piscem#1.
This means to default to: https://github.com/jermp/sshash/blob/master/include/hash_util.hpp#L51.
hello,
i have this error :
terminate called after throwing an instance of 'std::runtime_error'
what(): Using 64-bit hash codes with more than 2^30 keys can be dangerous due to collisions: use 128-bit hash codes instead.
and i don't find how i can use a 128-bit hash codes when building
I'm trying to install the flugor, but it seems like sshash, as a dependency, is failed to compile.
Environments:
$ uname -a
Linux mBio 6.1.55-1-MANJARO #1 SMP PREEMPT_DYNAMIC Sat Sep 23 12:13:56 UTC 2023 x86_64 GNU/Linux
$ gcc --version
gcc (GCC) 13.2.1 20230801
zlib: 1.3
$ cmake ..
-- The C compiler identification is GNU 13.2.1
-- The CXX compiler identification is GNU 13.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- CMAKE_BUILD_TYPE: Release
-- Configuring done (0.3s)
-- Generating done (0.0s)
-- Build files have been written to: /home/shenwei/Downloads/fulgor/sshash/build
$ make
[ 3%] Building CXX object CMakeFiles/sshash_static.dir/include/gz/zip_stream.cpp.o
In file included from /home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:1:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:121:5: error: ‘uint32_t’ does not name a type
121 | uint32_t get_crc() const;
| ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:67:1: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
66 | #include <vector>
+++ |+#include <cstdint>
67 |
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:124:5: error: ‘uint32_t’ does not name a type
124 | uint32_t get_in_size() const;
| ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:124:5: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:137:5: error: ‘uint32_t’ does not name a type
137 | uint32_t crc_;
| ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:137:5: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:180:5: error: ‘uint32_t’ does not name a type
180 | uint32_t get_crc() const;
| ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:180:5: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:186:5: error: ‘uint32_t’ does not name a type
186 | uint32_t get_in_size() const;
| ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:186:5: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:200:5: error: ‘uint32_t’ does not name a type
200 | uint32_t crc_;
| ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:200:5: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:285:5: error: ‘uint32_t’ does not name a type
285 | uint32_t gzip_crc_;
| ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:285:5: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:286:5: error: ‘uint32_t’ does not name a type
286 | uint32_t gzip_data_size_;
| ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:286:5: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In constructor ‘basic_zip_streambuf<CharT, Traits>::basic_zip_streambuf(ostream_reference, int, ZipStrategy, int, int, size_t)’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:37:83: error: class ‘basic_zip_streambuf<CharT, Traits>’ does not have any field named ‘crc_’
37 | : ostream_(ostream), output_buffer_(buffer_size, 0), buffer_(buffer_size, 0), crc_(0) {
| ^~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In member function ‘std::streamsize basic_zip_streambuf<CharT, Traits>::flush()’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:101:5: error: ‘crc_’ was not declared in this scope
101 | crc_ = static_cast<uint32_t>(crc32(crc_, zip_stream_.next_in, zip_stream_.avail_in));
| ^~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:101:24: error: ‘uint32_t’ does not name a type
101 | crc_ = static_cast<uint32_t>(crc32(crc_, zip_stream_.next_in, zip_stream_.avail_in));
| ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:5:1: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
4 | #include <cstring>
+++ |+#include <cstdint>
5 |
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: At global scope:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:136:1: error: ‘uint32_t’ does not name a type
136 | uint32_t basic_zip_streambuf<CharT, Traits>::get_crc() const {
| ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:136:1: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:141:1: error: ‘uint32_t’ does not name a type
141 | uint32_t basic_zip_streambuf<CharT, Traits>::get_in_size() const {
| ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:141:1: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In member function ‘bool basic_zip_streambuf<CharT, Traits>::zip_to_stream(char*, std::streamsize)’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:163:5: error: ‘crc_’ was not declared in this scope
163 | crc_ = static_cast<uint32_t>(crc32(crc_, zip_stream_.next_in, zip_stream_.avail_in));
| ^~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:163:24: error: ‘uint32_t’ does not name a type
163 | crc_ = static_cast<uint32_t>(crc32(crc_, zip_stream_.next_in, zip_stream_.avail_in));
| ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:163:24: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In constructor ‘basic_unzip_streambuf<CharT, Traits>::basic_unzip_streambuf(istream_reference, int, size_t, size_t)’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:193:87: error: class ‘basic_unzip_streambuf<CharT, Traits>’ does not have any field named ‘crc_’
193 | : istream_(istream), input_buffer_(input_buffer_size), buffer_(read_buffer_size), crc_(0) {
| ^~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: At global scope:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:259:1: error: ‘uint32_t’ does not name a type
259 | uint32_t basic_unzip_streambuf<CharT, Traits>::get_crc() const {
| ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:259:1: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:269:1: error: ‘uint32_t’ does not name a type
269 | uint32_t basic_unzip_streambuf<CharT, Traits>::get_in_size() const {
| ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:269:1: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In member function ‘std::streamsize basic_unzip_streambuf<CharT, Traits>::unzip_from_stream(char_type*, std::streamsize)’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:304:5: error: ‘crc_’ was not declared in this scope
304 | crc_ = static_cast<uint32_t>(crc32(crc_, reinterpret_cast<byte_type*>(buffer), (uInt)theSize));
| ^~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:304:24: error: ‘uint32_t’ does not name a type
304 | crc_ = static_cast<uint32_t>(crc32(crc_, reinterpret_cast<byte_type*>(buffer), (uInt)theSize));
| ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:304:24: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In member function ‘basic_zip_ostream<CharT, Traits>& basic_zip_ostream<CharT, Traits>::add_footer()’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:391:5: error: ‘uint32_t’ was not declared in this scope
391 | uint32_t crc = this->get_crc();
| ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:391:5: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:393:51: error: ‘crc’ was not declared in this scope; did you mean ‘crc32’?
393 | this->get_ostream().put(static_cast<char>(crc & 0xff));
| ^~~
| crc32
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:397:13: error: expected ‘;’ before ‘length’
397 | uint32_t length = this->get_in_size();
| ^~~~~~~
| ;
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:399:51: error: ‘length’ was not declared in this scope
399 | this->get_ostream().put(static_cast<char>(length & 0xff));
| ^~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In constructor ‘basic_zip_istream<CharT, Traits>::basic_zip_istream(istream_reference, int, size_t, size_t)’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:417:7: error: class ‘basic_zip_istream<CharT, Traits>’ does not have any field named ‘gzip_crc_’
417 | , gzip_crc_(0)
| ^~~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:418:7: error: class ‘basic_zip_istream<CharT, Traits>’ does not have any field named ‘gzip_data_size_’
418 | , gzip_data_size_(0) {
| ^~~~~~~~~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In member function ‘bool basic_zip_istream<CharT, Traits>::check_crc()’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:434:31: error: ‘gzip_crc_’ was not declared in this scope
434 | return this->get_crc() == gzip_crc_;
| ^~~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In member function ‘bool basic_zip_istream<CharT, Traits>::check_data_size() const’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:439:36: error: ‘gzip_data_size_’ was not declared in this scope; did you mean ‘get_gzip_data_size’?
439 | return this->get_out_size() == gzip_data_size_;
| ^~~~~~~~~~~~~~~
| get_gzip_data_size
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In member function ‘long int basic_zip_istream<CharT, Traits>::get_gzip_crc() const’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:444:12: error: ‘gzip_crc_’ was not declared in this scope
444 | return gzip_crc_;
| ^~~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In member function ‘long int basic_zip_istream<CharT, Traits>::get_gzip_data_size() const’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:449:12: error: ‘gzip_data_size_’ was not declared in this scope; did you mean ‘get_gzip_data_size’?
449 | return gzip_data_size_;
| ^~~~~~~~~~~~~~~
| get_gzip_data_size
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In member function ‘void basic_zip_istream<CharT, Traits>::read_footer()’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:511:9: error: ‘gzip_crc_’ was not declared in this scope
511 | gzip_crc_ = 0;
| ^~~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:513:39: error: ‘uint32_t’ does not name a type
513 | gzip_crc_ += (static_cast<uint32_t>(this->get_istream().get()) & 0xff) << (8 * n);
| ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:513:39: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:515:9: error: ‘gzip_data_size_’ was not declared in this scope; did you mean ‘get_gzip_data_size’?
515 | gzip_data_size_ = 0;
| ^~~~~~~~~~~~~~~
| get_gzip_data_size
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:517:45: error: ‘uint32_t’ does not name a type
517 | gzip_data_size_ += (static_cast<uint32_t>(this->get_istream().get()) & 0xff) << (8 * n);
| ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:517:45: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In instantiation of ‘bool basic_zip_istream<CharT, Traits>::check_crc() [with CharT = char; Traits = std::char_traits<char>]’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:527:16: required from here
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:434:18: error: ‘class basic_zip_istream<char>’ has no member named ‘get_crc’
434 | return this->get_crc() == gzip_crc_;
| ~~~~~~^~~~~~~
make[2]: *** [CMakeFiles/sshash_static.dir/build.make:76: CMakeFiles/sshash_static.dir/include/gz/zip_stream.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:93: CMakeFiles/sshash_static.dir/all] Error 2
make: *** [Makefile:91: all] Error 2
This is needed to index and query large collections containing more than 2^32 contigs.
An application of this is in Fulgor, when indexing very heterogenous collections.
Tagging @rickbeeloo here and this issue jermp/fulgor#16.
It looks like all that is needed is to refactor these two small points a little bit.
Another point is -- for very large scale indexing -- I would suggest to use a partitioned PTHash,
rather than a single PTHash, to lower the construction time.
Hey,
I am using sshash as submodule in my project. It will make my life much easier if you added an empty constructor for dictionary::iterator.
I am getting this error "error: no matching function for call to ‘sshash::dictionary::iterator::iterator()’". I am not using the constructor directly but I am keeping an iterator as a member in my classes.
Thanks,
Moustafa
I tried to compile the package to give it a try but met a problem at compiling:
/usr/lib/gcc/x86_64-linux-gnu/11/include/popcntintrin.h: In function ‘uint64_t pthash::util::popcount(uint64_t)’:
/usr/lib/gcc/x86_64-linux-gnu/11/include/popcntintrin.h:42:1: error: inlining failed in call to ‘always_inline’ ‘long long int _mm_popcnt_u64(long long unsigned int)’: target specific option mismatch
42 | _mm_popcnt_u64 (unsigned long long __X)
| ^~~~~~~~~~~~~~
Googled for a while and found out the the problem is related to the -mpopcnt option, but I am not familiar with the cmake and the _mm_popcnt_u64() function.
My system is Linux 5.13.0-24-generic #24-Ubuntu-21.10
I appreciate any help to resolve this issue.
As per the title, try to avoid that. So annoying.
Perhaps a simple linear search over the buckets' thresholds is fast anyway since
these are supposed to be very few.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.