Giter Club home page Giter Club logo

sshash's People

Contributors

adamant-pwn avatar hmusta avatar jermp avatar rob-p avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

sshash's Issues

Support for compiling on M1/M2 Macs

Hi @jermp,

Right now, compilation fails on M1 and M2 Macs for several reasons. First, the current version of Xcode doesn't support -march=native on these machines. Clang does upstream, so this will go away eventually, but it would be good for the CMakeFiles using this to have a flag one could pass to remove this flag.

Second, there are several places where x86 intrinsics are used (and at least one place where inline ASM is used). The intrinsics could be made portable with simde, and the assembly could likely be tested too (not sure if that actually causes a problem or not but I can't get to it yet because of the other intrinsics).

I think it would be worth the (hopefully) minor changes to be able to support sshash (and hence piscem) compilation directly on M1/M2 hardware!

Best,
Rob

Writing hashes to output file

Hello,

Can you give me some advice of how to change the code so that when I query the index it writes the hashes to stdout/a file. Thanks for any advice.

Minimizer absolute position in sequences

Store the minimizer's absolute positions rather than those of super-kmers.
Align the query kmer to the minimizer's position at query time.

This will simplify the code logic avoiding the scan of the super-kmers.

Segmentation fault

Hi, I'm getting this Segmentation fault. Am I doing something stupid or is it a bug?

./build/build test.fa 31 13 -o ./index/out.sshash

k = 31, m = 13, seed = 1, l = 6, c = 3, canonical_parsing = false, weighted = false
reading file 'test.fa'...
m_buffer_size 29411764
sorting buffer...
saving to file './sshash.tmp.run_1653987257158295118.minimizers.0.bin'...
read 1 sequences, 63 bases, 33 kmers
num_kmers 33
num_super_kmers 4
num_pieces 2 (+3.63636 [bits/kmer])
=== step 1: 'parse_file' 7.9e-05 [sec] (2393.94 [ns/kmer])
Segmentation fault

This is the file test.fna (I had to change the extension to .txt for github because github did not allow .fa file extension).

test.txt

Errors while indexing small test sequence sets with even values of m for small k

Hi Giulio,

Thank you for developing and providing this tool!

I was testing it out on some small data sets and noticed that for even values of m it fails to build an index.

I'm testing with the following sequence inputs:

>1
CATGTACTAGCTGATCGTAGCTAGCTAGC
>2
AAAAAAAAAAAA

and the following build command

./sshash build -i dump.fa -k 12 -m 6 -o dump_sshash

The above command produces this output:

k = 12, m = 6, seed = 1, l = 6, c = 3, canonical_parsing = false, weighted = false
reading file 'dump.fa'...
m_buffer_size 29411764
sorting buffer...
saving to file './sshash.tmp.run_1701436753917127000.minimizers.0.bin'...
read 2 sequences, 41 bases, 19 kmers
num_kmers 19
num_super_kmers 5
num_pieces 3 (+3.47368 [bits/kmer])
=== step 1: 'parse_file' 0.000847 [sec] (44578.9 [ns/kmer])
terminate called after throwing an instance of 'std::runtime_error'
  what():  mmap failed
Abort trap: 6

I don't see any errors when I run it using valgrind, but the backtrace shows that it's aborting after this line:

mm::file_source<unsigned long>::open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) (mm_file.hpp:133)

It does work, however, if I set m = 1,3,5,7, for m>=9 it segfaults or produces other errors. Strangely, I get different behaviour in different environments. It fails for all values of m on my Mac with g++12, but works with the values mentioned on a Linux system with g++9... Also, while trying to explore this, I replaced CATGTACTAGCTGATCGTAGCTAGCTAGC with CATGTAGCTGATCGTAGCTAGCTAGC and it works for even but not odd values of m on my Mac....

Please let me know if I can provide any more information to help debug this.

[Discussion] : Distribution via Bioconda

Continuing from #22

The problem is, that doesn’t work with this code. We immediately get an illegal instruction error. You can see this, e.g. if you pull down the latest piscem from bioconda.
One question I had is, what compiler flags do we actually need? For example, rosetta 2 handles many intrinsics fine (e.g. SSE4.2), but not others (e.g. AVX512). I am not certain if it handles BMI2 or not.
My thought is, if we figured out what instructions it can handle — we could perhaps add a “CONDA_BUILD” flag to the CMake scripts that would skip march=native, but would still pass the most essential performance flags to the compiler. This would produce (ideally) an x86_64 executable that may not be quite as optimized, but which would run on both x86_64 and M1/2 Macs. Let me know if you have any questions or thoughts on this.
Yeah, we should get an idea of what instructions are not permitted there...but anyway I do not think a x86 binary would execute well on ARM.

Ok, we are mixing two separate (although related) things in this thread:

  1. support SSHash on ARM (which, now, it should be done!);
  2. distribute SSHash via bioconda.
    For point 2. I do not know since I haven't used bioconda and I'm not familiar with that. But if they do not have any M1/M2 build hosts...then it's not SSHash's (hence, nor our) fault :D Eventually they should get some building environment, I suppose, no?
    For the moment, we could just warn the users about this bioconda's limitation and ask them to download SSHash directly from GitHub and compile it manually (which is trivial).

The discussion here is in regard to 2. I have two thoughts here.

  1. I absolutely agree that, until native M1/2 builds are available from bioconda, it would be better for folks to compile themselves, and for that to be made as easy as possible (could we provide pre-compiled binaries?)

  2. Regarding performance, actually, rosetta 2 is pretty amazing in my experience. Even through translation, the M1 (Pro/Max) often outperforms the previous top-end MacBook Pros running i9. My understanding is that rosetta 2 directly translates many of the x86 intrinsics to native Neon intrinsics (or whatever special instructions the M architecture has). While I agree that compilation isn't difficult, I also have a lot of prior experience telling me that my making that statement is very different than the experience a biologist who doesn't focus on software/methods trying to build my tool will have.

Of course, I absolutely understand if you think supporting bioconda builds that run on M1/2 isn't of sufficient priority to warrant effort at this point — we could ask the bioconda people what their path forward and intended timeline is. On the other hand, it would be nice to know what is the delta between what march=native offers and what instructions are actually useful / necessary. It may be that we can remove that flag in Conda builds, explicitly specify the instructions we want, and get little-to-no performance degradation and the ability to distribute something via Conda that works on all platforms (which makes it trivial for people to use both locally and on a cluster).

SSHash as a C++ API / library

Hello @jermp: thank you for designing this awesome data structure! We at @COMBINE-lab (tag: @rob-p) are finding SSHash very useful.

We are interested in using SSHash from within C++ codes, and were wondering if you could provide some C++ API / header for that. We are, for now, interested in the following basic functionalities. I'm using the "unitig" terminology in the following, but the input sequence types (i.e. unitigs / maximal paths) does not matter.

Given a k-mer x, find its:

  • hash value determined by SSHash, i.e. lookup(x)
  • unitig ID, i.e. unitig(x)
  • offset within the unitig that contains it, i.e. unitig_offset(x)

Given a unitig ID u, find its:

  • size, i.e. size(u)
  • list of neighbor unitigs, neighbors(u): this likely consists of a query of 2 * 4 (canonical) k-mers.

We hope that these might be feasible for you to provide.

Thanks again!

External memory construction seems too high

Hi @jermp,

I'm trying out the new external memory construction feature of sshash (super-exciting!). I've noticed that the memory usage seems surprisingly high. For example, I ran on the compacted dBG (not the UST) of the human genome with:

/usr/bin/time ./build  unitigs.fa 31 19 -d /scratch/shash_tmp -o grch38_k31_seg.sshash --verbose

and I am getting ~12G of memory usage. The time is ~16m, but I'd expect this given it's running on spinning rust rather than SSD.

2316.51user 47.52system 16:27.23elapsed 239%CPU (0avgtext+0avgdata 12690096maxresident)k
8155944inputs+50695376outputs (1major+8309647minor)pagefaults 0swaps

Any idea why the memory usage might be like this? I noticed that your initial mention was that it was ~4G for the human genome with k=31. Is it because of my minimizer size? When I use a minimizer of 15, this goes down to ~9G, which is better, but still larger than I'd have anticipated. Any thoughts on what might be going on here?

Thanks!
Rob

question about input data

hello,

i'm trying your software to see memory consumption of it for a metagenomics dataset i have, the problem is that i can't use bcalm to generate unitig for it(GATB/bcalm#71) due to the big number of kmer, have you any software recomandation to transform a lot of read into a input format that work with your software?

Larger k and generic alphabets in SSHash

Hi @jermp!

I'm currently working on adopting SSHash in MetaGraph, as we want to see if we can improve our performance using it as an internal structure for indexing and traversing DBGs.

As a part of the process, we want to generalize SSHash in a way that allows larger values of $k$ and potentially different alphabets. My general idea here is to rewrite the code base, so that the methods that previously relied on some properties of k-mers (the alphabet used, methods to convert between k-mers/individual characters and their bit representation, etc) would now be implemented as members of the kmer_t class, rather than utils:: functions, as they mostly are now.

So, I plan to write a simple base kmer_t class with virtual functions that provides contracts for how such methods should look like, and then we will inherit from this base class in MetaGraph to provide implementations for our kmer types. Then, all the methods that currently rely on kmer_t would take a template argument instead, which should be an implementation k-mer class derived from the base class described above. I think this way we can ensure that Metagraph and SSHash stay reasonably separated.

I started working in this direction in our fork of SSHash (9624612), but before going further I wanted to get in touch and collect some feedback on your side. That being said, I have a couple of questions:

  1. Would it be of interest for SSHash to merge these changes upstream in the future?
  2. Do you have any advice on how this should be approached? Does the outlined approach make sense to you?
  3. It seemed to me that most of the stuff that heavily relies on the alphabet being ACGT is currently encapsulated in util.hpp, but there are also some methods in e.g. dictionary.cpp. Are there any other places where you rely on the alphabet being ACGT directly, rather than implicitly via calls to util:: or dictionary methods?

If merging it upstream in the future is of interest to you, I will try to take your opinion into consideration while working in the fork, so it'd be easier to merge afterwards. If it's not a priority for you, no problem, we can keep working on the fork, but still will highly appreciate any insight and advice you might have for these changes.

No queries being run

Hi again,

I'm trying to run queries on a small sshash index, but the program reports that zero queries were ran. What could be the issue?

niklas@phoenix:~/code/SBWT_experiments$ ./sshash/build/query index.sshash test.fna
2022-06-01 19:02:46: loading index from file 'index.sshash'...
index size: 2.13636 [MB] (7.09913 [bits/kmer])
2022-06-01 19:02:46: performing queries from file 'test.fna'...
2022-06-01 19:02:46: DONE
==== query report:
num_kmers = 0
num_valid_kmers = 0 (-nan% of kmers)
num_positive_kmers = 0 (-nan% of valid kmers)
num_searches = 140332537301184/0 (inf%)
num_extensions = 2/0 (inf%)
elapsed = 0.005 millisec / 5e-06 sec / 8.33333e-08 min / inf ns/kmer

The index and the queries are attached here. I'm not attaching the original sequences because they are over 30GB.
data.zip

error with 64-bit hash code

hello,

i have this error :

terminate called after throwing an instance of 'std::runtime_error'
what(): Using 64-bit hash codes with more than 2^30 keys can be dangerous due to collisions: use 128-bit hash codes instead.

and i don't find how i can use a 128-bit hash codes when building

failed to compile with v3.0.0 and the lastest commit

I'm trying to install the flugor, but it seems like sshash, as a dependency, is failed to compile.

Environments:

$ uname -a
Linux mBio 6.1.55-1-MANJARO #1 SMP PREEMPT_DYNAMIC Sat Sep 23 12:13:56 UTC 2023 x86_64 GNU/Linux

$ gcc --version
gcc (GCC) 13.2.1 20230801

zlib: 1.3
$ cmake ..
-- The C compiler identification is GNU 13.2.1
-- The CXX compiler identification is GNU 13.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- CMAKE_BUILD_TYPE: Release
-- Configuring done (0.3s)
-- Generating done (0.0s)
-- Build files have been written to: /home/shenwei/Downloads/fulgor/sshash/build

$ make
[  3%] Building CXX object CMakeFiles/sshash_static.dir/include/gz/zip_stream.cpp.o
In file included from /home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:1:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:121:5: error: ‘uint32_t’ does not name a type
  121 |     uint32_t get_crc() const;
      |     ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:67:1: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
   66 | #include <vector>
  +++ |+#include <cstdint>
   67 | 
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:124:5: error: ‘uint32_t’ does not name a type
  124 |     uint32_t get_in_size() const;
      |     ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:124:5: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:137:5: error: ‘uint32_t’ does not name a type
  137 |     uint32_t crc_;
      |     ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:137:5: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:180:5: error: ‘uint32_t’ does not name a type
  180 |     uint32_t get_crc() const;
      |     ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:180:5: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:186:5: error: ‘uint32_t’ does not name a type
  186 |     uint32_t get_in_size() const;
      |     ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:186:5: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:200:5: error: ‘uint32_t’ does not name a type
  200 |     uint32_t crc_;
      |     ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:200:5: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:285:5: error: ‘uint32_t’ does not name a type
  285 |     uint32_t gzip_crc_;
      |     ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:285:5: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:286:5: error: ‘uint32_t’ does not name a type
  286 |     uint32_t gzip_data_size_;
      |     ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.hpp:286:5: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In constructor ‘basic_zip_streambuf<CharT, Traits>::basic_zip_streambuf(ostream_reference, int, ZipStrategy, int, int, size_t)’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:37:83: error: class ‘basic_zip_streambuf<CharT, Traits>’ does not have any field named ‘crc_’
   37 |     : ostream_(ostream), output_buffer_(buffer_size, 0), buffer_(buffer_size, 0), crc_(0) {
      |                                                                                   ^~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In member function ‘std::streamsize basic_zip_streambuf<CharT, Traits>::flush()’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:101:5: error: ‘crc_’ was not declared in this scope
  101 |     crc_ = static_cast<uint32_t>(crc32(crc_, zip_stream_.next_in, zip_stream_.avail_in));
      |     ^~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:101:24: error: ‘uint32_t’ does not name a type
  101 |     crc_ = static_cast<uint32_t>(crc32(crc_, zip_stream_.next_in, zip_stream_.avail_in));
      |                        ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:5:1: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
    4 | #include <cstring>
  +++ |+#include <cstdint>
    5 | 
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: At global scope:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:136:1: error: ‘uint32_t’ does not name a type
  136 | uint32_t basic_zip_streambuf<CharT, Traits>::get_crc() const {
      | ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:136:1: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:141:1: error: ‘uint32_t’ does not name a type
  141 | uint32_t basic_zip_streambuf<CharT, Traits>::get_in_size() const {
      | ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:141:1: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In member function ‘bool basic_zip_streambuf<CharT, Traits>::zip_to_stream(char*, std::streamsize)’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:163:5: error: ‘crc_’ was not declared in this scope
  163 |     crc_ = static_cast<uint32_t>(crc32(crc_, zip_stream_.next_in, zip_stream_.avail_in));
      |     ^~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:163:24: error: ‘uint32_t’ does not name a type
  163 |     crc_ = static_cast<uint32_t>(crc32(crc_, zip_stream_.next_in, zip_stream_.avail_in));
      |                        ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:163:24: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In constructor ‘basic_unzip_streambuf<CharT, Traits>::basic_unzip_streambuf(istream_reference, int, size_t, size_t)’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:193:87: error: class ‘basic_unzip_streambuf<CharT, Traits>’ does not have any field named ‘crc_’
  193 |     : istream_(istream), input_buffer_(input_buffer_size), buffer_(read_buffer_size), crc_(0) {
      |                                                                                       ^~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: At global scope:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:259:1: error: ‘uint32_t’ does not name a type
  259 | uint32_t basic_unzip_streambuf<CharT, Traits>::get_crc() const {
      | ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:259:1: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:269:1: error: ‘uint32_t’ does not name a type
  269 | uint32_t basic_unzip_streambuf<CharT, Traits>::get_in_size() const {
      | ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:269:1: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In member function ‘std::streamsize basic_unzip_streambuf<CharT, Traits>::unzip_from_stream(char_type*, std::streamsize)’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:304:5: error: ‘crc_’ was not declared in this scope
  304 |     crc_ = static_cast<uint32_t>(crc32(crc_, reinterpret_cast<byte_type*>(buffer), (uInt)theSize));
      |     ^~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:304:24: error: ‘uint32_t’ does not name a type
  304 |     crc_ = static_cast<uint32_t>(crc32(crc_, reinterpret_cast<byte_type*>(buffer), (uInt)theSize));
      |                        ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:304:24: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In member function ‘basic_zip_ostream<CharT, Traits>& basic_zip_ostream<CharT, Traits>::add_footer()’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:391:5: error: ‘uint32_t’ was not declared in this scope
  391 |     uint32_t crc = this->get_crc();
      |     ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:391:5: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:393:51: error: ‘crc’ was not declared in this scope; did you mean ‘crc32’?
  393 |         this->get_ostream().put(static_cast<char>(crc & 0xff));
      |                                                   ^~~
      |                                                   crc32
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:397:13: error: expected ‘;’ before ‘length’
  397 |     uint32_t length = this->get_in_size();
      |             ^~~~~~~
      |             ;
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:399:51: error: ‘length’ was not declared in this scope
  399 |         this->get_ostream().put(static_cast<char>(length & 0xff));
      |                                                   ^~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In constructor ‘basic_zip_istream<CharT, Traits>::basic_zip_istream(istream_reference, int, size_t, size_t)’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:417:7: error: class ‘basic_zip_istream<CharT, Traits>’ does not have any field named ‘gzip_crc_’
  417 |     , gzip_crc_(0)
      |       ^~~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:418:7: error: class ‘basic_zip_istream<CharT, Traits>’ does not have any field named ‘gzip_data_size_’
  418 |     , gzip_data_size_(0) {
      |       ^~~~~~~~~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In member function ‘bool basic_zip_istream<CharT, Traits>::check_crc()’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:434:31: error: ‘gzip_crc_’ was not declared in this scope
  434 |     return this->get_crc() == gzip_crc_;
      |                               ^~~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In member function ‘bool basic_zip_istream<CharT, Traits>::check_data_size() const’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:439:36: error: ‘gzip_data_size_’ was not declared in this scope; did you mean ‘get_gzip_data_size’?
  439 |     return this->get_out_size() == gzip_data_size_;
      |                                    ^~~~~~~~~~~~~~~
      |                                    get_gzip_data_size
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In member function ‘long int basic_zip_istream<CharT, Traits>::get_gzip_crc() const’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:444:12: error: ‘gzip_crc_’ was not declared in this scope
  444 |     return gzip_crc_;
      |            ^~~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In member function ‘long int basic_zip_istream<CharT, Traits>::get_gzip_data_size() const’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:449:12: error: ‘gzip_data_size_’ was not declared in this scope; did you mean ‘get_gzip_data_size’?
  449 |     return gzip_data_size_;
      |            ^~~~~~~~~~~~~~~
      |            get_gzip_data_size
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In member function ‘void basic_zip_istream<CharT, Traits>::read_footer()’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:511:9: error: ‘gzip_crc_’ was not declared in this scope
  511 |         gzip_crc_ = 0;
      |         ^~~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:513:39: error: ‘uint32_t’ does not name a type
  513 |             gzip_crc_ += (static_cast<uint32_t>(this->get_istream().get()) & 0xff) << (8 * n);
      |                                       ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:513:39: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:515:9: error: ‘gzip_data_size_’ was not declared in this scope; did you mean ‘get_gzip_data_size’?
  515 |         gzip_data_size_ = 0;
      |         ^~~~~~~~~~~~~~~
      |         get_gzip_data_size
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:517:45: error: ‘uint32_t’ does not name a type
  517 |             gzip_data_size_ += (static_cast<uint32_t>(this->get_istream().get()) & 0xff) << (8 * n);
      |                                             ^~~~~~~~
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:517:45: note: ‘uint32_t’ is defined in header ‘<cstdint>’; did you forget to ‘#include <cstdint>’?
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp: In instantiation of ‘bool basic_zip_istream<CharT, Traits>::check_crc() [with CharT = char; Traits = std::char_traits<char>]’:
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:527:16:   required from here
/home/shenwei/Downloads/fulgor/sshash/include/gz/zip_stream.cpp:434:18: error: ‘class basic_zip_istream<char>’ has no member named ‘get_crc’
  434 |     return this->get_crc() == gzip_crc_;
      |            ~~~~~~^~~~~~~
make[2]: *** [CMakeFiles/sshash_static.dir/build.make:76: CMakeFiles/sshash_static.dir/include/gz/zip_stream.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:93: CMakeFiles/sshash_static.dir/all] Error 2
make: *** [Makefile:91: all] Error 2

Support 64-bit contig_id

This is needed to index and query large collections containing more than 2^32 contigs.
An application of this is in Fulgor, when indexing very heterogenous collections.
Tagging @rickbeeloo here and this issue jermp/fulgor#16.

It looks like all that is needed is to refactor these two small points a little bit.

Another point is -- for very large scale indexing -- I would suggest to use a partitioned PTHash,
rather than a single PTHash, to lower the construction time.

Default constructor to the iterator

Hey,
I am using sshash as submodule in my project. It will make my life much easier if you added an empty constructor for dictionary::iterator.
I am getting this error "error: no matching function for call to ‘sshash::dictionary::iterator::iterator()’". I am not using the constructor directly but I am keeping an iterator as a member in my classes.

Thanks,
Moustafa

Error: inlining failed in call to ‘always_inline’

I tried to compile the package to give it a try but met a problem at compiling:

/usr/lib/gcc/x86_64-linux-gnu/11/include/popcntintrin.h: In function ‘uint64_t pthash::util::popcount(uint64_t)’:
/usr/lib/gcc/x86_64-linux-gnu/11/include/popcntintrin.h:42:1: error: inlining failed in call to ‘always_inline’ ‘long long int _mm_popcnt_u64(long long unsigned int)’: target specific option mismatch
   42 | _mm_popcnt_u64 (unsigned long long __X)
      | ^~~~~~~~~~~~~~

Googled for a while and found out the the problem is related to the -mpopcnt option, but I am not familiar with the cmake and the _mm_popcnt_u64() function.
My system is Linux 5.13.0-24-generic #24-Ubuntu-21.10
I appreciate any help to resolve this issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.