google / hashr Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
I suspect that the cache map starts consuming a very large amount of memory after a while. I ran it on a machine with 32gb RAM and nothing other than postgres and hashr running and it killed hashr for OOM.
Should we consider adding support for offloading the cache to the database or something like redis?
It would be helpful to create the DB schema from hashr binary if it doesn't exist yet.
It would good to have a small binary to query DB for a single hash or allow bulk querying of hashes from a flat file
If one tries to run hashr using Linux with enforced SELinux, there will be an access violation when it comes to the preprocessing.
This is due to incompatibility in SELinux contexts of /tmp and docker
AVC Events:
scontext=system_u:system_r:container_t:
tcontext=unconfined_u:object_r:user_tmp_t
General solutions (maybe there are more):
I ran hashr on 1.2TB of input (400k .deb files) and after 3 days, having processed about 120k packages, it crashed with a segfault. I have attached the log.
Apr 03 13:30:00 hashr2 bash[1362]: I0403 13:30:00.043981 1362 hashr.go:200] Preprocessing linux-tools-4.15.0-106-generic_4.15.0-106.107_amd64.deb
Apr 03 13:30:00 hashr2 bash[1362]: I0403 13:30:00.044380 1362 common.go:142] Copying linux-tools-4.15.0-106-generic_4.15.0-106.107_amd64.deb to /tmp/hashr-linux-tools-4.15.0-106-generic_4.15.0-106.107_amd64.deb-309297636/linux-tools-4.15.0-106-generic_4.15.0-106.107_amd64.deb
Apr 03 13:30:00 hashr2 bash[1362]: E0403 13:30:00.045723 1362 hashr.go:308] deb: skipping source linux-tools-4.15.0-101-generic_4.15.0-101.102_amd64.deb: error while preprocessing: error while opening tar archive in deb package: xz: data is truncated or corrupt
Apr 03 13:30:00 hashr2 bash[1362]: I0403 13:30:00.046763 1362 hashr.go:233] Deleting
The "Deleting" log line is missing a path
Line 233 in 8c20ba5
There is a --rm
missing when running the plaso container. This leaves a bunch of stopped containers after using that importer.
hashr/processors/local/local.go
Line 74 in 8fdb572
I ran hashr using the new deb importer to hash a large number of files, specficially every single Ubuntu 22 package and after about 8 hours of runtime it crashed with the following message:
Nov 13 09:19:43 deb-hashing bash[29850]: fatal error: concurrent map read and map write
I have attached the log file which shows what happened bafore and the full stack trace.
hashr-crash.log
Currently, some hashr importers rely on directly mounting ISO files within Docker containers (e.g. windows or iso importer). This approach is not generally supported by Docker due to security restrictions. Explore and implement alternatives to eliminate the need for the --privileged
flag. (see #61)
Currently when cleaning up the temp files we use sudo:
Line 235 in 8c20ba5
Currently, all hashes are stored as hex-encoded VARCHAR(100) in the database. This means that every hash takes up roughly twice as much space as it should need if stored optimally.
Unfortunately Postgres does not have a fixed length binary data type so the closest would be the BYTEA data type. I think we should investigate how changing to this for hashes affects storage requirements and lookup times.
It seems like .xz compressed contents in .deb packages are not supported by the .deb importer. We should add support for this.
trying to hash several windows iso's, getting mount failed every time, even when running as SUDO
docker run -it --network hashr_net -v /opt/hashr/windows_iso:/data/windows us-docker.pkg.dev/osdfir-registry/hashr/release/hashr -storage postgres -postgres_host hashr_postgresql -postgres_port 5432 -postgres_user XXX -postgres_password XXX -postgres_db hashr -importers windows -windows_iso_repo_path /data/windows -exporters postgres
error:
Stderr: mount: /tmp/hashr-server2022.iso-3119761407/mnt: mount failed: Operation not permitted.
thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.