Giter Club home page Giter Club logo

hashr's Issues

Large memory consumptions with large number of input files

I suspect that the cache map starts consuming a very large amount of memory after a while. I ran it on a machine with 32gb RAM and nothing other than postgres and hashr running and it killed hashr for OOM.

Should we consider adding support for offloading the cache to the database or something like redis?

hashr-crash3.log

Add SELinux policy

If one tries to run hashr using Linux with enforced SELinux, there will be an access violation when it comes to the preprocessing.
This is due to incompatibility in SELinux contexts of /tmp and docker

AVC Events:
scontext=system_u:system_r:container_t:
tcontext=unconfined_u:object_r:user_tmp_t

General solutions (maybe there are more):

  1. Custom SELinux policy
  2. Change default preprocessing directory or provide flag to do so
  3. Disable SELinux: Don't think this should be a general solution

Properly clean up temporary files on importer error

Apr 03 13:30:00 hashr2 bash[1362]: I0403 13:30:00.043981    1362 hashr.go:200] Preprocessing linux-tools-4.15.0-106-generic_4.15.0-106.107_amd64.deb
Apr 03 13:30:00 hashr2 bash[1362]: I0403 13:30:00.044380    1362 common.go:142] Copying linux-tools-4.15.0-106-generic_4.15.0-106.107_amd64.deb to /tmp/hashr-linux-tools-4.15.0-106-generic_4.15.0-106.107_amd64.deb-309297636/linux-tools-4.15.0-106-generic_4.15.0-106.107_amd64.deb
Apr 03 13:30:00 hashr2 bash[1362]: E0403 13:30:00.045723    1362 hashr.go:308] deb: skipping source linux-tools-4.15.0-101-generic_4.15.0-101.102_amd64.deb: error while preprocessing: error while opening tar archive in deb package: xz: data is truncated or corrupt
Apr 03 13:30:00 hashr2 bash[1362]: I0403 13:30:00.046763    1362 hashr.go:233] Deleting

The "Deleting" log line is missing a path

glog.Infof("Deleting %s", path)
which is probably also why the removal never happens. This could have been dangerous as well considering the command below is "sudo rm -rf ".

Add --rm when running plaso container

There is a --rm missing when running the plaso container. This leaves a bunch of stopped containers after using that importer.

args := []string{"run", "-v", "/tmp/:/tmp", "log2timeline/plaso", "image_export", "--logfile", logFile, "--partitions", "all", "--volumes", "all", "-w", exportDir, sourcePath}

Crash with concurrent map read/write

I ran hashr using the new deb importer to hash a large number of files, specficially every single Ubuntu 22 package and after about 8 hours of runtime it crashed with the following message:

Nov 13 09:19:43 deb-hashing bash[29850]: fatal error: concurrent map read and map write

I have attached the log file which shows what happened bafore and the full stack trace.
hashr-crash.log

Explore alternatives to `mount` for Docker compatibility in importers

Currently, some hashr importers rely on directly mounting ISO files within Docker containers (e.g. windows or iso importer). This approach is not generally supported by Docker due to security restrictions. Explore and implement alternatives to eliminate the need for the --privileged flag. (see #61)

Considerations:

  • Tools: Investigate libraries or tools like 7z, xorriso, or others that can extract or access contents of ISO files without requiring a direct mount.
  • Performance Impact: Evaluate any trade-offs in terms of performance or resource usage between the mounting approach and potential alternatives.
  • Importer Scope: Identify the specific importers within hashr that rely on mount and will need to be refactored.

Run containers as hashr user to remove need for "sudo rm"

Currently when cleaning up the temp files we use sudo:

cmd := exec.Command("sudo", "rm", "-rf", path)
I presume this is because some of the importers run a docker image (as root) which means the produced files are owned by root. If we run that container as the same user hashr is running as they will be owned by that user and we don't need to do "sudo rm ..." which could be dangerous if there is a bug.

Consider storing hashes as binary data in SQL database

Currently, all hashes are stored as hex-encoded VARCHAR(100) in the database. This means that every hash takes up roughly twice as much space as it should need if stored optimally.

Unfortunately Postgres does not have a fixed length binary data type so the closest would be the BYTEA data type. I think we should investigate how changing to this for hashes affects storage requirements and lookup times.

mount failed: Operation not permitted

trying to hash several windows iso's, getting mount failed every time, even when running as SUDO
docker run -it --network hashr_net -v /opt/hashr/windows_iso:/data/windows us-docker.pkg.dev/osdfir-registry/hashr/release/hashr -storage postgres -postgres_host hashr_postgresql -postgres_port 5432 -postgres_user XXX -postgres_password XXX -postgres_db hashr -importers windows -windows_iso_repo_path /data/windows -exporters postgres

error:
Stderr: mount: /tmp/hashr-server2022.iso-3119761407/mnt: mount failed: Operation not permitted.

thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.