fox-it / acquire Goto Github PK

View Code? Open in Web Editor NEW

83.0 83.0 20.0 250 KB

acquire is a tool to quickly gather forensic artifacts from disk images or a live system into a lightweight container.

License: GNU Affero General Public License v3.0

Python 99.74% Makefile 0.26%

acquire's People

Contributors

Stargazers

Watchers

Forkers

raphajohnsec cephurs zawadidone cvlabsio sima456 crispud jakuta-tech nicholas-souza lesander rickvandreunen devjoost tony7466 jscu-cni joost-j conspiracy98 otnxsl rickgeex ruzzle michoebey

acquire's Issues

Collect CarbonBlack logs

CarbonBlack logs can contain interesting information, they reside in the following directory on Windows:

c:\ProgramData\CarbonBlack\Logs

Some example log files in this directory:
confer.log and confer.log.\*.zip
cblr.log
SensorAlarms.log

However, all logs in this directory can be of interest.

Name services configuration + binaries

Name services configuration + binaries
** /etc/nsswitch.conf
** all referred libnss_* modules

Parse contents of {{nsswitch.conf}} to records. Ideally, collect the paths found in this config file with {{acquire}} automagically.

Make acquire modules OS aware

Design requirements:

Collection should be done for the whole set of os levels (e.g. Linux & fortigate incase of collection for a fortigate machine). The set of levels could be determined by looking at the (super) classes of OSPlugin for the specific target at hand.
This determining of the set of OS levels is best implemented in dissect.target to have a single interface for this type of information. Take a sneak peek at how to replace the current system of determining OSes (linux, windows, mac, unknown) with this interface.
A Module should be able to service multiple OSes. This means the SPECS etc. in a Module subclass should be devided based on OS.

Acquire Docker container metadata and filesystems

We recently had a case where relevant logs (and other traces) were stored in Docker volumes. It would be nice to have a way (a {{docker}} plugin?) to acquire the container and image metadata from {{/var/lib/docker}} as well as the overlayfs layers of the actual containers (i.e. not those of the images, since that would be both less interesting as well as a lot of data).

Skip sparse runs when collecting /var/log/lastlog in acquire

Similar to how the usnjrnl is collected.

Would no longer be an issue with ASDF.

Mac OS typo Applications Support

The acquire globs for Mac OS files related to Google Chrome and Firefox use the wrong folder name (s).

Applications Support -> Application Support

On Mac OS Monterey version 12.6

ls /Users/*/Library/Applications\ Support/
ls: /Users/*/Library/Applications Support/: No such file or directory

ls /Users/*/Library/Application\ Support/
Google
[...]

References

Make compression algorithms for Acquire output configurable

Currently compression algorithms for acquire outputs are hardcoded. Making them configurable is more desirable to avoid issues such as described in #182.

Collect coredump partitions in ESXi module

Collect additional Anydesk paths

This considers the remoteaccess dissect plugin

According to https://support.anydesk.com/knowledge/trace-files#trace-file-locations

Linux
~/.anydesk*/*.trace
/var/log/anydesk*.trace

MacOS
~/.anydesk*/*.trace
/var/log/anydesk*.trace

Windows (might be already in the plugin!)
%appdata%\AnyDeskd*.trace
%programdata%\AnyDeskd*.trace
%AllUsersProfile%\AnyDeskd*.trace

%

Skip and log when attempting to upload empty files with acquire

Research alternative to get rpd cache

when running qwinsta on windows11, it seems that it isn’t compatibe with the version of windows used… which is odd

look if we can run executable in compatibility mode using cmd

Pyoxidizer musl execution on VMware ESXi

Using the Pyoxidizer configuration (#109) I was able to build a static musl binary, but when executing the binary on VMware ESXi 7 the execution fails while it works on the Docker image (quay.io/pypa/manylinux2014_x86_64).

The error as shown below is triggered because it cannot obtain the current path of the executable, because /self/proc/exe (https://github.com/indygreg/PyOxidizer/blob/b78b0cb75f4317c45408bbc9a569c062c482c679/pyembed/src/config.rs#L478) is not available on VMware ESXi 7.

I don't understand well enough how Pyoxidizer works to determine what causes this error and how this issue can be resolved, but I will look into this.

*VMWare ESXi 7*

vmware -v
VMware ESXi 7.0.3 build-21930508

./acquire --help
error instantiating embedded Python interpreter: could not obtain current executable

strace ./acquire
[...]
mmap(0x7221612000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7221612000
readlink("/proc/self/exe", 0x7221612000, 256) = -1 EINVAL (Invalid argument)
[...]

ls -al /proc/self/exe
ls: /proc/self/exe: No such file or directory

*quay.io/pypa/manylinux2014_x86_64*

acquire --help
[...]
If no options are given, the collection profile 'default' is used.

ls -al /proc/self/exe
lrwxrwxrwx 1 root root 0 Nov 20 16:23 /proc/self/exe -> /usr/bin/ls

*Alpine Linux 3.18*

acquire --help
[...]
If no options are given, the collection profile 'default' is used.

strace ./acquire --help
[...]
readlink("/proc/self/exe", "/tmp/acquire", 256) = 12
open("/tmp/acquire", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_PATH) = 3
readlink("/proc/self/fd/3", "/tmp/acquire", 4095) = 12
fstat(3, {st_mode=S_IFREG|0750, st_size=112942000, ...}) = 0
stat("/tmp/acquire", {st_mode=S_IFREG|0750, st_size=112942000, ...}) = 0
close(3)
[...]


ls -al /proc/self/exe
lrwxrwxrwx    1 root     root             0 Nov 20 16:37 /proc/self/exe -> /bin/busybox

Allow skipping acquire of host with --children

{{--children}} automatically also collects the host itself. There should be a flag to skip this and only collect the children.

Inconsistent output paths when acquiring Windows container from Linux

When acquiring any non-live Windows container (HDD, VM image) from Linux with case-sensitive filesystem output tar/directory contains duplicate directories with mixed case:

For example, running acquire windows-vm.qcow2 on Linux with btrfs gives following directories (truncated for readability):

$ tree
.
└── C:
    ├── $Recycle.bin
    ├── $Recycle.Bin
    ├── windows
    │   ├── appcompat
    │   ├── system32
    │   │   ├── config
    │   │   ├── drivers
    │   │   ├── sru
    │   │   ├── tasks
    │   │   ├── wbem
    │   │   └── winevt
    │   └── tasks
    └── Windows
        └── System32
            └── WDI

Notice duplicated $Recycle.Bin, Windows, System32 directories with different case.
I managed to somewhat fix it with replacing all sysvol/windows/ and /sysvol/windows/system32 strings in acquire.py with proper case, but this method also requires similar changes in other dissect libraries, since acquire calls them to get collection paths. Surely there are a better fix for this than specifying correct case in collection paths, e.g. using proper path from filesystem for output path

acquire produces corrupt tar file when trying to collect a file which cannot be read for its whole length

I am using acquire on a raw image to get out some files. Within this image there are a couple of special (sparse? corrupt?) files which originate from a Firefox profile. The files in question are all sqlite files:

/root/.mozilla/firefox/[some-id].default-esr/cookies.sqlite and some other sqlite files found in this profile folder.

This file (cookies.sqlite) is reported to be 524288 bytes in size by ls or stat. But when actually read it stops after 98304 bytes (In my experiments cp, hexdump and md5sum all did not go beyond). After that it is only zero bytes anyway, so I doubt this is a coincidence.

Now when using the dir output type it seems to work without errors:


  Module |    Success |    Failure |    Missing |      Empty

------------------------------------------------------------

cli-args |      11477 |            |            |          

------------------------------------------------------------

   Total |      11477 |          0 |          0 |          0

------------------------------------------------------------

It should be noted though that the file cookies.sqlite for example was only 98304 bytes at destination as a result. I guess this behavior is ok depending on how you look at it. But a warning message might be appropriate.

The more serious problem happened when using the tar output type. Judging by the log this works mostly fine. In my example there are 11477 files of which only 5 failed:


Done collecting artifacts:

------------------------------------------------------------

  Module |    Success |    Failure |    Missing |      Empty

------------------------------------------------------------

cli-args |      11472 |          5 |            |

------------------------------------------------------------

   Total |      11472 |          5 |          0 |          0

------------------------------------------------------------

The 5 failed files were the sqlite files mentioned before.

But now when inspecting the tar file it seems to be corrupted:

tar -tvf out.tar | wc -l

tar: Skipping to next header

tar: Exiting with failure status due to previous errors

7152

It seems to only contain 7152 of the 11472 files. 7-Zip on Windows opens the tar even without complaints but many files are missing / not shown.

When using the tar switch -i/--ignore-zeros it finds at least all the files that were successfully collected.

tar -itvf out.tar | wc -l

tar: Skipping to next header

tar: Skipping to next header

tar: Skipping to next header

tar: Exiting with failure status due to previous errors

11469

What I learned is that if the file to be collected cannot be read for the length attributed to it, adding it to the tar file will fail, and the tar becomes somewhat corrupted.

What the logs says:


[2023-05-02 17:09:16,850] [ERROR] Failed to collect file

Traceback (most recent call last):

  File "/root/py3venv_dissect/lib/python3.11/site-packages/acquire/collector.py", line 279, in collect_file

    self.output.write_entry(outpath, entry, size)

  File "/root/py3venv_dissect/lib/python3.11/site-packages/acquire/outputs/base.py", line 54, in write_entry

    self.write(output_path, fh, entry, size)

  File "/root/py3venv_dissect/lib/python3.11/site-packages/acquire/outputs/tar.py", line 103, in write

    self.tar.addfile(info, fh)

  File "/usr/lib/python3.11/tarfile.py", line 2030, in addfile

    copyfileobj(fileobj, self.fileobj, tarinfo.size, bufsize=bufsize)

  File "/usr/lib/python3.11/tarfile.py", line 250, in copyfileobj

    raise exception("unexpected end of data")

OSError: unexpected end of data

[2023-05-02 17:09:16,853] [INFO ] - Collecting file /root/.mozilla/firefox/[some-id].default-esr/cookies.sqlite: OSError('unexpected end of data')

I tried to reproduce this error by building a sparse file manually with truncate -s 1M file.img and adding it to a tar using tarfile but to my surprise it worked. Maybe Firefox's sqlite files are created somehow special?

Let me know if I could provide more info.

Inconsistent flag name in error message

in the utils.py, there are two instances where --output_file is used instead of --output-file this needs to be changed for consistency

https://github.com/fox-it/acquire/blob/8a3a0b5eaf3d6e251aa52b5cad7e0b49a22cf7cd/acquire/utils.py#L302

Duplicate entries for McAfee Endpoint security logs

acquire/acquire/acquire.py

Line 1007 in 42f9ce0

("dir", "sysvol/ProgramData/McAfee/Endpoint Security/Logs"),

and

acquire/acquire/acquire.py

Line 1010 in 42f9ce0

("dir", "sysvol/ProgramData/McAfee/Endpoint Security/Logs"),

contain the same entries. Unsure if this would ever lead to any issues.

Read scratch directory from config in the ESXi module

Collect `$Secure:$SII` NTFS file

To make security descriptor lookups faster.

Extend method for NamedObject.from_directory_information

# TODO: Let this method generate different types of NamedObjects according to its type
# Idealy they each handle their own behaviour themselves, and clean up after themselves.
# NOTE: this does require some additional methods to get added, such as maybe open, close, and details(?)
# Then it would also be nice to have a contextmanager that handles the open and closing of the object.

Collect Zeek logs on Windows

(Path to be supplied)

Zeek will be implemented in Windows Defender so will run on all endpoints.

OSX module InstallHistory.plist file treated as a directory

acquire/acquire/acquire.py

Line 1410 in 0053395

("dir", "/Library/Receipts/InstallHistory.plist"),

This file is wrongly marked as a directory. It is, in fact, a regular file:

$ ls -lah
total 40
drwxrwxr-x   4 root        admin   128B Mar 18 21:58 .
drwxr-xr-x  72 root        wheel   2.3K Mar 15 11:21 ..
-rw-rw-r--   1 root        admin    20K Mar 18 21:58 InstallHistory.plist

Make acquire subprocess output capture non-blocking

There is no default way to do this on both Windows & Linux.

The best solution is to have a thread do the reading and let that block. An example can be found here:

[http://eyalarubas.com/python-subproc-nonblock.html]

Another way to do it is to make the way the winpmem module does it the default, with intermediate files where stdout and stderr are stored. This circumvents the need to implement non-blocking pipes.

Add option to acquire to collect UEFI

The UEFI partition is FAT based, and dissect.fat should just work. Might need some investigation into the differences between Windows and Linux based systems.

Add temporary suffix while Acquire is still running

During historical engagements, there have been cases where Acquire was deployed to many systems, with all resulting .tar.gz output files being written to a single fileshare. Although the .log file indicates the completion status of the Acquire run, it is hard to assess the status without opening and inspecting this file. As a result, Acquire output files are sometimes copied before the Acquire process has finished, leading to corrupt and/or incomplete output files.

Would it be an idea to add an additional suffix to the output archive/file/directory while Acquire is still running. E.g. start by writing all results to file:

HOSTNAME_20240625102705.tar.gz.running

and once the process has finished with success, rename this file to

HOSTNAME_20240625102705.tar.gz

In this way, it's easier to spot which Acquires have not yet finished and should not be copied. Obviously the .running suffix can be anything, maybe .tmp or .active are more suitable.

Use volume index in Bootbanks module

Add Linux and macOS paths for TeamViewer

acquire/acquire/acquire.py

Lines 1197 to 1200 in f5b50b6

 # teamviewer 

 ("glob", "sysvol/Program Files/TeamViewer/*.log"), 

 ("glob", "sysvol/Program Files (x86)/TeamViewer/*.log"), 

 ("glob", "AppData/Roaming/TeamViewer/*.log", from_user_home),

Only collects Windows at the moment.

Different consideration of letter case for windows output folders by the acquire plugins

When using acquire on a Linux distbution (in my case Ubuntu 22.04.3 LTS) to collect data from a Windows image (in my case in EWF format), different plugins seem to use different letter case for the output folders. Some plugins working case-sensitiv and some case-insensitiv.
Unfortunately, this leads, for example, to two folders with the name Windows being created in the output directory on my Linux system, one in upper and lower case and one only in lower case. This probably depends on the plugin to which folder the output is written.
This also not only affects folders under fs/sysvol but also subfolders. For example, two System32 folders are also created - one system32 and one system32
Here is an example of my output (i worked with the full profile):

/fs/sysvol $ tree -L 3 -d .
.
├── $Extend
├── $Recycle.bin
├── $Recycle.Bin
├── ProgramData
│   └── Microsoft
│       ├── Network
│       ├── Search
│       ├── Windows
│       └── Windows Defender
├── Users
├── windows
│   ├── appcompat
│   │   ├── appraiser
│   │   ├── Programs
│   │   └── UA
│   ├── inf
│   ├── prefetch
│   ├── system32
│   │   ├── config
│   │   ├── drivers
│   │   ├── sru
│   │   ├── tasks
│   │   ├── wbem
│   │   └── winevt
│   └── tasks
└── Windows
    ├── Logs
    │   ├── CBS
    │   └── WindowsUpdate
    ├── ServiceProfiles
    │   ├── LocalService
    │   └── NetworkService
    ├── system32
    │   └── config
    ├── System32
    │   ├── WDI
    │   └── winevt
    └── Temp

It would be nice if the plugins all used the same upper and lower case for the respective output folders. Preferably the Windows standard, i.e. what was found on the Windows image.

Add Atera/Splashtop to Acquire

During a CERT case it was observed that the actors were using the Atera Management Agent. This agent seems to use the Splashtop Remote Access Tool underlying. We'll need to add these locations to acquire so we can query this data with target-query.

File locations: C:\Program Files (x86)\Splashtop\Splashtop Remote\Server\log\

svcinfo.txt -> Splashtop service information loggin;
agent_log.txt -> agent output, generic information;
sysinfo.txt -> information about server and session startups;
SPLog.00x -> information about clipboard, transferred files, etc;

Add option to select specific children in acquire

I.e. a specific child may error or for some other reason you want to exclude it.

Maybe nice to draw inspiration from (or just use) rdump selectors e.g.

"t.os == 'windows'" or "'Windows Server' in t.version"

Error: Module name must be provided or Collector needs to be bound to a module

I am having trouble using dissects acquire. When using it with the file option everything is fine:

*** Acquiring specified paths
- Collecting file /root/.bashrc: OK

Done collecting artifacts:
------------------------------------------------------------
  Module |    Success |    Failure |    Missing |      Empty
------------------------------------------------------------
cli-args |          1 |            |            |
------------------------------------------------------------
   Total |          1 |          0 |          0 |          0
------------------------------------------------------------

But when using the glob option I can't seem to get any files out of the image. The following error suggests providing a module name which I am not sure how to do. Am I missing something? Any advice is much appreciated.

*** Acquiring specified paths
- Collecting glob /root/.*shrc
- Failed to collect glob /root/.*shrc
Traceback (most recent call last):
  File "/root/py3venv_dissect/lib/python3.10/site-packages/acquire/collector.py", line 361, in collect_glob
    self.collect_path(entry)
  File "/root/py3venv_dissect/lib/python3.10/site-packages/acquire/collector.py", line 381, in collect_path
    raise ValueError("Module name must be provided or Collector needs to be bound to a module")
ValueError: Module name must be provided or Collector needs to be bound to a module

Done collecting artifacts:
------------------------------------------------------------
  Module |    Success |    Failure |    Missing |      Empty
------------------------------------------------------------
cli-args |            |          1 |            |
------------------------------------------------------------
   Total |          0 |          1 |          0 |          0
------------------------------------------------------------

ESXi memory manager should always be used when running on an ESXi host system

Currently it’s only used when the target is ESXi, not necessarily the host system. This can give issues when trying to acquire an offline VM from an ESXi shell directly, without going through the ESXi target.

Make the acquire collection modules more declarative

Instead of a python function for everything

Additional NTFS Artefact Collection

The following files would be beneficial when collecting data with Acquire.

{code:java}
C:$LogFile
C:$Extend$UsnJrnl:$Max
C:$Extend$RmMetadata$TxfLog$Tops:$T
C:$Extend$RmMetadata$TxfLog$T{code}

Collect all nginx&apache logs in Acquire

For IIS we parse the config (using dissect.target’s IIS plugin) to find additional log directories.

A similar thing can be done for NginX and Apache. Their respective plugins already have such a “give-me-all-log-paths” function, so that can be used.

All this functionality (including collecting the default paths, but those are probably also emitted by the plugin function) should be put in a WebserverLog module in acquire, also moving the IIS stuff there.

The IIS module should log & print a deprecation warning and forward to the WebserverLog module.

Also add acquire-test tests, with default logs & logs configured in a mock config.

Feature Request: allow custom report names

Currently when acquire is ran, the tool writes a report to the output directory prefixed by the hostname of the machine, a timestamp and a suffix of .report.json. However, whenever a custom output file is specified, the report does not adhere to this same naming scheme. I was looking for the option, and I could only find an option for disabling the report all together.

This mainly becomes tedious when every single machine that is acuire'd has the exact same hostname, making reports hard to distinguish from each other.

Would it be possible to either have the report be written to a similarly named file, or have an extra option which lets users specify their own report path?

Acquire output does not contain a clear ending message

Acquire can give confusing output, that does not make it obvious whether it exited cleanly or not. Even after a summary it sometimes still provides confusing output.
This makes it unclear whether the acquisition process has ended due to a bug, or whether it ended the happy flow.

I strongly recommend ending cleanly with something like repeating e.g. the command line arguments like is done at the start.
Then end with a clear message. For example:

[2023-11-02 16:46:26,193] [INFO ] Finishing acquisition
[2023-11-02 16:46:26,367] [INFO ] Arguments: --children -o /vmfs/volumes/NFS_storetemp/server
[2023-11-02 16:46:26,367] [INFO ] Default Arguments: --compress --ntds --active-directory --profile full
[2023-11-02 16:46:26,573] [INFO ] Exiting with status code 0 (SUCCESS)

Note that this can also be confusing for individual child acquisitions with --children. The individual log files can also contain unclear beginning and ending log lines.

This should be a fairly simply fix with relatively high value. It can really help troubleshooting.

	# teamviewer
	("glob", "sysvol/Program Files/TeamViewer/*.log"),
	("glob", "sysvol/Program Files (x86)/TeamViewer/*.log"),
	("glob", "AppData/Roaming/TeamViewer/*.log", from_user_home),

fox-it / acquire Goto Github PK

acquire's People

Contributors

Stargazers

Watchers

Forkers

acquire's Issues

Recommend Projects

Recommend Topics

Recommend Org