log2timeline / dfvfs Goto Github PK
View Code? Open in Web Editor NEWDigital Forensics Virtual File System (dfVFS)
License: Apache License 2.0
Digital Forensics Virtual File System (dfVFS)
License: Apache License 2.0
dfVFS, or Digital Forensics Virtual File System, provides read-only access to file-system objects from various storage media types and file formats. The goal of dfVFS is to provide a generic interface for accessing file-system objects, for which it uses several back-ends that provide the actual implementation of the various storage media types, volume systems and file systems. For more information see: * Project documentation: https://dfvfs.readthedocs.io/en/latest
Unable to decompress zlib compressed stream with error: Error -3 while decompressing: invalid distance too far back.
zcat test.gz
<HTML>
...
gzip: test.gz: invalid compressed data--crc error
gzip: test.gz: invalid compressed data--length error
Add volume system support for:
source scanner auto-recurse does not find file system while non-auto-recurse does:
Using TSK for file system detection seems to have some unwanted side effect: log2timeline/plaso#229
See if: https://github.com/log2timeline/dfvfs/blob/master/dfvfs/analyzer/tsk_analyzer_helper.py can be changed to use pysigscan instead.
Revisit the current resolver context caching strategy, see if it can be improved.
Initial version: https://codereview.appspot.com/201840043/ changes:
Think about:
After care:
Latest version of plaso
2015-02-25 17:45:27,094 [WARNING] (Worker_2 ) PID:13288 <worker> [skydrive_log] Unable to process file: type: OS, location: /PATH/bde/bde1
type: RAW
type: TSK, inode: 118222, location: /Users/<FOO>/AppData/Local/Microsoft/Windows/Temporary Internet Files/Low/Content.IE5/0VUCKP34/AccountChooser[1].htm
type: GZIP
with error: Unable to decompress zlib compressed stream with error: Error -3 while decompressing: invalid distance too far back..
2015-02-25 17:45:27,094 [ERROR] (Worker_2 ) PID:13288 <worker> Unable to decompress zlib compressed stream with error: Error -3 while decompressing: invalid distance too far back.
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/plaso/engine/worker.py", line 146, in _ParseFileEntryWithParser
parser_object.UpdateChainAndParse(self._parser_mediator)
File "/usr/lib/python2.7/dist-packages/plaso/parsers/interface.py", line 71, in UpdateChainAndParse
self.Parse(parser_mediator, **kwargs)
File "/usr/lib/python2.7/dist-packages/plaso/parsers/text_parser.py", line 741, in Parse
text_file_object = text_file.TextFile(file_object)
File "/usr/lib/python2.7/dist-packages/dfvfs/helpers/text_file.py", line 32, in __init__
self._file_object_size = file_object.get_size()
File "/usr/lib/python2.7/dist-packages/dfvfs/file_io/file_object_io.py", line 151, in get_size
return self._file_object.get_size()
File "/usr/lib/python2.7/dist-packages/dfvfs/file_io/compressed_stream_io.py", line 314, in get_size
self._uncompressed_stream_size = self._GetUncompressedStreamSize()
File "/usr/lib/python2.7/dist-packages/dfvfs/file_io/compressed_stream_io.py", line 90, in _GetUncompressedStreamSize
read_count = self._ReadCompressedData(self._COMPRESSED_DATA_BUFFER_SIZE)
File "/usr/lib/python2.7/dist-packages/dfvfs/file_io/compressed_stream_io.py", line 173, in _ReadCompressedData
self._decompressor.Decompress(self._compressed_data))
Add Improve archive file support for:
Add a modular modifier support into dfVFS.
This would be a feature request for a way to indicate that the data should be manipulated/decoded in a certain way before being read, something like base64, ROT13, XOR with key "0xdd", etc. This could also expand to decryption support for file-level encryption.
First step, create the plugin framework, have that in dfVFS and implement something simple like base64 support.
I have a drive that has geometry with 4096 bytes per sector.
When I create a TSKVolumeSystem, and feed it the volume_system_path_spec of TYPE_INDICATOR_TSK_PARTITION, it shows that the TSKVolumeSystem.bytes_per_sector attribute as 512. This interns shows the incorrect offset for the partitions.
Is there a way to find and set the bytes per sector for a TSKVolumeSystem?
Here is what FTK Imager shows for the drive:
Partition Start Sector [2048]:
Offset then to partition is 2048 * 4096 = 8388608
I see in dfvfs TSKVolumeSystem._Parse() there is:
self.bytes_per_sector = tsk_partition.TSKVolumeGetBytesPerSector(tsk_volume)
Am I not passing something or are the BytesPerSector not being found correctly?
Here is output and an example code I used.
###########################################################
#Output
###########################################################
bytes_per_sector: 512
address: 2; description: NTFS (0x07); offset: 1048576; size 250048817664
###########################################################
#Example
###########################################################
import sys
from dfvfs.path import factory as path_spec_factory
from dfvfs.lib import definitions as dfvfs_definitions
from dfvfs.volume import tsk_volume_system
from dfvfs.lib import tsk_image
os_path_spec = path_spec_factory.Factory.NewPathSpec(
dfvfs_definitions.TYPE_INDICATOR_OS,
location='\\\\.\\PHYSICALDRIVE1'
)
volume_system_path_spec = path_spec_factory.Factory.NewPathSpec(
dfvfs_definitions.TYPE_INDICATOR_TSK_PARTITION,
start_offset=0,
parent=os_path_spec
)
image_system = tsk_volume_system.TSKVolumeSystem()
image_system.Open(volume_system_path_spec)
print 'bytes_per_sector: {}'.format(image_system.bytes_per_sector)
for volume in image_system.volumes:
volume_identifier = getattr(volume, 'identifier', None)
if volume_identifier:
info = {}
for attrib in volume.attributes:
info[attrib.identifier] = attrib.value
for extent in volume.extents:
info['offset'] = extent.offset
info['size'] = extent.size
print 'address: {}; description: {}; offset: {}; size {}'.format(info['address'],info['description'],info['offset'],info['size'])
After:
Traceback (most recent call last):
File "/usr/bin/coveralls", line 5, in <module>
from pkg_resources import load_entry_point
File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 3018, in <module>
File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 612, in _build_master
File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 918, in require
File "build/bdist.linux-x86_64/egg/pkg_resources/__init__.py", line 805, in resolve
pkg_resources.DistributionNotFound: requests>=1.0.0
Looks like we need add: https://pypi.python.org/pypi/requests
install_requires=['docopt>=0.6.1', 'coverage>=3.6', 'requests>=1.0.0'],
Check why the rietveld close issue is still not working.
The Python base64 implementation has unwanted behavior:
Replace the base decoders with a strict stream replacement (libuna?).
I have noticed when I set the credential for a BDE volume, I cannot set it again. The following test code shows that if you use the correct password first, the incorrect password also appears to work. If you use the incorrect password first, the correct password does not work.
With PASSWORD_SET = password_set1 both passwords work.
With PASSWORD_SET = password_set2 both passwords fail.
#!/usr/bin/python
# -*- coding: utf-8 -*-
'''Test BDE Cred Issue
Test uses dfvfs test image 'bdetogo.raw'
'''
import os
from dfvfs.lib import definitions as dfvfs_definitions
from dfvfs.analyzer import analyzer as dfvfs_analyzer
from dfvfs.resolver import resolver as dfvfs_resolver
from dfvfs.resolver import context as dfvfs_context
from dfvfs.path import factory as dfvfs_factory
from dfvfs.volume import tsk_volume_system
from dfvfs.vfs import bde_file_system
source_file = 'bdetogo.raw'
#Using this password set both passwords pass#
password_set1 = [u'bde-TEST',u'bad-password']
#Using this password set both passwords fail#
password_set2 = [u'bad-password',u'bde-TEST']
PASSWORD_SET = password_set1
for password in PASSWORD_SET:
print 'Using password: {}'.format(password)
os_path_spec = dfvfs_factory.Factory.NewPathSpec(
dfvfs_definitions.TYPE_INDICATOR_OS,
location=source_file
)
volume_bde_path_spec = dfvfs_factory.Factory.NewPathSpec(
dfvfs_definitions.TYPE_INDICATOR_BDE,
parent=os_path_spec
)
dfvfs_resolver.Resolver.key_chain.SetCredential(
volume_bde_path_spec,
'password',
password
)
volume_fs_path_spec = dfvfs_factory.Factory.NewPathSpec(
dfvfs_definitions.TYPE_INDICATOR_TSK,
location=u'/',
parent=volume_bde_path_spec
)
try:
file_system = dfvfs_resolver.Resolver.OpenFileSystem(
volume_fs_path_spec
)
print 'password pass: {}'.format(password)
except Exception as e:
print 'no go on password: {}. Error: {}'.format(password,str(e))
print 'end'
Here is output with PASSWORD_SET = password_set1 where both passwords work.
Using password: bde-TEST
password pass: bde-TEST
Using password: bad-password
password pass: bad-password
end
Here is output with PASSWORD_SET = password_set2 where both passwords fail.
Using password: bad-password
no go on password: bad-password. Error: 'pybde_volume_seek_offset: unable to seek offset. libbde_internal_volume_seek_offset: invalid volume - volume is locked.'
FS_Info_Con: (tsk3.c:207) Unable to open the image as a filesystem: Cannot determine file system type
Using password: bde-TEST
no go on password: bde-TEST. Error: 'pybde_volume_seek_offset: unable to seek offset. libbde_internal_volume_seek_offset: invalid volume - volume is locked.'
FS_Info_Con: (tsk3.c:207) Unable to open the image as a filesystem: Cannot determine file system type
end
Any ideas?
def _Open(self, path_spec=None, mode='rb'):
to def _Open(self, path_spec, mode='rb'):
I just did a quick test of the 20121220 release and it has a couple examples in it.
Unfortunately they are being installed to:
/usr/lib/python2.7/site-packages/examples/init.py
/usr/lib/python2.7/site-packages/examples/init.pyc
/usr/lib/python2.7/site-packages/examples/recursive_hasher.py
/usr/lib/python2.7/site-packages/examples/recursive_hasher.pyc
/usr/lib/python2.7/site-packages/examples/recursive_hasher2.py
/usr/lib/python2.7/site-packages/examples/recursive_hasher2.pyc
They should go into a less generic examples folder.
improve test coverage
We're trying to transition to the 64 bit version of dfvfs. When installing the 64 bit Windows dependencies on a test machine from https://github.com/log2timeline/l2tbinaries the check for the following modules fails with the following output
[FAILURE] missing: pytsk3.
[FAILURE] missing: pybde.
[FAILURE] missing: pyewf.
[FAILURE] missing: pyqcow.
[FAILURE] missing: pysigscan.
[FAILURE] missing: pysmdev.
[FAILURE] missing: pysmraw.
[FAILURE] missing: pyvhdi.
[FAILURE] missing: pyvmdk.
[FAILURE] missing: pyvshadow.
To follow up I opened a Python session to see if the modules could be imported. construct, six, sqlite3, and dfvfs all import with no problems, all the modules listed above result in the following error:
import pytsk3
Traceback (most recent call last):
File "", line 1, in
ImportError: DLL load failed: The specified module could not be found.
The test machine is a 64 bit Windows 7 Pro. To verify it wasn't an issue with the commit from 5 days ago I installed the 32 bit version of python and dependencies on a copy of the test image and had no similar problems running the tester or importing the modules. If there is any other information you need I'd be happy to provide it.
I have been working to get plaso and its dependencies into Debian and was asked to check whether test_data/bdetogo.raw
(or its contents) are actually licensed under the Apache-2.0 software license.
I did notice that, upon decryption using bdemount, there is a FAT image with some directories and seven files in it, four of which contain only zeroes. Am I missing something?
Also see: log2timeline/plaso#279
/dev/rdisk1
Source type : storage media image
OS: location: /dev/rdisk1
RAW:
TSK: location: /
TSK: location: /
OS: location: image.qcow2
QCOW:
TSK_PARTITION: location: /
TSK_PARTITION: 0, start offset: 0 (0x00000000)
TSK: location: /
TSK_PARTITION: 1, start offset: 0 (0x00000000)
TSK: location: /
TSK_PARTITION: 2, start offset: 32256 (0x00007e00), location: /p1
NTFS: location: \
TSK_PARTITION: 3, start offset: 16096872960 (0x3bf72ca00)
stat object is a grab bag of things, move common elements to file entry attributes. Keep a stat attribute for back-ends where it makes sense.
https://github.com/log2timeline/dfvfs/blob/master/dfvfs/vfs/vfs_stat.py#L7
_type
as public attribute_GetStatAttribute
support for CPIO - #614_GetStatAttribute
support for TAR - #614_GetStatAttribute
support for ZIP - #615_GetStatAttribute
support for NTFS - #615GetStatAttribute
file entry interface method - #615
The test: ./dfvfs/serializer/protobuf_serializer_test.py
Seems to fail for long CWD paths.
raw or ewf images with +128 segment files trigger a CacheFullError error
From: https://docs.python.org/2/library/bz2.html
Note This class does not support input files containing multiple streams (such as those produced
by the pbzip2 tool). When reading such an input file, only the first stream will be accessible.
If you require support for multi-stream files, consider using the third-party bz2file module
(available from PyPI). This module provides a backport of Python 3.3โs BZ2File class,
which does support multi-stream files.
Traceback (most recent call last):
File "/Projects/plaso/plaso/engine/worker.py", line 126, in _ParseFileEntryWithParser
parser_object.Parse(self._parser_context, file_entry)
File "/Projects/plaso/plaso/parsers/text_parser.py", line 471, in Parse
row = reader.next()
File "/usr/lib/python2.7/csv.py", line 108, in next
row = self.reader.next()
File "/usr/lib/python2.7/dist-packages/dfvfs/helpers/text_file.py", line 71, in __iter__
line = self.readline()
File "/usr/lib/python2.7/dist-packages/dfvfs/helpers/text_file.py", line 179, in readline
lines_data = self._ReadLinesData(size)
File "/usr/lib/python2.7/dist-packages/dfvfs/helpers/text_file.py", line 121, in _ReadLinesData
read_buffer = self._file_object.read(read_size)
File "/usr/lib/python2.7/dist-packages/dfvfs/file_io/file_object_io.py", line 132, in read
return self._file_object.read(size)
File "/usr/lib/python2.7/dist-packages/dfvfs/file_io/compressed_stream_io.py", line 370, in read
self._uncompressed_data[self._uncompressed_data_offset]])
Need to have an independent MFT parser that can be used to parse extracted $MFT file.
Several options available, including using analyzeMFT as a library and import it.
The current implementation uses pyewf to open E01 files and then uses pytsk3 to manage the file system. However, this doesn't work for L01s even though pyewf supports them because pytsk3 does not.
I suggest either adding support for L01s in pytsk3 or developing EWF file entry and file system objects to manage these directly. The pyewf library has all the necessary function calls to handle this.
2015-06-09 21:59:48,193 [ERROR] (Collector ) PID:15849 <multi_process> 'ascii' codec can't decode byte 0xee in position 1: ordinal not in range(128)
Traceback (most recent call last):
File "/home/onager/code/plaso/plaso/multi_processing/multi_process.py", line 274, in _Main
self._collector.Collect(self._source_path_specs)
File "/home/onager/code/plaso/plaso/engine/collector.py", line 113, in Collect
source_path_spec, find_specs=self._filter_find_specs)
File "/home/onager/code/plaso/plaso/engine/collector.py", line 93, in _ProcessPathSpec
self._ProcessFileSystem(path_spec, find_specs=find_specs)
File "/home/onager/code/plaso/plaso/engine/collector.py", line 59, in _ProcessFileSystem
self._fs_collector.Collect(file_system, path_spec, find_specs=find_specs)
File "/home/onager/code/plaso/plaso/engine/collector.py", line 298, in Collect
self._ProcessDirectory(file_entry)
File "/home/onager/code/plaso/plaso/engine/collector.py", line 272, in _ProcessDirectory
self._ProcessDirectory(sub_file_entry)
File "/home/onager/code/plaso/plaso/engine/collector.py", line 272, in _ProcessDirectory
self._ProcessDirectory(sub_file_entry)
File "/home/onager/code/plaso/plaso/engine/collector.py", line 272, in _ProcessDirectory
self._ProcessDirectory(sub_file_entry)
File "/home/onager/code/plaso/plaso/engine/collector.py", line 272, in _ProcessDirectory
self._ProcessDirectory(sub_file_entry)
File "/home/onager/code/plaso/plaso/engine/collector.py", line 272, in _ProcessDirectory
self._ProcessDirectory(sub_file_entry)
File "/home/onager/code/plaso/plaso/engine/collector.py", line 272, in _ProcessDirectory
self._ProcessDirectory(sub_file_entry)
File "/home/onager/code/plaso/plaso/engine/collector.py", line 272, in _ProcessDirectory
self._ProcessDirectory(sub_file_entry)
File "/home/onager/code/plaso/plaso/engine/collector.py", line 272, in _ProcessDirectory
self._ProcessDirectory(sub_file_entry)
File "/home/onager/code/plaso/plaso/engine/collector.py", line 220, in _ProcessDirectory
for sub_file_entry in file_entry.sub_file_entries:
File "/usr/lib/python2.7/dist-packages/dfvfs/vfs/os_file_entry.py", line 209, in sub_file_entries
for path_spec in self._directory.entries:
File "/usr/lib/python2.7/dist-packages/dfvfs/vfs/file_entry.py", line 42, in entries
for entry in self._EntriesGenerator():
File "/usr/lib/python2.7/dist-packages/dfvfs/vfs/os_file_entry.py", line 45, in _EntriesGenerator
location, directory_entry])
File "/usr/lib/python2.7/dist-packages/dfvfs/vfs/os_file_system.py", line 184, in JoinPath
segment.split(self.PATH_SEPARATOR) for segment in path_segments]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xee in position 1: ordinal not in range(128)
Path spec and file IO support: https://codereview.appspot.com/200110043/
To do: file system and file entry support
Add support for:
While parsing the dblake win81 test image.
[chrome_cache] Unable to process file:
with error:
Traceback (most recent call last):
File "build/bdist.linux-x86_64/egg/plaso/engine/worker.py", line 123, in _ParseFileEntryWithParser
parser_object.Parse(self._parser_context, file_entry)
File "build/bdist.linux-x86_64/egg/plaso/parsers/chrome_cache.py", line 376, in Parse
data_block_file_path_spec)
File "build/bdist.linux-x86_64/egg/dfvfs/resolver/resolver.py", line 74, in OpenFileEntry
return file_system.GetFileEntryByPathSpec(path_spec)
File "build/bdist.linux-x86_64/egg/dfvfs/vfs/tsk_file_system.py", line 117, in GetFileEntryByPathSpec
tsk_file = self._tsk_file_system.open(location)
RuntimeError: 'pyvshadow_store_read_buffer: unable to read data. libvshadow_store_descriptor_read_buffer: unable to read buffer from file IO handle. libvshadow_internal_store_read_buffer_from_file_io_handle: unable to read buffer from store descriptor: 2. libvshadow_store_read_buffer: unable to read buffer from store descriptor: 2.'
FS_Info_open: (tsk3.c:253) Unable to open file: Error reading image file (ntfs_dinode_lookup: Error reading MFT Entry at 3227558912)
The disk image can be shared upon request.
Currently the Travis build is broken due to missing dependency.
Set up gift PPA to hold the necessary packages and change Travis configuration.
Add encoded stream support for:
Improve file system support:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.