Giter Club home page Giter Club logo

rifiuti2's Introduction

Introduction

Rifiuti2 is a for analyzing Windows Recycle Bin INFO2 file. Analysis of Windows Recycle Bin is usually carried out during Windows computer forensics. Rifiuti2 can extract file deletion time, original path and size of deleted files and whether the trashed files have been permanently removed.

For those interested in what it does, and what functionality it provides, please check out official site for more info.

Special notes

Latest features and changes can be found in NEWS file.

0.8.1

JSON output format, WSL v2 support, and improve robustness when reading broken data.

0.8.0

Usage

rifiuti2 is designed to be portable (just download and use without need for installation), and runs on command line environment. Although utilities provide -h option for brief help message, it is suggested to consult Wiki page for full detail on all of the options; following are a few examples on how to use them:

  • rifiuti-vista.exe -x -z -o result.xml \case\S-1-2-3\

Scan for index files under \case\S-1-2-3\, adjust all deletion time for local time zone, and write XML output to result.xml

  • rifiuti -l CP932 -t "\n" INFO2

Assume INFO2 file is generated from Japanese Windows (codepage 932), and display each field line by line, instead of separated by tab

Download

Supported platforms

rifiuti2 is guaranteed usable on Windows, Linux and FreeBSD, with success reports for MacOS (using brew). Some testing on big endian platforms are done with Qemu emulator. More compatibility fix for other architectures welcome.

Windows

Windows binaries are officially provided on Github release page. Some info for ancient Windows version are available on wiki.

Unix packages

Most Linux and FreeBSD users can use pre-packaged software for convenience. Check out the status here.

Others

For OS where rifiuti2 is not readily available, it is always possible to compile from source.

rifiuti2's People

Contributors

abelcheung avatar anthonywong avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rifiuti2's Issues

Need to support Windows 95 and ME format

95 format has version = 0 with other properties like Windows 98 one. There are some extra field on preamble, figuring out their purpose should not be difficult.

ME has version = 5 but record size = 0x118, program logic need some change to cope with that.

XML presentation of invalid file names

On rare circumstances filenames might be invalid on Windows layer (e.g. Windows Explorer would choke), yet still valid for underlying filesystem. For example, broken file names with a single surrogate is not unheard of.

According to XML 1.0 rev 5 spec, control characters and surrogates are forbidden:

Char ::= | #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

Presenting those characters would result in invalid XML. XML 1.1 is permissible about control characters, but still forbids surrogates, and the standard is not widely adopted. In particular xmllint used in test suite would never support XML 1.1.

Further action and consideration pending.

Tab-delimited output is not well-formed

TSV output is not properly escaped and quoted, especially if users specifie their own record delimiter. It is possible to introduce libgsf to properly format the output, I wonder if introducing extra dependency (and an exotic one) for such trivial usage worths the effort.

Crash on multiple tests for sparc64 platform

Multiple tests were failing on SPARC64 arch. All of them seems to point to path conversion failure in ucs2_strnlen():

Reading symbols from rifiuti-vista...
(gdb) set args ../test/samples/dir-sample1
(gdb) run
Starting program: /home/debian/rifiuti2/src/rifiuti-vista ../test/samples/dir-sample1

Program received signal SIGBUS, Bus error.
0x00000100000050c8 in ucs2_strnlen (str=0x10000208887, max_sz=260) at utils.c:429
429              for (i=0; (i<max_sz) && str[i]; i++) {}
(gdb) bt full
#0  0x00000100000050c8 in ucs2_strnlen (str=0x10000208887, max_sz=260) at utils.c:429
        i = 0
#1  0x00000100000057dc in conv_path_to_utf8_with_tmpl (path=0x10000208887 "C", from_enc=0x0, tmpl=0x10000008530 "<\\u%04X>",
    read=0x7feffffee10, st=0x10000200ee0 <exit_status>) at utils.c:552
        u8_path = 0x6542b679 <error: Cannot access memory at address 0x6542b679>
        i_ptr = 0x1 <error: Cannot access memory at address 0x1>
        o_ptr = 0x2 <error: Cannot access memory at address 0x2>
        result = 0x0
        len = 8791798050296
        r_total = 172000000
        rbyte = 22517998136852480
        wbyte = 0
        status = 336
        in_ch_width = 2
        out_ch_width = 8
        conv = 0x1000000347c <populate_record_data+744>
        __func__ = "conv_path_to_utf8_with_tmpl"
#2  0x0000010000003540 in populate_record_data (buf=0x10000208870, version=1, pathlen=260, erraneous=1) at rifiuti-vista.c:198
        record = 0x1000020ad20
        read = 8791798050488
        __func__ = "populate_record_data"
#3  0x00000100000037fc in parse_record_cb (index_file=0x10000208130 "../test/samples/dir-sample1/$IYAR1YY.exe",
    recordlist=0x7fefffff0b0) at rifiuti-vista.c:252
        record = 0x1000020a940
        basename = 0x1000020abc0 "$IYAR1YY.exe"
        version = 1
        pathlen = 260
        bufsize = 543
        buf = 0x10000208870
        validate_st = R2_OK
        __func__ = "parse_record_cb"
#4  0xfffff80100389790 in ?? ()
No symbol table info available.
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) p str
$1 = (const gunichar2 *) 0x10000208887
(gdb) p sizeof(str)
$2 = 8

Most likely the string was treated as (char *) so it was truncated early, and fails conversion to (gunichar *) later.

This is very likely the reason the relavant Debian package stays at 0.6.1, not upgrading to 0.7.0.

Migrate build system to CMake

autoconf / automake should have been phased out long time ago, almost nobody is using them for new projects.

The building of binaries itself is relatively easy, but converting autotest test suites to CTest is likely a pain.

  • Basic GitHub actions to build binaries on various platforms
  • Binaries can be stripped
  • Static build on Windows
  • Migrate test suite to CTest
  • Create packages with CPack
    • Source tarballs on Ubuntu
    • Create binary zip archive for Windows

Binary name = (null) in help message

On most platforms the name of executable is not displaying properly:

$ ./src/rifiuti
Usage:
  (null) [OPTION...] INFO2

MinGW is a notable exception here.

charset in locale variables no more respected under Windows

Before switching to GNU gettext, charsets in locale variables are honored when displaying translations (such as $LANG=zh_HK.UTF-8). But after the switch it is no more useful. Right now the only way to override charset is to use the CHARSET variable.

Actually such change is probably better for normal users (who wouldn't change console codepage). Considering whether a better compromise can be achieved, or we just accept the status quo.

Windows 10 Recycle Bin changes - rifiuti2 patch attached

Hi there,

Windows 10 is bringing some changes to Recycle Bin, including a different file 
header for index files and variable index file size.

I put some spare time into patching rifiuti2 code to address that. I've 
compiled it and tested on Linux. 
Please see the attached file for updated source code. I put some comments too. 

Original issue reported on code.google.com by [email protected] on 20 Feb 2015 at 7:45

Attachments:

Convert manpage to online doc

Maintaining a complete and detailed roff document is becoming a burden. Providing online document is easier and more aligned with current user cases, no matter how certain Linux distros insist otherwise. Either GitHub page or repo wiki would be fine.

  • Migrate manpage content to Wiki
  • Manpage declared obsolete and point people to Wiki
  • Merge some README sections to Wiki
  • Provide doc pointer for Windows binary dist

Add option for displaying NCR and UCN

(Using character é for examples below)

It would be nice to allow displaying non-ASCII path in NCR (&#xE9;) in XML mode, and either hex (\xE9) or UCN (\u00E9) in tab-delimited mode, instead of just "File can't be displayed in foobar encoding".

This is a tracking task with subtasks below:

  • Some characters inside path need to be converted to XML entities #13
  • XML presentation of invalid file names #14
  • Allow partially displayable characters in INFO2 legacy paths #15
  • More permissive handling of unicode path #16

Simple option to inspect live system

Perhaps it is possible to launch rifiuti to scan all recycle bins in all drives for current user and list all related files instantly with a simple option, instead of requiring users to go through all the recycle bin folders themselves. This would be a Windows-only feature.

Merge both programs or rename them

Current program naming is no more intuitive. It was a very old artifact that aimed at compatibility with the ancient rifiuti.

There are 2 possible routes:

The easy way

Original Rename to
rifiuti rifiuti-legacy
rifiuti-vista rifiuti

The hard way

Merge them together as rifiuti for the benefit of users. Have been more or less synchronising features of both programs, so this is probably manageable.

Allow partially displayable characters in INFO2 legacy path

Currently if users supply wrong encoding to -l option, legacy paths would be completely replaced by another string denoting failure. It would be nice to at least show ASCII parts, while remaining parts shown as hex instead. Whether this behavior can be toggled by user would be decided later.

Migrate to GitHub action

Previous releases depend on AppVeyor and Travis-CI for building Windows binary and Linux tarball respectively. Recent practices encourage usage of GitHub actions (especially when Travis-CI is still susceptible to attack on credentials and secrets.

However care must be taken when support of MingW32/MingW64 target is uncertain for MSYS2 -- their deprecation has already been planned, so that building for Windows 7 will no longer be viable in future. Need to check out if pre-installed MSYS2 on GitHub runners or archive installed by setup-msys2 action still contains MingW targets.

Extra header info for 95/NT (possibly other OS too?)

Tests seems to indicate the header of INFO file (first 20 bytes) contains extra data while the corresponding fields in INFO2 are filled with junk. They are:

Byte offset Purpose
4 – 7 Total entries still in recycle bin
8 – 11 Total entries ever existed
16 – 19 Total sum of cluster size

Probably it is worthwhile to display those data, or use them as extra validation.

Salvage a partial record

If not too much info is lost for a record (say, only part of path is chopped), it may be able to still display the rest of data, instead of discarding the whole record as how it's done currently.

That requires a more robust path string converter that still does its job when feeded a very broken string data.

Unicode not displayed properly on Windows console

fprintf() from Windows C runtime would fail to display unicode properly on Windows console, even when codepage is set to 65001 (UTF-8), as if console is in original OEM codepage.
Need to use wide char API to display unicode properly in such case (which works regardless of console codepage!).

More permissive handling of unicode path

Currently rifiuti2 don't allow any error while converting Windows unicode path (in UTF-16) to UTF-8, such as encountering single surrogate codepoint (instead of pair). Though unsure if they can make it into recycle bin, these erraneous file names are not unheard of (e.g. created by buggy software). It would be nice if rifiuti2 can handle them in more robust way, displaying escaped unicde codepoint instead.

g_convert_with_fallback() handles such case, but doesn't allow one to specify a template for escape sequence.

JSON output format

It should be trivial to add JSON output format, even without any JSON parsing / writing library (same situation as XML output).

For testing, it might be easier to rely on simple output comparison, instead of parsing JSON object and compare it with reference object, because that needs extra external dependency (likely python?)

Support WSL for live system inspection

It should be possible to support running under WSL (v2 at least) for --live option.

  • Run whoami.exe to retrieve current user SID
whoami.exe /user /fo csv
"User Name","SID"
"machine\user","S-1-5-21-..."
  • Get mounted drives from /proc/mounts with fstype drvfs
  • Build the possible recycle bin folders from above data

For detection of WSL environment, probably this comment would be one of the most comprehensive.

Error not displayed properly on Windows console

This is related to #12 , but concerns error message instead of normal output. Right now error message is dependent on console codepage; once set to CP65001 all errors are garbled. Should use Windows conosle API (WriteConsoleW) instead.

Point to specific folder?

I don't often examine live machines. Can I extract $I files from an image and then use this application to process just those files?

XML entities in path

In XML output mode, character inside paths need to be escaped as entities, but this is not done right now:

character entity
& &amp;
" &quot;
' &apos;
< &lt;
> &gt;

For reference, Microsoft doc mentions file names with these characters are not allowed:

  • <>:"\/|?*
  • [\x0-\x1F]

On the other hand NTFS layer seems to be very liberal and has no disallowed character value except \x0.

Spreadsheet programs do not recognise new time format

When importing tab-delimited result, spreadsheet programs like Microsoft Office or LibreOffice don't recognize the ISO 8601 date/time format. Since it would likely be an important user case, perhaps the old behavior (no timezone) shall be restored for TSV output. XML output would use full ISO 8601 date/time.

Rifiuti does not work on XP

Rifiuti does not work on XP 32bit - I type "rifiuti INFO2" and get: 'INFO2' does not exist. I try "rifiuti -l INFO2" and get: "Must specify exactly one INFO2 file as argument.", but rifiuti -? correctly displays.

My system is on disk I: not C: - Can this be a problem?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.