Giter Club home page Giter Club logo

pack2's People

Contributors

hops avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

sy5surf gavz

pack2's Issues

feature: unhex of in-line HEX

Support in-context unhex of outfiles / potfiles / etc.

For example, this input:

96a1bbb41c713dce96b49dd13b6f6d07:$HEX[636f6c6f6e3a636f6c6f6e]

... should produce this output:

96a1bbb41c713dce96b49dd13b6f6d07:colon:colon

feature: operating on multiple files

For the subcommands that take input (pack2 cgrams, etc), it would be handy to directly read multiple files, instead of piping them in with cat.

option to extract exhausted masks from a hashcat logfile

Option to extract masks with a specific exit status from a hashcat potfile. The exit status could be an optional parameter, with exhausted masks (status 5) the default.

In other words, the Rust equivalent of this crude shell example, which extracts all exhausted masks (status 5):

$ egrep -h "mask_ctx->mask|status-after-work" ~/.hashcat/sessions/blah*.log \
    | cut -f3- \
    | egrep -B 1 'status-after-work.*5$' \
    | grep '>mask' \
    | awk '{print $2}'

It would also be useful if multiple logfiles (globbed) could be accepted.

option: sort a wordlist by mask, then by natural sort order

It would be very useful to be able to sort a wordlist first by length, then by mask, then by natural sort order.

Example:

2345
z5678
1234
C5309
9999a
2345a
1234a
!6666
z9933
a1234

... would sort as:

1234
2345
1234a
2345a
9999a
a1234
z5678
z9933
C5309
!6666

Suggested mask order: ?d?l?u?s, to match the rough order of frequency groups.

It would be useful for this option to support sorting both raw wordlists and potfiles. Automatic interpretation of $HEX would also be good.

new tool: analyze and split strings on character-class changes

As an aid to extracting likely base words, it would be very useful to split strings on character class. I've been calling the resulting strings 'tokens', but I think there's probably a better word. :)

An optional flag to consider changes in case to be significant could be useful.

For example, this list:

Hello123
PaSsWoRd$
hashes4evar

... might produce the following output, if case were treated as a character-class change:

Hello
123
Pa
Ss
Wo
Rd
$
hashes
4
evar

... and might produce this output, if case were not treated as a character-class change:

Hello
123
PaSsWoRd
$
hashes
4
evar

I'd argue that optionally normalizing the strings on the fly would also be useful, such that it might produce this output. This somewhat artificially inflates the significance of the lower-case version of the word, but since the lower-case form is likely the most "basic" / "proto" version of a given base word, it could be argued that this is a feature, not a bug :)

Hello
hello
123
Pa
Ss
Wo
Rd
PaSsWoRd
password
$
hashes
4
evar

Since a common use case for this is to obtain frequency counts, an optional flag to automatically also accumulate frequency count at the same time would be ideal (but also preserving the ability to not do this, to support larger data sets, would be good).

Either way, finding a way to do this in a very efficient way (in terms of both memory and speed) would be highly useful.

How to handle the long tail of non-ASCII / non-Unicode strings is up for discussion.

define and document default behavior

As we add more functionality I'd like to have some clear defined default behavior.
The ones I have in mind right now are:

  • Unless a file is specified it must read from stdin
  • Unless a file is specified it must write to stdout
  • Infos (e.g. stats) must always be written to stderr
  • Input lines in the $HEX[] format must always be decoded before processing
  • Output must always be encoded using the $HEX[] encoding if at least one character is outside of \x20 - \x7e

Of course there will be the occasional exception. For example formatting the output of the unhex tool in $HEX[] would be pointless as it's purpose is to decode such lines.

I'm open to ideas and suggestions, hence the discussion label.

feature: boundary window for cgrams

It would be useful to optionally produce cgrams within a window of X characters "beyond" the character change boundary.

For example, the default behavior of pack2 cgrams would have a range value of 0. With a specified range of '1',this input:

abcd1234

... would produce:

abcd
abcd1
d1234
1234

... and a range of '2' would produce:

abcd
abcd1
abcd12
cd1234
d1234
1234

In other words, this would produce a focused subset of what would be generated by window/slider tools, or tools like cutb, still informed by character-change boundaries.

Properly handle UTF-8 characters

Currently we treat any byte outside of 0x20 - 0x7e as the mask character ?b. This is not ideal as we already know we don't have to check ?a (which is, of course, also part of ?b). (Still more accurate than PACK which uses ?s.)
Rust has native support for UTF-8 strings but it's to slow for us. The current idea is to check if at least on byte is outside of the ?a range and handle these in a slow path.
Once we have a validated UTF-8 character we map it to it's Unicode block
Mapping a Unicode block to a mask is be possible using custom charsets in combination with the --hex-charset flag.
Example input: Röschti
ö is part of the Latin-1 Supplement block.
This block in UTF-8 encoding ranges from [c2,c3] [80-bf] therefore our custom charsets would be ?1 c2c3 and ?2 808182...bf
Full mask:

c2c3,808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebf,?u?1?2?l?l?l?l?l

We could even go further and detect the that it in the "sub-block" letters and only use this in our mask.
This is a very basic example of how I think about this problem. I'm totally aware there will be cases which aren't this simple. This whole idea isn't set in stone and I'm open to any ideas and suggestions.

new tool: merge frequency counts

Expanding to the more general case mentioned in #8 (comment), it would be very useful to have an optimized tool to efficiently merge frequency-count data.

The use case is merging frequency counts across large datasets, and incrementally adding new frequency counts over time as new data is discovered. Calculating a frequency count for a delta or a new data source, and then merging it with an existing frequency count, is significantly more efficient than recalculating the entire frequency count.

The uniq -c format (integer frequency count, a space, and the item being counted) is the most obvious case, but other formats could be supported.

It would be nice to be able to assume that the list is sorted by the item being counted, but the implementation should assume that it's not. Or, perhaps, like rli vs rli2, one version that does not assume sorting but is memory-bound, and another version that has no size limits but requires sorted input (or a flag to switch between the two).

Reference awk implementation is here.

option: mask list set operations

Would be super useful to have a general way to merge, split, expand, and detect overlap in lists of masks.

For example, these masks:

?l?l?l?l?l?l?l
?l?l?l?l?l?l?u
?l?l?l?l?l?l?d
?l?l?l?l?l?l?s

... could be merged to:

?l?l?l?l?l?l?a

"Splitting" would be the opposite - turning ?a into its components.

Expanding could also be useful, perhaps with thresholding that is a little more sophisticated, based on target keyspace or runtime. For example:

?l?l?l?l?l

... could be split to:

?l?l?l?la
?l?l?l?lb
?l?l?l?lc
[etc]

... to fit a specific target runtime (--PPS).

Add a command line parser

We want to have options and a nice CLI interface. Most likely we will be using the structopt crate for this task.
It supports subcommands, options (both short and long) and a help generator so our interface would look like this:

$ pack2 statsgen --option=value

Add an option for the separator between mask and count

The default separator will be but allowing the user to choose a different one would be nice.
, will be blacklisted because it would make parsing harder as it's already used for specifying the custom charsets described in #2

ability to compare two mask stats file

Comparing the percentage of a stats file from statsgen against a reference file (e.g. rockyou.masks), then sort from highest to lowest difference.
This could be helpful to get some low hanging fruits and insights which masks to try first.
To make it even more powerful we can add the keyspace of each mask into the equation.

feature: allow hard-coded characters in filternask masks

filtermask doesn't appear to properly process hard-coded strings in masks:

$ pack2 filtermask ?l0 test.list
[snip]
a
c
d
e
f
g
h
l
m
n
p
q
s
t
u
v
x
y
z
$ pack2 filtermask ?l00 test.list
thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', /rustc/8d69840ab92ea7f4d323420088dd8c9775f180cd/src/libcore/slice/mod.rs:2842:10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Error message displayed when piping into HEAD command

When running the following command

./pack2 statsgen plains.txt | head

The following error message is displayed. Appears to be cosmetic.

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }', src/statsgen.rs:174:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Add an option to choose the encoding of the input list

The default encoding will always be UTF-8 but we are not living in a perfect world so there will be lists with a different encoding. Converting to UTF-8 isn't a problem but there are some open questions.

  • Do we always use UTF-8 as output encoding for the approach described in #2 ?
  • If not we would have to implement the same for each and every encoding (that's a nightmare).
  • Do we simply ignore/drop invalid encoded input? If not what should be the fallback method? (applies to any encoding)

avoid panic if output terminates due to a downstream pipe

It would be clean to support an expected end of output due to a downstream pipe:

$ tail -n 100000 potfile | cut -d: -f2- | pack2 unhex | head
[redacted]
[redacted]
[redacted]
[redacted]
[redacted]
[redacted]
[redacted]
[redacted]
[redacted]
[redacted]
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }', src/unhex.rs:15:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.