Giter Club home page Giter Club logo

leakdb's Introduction

Principal Consultant & Red Team Lead at Bishop Fox. I work on mostly security related projects in Go, Python, and TypeScript.

Metrics

leakdb's People

Contributors

moloch-- avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

leakdb's Issues

Documentation error

Hi there,
I try to use your tool and struggle after the normalization process.

Your doc suggests to use a command like that $> leakdb-curator --json file.json but it doesnt work, the program returns Applying bloom filter ...[!] open file.json: not a directory.
Il looks that the doc is missing just a word/flag/command on your example to work :)

Thanks by advance.

Search getting 0 results

Describe the bug
I'm trying to match 3M lines of emails to a 700M lines (roughly 50GB), but after everything is going smoothly and after doing a bunch of tests, I can't get a single match returned, even on the emails/user that I know are in my dataset for sure.
All the processes are running on an AWS instance (so I followed the server deployment steps), tried to build from source and use the released version, tried to split my data into smaller files, but still no results. I tried launching the server version and requesting through http request as well.

The really wieird thing is that when I run a search on the test folder you provide, it works properly with your provided indexes.
But when I try to regenerate the indexes for small.txt using the doc from the wiki, I'm not getting any results and when I diff my generated index, and the one you provide, they differ, so I'm guessing it has something to do with how the index generation/sorting .

To Reproduce
Steps to reproduce the behavior:

  1. ./leakdb-curator --format colon-newline --recursive --target ./large-folder-containing-all --output normalized.json
  2. ./leakdb-curator --json normalized.json
  3. ./leakdb-curator search -i leakdb/email.idx -j leakdb/bloomed.json -v "[email protected]"
    Response : Found 0 results ..
  4. grep -F "[email protected]" bloomed.json
    Response : {"email": "xxx", "user": "xxx", "domain": "gmail.com", "password": "xxx"}

I really wish I could get this to work because it looks amazing, I'm at your disposal for any questions/tests you want me to run.

Enzyro

"source" field in normalized JSON?

Would it be feasible to add a "source" field to the JSON/indexed data, so you could "tag" entries as being from certain leaks.

This could be very useful when trying to go back later and attribute where a piece of data came from - but unsure if it would have performance impacts?

[Feature] Monitor list of keywords

Feature request
Is it feasible to give a list of keywords and when a database is getting indexed if any of those keywords match, the tool sends a notification to Slack/Telegram/Email. This feature would allow active monitoring rather than searching for those keywords once the database is created. For example, the keyword apple.com if it's added to a monitoring list and if a certain leak or database contains multiple leaks for that domain. It's easy to remidiate and request password reset for those leaked credentials. What do you think about it?

Bloom Filter

Hi @moloch-- !

I saw this project today after reviewing some of your other work. I was wondering if you had considered using a cuckoo filter instead? This structure would support entry deletion, as well as growth. Bloom filter size parameters would not be required.
https://github.com/seiflotfy/cuckoofilter

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.