Giter Club home page Giter Club logo

Comments (7)

KevinHock avatar KevinHock commented on July 23, 2024

There are also some AKIAblahblah keys that do not have high-entropy enough, but I suppose that should be a different issue :)

from detect-secrets.

mohit-surana avatar mohit-surana commented on July 23, 2024

I wanted to work on this issue. On doing a little bit of digging, I came up with two proposed solutions and would really appreciate any comments regarding the same:

  1. Can we expect the client to include a hyphen within the charset? If yes, then I believe we just need to use re.escape(charset) instead of charset on this line
  2. If clients should continue to use the existing charset, we need to either enrich just the regex, or both the regex and charset (internally). Unless we append the hyphen to the charset in the constructor, the entropy calculation will not use hyphens. So should hyphens be included in the entropy calculation?

Finally, I believe tests should go here, right?
Thanks!

from detect-secrets.

domanchi avatar domanchi commented on July 23, 2024

Those are good questions @mohit-surana! The short answers are:

  1. No (we should be able to support both, holistically)
  2. Yes?

Based on the entropy algorithm, it seems that the more characters in the charset, the higher the entropy can be.

Following this logic, it would suggest that a more liberal charset may require a different entropy configuration level, seeing that the same level may produce more false positives.

However, if this is true, then any additions to the charset would require a completely separate plugin (e.g. adding hyphens and percentage signs -%), and the maintenance of these potential plugins could get very messy.

Any thoughts on this?

from detect-secrets.

mohit-surana avatar mohit-surana commented on July 23, 2024

Theoretically, yes. It would increase the false positive rate while reducing false negative rates as well. Ultimately it will be a trade-off between false positive and false negatives. Do we have any statistics regarding the current system's false positive rates?

How can we design good tests to measure the new statistics, that have a large coverage to assess the new FP/FN rates?

As for new plugins, it seems to me that ultimately, making changes to the entropy calculation is a big NO as it may affect current clients. And you can make combinatorial number of plugins if we make one for each small difference. Would it be better to allow clients to pass an additional argument indicating whether they would like to include additional preset/client specified symbols?

Bottom line: If FP increases a lot, we need to have the client make a conscious decision to move into a new version that supports hyphens.

from detect-secrets.

domanchi avatar domanchi commented on July 23, 2024

I'm in favor of the additional argument, but I don't know how that might look like with the user interface. Certainly would increase the scope of this issue (and perhaps no longer a "good first issue")! If you still wanted to take it on, we'd more than welcome the contribution!

Otherwise, the AKIA prefixed issue that @KevinHock mentioned may be a good start. Though it doesn't strictly find the AWS secret, it gives a good indication that there might be a secret there, in the same principle as "where there's smoke, there's fire".

As for testing FP/FN rates, we are building a large internal collection of various different secrets that we use to experiment with our new plugins. We can certainly run your plugin on our corpus, and help tweak its default sensitivity.

from detect-secrets.

mohit-surana avatar mohit-surana commented on July 23, 2024

Hey @domanchi. I am interested in implementing the additional argument version of the solution. I am a bit caught up with stuff at the moment and I'll get back to it as soon as time permits!

The internal corpus sounds like a really good idea, and in general will help attract more users as well. As for the AKIA prefix, I will need to think further to understand how we can incorporate patterns along with the entropy calculation. Let me get back to you!

from detect-secrets.

lorenzodb1 avatar lorenzodb1 commented on July 23, 2024

We're going to close this issue as it hasn't received any update in a very long time. Feel free to re-open it if you think it's still relevant.

from detect-secrets.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.