Giter Club home page Giter Club logo

Comments (32)

david-a-wheeler avatar david-a-wheeler commented on July 2, 2024 3

I just had a discussion with the OpenSSF Security Tooling Working Group members. Everyone was very positive.

YES! LET'S ADD SARIF!

So what's next?

from flawfinder.

david-a-wheeler avatar david-a-wheeler commented on July 2, 2024 1

I'm definitely open to it. I want to support "the standard way to get vulnerability" data. It hasn't been obvious what that is, e.g., there's SARIF, MITRE's Heimdall Data Format (HDF) for exchanging data between vulnerability tools, and probably others.

from flawfinder.

david-a-wheeler avatar david-a-wheeler commented on July 2, 2024 1

Those are very sensible arguments. I've emailed the OpenSSF Security Tooling Working Group (openssf-wg-security-tooling AT lists.openssf.org), with a pointer to the comments you just made, to hear their viewpoints.

Unless I hear a good counter-argument soon, I expect the answer to be "yes!", because I do want to encourage the use of standards to exchange this data. I just want to give people time to present a counter-argument first.

from flawfinder.

michaelcfanning avatar michaelcfanning commented on July 2, 2024 1

We've got your sample CSV and are putting together a proposed rendering of the same contents in SARIF. Thanks very much for providing it!

btw - coincidentally, we've started a conversation with the Mitre Heimdall team on the best way to collaborate on interop between our mutual formats. Looks like we'll start with a SARIF -> HDF converter, which means that FlawFinder SARIF will be able to naturally flow from your reports into the HDF viewers.

HDF requires a mapping between a tool's rule ids and the compliance standards it is oriented on. So there may a useful discussion soon on how to create/maintain this mapping. One step at a time, though! :)

from flawfinder.

david-a-wheeler avatar david-a-wheeler commented on July 2, 2024 1

There's probably a million ways to map things. It's unfortunate SARIF only has 3 levels.

I'd probably map as follows... 4 and 5 : error, 3: warning, 0..2: note.

Of course, flawfinder (like most tools) has false positives and false negatives, so an "error" or "warning" may not really be a problem at all.

from flawfinder.

michaelcfanning avatar michaelcfanning commented on July 2, 2024 1

Hey, @eddynaka and other MS people, the next priority according to the roadmap I outlined above is the following. How's the work going here?

Making the taxonomy web-accessible will allow us to update Flawfinder to produce its mappings to CWEs.

The SARIF converter is not really a desirable option for users, we want users to be able to use FlawFinder directly without pulling in other utilities or dependencies.

from flawfinder.

david-a-wheeler avatar david-a-wheeler commented on July 2, 2024 1

I just merged #42 to add native SARIF output. I've added more information to the documentation about it. I expect it can be improved, improvements welcome. However, very soon I plan to close this issue (once the SARIF-output version is released). Improvements can be their own issues and/or pull requests.

from flawfinder.

david-a-wheeler avatar david-a-wheeler commented on July 2, 2024 1

Flawfinder version 2.0.16 has been released, with a new --sarif option. My thanks! I'm closing this, I think it's done. We can always improve it as a separate issue, or even better, a new pull request :-).

from flawfinder.

michaelcfanning avatar michaelcfanning commented on July 2, 2024

HDF is new to me and I'm having trouble finding information on it to comment on this specific format. If you have pointers to anything useful, let me know. My sense is that its orientation is around sharing security/compliance relevant reporting weighted towards (web?) dynamic analysis tools. There appears to be an eco-system of several converters from existing tool formats.

Some outwards trappings of a standard way: a finished public standard, thorough documentation, and industry/other uptake. SARIF has all these. At least one other gov't effort (the SAMATE project which previously produced a SCARF vulnerability format) has adopted SARIF as an alternate, it was supported as a submittal format for SATE VI (you need to search this page for SARIF to see the reference). The DHS/STAMP contributed to funding/design. The DHS-funded SWAMP project also converted from SCARF to SARIF.

Direct uptake includes companies like Microsoft, GrammaTech, Semmle/Code QL, MicroFocus, Codacy, Kestrel, etc. There's a Clang analyzer extension for it and the MS native/managed compilers emit it. Open source direct support includes detekt, spotbugs, eslint, brakeman. There's an eco-system of open source converters from other formats and this is how Microsoft integrates FlawFinder results today.

The above are all the outwards trappings of adoption (and if you have similar information on HDF or other format, I'd be interested in it). Equally or more important to examine the SARIF schema and/or the specification to consider the topics covered by the format. It was a five year design effort, start-to-finish, with contributions from people knowledgeable in the domain. It is a complex format, unfortunately, due to including concepts relevant to more advanced static analysis efforts (e.g., the persistence of code flows, call graphs, results from concurrency analysis, binary-level analysis, web requests/responses, etc.). But MS usage to date demonstrates that the core format for driving IDE integration, bug filing, etc., is suited-to-purpose.

Spec is here
Schema here

If you have any time to take a look, let us know what you think.

from flawfinder.

michaelcfanning avatar michaelcfanning commented on July 2, 2024

That's great to hear! An easy way to make progress would be to:

  1. Generate substantive FlawFinder CSV files that comprehensively exercise this existing report format.
  2. Run those CSV files through the SARIF-SDK Multitool 'convert' command in order to transform them into SARIF. This output will be a real assist to you in identifying the core data to emit (without requiring anyone to absorb a 220 page spec).
  3. Identify and code an appropriate approach in FlawFinder, command-line arg update, etc., to incorporate exporting the new format.
  4. Review other FlawFinder concepts that may not be expressed in the CSV report today but can be emitted to SARIF and make a call on whether to export them. SARIF can accept computed fingerprints for baselining, for example. SARIF can accept 'logical locations' which are identifiers that aren't tied to source artifacts (file name, line locations), e.g., a C++ undecorated name.
  5. Run the candidate FlawFinder SARIF through our online validation tools (or use the multitool again to perform this analysis) to verify everything is in good shape. We (GitHub and Microsoft) have a special interest in ensuring that FlawFinder SARIF works very well when ingested into systems GitHun Advanced Security, and have authored some special checks for these.

I suppose the immediate step is to bless a contributor. We are glad to provide SARIF expertise and tooling support through-out the process. If you'd like us to pull together a contribution, we'd be glad to do that, too. :)

Michael

from flawfinder.

david-a-wheeler avatar david-a-wheeler commented on July 2, 2024

I can generate a CSV file. Would you like to run this "SARIF-SDK multitool" on it?

from flawfinder.

michaelcfanning avatar michaelcfanning commented on July 2, 2024

Yes, glad to. If you'd like to send to me directly at microsoft.com, my alias is 'mikefan'. That would be a good opportunity for me to ask you a few questions about your reproducible build work as well. :)

@eddynaka

from flawfinder.

yongyan-gh avatar yongyan-gh commented on July 2, 2024

@michaelcfanning
converted the flawfinder csv result into sarif file using SARIF.multitool converter. Attached the result file, please review.

flawfinder-all.zip

Validated the result using SARIF validator, it reports 2 types of errors:

  • SARIF2005: runs[0].tool.driver: The tool 'FlawFinder' does not provide any of the version-related properties 'version', 'semanticVersion', 'dottedQuadFileVersion'.
  • SARIF2005: runs[0].tool.driver: The tool 'FlawFinder' does not provide 'informationUri'.
  • SARIF2004: The 'artifacts' array at 'runs[0].artifacts' contains no information beyond the locations of the artifacts.

SARIF2005 is because the csv doesn't have information of tool version/information uri.
SARIF2004 does it mean runs[0].artifacts missing index attribute?

from flawfinder.

eddynaka avatar eddynaka commented on July 2, 2024

@david-a-wheeler , attached you can find a sarif file with a few modifications:

  1. added tool.driver.version and tool.driver.informationUri
  2. added tool.driver.rule.helpUri. With that, you can show to the user and point to a specific documentation (if you have)
  3. added an object called originalUriBaseIds that contain the base of your paths. And, with that, you can simplify the usage in the results (check that now it contains the relative path + uriBaseId)

Let us know what do you think!

flawfinder-all.zip

from flawfinder.

michaelcfanning avatar michaelcfanning commented on July 2, 2024

This is a great start! Here's some detailed feedback on a next revision:

  1. Please define a SARIF friendly opaque id, with a prefix that David suggests. As a convention, I suggest starting rule ids at 1001 and increasing them monotonically. Tools often reserve ids in the range of 0XXX for internal/engine errors. FF1001, FF1002, etc., would be a good choice. David might prefer a language prefix, such as CPP1001, CPP1002, etc.
  2. Please make sure all message strings end with a period. This is done in order to create some consistency in look-and-feel when SARIF aggregating systems display results.
  3. Please remove the leading spaces from the code snippets in the log file. I don't see that the CSV report David provided contains these. Does the inclusion of this whitespace indicate a problem in our SARIF converter? Or is it actually true that the FF regions specified in the CSV include this data?
  4. Please populate the fingerprints property with the FF match hash, it is not actually a partialFingerprint. Read the spec to learn more. :) Also, please version this hash, i.e., use the key name contextHash/v1. I use the term contextHash based on reading David's documentation, he refers to this value as a SHA256 hash of relevant context. The versioning allows some future-proofing in case FF changes its hashing approach and emits multiple version in future for compatibility. This is typically done to allow for result matching across multiple SARIF log files where the fingerprint algorithm has changed in the tool.
  5. Please find a way to put the FF Level data into the SARIF ranking property. Then eliminate the level data from the log file. Generally, we drop information from the property bag if it can be directly emitted into the SARIF.
  6. If you provide help URIs in your sample, put them everywhere. You can use a fake URL from example.com or a speculative URL pointing within the FF repository.
  7. Finally, for this v1, it is fine to append the CWE data into the user-facing message. Moving forward, however, we should look at capturing this data in the SARIF taxonomies design (read the spec! ). The most sophisticated utilization would be: 1) we define and maintain a SARIF format taxonomy that exhaustively describes the common weakness enumeration. 2) we update FF SARIF production such that its rules table properly associates various rules to the relevant CWE mapping.

Again, for item 7, this can probably wait. It's clear we will want to make sure that FF results can move into Heimdall/HDF viewers (we will work on a SARIF -> HDF converter next), and this will require an explicit mapping from FF -> CWE (and a separate NIST taxonomy). More on this later...

from flawfinder.

david-a-wheeler avatar david-a-wheeler commented on July 2, 2024

An "FF" prefix works for me!

from flawfinder.

yongyan-gh avatar yongyan-gh commented on July 2, 2024

hi @david-a-wheeler, SARIF uses level property to specify severity level (§3.27.10) of the result (hit in Flawfinder) with below 3 values.

  • "error": a serious problem was found.
  • "warning": a problem was found.
  • "note": a minor problem or an opportunity to improve the code was found.
    How should we map the level of hit in Flawfinder (from 0 to 5) to above SARIF level?
    Can we assume level 4 & 5 in Flawfinder map to error in SARIF, and others levels map to warning?

from flawfinder.

yongyan-gh avatar yongyan-gh commented on July 2, 2024

Thank you @david-a-wheeler for clarification about the level mapping.

Another question regarding adding SARIF friendly opaque rule id to each rule. E.g. assign FF1001 to rule "strncpy"
I see the existing rules defined in dictionary c_ruleset and the rule name can combine multiple function names.
e.g. blow rule definition will be split into 3 rules "memcpy", "CopyMemory", "bcopy", these 3 rules share other rule data except name.

    "memcpy|CopyMemory|bcopy":
    (c_memcpy, 2,  # I've found this to have a lower risk in practice.
     "Does not check for buffer overflows when copying to destination (CWE-120)",
     "Make sure destination can always hold the source data",
     "buffer", "", {}),

In this case, should we assign 1 rule id ,e.g. FF1002, to these 3 rules statically?
or independent rule id for each rule e.g.FF1002, FF1003, FF1004, dynamically?
If we assign rule id dynamically, in the case that need to extend existing rules with more function names
we need to make sure the id is unique and not used by other rules and keep it stable (rule id should not change over time).

@david-a-wheeler @michaelcfanning @eddynaka pls let me know what do you think?

from flawfinder.

yongyan-gh avatar yongyan-gh commented on July 2, 2024

I prefer assigning a rule id per rule in c_ruleset, because each rule describes 1 issue, it can related to many functions.
This way its easier to maintain the rule id.
I have created a draft PR #35 can you pls review?

from flawfinder.

michaelcfanning avatar michaelcfanning commented on July 2, 2024

Three things:

  1. If someone could provide an updated SARIF file that we think represents our current target, I'd appreciate it.
  2. You might read the spec re: SARIF sub-ids. A sub-id is a qualifier that marks a check as being incorporated within a larger issue (i.e., shares a rule id + rule metadata) but has some other distinguishing characters. The typical principle we use or deciding whether a rule should be its own id (or not) is to think about whether a user might want to selectively enable/disable analyses. A second test is whether each topic might warrant its own help topic (due to having some distinct technical/conceptual complexity).

David, you're right, there are many ways to categorize checks! When SARIF was designed, a goal was to define something that could generally be used to express the existing body of checkers and so our approach is, admittedly a lowest common denominator, arguably. On the other hand, our categories, in practice, tend to be what teams mostly need: 1) error, always displayed, blocks things like build or check-in, 2) warning, always displays, but doesn't block developer activity, 3) note, does not display by default. No one likes to lose items in #3 but few people look at them. :)

I mention this standard in case it helps you evaluate your bucketing of FF levels 0 - 5 differently than what you decided above. According to SARIF principles, levels of 0 - 2 (Note) would not display in a SARIF viewer like Visual Studio unless a developer explicitly opted in (by in this example, clicking a VS checkbox which happens to be named 'Messages', which is disabled by default.

We do have the SARIF 'rank' property that can be used to render all your levels. This property will be useful to sort your notes and errors to maintain FF's actual prioritization.

Here's what it would look like currently, seem right to you?

SARIF rank FF Level SARIF level Default Viewer Action
0.0 0 note Does not display by default
0.2 1 note Does not display by default
0.4 2 note Does not display by default
0.6 3 warning Displays by default, does not break build/other processes
0.8 4 error Displays by default, breaks build/other processes
1.0 5 error Displays by default, breaks build/other processes

from flawfinder.

yongyan-gh avatar yongyan-gh commented on July 2, 2024

Attached the latest Flawfinder csv and Sarif log generated by converter. Please review:
Results.zip

from flawfinder.

david-a-wheeler avatar david-a-wheeler commented on July 2, 2024

I merged in the change from @yongyan-gh . What's next?

from flawfinder.

david-a-wheeler avatar david-a-wheeler commented on July 2, 2024

At the moment I'm waiting for future pull requests to review & merge. If something else needs doing, please let me know!

from flawfinder.

david-a-wheeler avatar david-a-wheeler commented on July 2, 2024

Okay... I'm waiting for more pull requests. If that's not what I should be expecting, please let me know!

from flawfinder.

eddynaka avatar eddynaka commented on July 2, 2024

Hi @david-a-wheeler , just a quick update. We are working in the taxonomies part. So we can link FlawFinder with it. As soon as we finish it, we will be able to come back and link it!

from flawfinder.

david-a-wheeler avatar david-a-wheeler commented on July 2, 2024

Great! I wasn't sure if you were waiting for me to do something :-). Let me know how I can help get this completed.

from flawfinder.

michaelcfanning avatar michaelcfanning commented on July 2, 2024

So, we first needed to author a SARIF taxonomy for CWE and that's under final review. Next, we need some short-term place to publish this content, perhaps in the OASIS SARIF repo, where we publish the finished specifications. Making the taxonomy web-accessible will allow us to update Flawfinder to produce its mappings to CWEs. In parallel, we can (and should) be authoring a SARIF (+CWE mappings) to Heimdall converter. We also need to nail down an official published location for the CWE data. Conveniently, an OASIS SARIF technical committee member is also on the CWE board, so will ask for his help getting a discussion. Your suggestion also appreciated on a suitable place to publish/maintain this data, if you have them.

So that's the rough roadmap! @eddynaka, this little thinking exercise have been helpful in making me understand that we are ready to start planning the HDF converter work. I believe HDF depends on a second NIST standard (in addition to CWE) that we likely need to author a taxonomy for.

from flawfinder.

eddynaka avatar eddynaka commented on July 2, 2024

@david-a-wheeler , just a quick update.
We finished some work in the taxonomies: http://github.com/sarif-standard/taxonomies/
For now, we are hosting those in the github repository, but we are also looking where is the best place to host it.

We also updated the sarif-flawfinder converter to show the CWE taxonomies: microsoft/sarif-sdk#2332

from flawfinder.

david-a-wheeler avatar david-a-wheeler commented on July 2, 2024

I appreciate that! I'm currently waiting for one of you to send me the actual patch set to generate at least a starting SARIF. Note that this would also implement #39 .

from flawfinder.

eddynaka avatar eddynaka commented on July 2, 2024

Hi @david-a-wheeler ,

below the commands:

  1. first, let's install the tool:
dotnet tool install sarif.multitool --global
  1. let's convert:
sarif convert c:\flawfinder --tool FlawFinder --output c:\flawfinder.sarif

To have the taxonomies, you will need at least version 2.4.6.

from flawfinder.

yongyan-gh avatar yongyan-gh commented on July 2, 2024

created a PR #42 Add native sarif output, includes CWE taxonomies supports. Please review.

from flawfinder.

eddynaka avatar eddynaka commented on July 2, 2024

Hi @david-a-wheeler , thanks for reviewing and approving!

from flawfinder.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.