Comments (32)
I just had a discussion with the OpenSSF Security Tooling Working Group members. Everyone was very positive.
YES! LET'S ADD SARIF!
So what's next?
from flawfinder.
I'm definitely open to it. I want to support "the standard way to get vulnerability" data. It hasn't been obvious what that is, e.g., there's SARIF, MITRE's Heimdall Data Format (HDF) for exchanging data between vulnerability tools, and probably others.
from flawfinder.
Those are very sensible arguments. I've emailed the OpenSSF Security Tooling Working Group (openssf-wg-security-tooling AT lists.openssf.org), with a pointer to the comments you just made, to hear their viewpoints.
Unless I hear a good counter-argument soon, I expect the answer to be "yes!", because I do want to encourage the use of standards to exchange this data. I just want to give people time to present a counter-argument first.
from flawfinder.
We've got your sample CSV and are putting together a proposed rendering of the same contents in SARIF. Thanks very much for providing it!
btw - coincidentally, we've started a conversation with the Mitre Heimdall team on the best way to collaborate on interop between our mutual formats. Looks like we'll start with a SARIF -> HDF converter, which means that FlawFinder SARIF will be able to naturally flow from your reports into the HDF viewers.
HDF requires a mapping between a tool's rule ids and the compliance standards it is oriented on. So there may a useful discussion soon on how to create/maintain this mapping. One step at a time, though! :)
from flawfinder.
There's probably a million ways to map things. It's unfortunate SARIF only has 3 levels.
I'd probably map as follows... 4 and 5 : error, 3: warning, 0..2: note.
Of course, flawfinder (like most tools) has false positives and false negatives, so an "error" or "warning" may not really be a problem at all.
from flawfinder.
Hey, @eddynaka and other MS people, the next priority according to the roadmap I outlined above is the following. How's the work going here?
Making the taxonomy web-accessible will allow us to update Flawfinder to produce its mappings to CWEs.
The SARIF converter is not really a desirable option for users, we want users to be able to use FlawFinder directly without pulling in other utilities or dependencies.
from flawfinder.
I just merged #42 to add native SARIF output. I've added more information to the documentation about it. I expect it can be improved, improvements welcome. However, very soon I plan to close this issue (once the SARIF-output version is released). Improvements can be their own issues and/or pull requests.
from flawfinder.
Flawfinder version 2.0.16 has been released, with a new --sarif
option. My thanks! I'm closing this, I think it's done. We can always improve it as a separate issue, or even better, a new pull request :-).
from flawfinder.
HDF is new to me and I'm having trouble finding information on it to comment on this specific format. If you have pointers to anything useful, let me know. My sense is that its orientation is around sharing security/compliance relevant reporting weighted towards (web?) dynamic analysis tools. There appears to be an eco-system of several converters from existing tool formats.
Some outwards trappings of a standard way: a finished public standard, thorough documentation, and industry/other uptake. SARIF has all these. At least one other gov't effort (the SAMATE project which previously produced a SCARF vulnerability format) has adopted SARIF as an alternate, it was supported as a submittal format for SATE VI (you need to search this page for SARIF to see the reference). The DHS/STAMP contributed to funding/design. The DHS-funded SWAMP project also converted from SCARF to SARIF.
Direct uptake includes companies like Microsoft, GrammaTech, Semmle/Code QL, MicroFocus, Codacy, Kestrel, etc. There's a Clang analyzer extension for it and the MS native/managed compilers emit it. Open source direct support includes detekt, spotbugs, eslint, brakeman. There's an eco-system of open source converters from other formats and this is how Microsoft integrates FlawFinder results today.
The above are all the outwards trappings of adoption (and if you have similar information on HDF or other format, I'd be interested in it). Equally or more important to examine the SARIF schema and/or the specification to consider the topics covered by the format. It was a five year design effort, start-to-finish, with contributions from people knowledgeable in the domain. It is a complex format, unfortunately, due to including concepts relevant to more advanced static analysis efforts (e.g., the persistence of code flows, call graphs, results from concurrency analysis, binary-level analysis, web requests/responses, etc.). But MS usage to date demonstrates that the core format for driving IDE integration, bug filing, etc., is suited-to-purpose.
If you have any time to take a look, let us know what you think.
from flawfinder.
That's great to hear! An easy way to make progress would be to:
- Generate substantive FlawFinder CSV files that comprehensively exercise this existing report format.
- Run those CSV files through the SARIF-SDK Multitool 'convert' command in order to transform them into SARIF. This output will be a real assist to you in identifying the core data to emit (without requiring anyone to absorb a 220 page spec).
- Identify and code an appropriate approach in FlawFinder, command-line arg update, etc., to incorporate exporting the new format.
- Review other FlawFinder concepts that may not be expressed in the CSV report today but can be emitted to SARIF and make a call on whether to export them. SARIF can accept computed fingerprints for baselining, for example. SARIF can accept 'logical locations' which are identifiers that aren't tied to source artifacts (file name, line locations), e.g., a C++ undecorated name.
- Run the candidate FlawFinder SARIF through our online validation tools (or use the multitool again to perform this analysis) to verify everything is in good shape. We (GitHub and Microsoft) have a special interest in ensuring that FlawFinder SARIF works very well when ingested into systems GitHun Advanced Security, and have authored some special checks for these.
I suppose the immediate step is to bless a contributor. We are glad to provide SARIF expertise and tooling support through-out the process. If you'd like us to pull together a contribution, we'd be glad to do that, too. :)
Michael
from flawfinder.
I can generate a CSV file. Would you like to run this "SARIF-SDK multitool" on it?
from flawfinder.
Yes, glad to. If you'd like to send to me directly at microsoft.com, my alias is 'mikefan'. That would be a good opportunity for me to ask you a few questions about your reproducible build work as well. :)
from flawfinder.
@michaelcfanning
converted the flawfinder csv result into sarif file using SARIF.multitool converter. Attached the result file, please review.
Validated the result using SARIF validator, it reports 2 types of errors:
- SARIF2005: runs[0].tool.driver: The tool 'FlawFinder' does not provide any of the version-related properties 'version', 'semanticVersion', 'dottedQuadFileVersion'.
- SARIF2005: runs[0].tool.driver: The tool 'FlawFinder' does not provide 'informationUri'.
- SARIF2004: The 'artifacts' array at 'runs[0].artifacts' contains no information beyond the locations of the artifacts.
SARIF2005 is because the csv doesn't have information of tool version/information uri.
SARIF2004 does it mean runs[0].artifacts missing index attribute?
from flawfinder.
@david-a-wheeler , attached you can find a sarif file with a few modifications:
- added tool.driver.version and tool.driver.informationUri
- added tool.driver.rule.helpUri. With that, you can show to the user and point to a specific documentation (if you have)
- added an object called originalUriBaseIds that contain the base of your paths. And, with that, you can simplify the usage in the results (check that now it contains the relative path + uriBaseId)
Let us know what do you think!
from flawfinder.
This is a great start! Here's some detailed feedback on a next revision:
- Please define a SARIF friendly opaque id, with a prefix that David suggests. As a convention, I suggest starting rule ids at 1001 and increasing them monotonically. Tools often reserve ids in the range of 0XXX for internal/engine errors. FF1001, FF1002, etc., would be a good choice. David might prefer a language prefix, such as CPP1001, CPP1002, etc.
- Please make sure all message strings end with a period. This is done in order to create some consistency in look-and-feel when SARIF aggregating systems display results.
- Please remove the leading spaces from the code snippets in the log file. I don't see that the CSV report David provided contains these. Does the inclusion of this whitespace indicate a problem in our SARIF converter? Or is it actually true that the FF regions specified in the CSV include this data?
- Please populate the
fingerprints
property with the FF match hash, it is not actually apartialFingerprint
. Read the spec to learn more. :) Also, please version this hash, i.e., use the key namecontextHash/v1
. I use the termcontextHash
based on reading David's documentation, he refers to this value as a SHA256 hash of relevant context. The versioning allows some future-proofing in case FF changes its hashing approach and emits multiple version in future for compatibility. This is typically done to allow for result matching across multiple SARIF log files where the fingerprint algorithm has changed in the tool. - Please find a way to put the FF
Level
data into the SARIFranking
property. Then eliminate the level data from the log file. Generally, we drop information from the property bag if it can be directly emitted into the SARIF. - If you provide help URIs in your sample, put them everywhere. You can use a fake URL from example.com or a speculative URL pointing within the FF repository.
- Finally, for this v1, it is fine to append the CWE data into the user-facing message. Moving forward, however, we should look at capturing this data in the SARIF taxonomies design (read the spec! ). The most sophisticated utilization would be: 1) we define and maintain a SARIF format taxonomy that exhaustively describes the common weakness enumeration. 2) we update FF SARIF production such that its
rules
table properly associates various rules to the relevant CWE mapping.
Again, for item 7, this can probably wait. It's clear we will want to make sure that FF results can move into Heimdall/HDF viewers (we will work on a SARIF -> HDF converter next), and this will require an explicit mapping from FF -> CWE (and a separate NIST taxonomy). More on this later...
from flawfinder.
An "FF" prefix works for me!
from flawfinder.
hi @david-a-wheeler, SARIF uses level property to specify severity level (§3.27.10) of the result (hit in Flawfinder) with below 3 values.
- "error": a serious problem was found.
- "warning": a problem was found.
- "note": a minor problem or an opportunity to improve the code was found.
How should we map the level of hit in Flawfinder (from 0 to 5) to above SARIF level?
Can we assume level 4 & 5 in Flawfinder map to error in SARIF, and others levels map to warning?
from flawfinder.
Thank you @david-a-wheeler for clarification about the level mapping.
Another question regarding adding SARIF friendly opaque rule id to each rule. E.g. assign FF1001 to rule "strncpy"
I see the existing rules defined in dictionary c_ruleset and the rule name can combine multiple function names.
e.g. blow rule definition will be split into 3 rules "memcpy", "CopyMemory", "bcopy", these 3 rules share other rule data except name.
"memcpy|CopyMemory|bcopy":
(c_memcpy, 2, # I've found this to have a lower risk in practice.
"Does not check for buffer overflows when copying to destination (CWE-120)",
"Make sure destination can always hold the source data",
"buffer", "", {}),
In this case, should we assign 1 rule id ,e.g. FF1002, to these 3 rules statically?
or independent rule id for each rule e.g.FF1002, FF1003, FF1004, dynamically?
If we assign rule id dynamically, in the case that need to extend existing rules with more function names
we need to make sure the id is unique and not used by other rules and keep it stable (rule id should not change over time).
@david-a-wheeler @michaelcfanning @eddynaka pls let me know what do you think?
from flawfinder.
I prefer assigning a rule id per rule in c_ruleset, because each rule describes 1 issue, it can related to many functions.
This way its easier to maintain the rule id.
I have created a draft PR #35 can you pls review?
from flawfinder.
Three things:
- If someone could provide an updated SARIF file that we think represents our current target, I'd appreciate it.
- You might read the spec re: SARIF sub-ids. A sub-id is a qualifier that marks a check as being incorporated within a larger issue (i.e., shares a rule id + rule metadata) but has some other distinguishing characters. The typical principle we use or deciding whether a rule should be its own id (or not) is to think about whether a user might want to selectively enable/disable analyses. A second test is whether each topic might warrant its own help topic (due to having some distinct technical/conceptual complexity).
David, you're right, there are many ways to categorize checks! When SARIF was designed, a goal was to define something that could generally be used to express the existing body of checkers and so our approach is, admittedly a lowest common denominator, arguably. On the other hand, our categories, in practice, tend to be what teams mostly need: 1) error, always displayed, blocks things like build or check-in, 2) warning, always displays, but doesn't block developer activity, 3) note, does not display by default. No one likes to lose items in #3 but few people look at them. :)
I mention this standard in case it helps you evaluate your bucketing of FF levels 0 - 5 differently than what you decided above. According to SARIF principles, levels of 0 - 2 (Note) would not display in a SARIF viewer like Visual Studio unless a developer explicitly opted in (by in this example, clicking a VS checkbox which happens to be named 'Messages', which is disabled by default.
We do have the SARIF 'rank' property that can be used to render all your levels. This property will be useful to sort your notes and errors to maintain FF's actual prioritization.
Here's what it would look like currently, seem right to you?
SARIF rank |
FF Level | SARIF level |
Default Viewer Action |
---|---|---|---|
0.0 | 0 | note |
Does not display by default |
0.2 | 1 | note |
Does not display by default |
0.4 | 2 | note |
Does not display by default |
0.6 | 3 | warning |
Displays by default, does not break build/other processes |
0.8 | 4 | error |
Displays by default, breaks build/other processes |
1.0 | 5 | error |
Displays by default, breaks build/other processes |
from flawfinder.
Attached the latest Flawfinder csv and Sarif log generated by converter. Please review:
Results.zip
from flawfinder.
I merged in the change from @yongyan-gh . What's next?
from flawfinder.
At the moment I'm waiting for future pull requests to review & merge. If something else needs doing, please let me know!
from flawfinder.
Okay... I'm waiting for more pull requests. If that's not what I should be expecting, please let me know!
from flawfinder.
Hi @david-a-wheeler , just a quick update. We are working in the taxonomies part. So we can link FlawFinder with it. As soon as we finish it, we will be able to come back and link it!
from flawfinder.
Great! I wasn't sure if you were waiting for me to do something :-). Let me know how I can help get this completed.
from flawfinder.
So, we first needed to author a SARIF taxonomy for CWE and that's under final review. Next, we need some short-term place to publish this content, perhaps in the OASIS SARIF repo, where we publish the finished specifications. Making the taxonomy web-accessible will allow us to update Flawfinder to produce its mappings to CWEs. In parallel, we can (and should) be authoring a SARIF (+CWE mappings) to Heimdall converter. We also need to nail down an official published location for the CWE data. Conveniently, an OASIS SARIF technical committee member is also on the CWE board, so will ask for his help getting a discussion. Your suggestion also appreciated on a suitable place to publish/maintain this data, if you have them.
So that's the rough roadmap! @eddynaka, this little thinking exercise have been helpful in making me understand that we are ready to start planning the HDF converter work. I believe HDF depends on a second NIST standard (in addition to CWE) that we likely need to author a taxonomy for.
from flawfinder.
@david-a-wheeler , just a quick update.
We finished some work in the taxonomies: http://github.com/sarif-standard/taxonomies/
For now, we are hosting those in the github repository, but we are also looking where is the best place to host it.
We also updated the sarif-flawfinder converter to show the CWE taxonomies: microsoft/sarif-sdk#2332
from flawfinder.
I appreciate that! I'm currently waiting for one of you to send me the actual patch set to generate at least a starting SARIF. Note that this would also implement #39 .
from flawfinder.
Hi @david-a-wheeler ,
below the commands:
- first, let's install the tool:
dotnet tool install sarif.multitool --global
- let's convert:
sarif convert c:\flawfinder --tool FlawFinder --output c:\flawfinder.sarif
To have the taxonomies, you will need at least version 2.4.6.
from flawfinder.
created a PR #42 Add native sarif output, includes CWE taxonomies supports. Please review.
from flawfinder.
Hi @david-a-wheeler , thanks for reviewing and approving!
from flawfinder.
Related Issues (20)
- How i can get an output with .csv format? In python,i use "flawfinder ./test" HOT 1
- Add GitHub Actions integration HOT 31
- `c_printf` possible false positive for format macro constant HOT 1
- Add an svg icon file for GitHub actions HOT 56
- Consider rewriting this to use joern or alternative HOT 1
- Only output CSV when using CSV option HOT 1
- std::istream::read() reports security issue, false alert? HOT 4
- Feature Request: Support Stream Use
- Presence of ioctl
- Add a --ignore option
- Invalid helpUri generated HOT 1
- SARIF artifact location paths HOT 3
- Character Encoding Error on UTF-8 Encoded Source File with U+0441 HOT 18
- Warn when PQExec is called with a non-constant to warn about SQL injection in PostgreSQL
- --csv option wont output hits to csv file from mac terminal
- FF1057 is missing CWE attribution in the warning text HOT 1
- Flawfinder does scan the directory with symlinks and exits quietly with error code HOT 1
- binary/hex integer literals with separators lead to parse error HOT 2
- Flawfinder reports abseil::StrCat the same as std:strcat HOT 1
- Can I Modfy more CWE? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flawfinder.