oasis-tcs / sarif-spec Goto Github PK

OASIS SARIF TC: Repository for development of the draft standard, where requests for modification should be made via Github Issues

Home Page: https://github.com/oasis-tcs/sarif-spec

License: Other

HTML 97.13% Python 2.54% Shell 0.01% CSS 0.31% Makefile 0.02%

sarif-spec's Introduction

README

Members of the OASIS Static Analysis Results Interchange Format (SARIF) TC create and manage technical content in this TC GitHub repository ( https://github.com/oasis-tcs/sarif-spec ) as part of the TC's chartered work (i.e., the program of work and deliverables described in its charter).

OASIS TC GitHub repositories, as described in GitHub Repositories for OASIS TC Members' Chartered Work, are governed by the OASIS TC Process, IPR Policy, and other policies, similar to TC Wikis, TC JIRA issues tracking instances, TC SVN/Subversion repositories, etc. While they make use of public GitHub repositories, these TC GitHub repositories are distinct from OASIS Open Repositories, which are used for development of open source licensed content.

Description

The purpose of the SARIF TC is to define a standard output format for static analysis tools, which will be called the Static Analysis Results Interchange Format (SARIF). This GitHub repository supports development of the draft SARIF standard. Requests for modification should be made via Github Issues.

A static analysis tool is a program that examines programming artifacts in order to detect problems, without executing the program. Software developers use a variety of static analysis tools to assess the quality of their programs. To form an overall picture of program quality, developers must often aggregate the results produced by all of these tools. This aggregation is more difficult if each tool produces output in a different format. A standard output format would make it feasible for developers and teams to view, understand, interact with, and manage the results produced by all the tools that they use.

Submission request from David Keaton (SARIF TC Co-Chair): we expect to populate this repository from the existing one found here: https://github.com/sarif-standard/sarif-spec.

Contributions

As stated in this repository's CONTRIBUTING file, contributors to this repository are expected to be Members of the OASIS SARIF TC, for any substantive change requests. Anyone wishing to contribute to this GitHub project and participate in the TC's technical activity is invited to join as an OASIS TC Member. Public feedback is also accepted, subject to the terms of the OASIS Feedback License.

Licensing

Please see the LICENSE file for description of the license terms and OASIS policies applicable to the TC's work in this GitHub project. Content in this repository is intended to be part of the SARIF TC's permanent record of activity, visible and freely available for all to use, subject to applicable OASIS policies, as presented in the repository LICENSE file.

Further Description of this Repository

[Any narrative content may be provided here by the TC, for example, if the Members wish to provide an extended statement of purpose.]

Contact

Please send questions or comments about OASIS TC GitHub repositories to Robin Cover and Chet Ensign. For questions about content in this repository, please contact the TC Chair or Co-Chairs as listed on the the SARIF TC's home page.

sarif-spec's People

Contributors

Stargazers

Watchers

sarif-spec's Issues

Should annotatedCodeLocation have a formattedMessage property?

Copied from sarif-standard/sarif-spec-v1#251, created by @lgolding:

The reason we didn't haven't done it so far is that we didn't know where to put the message formats (it wasn't obvious that they belonged in rule.messageFormats, since they weren't necessarily rule-specific).

Cite source for list of hash algorithms

Copied from sarif-standard/sarif-spec-v1#13, created by @lgolding:

The list comes from https://en.wikipedia.org/wiki/List_of_hash_functions. Is this ok?

Add ACL.annotations member

Copied from sarif-standard/sarif-spec-v1#267, created by @michaelcfanning:

An ACL could include arbitrary source annotations, which consist of an array of messages + regions. This would allow squigglies + popups to be applied on clicking any arbitrary ACL.

In our proposal, we would assume that the regions are within the same file referenced in the ACL.location. Alternately, we could include a full PLC rather than a region.

Provide a URI that can be used to retrieve rule metadata

Copied from sarif-standard/sarif-spec-v1#175, created by @michaelcfanning:

Remember @rtaket's fine suggestion around providing a URI member for rule metadata. In the event this item is populated, the uri represents an endpoint that can be used to retrieve the rule SARIF. Not a blocker for v1.

microsoft/sarif-sdk#124

Can we cite ECMA-404 as normative?

Copied from sarif-standard/sarif-spec-v1#10, created by @lgolding:

CONSIDER: Adding a triageState property to result

Copied from sarif-standard/sarif-spec-v1#279, created by @lgolding:

Some customers want to be able to put a "check mark" next to an item in a results viewer, to indicate "reviewed". Is this purely a function of a SARIF viewer, or should we add a field like triageState with values like New and Reviewed to persist this?

should we provide 'suppressionJustification'?

Copied from sarif-standard/sarif-spec-v1#254, created by @michaelcfanning:

Many in-source suppression mechanisms provide a justification field. This information may be useful to persist to a log file, for auditing purposes.

if we provide a field, we need to be cognizant that a result can be suppressed in multiple locations, e.g., both in-source and as part of an external baseline. If we add something, is it a single member relevant to in-source suppressions only? Is there a member for each suppression state? Do we provide an array of justification strings, with no particular ordering or association?

Or take @rtaket's original suggestion: we should define a suppressions array on a result, each element of which might include information such as its justification and persistence location. We could add this data as an additional member (deprecating suppressionState).

Consider restructuring SARIF to be location, not results-focused

Today, SARIF is primarily designed to render flat lists of results. The format has the beginnings of some relational structure/linkage (for example, the 'rules' and 'files' objects). If we continue down this path, you could imagine a 'locations' object to hold all referenced code locations. These locations could themselves be annotated with additional information, such as static analysis results, but also code metrics. Annotations (such as static analysis results) could themselves refer to other locations in the set, for example, by constructing a code flow associated with a result that chains a set of code locations together.

This design change is a significant departure from the current spec and would require all existing SARIF SDK/other support to be reimplemented. A more relational design (rather than being oriented towards a flat list) would make log files much less readable. SARIF already suffers from this problem to some degree (messages can refer to a formatting string, for example, that isn't inlined with the message). At a high-level, this proposal moves the SARIF standard more towards being a rich mechanism to annotate code locations and describe relationships between them, as opposed to more strictly serving the static analysis reporting scenario.

Consider adding 'rank' or 'probability' property

Some tools produce an issue rank. It is difficult to normalize rank across tools. One idea is to normalize all these to a value from 0 to 1 (0 to 100% certainty of value). Some tools provide a numeric rank with no known upper bound , however.

Define driving principles for SARIF effort

We should articulate and maintain a set of driving principles for the SARIF format. These principles should define a vision for the format in general and be useful for resolving difficult design decisions. Below is a starter list that we should refine, add to or subtract from.

SARIF is primarily designed to advance the industry by providing the best direct production format possible. Aggregating results from other formats is another important scenario but secondary to direct production.
SARIF defines a range of data that shall be expressed in order to best support static analysis tooling. The specification describes a JSON implementation of this standard. It should be possible to define other implementations (such as XML).
SARIF is designed for static analysis tools and any concept that generally applies for this scenario shall be considered for the format. SARIF can clearly be used for many dynamic analysis scenarios and we should consider augmenting the format for this class of tooling, but not in cases where what is proposed is applicable to the dynamic analysis domain only.
SARIF is domain-agnostic; that is, it does not contain objects or properties that are specific to a single domain, such as security or compliance. However, SARIF might define specific values for properties that are specific to a single domain. For example, the proposed result.taxonomies property might define a dictionary entry whose key invokes a standard classification for memory safety issues only.
The SARIF design is focused on expressing results as produced by a tool at a specific point-in-time and current excludes detailed thinking related to results management (associated result work item, false positive evaluation, etc.). These concepts may be addressed by defining or proposing 'profiles' that broaden SARIF's design surface area, contingent on progress with core work.

Introduce object-valued rule.configuration

Copied from sarif-standard/sarif-spec-v1#266, created by @lgolding:

"enabled", "disabled", or "unknown". Default: "unknown".

If any rule has "enabled" or "disabled", all rules must be mentioned, and all rules must be specified as either "enabled" or "disabled" (a smart validator can enforce the latter).

@michaelcfanning FYI

Need a compliance test suite

Copied from sarif-standard/sarif-spec-v1#5, created by @lgolding:

Jeffrey van Gogh suggested:

When the doc is more stable, it would be great to have a compliance test suite so both producers and consumers can be sure that they're generating the right format.

Document how converters should provide notifications

Copied from sarif-standard/sarif-spec-v1#144, created by @michaelcfanning:

Should we provide a way for a converter to communicate that something strange happened during conversion? What about tools that process existing log files and add some notes/observations? Not a breaking change to extend the enum.

Should we allow markdown in messages?

Copied from sarif-standard/sarif-spec-v1#275, created by @lgolding:

... for example, bold, italic, and hyperlinks?

One possibility is to explicitly allow Markdown. Another is to add a property that specifies one of an enumerated list of allowed formats ("Markdown", "HTML"...) Keeping in mind:

Security issues
There are many dialects of Markdown

End-to-end results management concepts

The TC has determined that for the immediate future, we will focus strictly on SARIF as a format for expressing immediate tool results. We will track a set of additional concepts, however, that relate to managing results end-to-end (from production, through baselining, triage, scheduling, resolution, etc.). If we have sufficient time/inclination in the TC, we may address these concepts or propose a new plan to address. We will maintain a list of concepts to address in the root message of this thread. Current concepts include:

Triage state. 'Not reviewed', 'Won't fix', 'Fix', etc.
Validation state 'Not validated', 'False positive', 'True positive.'
Suppression justification (string that describes why an item is baselined/suppressed)
Priority, Certainty, Severity, Rank, etc.

Can we cite FIPS PUB 180-4, Secure Hash Standard, as normative?

Copied from sarif-standard/sarif-spec-v1#11, created by @lgolding:

Specify that nested paths shall be absolute

Copied from sarif-standard/sarif-spec-v1#258, created by @lgolding:

In the section on run.files, add the requirement that when a URI-valued property refers to a nested file whose location can be specified by a path relative to its parent (for example, a file within a compressed archive), the nested path should be specified as an absolute path (that is, it should begin with a slash).

Right:

"uri": "file:///C:/packages/foo.zip#/bar/baz/qux.cpp"

Wrong:

"uri": "file:///C:/packages/foo.zip#bar/baz/qux.cpp"

It doesn't actually matter, but the idea is that this is a location starting from the "root" of the parent file.

Consider: Adding CWE property to result object

Copied from sarif-standard/sarif-spec-v1#278, created by @lgolding:

At least one static analysis tool (Semmle) emits a CWE (Common Weakness Enumeration) value for the results it finds. Should we add a cwe property to the result object?

Possible counter-argument is that CWE is domain-specific (applies only to security), which SARIF is domain-neutral (applies to any static analysis result).

Support graphs and graph traversals

There are many graph structures in static analysis which are useful to preserve in results. For example:

A value flow graph
- Show multiple values contributing to a value
A call graph
- Thanks to the graph structure, we can expand callsites to see more information about their side effects

Several of the existing properties of result could be abstracted and generalized in this manner:

codeFlows property
stacks property
relatedLocations property
Note that all of these have properties in common: location, message, These would be the vertices in the graph. (in the case of stacks, the stackFrame objects are the vertices and the stacks object is providing some of the graph edges/structure).

This would also allow the format to support other information which generally fits into a graph.
Having codeflows and stacks properties show the desire for this generalization/extensibility. What other similar properties will be wanted in the future that are not currently specified?

Each vertex would need some tag to identify what it means (i.e. this vertex is a stackFrame, this vertex is a value flow at an addition) and how vertices are expected to fit together (a stackFrame cannot flow into an addition, these should not appear in the same graph).
Tools doing their own graphs (not specified in SARIF) could still have a graph of vertices with a location and a message and their own meaning.

Consider: adding field for cryptographically secure digital signature

The discussion on "provenance" at the last meeting got me thinking about whether SARIF ought to allow for cryptographic digital signatures associated with analysis runs. This would yield confidence to the user that the results had not been tampered with.

We had a situation a few years ago where a user of CodeSonar wished to produce reports that were to go to a certifying authority in support of a safety assurance argument. They were worried that the HTML output was too easy to edit, and they did not trust their developers not to do so.

(In the end they settled for unsigned PDF because they thought it was sufficiently more difficult to edit that it was enough of a barrier to discourage tampering! Go figure.)

Define and document issue workflow

A preliminary proposal for driving issue workflow for spec changes. We should refine, add to and subtract from this content in discussion here.

All activity that may impact spec content will be tracked as issues, e.g., #1
We encourage discussion on the public issues. Editors will ‘curate’ by applying specific tags, ensuring content from other sources (email) is recorded, etc.
Editors will prepare proposed changes in separate repository branches. A clean document with revisions tracking enabled will be used for driving specific textual proposals.
Any issues that require time-sensitive discussion will be driven through the mailing list (this should be rare)
Issues that are ready for final approval or which warrant discussion on the telecon will be tagged in advance of each meeting.
After telecon approval/rejection, PRs will be merged or closed unmerged as appropriate.
This process (with additional details) will be driven by an open issue and documented in a file persisted to the repository (and will be approved next TC)

Typo: "the either" -> "either the" in Sec. 5.5

Copied from sarif-standard/sarif-spec-v1#292, created by @lgolding:

Consider URL protocol to reference internal files and provide an associated region

We may support markdown in results messages. If so, it would be useful if markdown's embedded links could refer to a file referenced within the log file itself (that file could be embedded in the log file).

E.g.,
sarif-log://somefilename.cpp/?startLine=1,startColum=7

Can we cite RFC 3986 as normative?

Copied from sarif-standard/sarif-spec-v1#12, created by @lgolding:

... and figure out if we can cite it as normative.

Incorporate concepts from SATE format

Copied from sarif-standard/sarif-spec-v1#282, created by @lgolding:

https://samate.nist.gov/SATE5.html

Represent "triage status"

Copied from sarif-standard/sarif-spec-v1#22, created by @lgolding:

Do we need a field to represent an issue's triage status, e.g., "fixed", "false positive"?

One argument goes that tools would never emit this field, and we're defining a tool output format, so no. Another argument goes that the life cycle of an issue extends beyond the moment it is emitted by the tool, and we'd like to use the same format to represent the issue throughout its lifecycle, so yes.

If the second argument wins, we need to extend the introduction to say that we want the format to represent an issue throughout its lifecycle.

Add invocation.toolConfiguration property

Copied from sarif-standard/sarif-spec-v1#276, created by @lgolding:

There should be a way for a tool whose operation is driven by a configuration (either read from a file, or using a default configuration) to write that configuration to the SARIF log? If so, then one way to do it is to put it in the run object's property bag:

log.runs[0].properties["toolConfiguration"] = configurationObject;

. which will work because property bags can hold arbitrary objects.

Another option is to decide that this is a common scenario, and add it as a first-class, arbitrary-object-valued property of the run object:

log.runs[0].toolConfiguration = configurationObject;

Should physicalLocation disappear?

Copied from sarif-standard/sarif-spec-v1#218, created by @lgolding:

We'd replace location.analysisTarget with location.analysisTargetUri, replace location.resultFile with location.resultFileUri, and introduce property location.region.

The rationale is that you never need a region on both of analysis target and result file.

The initial motivation for considering this change was the introduction of run.analysisTargetUri. We originally considered calling it run.analysisTarget and making it a physicalLocation object whose region would never be used.

@michaelcfanning FYI

Represent exceptions in code flows

Copied from sarif-standard/sarif-spec-v1#265, created by @lgolding:

Enable SARIF to represent exceptions in code flows, perhaps by introducing new values for annotatedCodeLocation.kind. Take into account at least the following issues:

An exception may be thrown implicitly (for example, a ThreadAbortException), without a throw statement appearing in the source code.
An exception may be thrown explicitly.
An exception may be typed (e.g., throw new Blah()) and the thrown object may be parameterized (e.g. ,throw new Blah(42)).
An exception may satisfy an exception filter.
An exception may be caught by a catch clause.
An exception may be rethrown (bare throw statement in a catch clause).
An exception may cause program termination (for example, if thrown from a C++ destructor).
A finally clause may execute as control leaves a function which did not catch an exception (or, for that matter, if the function did catch the exception).

Should we allow file identity to be specified by reference to a commit...

Copied from sarif-standard/sarif-spec-v1#130, created by @lgolding:

... (in a specific repo) rather than by a hash?

This is related to sarif-standard/sarif-spec/#28, but it's about not wanting to have to hash even the few files that are mentioned in the results, whereas sarif-standard/sarif-spec/#28 is about not wanting to have to mention all the files.

CONSIDER: Adding a "project" property to the "file" object

Copied from sarif-standard/sarif-spec-v1#283, created by @lgolding:

... to represent the project to which the file belongs.

Formulate design goals w.r.t. compactness and readability

Copied from sarif-standard/sarif-spec-v1#126, created by @lgolding:

Introduce result.taxonomies

As discussed in the TC meeting of 2017/09/20: Introduce a new property result.taxonomies, as follows:

result object

taxonomies property

The result object MAY contain a property taxonomies whose value is an object. Each property name in the object SHALL denote a particular taxonomy for classifying the result. The corresponding property value SHALL be an array of one or more classification objects which specify the classification of this result with respect to the specified taxonomy.

EXAMPLE:

results: [
  {
    ...
    taxonomies: {
      "CWE": [                                        # an array of classification objects
        {                                             # a classification object
          "id": "CWE-396",
          "description": "Divide by zero"
        }
      ]
    }
  }
]

SARIF defines the following property names for the object contained in the taxonomies property, with the following meanings.

CWE: The Common Weakness Enumeration taxonomy [link, cite as normative]

Any property name not appearing in this list MUST begin with the prefix x-. This prefix means that the property name is not defined in the SARIF standard, and that its meaning, and the meaning of any associated classification objects, must be agreed to by convention between the producers and consumers of any SARIF log files in which it appears.

classification object

General

A classification object specifies the classification of a result with respect to a particular taxonomy. Some results might have multiple classifications with respect to a single taxonomy. See Section <result.taxonomies> for an example.

id property

A classification object SHALL contain a property named id whose value is a string that uniquely identifies this classification within the taxonomy.

description property

A classification object MAY have a property named description whose value is a string that describes this classification.

should we require 'id' on a 'rule' instance when the 'rules' key matches it?

Copied from sarif-standard/sarif-spec-v1#210, created by @michaelcfanning:

We don't require this recapitulated information for the 'uri' in a file object where the key matches...

One of result.{message,formattedRuleMessage} is required

Copied from sarif-standard/sarif-spec-v1#259, created by @lgolding:

The section on result.formattedRuleMessage correctly says:

If the formattedRuleMessage property is present on a result, the message property (Sec. 5.14.5) shall be absent. If the message property is present on a result, the formattedRuleMessage property shall be absent.

But the section on result.message inconsistently says that result.message shall be present.

Should result have an exception property?

Copied from sarif-standard/sarif-spec-v1#171, created by @lgolding:

... to handle the case where a dynamic analysis tool encounters an exception in the case of executing the analysis targert.

@michaelcfanning FYI

Clarify requirement for format of URI-valued properties for nested files

Copied from sarif-standard/sarif-spec-v1#257, created by @lgolding:

In the section on run.files we specify the format of the property names in the files object, including the case where the property describes a nested file. In that case, we explain how to use the URI fragment to specify the location of the nested file with respect to its parent.

But nowhere do we state that when a URI-valued property specifies a location in a nested file, the value of that property should be specified in that same way.

TypeScript files with Sarif Object declarations

Copied from sarif-standard/sarif-spec-v1#291, created by @joshbw:

It would be quite useful to have a couple TypeScript files with the Sarif objects declared. That would make working with Sarif output super easy to script in Node.js, and selfishly, would save me some time adding Sarif output support in DevSkim

Do we want an array of fingerprint contributions on result?

Copied from sarif-standard/sarif-spec-v1#15, created by @lgolding:

Action: @michaelcfanning

Register SARIF MIME type with IANA

Copied from sarif-standard/sarif-spec-v1#268, created by @lgolding:

Possibly application/sarif+json.

@michaelcfanning FYI.

Consider adding namespaces to tags

If we define a separator for tags, we can allow tool producers to serialized a hierarchized set of tags. For example, by using a forward slash, a user could tag a result with multiple CWE designations:

["CWE/973", "CWE/435"]

Similarly, a SARIF converter that transforms Polyspace Verifier output to SARIF might preserve validation-relevant annotations produced by its abstract interpretation engine like so:

"Polyspace/Dead"
"Polyspace/Open"
etc.

Should we add a "priority" property to result?

Copied from sarif-standard/sarif-spec-v1#243, created by @lgolding:

We've certainly discussed this before; I don't remember why we didn't add it. It doesn't sound like something a tool would be able to populate; that's what severity is for. (The tool can't know the business priorities that would lead a team to decide "Let's fix these 5 bugs first".) But it's something that a downstream triage process might populate. So far we haven't added anything like that to the format. I suggest that we consider the whole question of "properties populated after the scan is over" for a future revision of SARIF.

@TikiWan @michaelcfanning

Do we want to define a "conceptual standard"?

Copied from sarif-standard/sarif-spec-v1#280, created by @lgolding:

Do we want to separately define

an "object model", and
a definitive JSON serialization

... allowing other serializations such as XML, OR do we want to specify JSON as the only valid serialization?

Add file.contents property

Copied from sarif-standard/sarif-spec-v1#260, created by @lgolding:

To hold MIME Base-64 encoded file contents.

Consider: Remove Dynamic specific properties and terminology

Properties such as threadId are only relevant to dynamic analysis.
Should they be first class properties in this format, or left for users to put in catch-all property bags?

Also, I think one could argue the whole stack object and stackFrame object are both dynamic/runtime concepts. The stackFrame having properties like threadId and address also suggests these structures are for dynamic analysis results.

I'd also note, as far as I'm aware, the terms "stack" and "stack frame" only refer to runtime datastructures. I think it would be inaccurate to use them as a way of presenting the static concept of calling context.

Add a help property to rule

Copied from sarif-standard/sarif-spec-v1#261, created by @lgolding:

The rule object has a helpUri property. For scenarios where a rule's help is not available via a URI, it would be nice if rule had a help property whose value was an object specifying the MIME type and text of the help content, for example:

rules: {
  "EX0001": {
    "id": "EX0001",
    "name": "DontDoBadThings",
    "help": {
      "mimeType": "text/markdown",
      "text": "# Rule EX0001: Don't do bad things\nIt is _really_ not nice to do bad things."
    }
  }
}

"Properties" property missing from "run" object in schema

Copied from sarif-standard/sarif-spec-v1#274, created by @mj1856:

Section 5.12.15 describes a Properties property for the run object. This doesn't exist in the JSON Schema.

However, the run object contains an invocation property, and there is a Properties property on that object that is described as "Key/value pairs that provide additional information about the run." in the schema description. That one should probably say "... about the invocation".

metric type (should include a unit, such as seconds)
location object (may be logical or physical location)
metric value (may be integer or floating point)

oasis-tcs / sarif-spec Goto Github PK

sarif-spec's Introduction

README

Description

Contributions

Licensing

Further Description of this Repository

Contact

sarif-spec's People

Contributors

Stargazers

Watchers

Forkers

sarif-spec's Issues

result object

taxonomies property

classification object

General

id property

description property

Recommend Projects

Recommend Topics

Recommend Org