Giter Club home page Giter Club logo

sarif-spec's Introduction

README

Members of the OASIS Static Analysis Results Interchange Format (SARIF) TC create and manage technical content in this TC GitHub repository ( https://github.com/oasis-tcs/sarif-spec ) as part of the TC's chartered work (i.e., the program of work and deliverables described in its charter).

OASIS TC GitHub repositories, as described in GitHub Repositories for OASIS TC Members' Chartered Work, are governed by the OASIS TC Process, IPR Policy, and other policies, similar to TC Wikis, TC JIRA issues tracking instances, TC SVN/Subversion repositories, etc. While they make use of public GitHub repositories, these TC GitHub repositories are distinct from OASIS Open Repositories, which are used for development of open source licensed content.

Description

The purpose of the SARIF TC is to define a standard output format for static analysis tools, which will be called the Static Analysis Results Interchange Format (SARIF). This GitHub repository supports development of the draft SARIF standard. Requests for modification should be made via Github Issues.

A static analysis tool is a program that examines programming artifacts in order to detect problems, without executing the program. Software developers use a variety of static analysis tools to assess the quality of their programs. To form an overall picture of program quality, developers must often aggregate the results produced by all of these tools. This aggregation is more difficult if each tool produces output in a different format. A standard output format would make it feasible for developers and teams to view, understand, interact with, and manage the results produced by all the tools that they use.

Submission request from David Keaton (SARIF TC Co-Chair): we expect to populate this repository from the existing one found here: https://github.com/sarif-standard/sarif-spec.

Contributions

As stated in this repository's CONTRIBUTING file, contributors to this repository are expected to be Members of the OASIS SARIF TC, for any substantive change requests. Anyone wishing to contribute to this GitHub project and participate in the TC's technical activity is invited to join as an OASIS TC Member. Public feedback is also accepted, subject to the terms of the OASIS Feedback License.

Licensing

Please see the LICENSE file for description of the license terms and OASIS policies applicable to the TC's work in this GitHub project. Content in this repository is intended to be part of the SARIF TC's permanent record of activity, visible and freely available for all to use, subject to applicable OASIS policies, as presented in the repository LICENSE file.

Further Description of this Repository

[Any narrative content may be provided here by the TC, for example, if the Members wish to provide an extended statement of purpose.]

Contact

Please send questions or comments about OASIS TC GitHub repositories to Robin Cover and Chet Ensign. For questions about content in this repository, please contact the TC Chair or Co-Chairs as listed on the the SARIF TC's home page.

sarif-spec's People

Contributors

dmk42 avatar eddynaka avatar michaelcfanning avatar motional-charles-wilson avatar planetlevel avatar robincover avatar shiningmassxacc avatar sthagen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sarif-spec's Issues

Add ACL.annotations member

Copied from sarif-standard/sarif-spec-v1#267, created by @michaelcfanning:

An ACL could include arbitrary source annotations, which consist of an array of messages + regions. This would allow squigglies + popups to be applied on clicking any arbitrary ACL.

In our proposal, we would assume that the regions are within the same file referenced in the ACL.location. Alternately, we could include a full PLC rather than a region.

should we provide 'suppressionJustification'?

Copied from sarif-standard/sarif-spec-v1#254, created by @michaelcfanning:

Many in-source suppression mechanisms provide a justification field. This information may be useful to persist to a log file, for auditing purposes.

if we provide a field, we need to be cognizant that a result can be suppressed in multiple locations, e.g., both in-source and as part of an external baseline. If we add something, is it a single member relevant to in-source suppressions only? Is there a member for each suppression state? Do we provide an array of justification strings, with no particular ordering or association?

Or take @rtaket's original suggestion: we should define a suppressions array on a result, each element of which might include information such as its justification and persistence location. We could add this data as an additional member (deprecating suppressionState).

Consider restructuring SARIF to be location, not results-focused

Today, SARIF is primarily designed to render flat lists of results. The format has the beginnings of some relational structure/linkage (for example, the 'rules' and 'files' objects). If we continue down this path, you could imagine a 'locations' object to hold all referenced code locations. These locations could themselves be annotated with additional information, such as static analysis results, but also code metrics. Annotations (such as static analysis results) could themselves refer to other locations in the set, for example, by constructing a code flow associated with a result that chains a set of code locations together.

This design change is a significant departure from the current spec and would require all existing SARIF SDK/other support to be reimplemented. A more relational design (rather than being oriented towards a flat list) would make log files much less readable. SARIF already suffers from this problem to some degree (messages can refer to a formatting string, for example, that isn't inlined with the message). At a high-level, this proposal moves the SARIF standard more towards being a rich mechanism to annotate code locations and describe relationships between them, as opposed to more strictly serving the static analysis reporting scenario.

Consider adding 'rank' or 'probability' property

Some tools produce an issue rank. It is difficult to normalize rank across tools. One idea is to normalize all these to a value from 0 to 1 (0 to 100% certainty of value). Some tools provide a numeric rank with no known upper bound , however.

Define driving principles for SARIF effort

We should articulate and maintain a set of driving principles for the SARIF format. These principles should define a vision for the format in general and be useful for resolving difficult design decisions. Below is a starter list that we should refine, add to or subtract from.

  1. SARIF is primarily designed to advance the industry by providing the best direct production format possible. Aggregating results from other formats is another important scenario but secondary to direct production.

  2. SARIF defines a range of data that shall be expressed in order to best support static analysis tooling. The specification describes a JSON implementation of this standard. It should be possible to define other implementations (such as XML).

  3. SARIF is designed for static analysis tools and any concept that generally applies for this scenario shall be considered for the format. SARIF can clearly be used for many dynamic analysis scenarios and we should consider augmenting the format for this class of tooling, but not in cases where what is proposed is applicable to the dynamic analysis domain only.

  4. SARIF is domain-agnostic; that is, it does not contain objects or properties that are specific to a single domain, such as security or compliance. However, SARIF might define specific values for properties that are specific to a single domain. For example, the proposed result.taxonomies property might define a dictionary entry whose key invokes a standard classification for memory safety issues only.

  5. The SARIF design is focused on expressing results as produced by a tool at a specific point-in-time and current excludes detailed thinking related to results management (associated result work item, false positive evaluation, etc.). These concepts may be addressed by defining or proposing 'profiles' that broaden SARIF's design surface area, contingent on progress with core work.

End-to-end results management concepts

The TC has determined that for the immediate future, we will focus strictly on SARIF as a format for expressing immediate tool results. We will track a set of additional concepts, however, that relate to managing results end-to-end (from production, through baselining, triage, scheduling, resolution, etc.). If we have sufficient time/inclination in the TC, we may address these concepts or propose a new plan to address. We will maintain a list of concepts to address in the root message of this thread. Current concepts include:

  • Triage state. 'Not reviewed', 'Won't fix', 'Fix', etc.
  • Validation state 'Not validated', 'False positive', 'True positive.'
  • Suppression justification (string that describes why an item is baselined/suppressed)
  • Priority, Certainty, Severity, Rank, etc.

Specify that nested paths shall be absolute

Copied from sarif-standard/sarif-spec-v1#258, created by @lgolding:

In the section on run.files, add the requirement that when a URI-valued property refers to a nested file whose location can be specified by a path relative to its parent (for example, a file within a compressed archive), the nested path should be specified as an absolute path (that is, it should begin with a slash).

Right:

"uri": "file:///C:/packages/foo.zip#/bar/baz/qux.cpp"

Wrong:

"uri": "file:///C:/packages/foo.zip#bar/baz/qux.cpp"

It doesn't actually matter, but the idea is that this is a location starting from the "root" of the parent file.

Support graphs and graph traversals

There are many graph structures in static analysis which are useful to preserve in results. For example:

  • A value flow graph
    • Show multiple values contributing to a value
  • A call graph
    • Thanks to the graph structure, we can expand callsites to see more information about their side effects

Several of the existing properties of result could be abstracted and generalized in this manner:

  • codeFlows property
  • stacks property
  • relatedLocations property
    Note that all of these have properties in common: location, message, These would be the vertices in the graph. (in the case of stacks, the stackFrame objects are the vertices and the stacks object is providing some of the graph edges/structure).

This would also allow the format to support other information which generally fits into a graph.
Having codeflows and stacks properties show the desire for this generalization/extensibility. What other similar properties will be wanted in the future that are not currently specified?

Each vertex would need some tag to identify what it means (i.e. this vertex is a stackFrame, this vertex is a value flow at an addition) and how vertices are expected to fit together (a stackFrame cannot flow into an addition, these should not appear in the same graph).
Tools doing their own graphs (not specified in SARIF) could still have a graph of vertices with a location and a message and their own meaning.

Consider: adding field for cryptographically secure digital signature

The discussion on "provenance" at the last meeting got me thinking about whether SARIF ought to allow for cryptographic digital signatures associated with analysis runs. This would yield confidence to the user that the results had not been tampered with.

We had a situation a few years ago where a user of CodeSonar wished to produce reports that were to go to a certifying authority in support of a safety assurance argument. They were worried that the HTML output was too easy to edit, and they did not trust their developers not to do so.

(In the end they settled for unsigned PDF because they thought it was sufficiently more difficult to edit that it was enough of a barrier to discourage tampering! Go figure.)

Define and document issue workflow

A preliminary proposal for driving issue workflow for spec changes. We should refine, add to and subtract from this content in discussion here.

  • All activity that may impact spec content will be tracked as issues, e.g., #1
  • We encourage discussion on the public issues. Editors will ‘curate’ by applying specific tags, ensuring content from other sources (email) is recorded, etc.
  • Editors will prepare proposed changes in separate repository branches. A clean document with revisions tracking enabled will be used for driving specific textual proposals.
  • Any issues that require time-sensitive discussion will be driven through the mailing list (this should be rare)
  • Issues that are ready for final approval or which warrant discussion on the telecon will be tagged in advance of each meeting.
  • After telecon approval/rejection, PRs will be merged or closed unmerged as appropriate.
  • This process (with additional details) will be driven by an open issue and documented in a file persisted to the repository (and will be approved next TC)

Represent "triage status"

Copied from sarif-standard/sarif-spec-v1#22, created by @lgolding:

Do we need a field to represent an issue's triage status, e.g., "fixed", "false positive"?

One argument goes that tools would never emit this field, and we're defining a tool output format, so no. Another argument goes that the life cycle of an issue extends beyond the moment it is emitted by the tool, and we'd like to use the same format to represent the issue throughout its lifecycle, so yes.

If the second argument wins, we need to extend the introduction to say that we want the format to represent an issue throughout its lifecycle.

Add invocation.toolConfiguration property

Copied from sarif-standard/sarif-spec-v1#276, created by @lgolding:

There should be a way for a tool whose operation is driven by a configuration (either read from a file, or using a default configuration) to write that configuration to the SARIF log? If so, then one way to do it is to put it in the run object's property bag:

log.runs[0].properties["toolConfiguration"] = configurationObject;

. which will work because property bags can hold arbitrary objects.

Another option is to decide that this is a common scenario, and add it as a first-class, arbitrary-object-valued property of the run object:

log.runs[0].toolConfiguration = configurationObject;

Should physicalLocation disappear?

Copied from sarif-standard/sarif-spec-v1#218, created by @lgolding:

We'd replace location.analysisTarget with location.analysisTargetUri, replace location.resultFile with location.resultFileUri, and introduce property location.region.

The rationale is that you never need a region on both of analysis target and result file.

The initial motivation for considering this change was the introduction of run.analysisTargetUri. We originally considered calling it run.analysisTarget and making it a physicalLocation object whose region would never be used.

@michaelcfanning FYI

Represent exceptions in code flows

Copied from sarif-standard/sarif-spec-v1#265, created by @lgolding:

Enable SARIF to represent exceptions in code flows, perhaps by introducing new values for annotatedCodeLocation.kind. Take into account at least the following issues:

  • An exception may be thrown implicitly (for example, a ThreadAbortException), without a throw statement appearing in the source code.
  • An exception may be thrown explicitly.
  • An exception may be typed (e.g., throw new Blah()) and the thrown object may be parameterized (e.g. ,throw new Blah(42)).
  • An exception may satisfy an exception filter.
  • An exception may be caught by a catch clause.
  • An exception may be rethrown (bare throw statement in a catch clause).
  • An exception may cause program termination (for example, if thrown from a C++ destructor).
  • A finally clause may execute as control leaves a function which did not catch an exception (or, for that matter, if the function did catch the exception).

Introduce result.taxonomies

As discussed in the TC meeting of 2017/09/20: Introduce a new property result.taxonomies, as follows:

result object

taxonomies property

The result object MAY contain a property taxonomies whose value is an object. Each property name in the object SHALL denote a particular taxonomy for classifying the result. The corresponding property value SHALL be an array of one or more classification objects which specify the classification of this result with respect to the specified taxonomy.

EXAMPLE:

results: [
  {
    ...
    taxonomies: {
      "CWE": [                                        # an array of classification objects
        {                                             # a classification object
          "id": "CWE-396",
          "description": "Divide by zero"
        }
      ]
    }
  }
]

SARIF defines the following property names for the object contained in the taxonomies property, with the following meanings.

  • CWE: The Common Weakness Enumeration taxonomy [link, cite as normative]

Any property name not appearing in this list MUST begin with the prefix x-. This prefix means that the property name is not defined in the SARIF standard, and that its meaning, and the meaning of any associated classification objects, must be agreed to by convention between the producers and consumers of any SARIF log files in which it appears.

classification object

General

A classification object specifies the classification of a result with respect to a particular taxonomy. Some results might have multiple classifications with respect to a single taxonomy. See Section <result.taxonomies> for an example.

id property

A classification object SHALL contain a property named id whose value is a string that uniquely identifies this classification within the taxonomy.

description property

A classification object MAY have a property named description whose value is a string that describes this classification.

Clarify requirement for format of URI-valued properties for nested files

Copied from sarif-standard/sarif-spec-v1#257, created by @lgolding:

In the section on run.files we specify the format of the property names in the files object, including the case where the property describes a nested file. In that case, we explain how to use the URI fragment to specify the location of the nested file with respect to its parent.

But nowhere do we state that when a URI-valued property specifies a location in a nested file, the value of that property should be specified in that same way.

Consider adding namespaces to tags

If we define a separator for tags, we can allow tool producers to serialized a hierarchized set of tags. For example, by using a forward slash, a user could tag a result with multiple CWE designations:

["CWE/973", "CWE/435"]

Similarly, a SARIF converter that transforms Polyspace Verifier output to SARIF might preserve validation-relevant annotations produced by its abstract interpretation engine like so:

"Polyspace/Dead"
"Polyspace/Open"
etc.

Should we add a "priority" property to result?

Copied from sarif-standard/sarif-spec-v1#243, created by @lgolding:

We've certainly discussed this before; I don't remember why we didn't add it. It doesn't sound like something a tool would be able to populate; that's what severity is for. (The tool can't know the business priorities that would lead a team to decide "Let's fix these 5 bugs first".) But it's something that a downstream triage process might populate. So far we haven't added anything like that to the format. I suggest that we consider the whole question of "properties populated after the scan is over" for a future revision of SARIF.

@TikiWan @michaelcfanning

Consider: Remove Dynamic specific properties and terminology

Properties such as threadId are only relevant to dynamic analysis.
Should they be first class properties in this format, or left for users to put in catch-all property bags?

Also, I think one could argue the whole stack object and stackFrame object are both dynamic/runtime concepts. The stackFrame having properties like threadId and address also suggests these structures are for dynamic analysis results.

I'd also note, as far as I'm aware, the terms "stack" and "stack frame" only refer to runtime datastructures. I think it would be inaccurate to use them as a way of presenting the static concept of calling context.

Add a help property to rule

Copied from sarif-standard/sarif-spec-v1#261, created by @lgolding:

The rule object has a helpUri property. For scenarios where a rule's help is not available via a URI, it would be nice if rule had a help property whose value was an object specifying the MIME type and text of the help content, for example:

rules: {
  "EX0001": {
    "id": "EX0001",
    "name": "DontDoBadThings",
    "help": {
      "mimeType": "text/markdown",
      "text": "# Rule EX0001: Don't do bad things\nIt is _really_ not nice to do bad things."
    }
  }
}

"Properties" property missing from "run" object in schema

Copied from sarif-standard/sarif-spec-v1#274, created by @mj1856:

Section 5.12.15 describes a Properties property for the run object. This doesn't exist in the JSON Schema.

However, the run object contains an invocation property, and there is a Properties property on that object that is described as "Key/value pairs that provide additional information about the run." in the schema description. That one should probably say "... about the invocation".

Consider: Adding support for metrics

A metric would consist of three things:

  • metric type (should include a unit, such as seconds)
  • location object (may be logical or physical location)
  • metric value (may be integer or floating point)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.