Giter Club home page Giter Club logo

spdx-3-model's Introduction

SPDX 3 model

The System Package Data Exchange® (SPDX®) is a standard format for communicating information about components associated with systems.

Components can include software, AI/ML models, and data today. More component types that make up modern systems are planned to be included in subsequent releases.

The prior version of this format was focused on Software, is an ISO/IEC standard (ISO/IEC 5962:2021) and has wide industry adoption as a standardized Software Bill of Materials (SBOM). All use cases supported by the prior version are supported here as well.

This repository holds the model for the information captured in SPDX version 3 standard.

Branches and Formats

The editable files are written in a constrained subset of Markdown and are stored in the main branch.

These files are automatically processed by spec-parser and the following are generated:

People who wish to read the current version of the information should be viewing the generated files, while anyone wanting to edit should be working on the former.

Model

The SPDX model is described using profiles related to the software application. The profiles are organized as sub-directories under the ‘model’ directory.

Note:

  1. The ‘Licensing’ profile has three categories (sub-directories): ‘Licensing’, ‘SimpleLicensing’, and ‘ExpandedLicensing’.
  2. The ‘extension’ namespace (sub-directory) provides for adding information about the software application which is not otherwise covered under the SPDX model.

Profiles of the model

AI

The AI profile describes the characteristics and capabilities of the AI component of the software application. Fields include the domain of the application (banking, telecom, …), type of AI model (neural networks, logistic regression, …), industry standards compliance (ISO/IEC 42001, …), information about how the AI model is used within the application, energy consumed by the AI model, limitations of the AI model (dataset of a specific demography cannot be used by the model, …), how the model would be trained, hyperparameters, how data would be preprocessed, whether the model can explain its reasoning, whether sensitive personal information is used during the model training, metrics used for measuring model performance, decision thresholds for those metrics, autonomy type of the model (supervised, unsupervised, …), and safety risk assessment information.

Build

The Build profile contains information about the build done for the software application. Fields include build type URI (of toolchain, platform, or infrastructure), locally unique build identifier assigned by the developer, entry point of creation of build, URI of the build configuration source if any, digest of build configuration source if any, build parameters, start time of the build, end time of the build, and the system’s environment variables at the time of the build.

Core

The Core profile describes the foundational classes and properties that are used by all profiles of the SPDX model.

Dataset

The Dataset profile describes the characteristics of the dataset(s) used by the AI system. Fields include type of the dataset (image, text, relational DB, …), how the data is collected, how the collected data would be used, size of the dataset, information about noise in the fields of the dataset and/or noise that affects the entire dataset, information about preprocessing before dataset is constructed, sensors used to collect data, known biases in the data of the dataset, information about whether the dataset contains sensitive or identifiable personal information, anonymization methods used to mask sensitive or identifiable personal information, confidentiality levels of data points in the dataset as defined in the Traffic Light Protocol, information about dataset update mechanism, and information about availability aspects of the dataset (public, restricted, …).

Licensing

The Licensing profile describes the aspects of licensing for the software application under three categories (sub-directories) - Licensing, SimpleLicensing, and ExpandedLicensing.

The Licensing category describes information about declared licenses and concluded (detected) licenses. The SimpleLicensing category describes information about text-formatted licenses. The ExpandedLicensing category describes information about parseable and machine-readable licenses.

Lite

The SPDX Lite profile defines a subset of the SPDX specification for use cases and workflows in some industries.

Security

The Security profile contains information about vulnerabilities and their assessments based on CVSS (versions 2, 3, and 4), EPSS, Exploit Catalog, SSVC, and VEX (affected, not affected, under investigation, and fixed categories).

Software

The Software profile contains information about files, packages, SBOMs, snippets, and artifacts of the software application.

Contribute!

For information about how to contribute to a specific profile, please see Contributing.md.

Feel free to join us and contribute!

The discussions are happening on the spdx-tech mailing list and during our weekly meetings.

All the details are in: https://spdx.dev/participate/tech/

spdx-3-model's People

Contributors

armintaenzertng avatar bact avatar bigbluehat avatar davaya avatar david-a-wheeler avatar dependabot[bot] avatar edelsohn avatar goneall avatar iamwillbar avatar jeff-schutt avatar jlovejoy avatar jpewdev avatar kestewart avatar lumjjb avatar maxhbr avatar meretp avatar mkdolan avatar mordodemaru avatar nishakm avatar puerco avatar rgopikrishnan91 avatar rnjudge avatar sbarnum avatar swinslow avatar timothygillespie avatar tsteenbe avatar vargenau avatar venkattechnologist avatar willarmiros avatar zvr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spdx-3-model's Issues

Proposal: move Extensions to their own profile

I would like to move the extensions property of elements to its own Extensions object in a new profile.

20221114_07h59m28s_grim

This has several advantages:

  • a consumer can on a profile level decide, if he wants to support extensions
  • a extension can have more fields, which for example point to documentation or can have comments, ...
  • the extensions do not loose any functionality (one can still add extensions to arbitrary elements), but still can be separated from core discussions

Punch List: Collection

This is a punch list of open questions from the 2021-12-07 Tech Team meeting. Please comment on this issue with any discussion, proposed answers, or additional questions you have:

  • Is Collection a peer or a container of its members?

Punch List: IntegrityMethod

This is a punch list of open questions from the 2021-12-07 Tech Team meeting. Please comment on this issue with any discussion, proposed answers, or additional questions you have:

  • Do we want to have signatures as integrity methods?
  • How are signatures going to be used (for example, element vs artifact)?
  • How can this model fit in with best practices on signing and verification practices?
  • What is list of minimum capabilities so can include?
  • How does integrity apply to element vs. collections vs. documents?
  • Can integrity be applied to anything other than sequence of bytes?
  • How can we be sure that references to other SPDX documents can keep integrity intact?

Serialization: lists of constants

The element "profile" property is a list of profile names, e.g., "core", "software", etc.

The conventional way of serializing a list of string values is as a list:

  "profile": ["core", "software"]

It has been suggested to instead require all list items to be serialized as objects with repeated constant property names:

  "profile": [
    {
      "name": "core"
    },
    {
      "name": "software"
    }
  ]

Proposal: Serialize lists with their actual values (e.g., IRIs, enumerations such as profile name, SoftwarePurpose, etc. rather than injecting an artificial object wrapper around those values.

Rationale: an object wrapper makes serialized data both more verbose and more difficult to read. But more importantly, the extra data provides no information - the receiver knows nothing more after seeing it than it does without seeing it. Proof: a receiver, after receiving the native list, has everything it needs to locally transform it into the list of objects if that format is needed for some reason.

Decision: ?

Punch List: ContextualCollection

This is a punch list of open questions from the 2021-12-14 Tech Team meeting. Please comment on this issue with any discussion, proposed answers, or additional questions you have:

  • Does contextual collection really need a namespace map and an external map?
  • Context - is an arbitrary string to describe the why these files are related. Should the context be an enumeration rather than a string?
  • If BOM & SBOM have a fixed context, do we need to have them as separate classes?

Punch List: ExternalMap

This is a punch list of open questions from the 2021-12-14 Tech Team meeting. Please comment on this issue with any discussion, proposed answers, or additional questions you have:

  • Should we change the cardinality of verified using to be optional? (See where it resides, but don't want to verify integrity) Possibly using an integrity profile?
  • Can a single element be serialized into a document with just the bytes of that document? Do you need the bytes of two elements in serialized data, or just the bytes of one element?
  • Is Document only for serialization?
  • In git, what it hashes, it goes through a filter to do normalization.
  • Do we need to clarify between external map (SPDX understand) and external reference (SPDX is opaque to, and doesn't follow further)? Should ExternalMap be renamed ExternalSPDXMap or something like imply that understanding?
  • Some confusion about external id vs. Element URL
  • Should document hang off element and namespacemap and externalmap hang off document?

Punch List: Annotation

This is a punch list of open questions from the 2021-12-07 Tech Team meeting. Please comment on this issue with any discussion, proposed answers, or additional questions you have:

  • Should annotation inherit profile from element?
  • Have we articulated all the annotation types for 3.0? Review & Other

Just for inspiration, a (maybe) equivalent model using composition instead of inheretance

(This is very WIP)

Hey,

just some brainstorming, that I am doing in https://github.com/maxhbr/spdx-3-model.compositional. I tried to implement (just for fun) the current state of SPDX3 model 92c25fc multiple times and I failed in the complicated inheritance. Especially when trying to add a second profile it fall apart. I had more success when using the following compositional model (Element is a sealed trait or Enum or algebraic datatype):
model

Is there a datatype for SPDXIDs?

I see that on element SPDXID: IRI, but not every IRI is a SPDXID and thus I wonder if there should be a special datatype with a defined structure. Especially if someone substitutes relations like rootElement with an IRI instead of the whole Element, a definition would be helpful.

Handling NONE and NOASSERTION values in a consistent manner

In the SPDX 2.X spec, there are several fields which accept a value of NONE and NOASSERTION which have specific semantic meaning along with either a text or a class.

This is defined in RDF as a superclass restriction - you can find an example for SPDX Item in the License Concluded property. This may not be the cleanest approach since we're mixing individual values with other types (e.g. xsd:string) and the reasoning is rather clunky IMHO.

@swinslow is proposing having an abstract class for each of the properties where the values can include the NONE and NOASSERTION values which may yield a slightly different RDF model.

There may also be other alternatives.

Whichever approach we take, we should probably be consistent.

Related to issue #71

@sbarnum @davaya @zvr - Interested in any thoughts you have on this issue.

Question: How would "infered" or "implicit" creationInfo in a serialization work?

(I know that this is a serialization question. But we are right now acting under the assumption that serialization can answer this question so I think it is fair to ask it already)

Hey, I came up with the following example:

lets say you have a SPDXDocument called SPDXDocument containing a SBOM called SBOM containing a Package called Package.

This can be encoded either completely inlined or the Package can be pulled one level up and just be referenced in the SBOM element. And both are parsed to the same internal representation.

So, in some kind of pseudo code it could look like:

SPDXDocument {
  SBOM {
    Package
  }
}

or

SPDXDocument {
  SBOM {
    PackageRef
  },
  Package
}

Now, my question is the following:

  • Under the assumption, that SPDXDocument and SBOM would have different creationInfos, which of these would be inferred to Package?

And as a potential follow up / equivalent question:

  • If a package or reference of a package appears in multiple collections, which one should be infered?

Punch List: Extension

This is a punch list of open questions from the 2021-12-07 Tech Team meeting. Please comment on this issue with any discussion, proposed answers, or additional questions you have:

  • Should there be a map of extension id to class?
  • Extensions in the context of profile?
  • Does extension inherit data license field?
  • How do you serialize extension without knowning the type? This will need to be explained explicitly and documented.

Cyclic dependency of definitions in the Model

in 92c25fc one has the following cyclic dependency which is annoying when implementing:

  • the Element definition depends on the definition of CreationInformation
  • the CreationInformation contains a createdBy field of type Actor and thus depends on its definition
  • Actor itself derives from Element so it is dependent on its definition

This causes a cyclic dependency between implementing modules which is really bad in some programming languages and at least annoying in others.

Can one parse a JSON containing Elements extended by an unsupported / unknown profile?

Hey, I have looked at the following example taken from the current png and I have a question related to it, regarding "one does not need to support every profile and just looses information that is not relevant".

{
  "@type": "SBOM",
  "@id": "urn:spdx.dev:null-sbom",
  "creationInfo": {
    "specVersion": "3.0",
    "created": "2022-05-02T20:28:00.000Z",
    "profile": ["core","software"],
    "dataLicense": "CC0",
    "createdBy": ["urn:spdx.dev:iamwillbar"]
  },
  "rootElements": ["urn:spdx.dev:spdx-tools-3.0.1"],
  "externalMap": [ <...> ],
  "elements": [ <...> ]
}

For this example lets assume that I do not have support for software profile. Since SBOM is basically a BOM I would assume that I still should be able to parse it (despite loosing some information, that is not relevant for me).

But how do I know that it is basically a BOM, the JSON does not contain that information?

(A rather ugly workaround would be to have something like "@type": "SBOM:BOM:Bundle:Collection:Element" in the serialization.)

SPDX 2->3 conversion of ExternalPackageReferences

This is part of #74:

Edit: updated after I understood that ExternalPackageReference will be split into two new properties. Some references will convert to ExternalIdentifier, others to ExternalReference. See the migration documentation for a conversion guideline.

There remain two problems when the category is OTHER:

  • In this case, type can be String-valued. Should this then be converted to SPDX3 ExternalIdentifierType.other or ExternalReferenceType.other?
  • Also in this case, locator can be an arbitrary String (without spaces) in SPDX2, but the SPDX3 ExternalReference.locator must be a URI. ExternalIdentifier.identifier would be a String, but can we be sure that the SPDX2 reference of category OTHER is actually an identifier?

Move "verifiedUsing" from Element to Artifact

Based on snippet punch list discussion:

  1. "verifiedUsing" always applies to content (sequence of bytes), never to metadata (element properties)
  2. only Artifact element types have content. (Annotation, Relationship, Collection, Actor, and Identity do not.)
  3. gitoid is a hash algorithm and can be added to verifiedUsing algorithm list (hashes can be used as content unique ids, but not as artifact ids)
  4. multiple artifacts that have the identical content / hash can be linked using a COPY relationship
  5. verifiedUsing is not a content unique identifier because different signatures apply to the same content. But it does apply only to content (including hardware), not to metadata.

Since "verifiedUsing" does not apply to most elements, move it from Element to Artifact, removing confusion that it might apply to metadata. SpdxDocument has content; it must either extend Artifact or have a "verifiedUsing" property.

Punch List: BOM & SBOM

This is a punch list of open questions from the 2021-12-14 Tech Team meeting. Please comment on this issue with any discussion, proposed answers, or additional questions you have:

  • Why do we need these? Marketing/expectation management.
    • Package artifact is always the top level. Everything that follows from it is a SBOM.
      SBOM is about the package. Package is about the package, it's also not the package.
  • Is SBOM and Package the same thing?
  • Does having an SBOM replace the need for document DESCRIBES relationship?
  • How does SBOM relate to document?
  • Is SBOM the same as artifact? What does an artifact describe?
  • Artifact could be a service, device, feature.
  • Is defining feature of BOM that it is comprehensive?
    • Is this BOM ready to ship, does it capture everything it is meant to provide? SBOM would just be software?

Punch List: ExternalReference

This is a punch list of open questions from the 2021-12-07 Tech Team meeting. Please comment on this issue with any discussion, proposed answers, or additional questions you have:

  • Do we need Comments on External Reference?
  • We need to define the types of external references.

Punch List: Artifact

This is a punch list of open questions from the 2021-12-07 Tech Team meeting. Please comment on this issue with any discussion, proposed answers, or additional questions you have:

  • Should this have a Created property, distinct from the creation time of the Element?
  • Should we separate create time of actual file, vs creation of meta data (element) about the actual file?
  • Should created by
  • Created time on reproducible builds? New SBOM for exactly same set of inputs? What are the use cases where it is used?
  • How are elements created related to contextual collection? Creation date on collection.
  • Could an SBOM be considered an Artifact?
  • How do ranges of artifacts get represented? Is it only packages?
  • Do artifacts have to be "real"? Expressing version ranges that don't yet exist? ie. future?
  • It depends on a version range, but we can't tell you what it is? PURLS, CPEs, wildcards, how handle.
  • Common set of relationships between artifacts - ie. files associated with package. Overlapping with relationships - document? handle in serialization specification? How do we model this? Example: Snippet is from file becomes "contains". Direct properties vs. handle. Example if creation an SBOM that only had an annotation, need from as well as to.

How to add profile data to an existing SBOM or Element

This scenario came up in the Asia SPDX tech call on Jan 9, 2023.

An entity which is in the middle of a software supply chain receives an SPDX document and wishes to add profile information (e.g. the Build profile) before passing the information on to downstream consumers.

In SPDX 2.X, you would produce an SPDX Document which has an AMENDS relationship to the supplied SPDX document.

Since each element has creator information and can be independent of the Document, I would like to document the best practice for this scenario.

A proposal would be that each Element (be it a Package, SBOM, File, etc.) would be updated in the new document with the Profile information. The Profile would also be added to the Creator information for the element. An AMENDS relationship would be added from the new Element to the previous Element using the ExternalMap to reference the previous Element.

There are a couple of downsides to the above proposal:

  • It is difficult to tell exactly what has changed, you would essentially need to "diff" the Elements
  • You end up copying all the information from one element to another

I'm open to any other proposals or variations on a theme.

Below is an example:

Original Supplier element:

{
  ...
  "elements": [
    "@type": "Package",
    "@id": "urn:originalsupplier.org:packagea#elementid",
    "creationInfo": { ... "profile": ["core", "software"] ...
    "name": "myPackage"
    ...
    ],
   ...
}

Intermediate supplier element

{
  ...
  "externalMap": [{"elementId": "urn:originalsupplier.org:packagea#elementid"}],
  "elements": [
    "@type": "Package",
    "@id": "urn:intermediateSupplier.org:packagea#uniqueid2",
    "creationInfo": { ... "profile": ["core", "software", "build"] ...
    "name": "myPackage",
    "buildDate": "2022-12-13T18:36:37.793Z",
    ...
    ],
   ...
}

Punch List: Package

This is a punch list of open questions from the 2021-12-07 Tech Team meeting. Please comment on this issue with any discussion, proposed answers, or additional questions you have:

  • Review 2.2 Package and make sure not dropping use cases?
  • Should download location be considered as an external reference?
  • Should homepage be considered as an external reference or a property?
  • Is package-file-name a relationship or a property?
  • Is supplier a relationship? (note define some new relationships)
  • Is files-analyzed replaced by profiles?

Punch List: Meta

This is a punch list of open questions from the 2021-12-14 Tech Team meeting. Please comment on this issue with any discussion, proposed answers, or additional questions you have:

  • Can you serialize a single element with nothing else?
  • Should we have properties that duplicate relationships? And impact on round-tripping?

Serialization: Creation Information

Some time ago we moved creation-related properties from Element to the CreationInformation class because 1) they are related in purpose and 2) it makes Element easier on the eyes. But that raises the question of default values - if Element has a creationInfo property of type CreationInformation, then its value is treated as a single unit, not five or six separate property values. That won't work because the requirement is for each property to have an individual default value that doesn't need to be sent even if another property is not defaulted.

Two potential solutions:

  1. Move the individual creation-related properties back into Element
  2. Define a "macro" modeling notation that substitutes one value for another, e.g.:
        Element
+ SPDXID: IRI
+ name: String [0..1]
+ summary: String [0..1]
+ description: String [0..1]
+ comment: String [0..1]
#include CreationInformation
+ verifiedUsing: IntegrityMethod [0..1]
 ...

where the #include macro substitutes the group of properties called CreationInformation into Element, and there is no actual CreationInformation class.

If we did that, I'd shorten Element even more by defining a DescriptiveInformation macro with the summary, description, and comment properties.

Pros of #1: doesn't need any new modeling conventions or tooling support
Cons of #1: Element is bloated
Pros of #2: it's cool
Cons of #2: needs work to invent, macro properties must be de-duplicated (e.g. creationComment)

SPDX 2->3 conversion of PackageFileName

Part of #74:

  • Will this be a Relationship to a file? packageFileName may also point to a folder, though.
  • If a future integrity profile requires a checksum, it may not be possible to translate to a file relationship since the checksum is optional in SPDX2.

Gary's comment from #74:

We were thinking of using a relationship. If it is a folder, then we would probably create a separate "package" definition to represent the folder? The folder scenario is one I have not thought through and probably deserves more discussion.

This has been discussed in the tech team call on 2023-02-21.

This has been discussed in the tech team call on 2023-03-07:
Gary's migration document describes the conversion process.
Open issue is still the case that packageFileName may also point to a folder.

Punch List: Document

This is a punch list of open questions from the 2021-12-14 Tech Team meeting. Please comment on this issue with any discussion, proposed answers, or additional questions you have:

  • Are we losing any information compared to 2.2 when serialized.
  • Should it be considered as an Artifact? Document artifact is a collection of elements and can refer to as an artifact.
  • Is it to describe the document or is it a document?
  • Can you serialize a single element by itself?
  • Should namespace be in a collection or a Document? A document represents a serialization of one or more elements and is namespace specific.
  • External map - how does it relate to a collection. Document inherits collection, Collection is abstract.
  • Namespace map should go on document because they are specific to serialization.
  • Why would you have a BOM rather ContextualCollection?
  • Henk: Document and contextual collection have no semantic difference, when doing the model. Parsers trying to automate may have concerns. William: the semantic is the definition.
  • Contextual collection: shared context. Does not represent a serialization.
  • Document: no implication that there is a shared context. Represents a serialization. You can checksum it. It is physical, and namespaces. Does not need to contain an SBOM at all - it could be a list of licenses.
  • Does the document represent some form of serialization for verification? So that it can be referred to in the future? Round tripping (meta data about the serialization is important).
  • Documentation being serialized representation of elements, makes more sense to show as an artifact. Could have a stream artifact. Collection says that this set of bytes has these elements in it. Document is the stream of bytes.
  • An element called document does not appear in a serialized set of bytes. License elements.
  • An annotation has creator, summary, etc.
  • A single Annotation Element has creation info. Five annotation Elements each have their creation info that may be the same or different. One or more Elements can be serialized into document bytes. Those bytes are described by a File (Artifact) Element that has creation info for the aggregation separate from its members.
  • Reasons why Meta data about the serializations: Data licensing, Copyright, Round tripping, ....
  • Is it necessary for data licenses to be involved in document? (related to serialized bytes).
    • Element license represents for the element, not for the document collection. Not comfortable can be solved in a contextual collection.
  • Can the license info for the serialized data be a property of the collection.

Serialization: friendly or unfriendly properties?

IBM coined the term "unfriendly" for serializations that use repeated constant property names, and "friendly" where property names directly identify the property. For example, an unfriendly email header would be:

"header": [
  {"header_name": "from", "value": "[email protected]"},
  {"header_name": "to", "value", "[email protected]"}
]

The friendly alternative uses names to directly identify values:

"header": {
  "from": "[email protected]",
  "to": "[email protected]"
}

In SPDX this affects the serialization of types such as Hash ("algorithm", "hashValue") and NamespaceMap ("prefix", "namespace"). The logical model is unaffected; this issue concerns only serialization.

The unfriendly serialization of NamespaceMap is:

"namespace": [
  {"prefix": "acme", "namespace": "http://sboms.acme.org"}
]

and the friendly version is:

"namespace": {
  "acme": "http://sboms.acme.org"
}

Proposal: where repeated constant property names appear in lists of the form {"tag": x, "value": y} pairs, serialize them as object properties {x: y}.

Decision: ?

drop `Payload --element--> Element`, since it is redundant?

Why is there a element relation between Payload and Element? Why is the element relation between Collection and Element not sufficient?

Further is this element relation a n-to-m relation and since everything derives from Payload there are arbitrary element relations between arbitrary Elements possible, which is even more flexible then the Collection, that feels wrong.

20221030_09h55m16s_grim

Punch List: File

This is a punch list of open questions from the 2021-12-07 Tech Team meeting. Please comment on this issue with any discussion, proposed answers, or additional questions you have:

  • Review 2.x File and make sure not dropping use cases?
  • Are there any media types missing?
  • Review Software purpose list and have definitions?

Punch List: Hash

This is a punch list of open questions from the 2021-12-07 Tech Team meeting. Please comment on this issue with any discussion, proposed answers, or additional questions you have:

  • Do we have all the hash algorithms we want?
  • Should hash value be string? (should serializations supporting binary data be optimized?)
  • Do we want to include interpretations in the strings in the logical model?
  • What are the serialization rules? (keeping it distinct from serialization - byte level)

SPDX 2->3 conversion of filesAnalyzed

Part of #74:

There is no integrity profile to take the place of requiring checksums for files. Some uses of filesAnalyzed indicate that a certain level of tooling was applied to the source. We would lose this information if this field is removed.

Gary's comment from #74:

I wonder if the build profile would service this purpose?

Brandon Lum's comment from #74:

The build profile has some established relationships on the files themselves. So the information would be encodable by the build profile relationships. However, that is just the ability to express the relationship. I'd be curious to discuss what the integrity story would look like (as it ties into verification). I think that's probably something to discuss... since with the build profile the granularity of identities may be more granular and harder to manage.

Punch List: NamespaceMap

This is a punch list of open questions from the 2021-12-14 Tech Team meeting. Please comment on this issue with any discussion, proposed answers, or additional questions you have:

  • Is there a default namespace? Will empty prefix suffice?
  • Namespace map can have explicit or blank prefix - can a blank prefix refer to all the documents in the document? Elements in a document can be from any namespace.
  • How does the namespace map, interact with External map? External map has integrity map. Validation
  • Should namespace map be on collection or document?

SPDX 2->3 conversion of excludedFiles in PackageVerificationCode

In SPDX2 we could express a list of excluded files, how is this possible in SPDX3?

Gary's comment from #74:

The idea is we would have a separate verification implementation for packages that would include this information. The work for this hasn't been done.

This has been discussed in the tech team call on 2023-02-28. It is not clear yet whether PackageVerificationCode will be carried over to SPDX3.

Punch List: Snippet

This is a punch list of open questions from the 2021-12-07 Tech Team meeting. Please comment on this issue with any discussion, proposed answers, or additional questions you have:

  • Define byte range, continue to follow W3C standard?
  • Define snippet location - possibly as abstract class: byte range, line range, W3C, etc.
  • Will snippet location class be able to model W3C
  • Review 2.2 Snippet to make sure not dropped.

Serialization needs property and enumeration ids

While the logical model is intended to be independent of serialization, serialized data needs to be defined in a way that can be hashed consistently. For example: Class files define properties:

SPDX-License-Identifier: Community-Spec-1.0

# Annotation

## Summary

An assertion made in relation to one or more elements.

## Description

An Annotation is an assertion made in relation to one or more elements.

## Metadata

- name: Annotation
- SubclassOf: Element
- Instantiability: Concrete

## Properties

- annotationType
  - type: AnnotationTypeVocab
  - minCount: 1
  - maxCount: 1
- contentType
  - type: MediaType
- statement
  - type: xsd:string
  - minCount: 0
  - maxCount: 1
- subject
  - type: Element
  - minCount: 1
  - maxCount: 1

The properties logically have no ordering, but when serializing the order matters:

{ "subject": "http://acme.com/sboms/1948294/package59", "annotationType": "REVIEW", "statement": "Awesome!" }

is a different serialized value than:

{"statement": "Awesome!", "annotationType": "REVIEW", "subject": "http://acme.com/sboms/1948294/package59" }

and a different value than:

[ "http://acme.com/sboms/1948294/package59", "REVIEW", "Awesome!" ]

even though all are equivalent JSON serializations of the identical annotation.

PROPOSAL:
Add an id field to all Property definitions to enable the model files to be the single source of truth for both the logical model and the information/serialization model:

## Properties

- subject
  - type: Element
  - minCount: 1
  - maxCount: 1
  - id: 1
  - link: true
- annotationType
  - type: AnnotationTypeVocab
  - minCount: 1
  - maxCount: 1
  - id: 2
- statement
  - type: xsd:string
  - minCount: 0
  - maxCount: 1
  - id: 3
- contentType
  - type: MediaType
  - id: 4

The id field serves several purposes in a serialization model:

  1. as a column number/position when serializing as table rows
  2. as a compressed property name when serializing properties and enumerated values in concise data formats

Punch List: Collection

This is a punch list of open questions from the 2021-12-14 Tech Team meeting. Please comment on this issue with any discussion, proposed answers, or additional questions you have:

  • Does collection really need a namespace map and an external map?

SPDX 2->3 conversion of FileTypes

Part of #74:

SPDX 2 fileType will be converted to contentType. But contrary to contentType, fileType is not restricted to a single value, thus different fileTypes might call for different contentTypes, which would be problematic.

This topic has been discussed in the tech team call on 2023-02-21.

An example document "from the wild" with multiple file types per file can be found here.

Punch List: Identity

This is a punch list of open questions from the 2021-12-07 Tech Team meeting. Please comment on this issue with any discussion, proposed answers, or additional questions you have:

  • Is there value in separating an identity for things that represent an identity?
  • Should identities be artifacts?
  • How separate identity of entity from proxy?
  • AI: Sean to provide questions his proposal attended to address.
  • What purpose does userAgent property on Tool serve given other properties inherited from Element?
  • What elements are relevant for representing identity?

NOTE: We didn't discuss project, tool, organization, person because they are dependent on answers to the above.

Serialization: Payload needs individual Sbom example

The model defines a Payload interface, indicating that it can include a single serialized element, as well as multiple elements serialized together. The model diagram includes a few examples, but is missing an example of an individual Sbom element.

This is an example of the individual SBOM element shown on the diagram without nesting other element content:

{
  "id": "urn:spdx.dev:null-sbom",
  "type": {
    "sbom": {
      "element": [
        "urn:spdx.dev:iamwillbar",
        "urn:spdx.dev:spdx-tools-3.0.1",
        "urn:spdx.dev:project",
        "urn:spdx.dev:doc"
      ],
      "import": [
        {
          "externalId": "urn:spdx.dev:project",
          "verifiedUsing": {"hash": {"sha256": "14a657a7118a333cc1fdc6af05071a59cda067fd11130d4ee5d6d47c26e7863f"}},
          "locationHint": "https://spdx.dev/projects/v1.0.json"
        },
        {
          "externalId": "urn:spdx.dev:doc",
          "verifiedUsing": {"hash": {"sha256": "14a657a7118a333cc1fdc6af05071a59cda067fd11130d4ee5d6d47c26e7863f"}},
          "locationHint": "https://spdx.dev/docs/v1.0.json"
        }
      ]
    }
  },
  "creator": ["urn:spdx.dev:iamwillbar"],
  "created": "2022-05-02T20:28:00.000Z",
  "specVersion": "3.0",
  "profile": ["core", "software"],
  "dataLicense": "CC0-1.0"
}

Topics for discussion include:

  1. Is CreationInformation serialized as a single property or a collection of properties? Either is valid, but this example shows individual properties to indicate that they aren't defaulted/inherited as a unit, each one is an individual default value.
  2. Does rootElements serve any purpose?
    a. The SBOM element is its own root, as is every individually-serialized element.
    b. In an element store/graph there are no roots because any element that might have been called a root can be referenced by other elements.
    c. What purpose does rootElements serve (how does it help the consumer) in any Payload with multiple elements?

Punch List from Aug 30 2022 tech call

Below are the items raised in the August 30, 2022 tech call along with pointers to any previous discussions or decisions directly related to the item.

The table has the following columns:

  • Class The model class the item relates to, or Overview if it is a general item pertaining to all classes
  • Item The punch list item - text taken verbatim from the minutes as of 31 August 2022
  • Cat. A Category indicating if this item has been previously discussed. The category is one of:
    • NPD - Not previously discussed - no evidence was found in prior meetings of this item being discussed
    • DD - Discussed and decided - there are minutes or notes in a GitHub issue that this item was discussed and a decision made
    • DND - Discussed and not decided - There are minutes or issue notes where this was discussed, but no clear decision was recorded in the minutes
  • References - Links to GitHub issues or minutes relevant to the item
Class Item Cat. References Status
Snippet Snippet is line range and byte range. Yes. Punch: Document beginning & end of snippet in model. DND Issue #11 Resolved, added line and byte ranges to model. Byte range takes precedence if in conflict.
Snippet Discuss and determine if we should add a "data class" and relationship between data class, file and snippet. Ping Alexios. We may already have a relationtype suitable. Content of a file is one object, Metadata of a file is another object. NPD Although this was described as a previous suggestion on the call, I could not find mention in prior meeting minutes Not a blocker for 3.0, Sebastian and Jeff are working on related proposals.
Snippet clarify location cardinality in model DND Issue #11 Resolved. Keeping single location for 3.0, content proposal may amend.
File Clarify definition of Package and File and how they relate to each other (including if download location makes sense for a file) NPD Lots of discussions on package download location, but no evidence of file download location being discussed Resolved. These are properties on Package but available via ExternalReferences for other artifacts.
Package Revisit cardinality of Download location. 0..* DD Decision documented in 2021-10-05.md Resolved. Primary attached to Package, additional can be added with ExternalReferences.
Package Revisit cardinality of Home Page (depends on purpose - ie. marketing, developer) DND Cardinality not specifically addressed, but discussed on issue #13, 2022-03-01.md, 2021-12-07.md, 2020-10-13.md Resolved. Primary attached to Package, additional can be added with ExternalReferences.
Package Clause 7 fields reviewed for coverage. Some should 2.3 become elements? or more fields in the Package class? DND Several discussions on individual fields found, but not as an entire group Open issue.
Package Should Package URL be a property on package? DND Discussed on 2021-11-09.md - line 74 looks like a decision Resolved. Added as a property on Package.
Artifact Should Artifact URI be moved back into Artifact class. (one form of an external identifier) DD? 2020-10-13.md,2020-09-14.md,2020-10-13.md - decision to make property? 2020-11-09.md Resolved. Decided not to add this since no universal URI for all artifact types. Added content identifiers to File and Snippet to resolve one of the needs for this.
Artifact Should location be added to Artifact or at the Package/File/Snippet level? DND 2022-08-16.md Resolved. Did not add property but available via ExternalReferences.
Element Should external locator and external identifier be called out in Element? DND 2022-08-16.md Resolved. ExternalIdentifiers on element meets this need.
Element External Identifiers need to be better defined DND Note that we've have several discussions on external identifiers - to many to list here Pending. Need to document in spec text but concept is defined (identifier from an external system that uniquely identifies the subject of the element).
Element Gain consensus and clarity on type of extension element. (clarify - Map<IRI, any> Structure or dictionary map? DND Discussed in 2022-03-22.md and marked as closed but specific type was not documented. Also discussed in 2021-12-07.md, 2021-04-20.md, 2021-03-02.md, and 2021-02-23.md Pending. There are still some active concerns about the existence of this concept.
Element Assess if data model proposal (Alexios) works with VerifiedUsing (example SPDX document element) See above Data model proposal under snippet
Element Clarify if VerifiedUsing refers to integrity of the 'thing' described by an Element, or the integrity of the Element data itself (its canonical representation)? Capture and enable both with separate "targetVerifiedUsing" and "elementVerifiedUsing properties? targetVerifiedUsing would go in Artifact, not Element. DND Partial decision in 2021-11-09.md, does not look like it was discussed in 2021-11-16.md Resolved. verifiedUsing is how to verify the subject of the element. Non-blocking for 3.0: Canonicalization WG to propose what is needed for canonicalization.
Element On ExternalReference and ExternalIdentifier indended to be abstract and if so, how does it work NPD Resolved. Not abstract.
Element Provide a complete list of annotation types. DD Decision documented in 2022-03-29.md Resolved. List updated.
Enumerations Need to be made complete with values pulling forward from 2.3 ? more of an action than a decision Resolved? Please verify.
Enumerations Should they be closed, or allow folks to add things. (Folks can always add other things to their software, but can those things claim to be SPDX-conforming?) NPD Could only find one discussion on enumerations for profiles Resolved. They are closed but software wanting to maintain forward compatibility should expect new values.
Actor should Actor have it's own identifier property? See ArtifactURI. DD? Line 70 in 2022-04-05.md appears to be a decision Pending. Ongoing discussion about identities.
Relationship what are the use-cases for relationship cardinalities (from+to) and directions DD? Discussed in issue #5 Documentation?
Relationship Revisit cardinality of relationshipCompleteness. (In context of serialization. ) Default, optional are conflicting. NPD I couldn't find this specific discussion Resolved. Logically it must be present, but there is a common default value for serializations and they are recommended to make it optional.
Relationship Articulate all the Enumerated relationship types for 3.0 ?? More of an action than a decision Resolved? Please verify.
Checksum Enumeration: Consider adding "OTHER" NPD Resolved. Added.
Overview Create an spreadsheet Review 2.3 and 3.0 ?? More of an action than a decision Pending.
Overview Do we want to restrict Enumerations or allow extensions? Historical we've kept it restricted for Consumer benefit. NPD Same as enumerations issue listed above Resolved. Duplicate.
Overview Should Profiles or closed, or enable extending definitions. DD Discussed and decided in 2021-03-30.md, 2021-04-06.md, 2021-04-20.md, 2021-04-27.md, Resolved.
Overview Anything in 2.3 we need to justify why we deprecate. Already depricated, we don't need to revisit. ?? More of an action Pending.
Overview How do we handle optional fields with default values in model? Should optionality be part of data model, not in information model? Can you answer the question is this complete at logical model level? Do we want to have consistent approach across all fields. When not present stating default value, etc. NPD similar to other punch list items above Resolved. Duplicate.

`NOASSERTION` should be a general concept and not element of the enum values

Currently the .png shows the ENUM values of DependencyScope:

noAssertion [default]
static
dynamic
tool
other

I think this should not include noAssertion and that instead there should be a general concept of optional values. And, if an optional value is not present, that can be serialized as NOASSERTION or null or ...

Further, the default value and cardinallity of an enum should be decided by the user and not by the enum.

(It might also be an issue, that some programming languages have global namespaces for enum vaules and noAssertion is appearing in multiple ones)

Potential punch list for issues regarding the SPDX2->3 conversion

This is a punchlist of problems we are currently facing when converting SPDX2 to SPDX3. See also this spreadsheet for an overview of the current state.
Please feel free to comment on any of the points below. Small remarks can go right here in this issue, but if you want to start an in-depth discussion on any of the points, please open and link a new issue (edit this post) to keep this issue comprehensive.

CreationInformation properties:

File properties:

Package properties:

Relationship properties:

  • relatedSpdxElement
    • This gets converted to a (mandatory) to property of the SPDX3 relationship, which is an Element. But relatedSpdxElement can also take the values NONE and NOASSERTION. How should these be converted to an Element?
    • This is covered by the completeness relation

Snippet properties:

  • SnippetFromFile
    • Should this be converted to a contains relationship from the containing file to the contained snippet?
    • Gary remembers that it was decided to have it as a relationship.

Punch List: Element

This is a punch list of open questions from the 2021-12-07 Tech Team meeting. Please comment on this issue with any discussion, proposed answers, or additional questions you have:

  • Should we take all creation related elements and put in a separate class? Problem to solve at logical model level - there may be a set of elements created at same time with same tool. Concern about serialization compactness. Seems simpler to have Creation Info.
  • Should we be describing creation as an event?
  • Is created data type to be made up?
  • What is the impact of element created date on determinism for reproducible builds? What are the use cases for created date?
  • What is the precise type of sem-ver?
  • For profile, if a collection has one profile, but element only applies to a base, do you put integrity on it? Alternately element has list of multiple profile, vs. collection has multiple elements with different profiles?
  • Does data license need to be constrained to CC0?

Typos

A few errors came up when parsing the Core files:

Classes / Document / Properties / externalMap / type; ExternalMap should be colon
Classes / Extension - has no Properties section
Classes / IntegrityMethod - has no Metadata or Properties sections
Classes / Package / Metadata - subclass is not a list item
Classes / Package - has no Properties section
Classes / PublicKey - has no Properties
Vocabularies / HashAlgorithmVocab - description section should be capitalized

Some of these are easily-fixed typos. Others require the model itself to be developed before the template can include the information. And Classes/Agent and related elements don't exist in the current model, indicating the template should be updated.

What is the difference between rootElement and element?

..and how do I determine them correctly? So for example a SPDXDocument is a Subclass of Bundle which is Subclass of SpdxCollection. A SpdxCollection has at least one element and at least one rootElement. If I now try to migrate a document from spdx 2.3 into a SpdxDocument in spdx 3.0, I would assume that Packages, Files, Snippets, Relationships and Annotations would be collected in the List of element but what is the rootElement in that case?

Punch List: Relationship

This is a punch list of open questions from the 2021-12-07 Tech Team meeting. Please comment on this issue with any discussion, proposed answers, or additional questions you have:

  • Do One-To-Many relationships have any benefits to offset them being harder to parse as opposed to One-to-One?
  • Do we want to simplify relationships to one direction? Are the semantics documented? Look at the from (single) / to (one or many) - keep cardinality? Do we need to remain both directions in the 1:many
  • Do we have way its directional or bidirectional?
  • Should we convey properties for conveying temporality?
  • Is the list of relationship types we have today complete?
  • Are there open questions from linking profile discussion?
  • How can we express a range of dependencies?
  • How can we express more general conditions? subclass of relationship with additional semantics?
  • Do we have the relationships we need to describe services?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.