cyclonedx / cyclonedx-rust-cargo Goto Github PK
View Code? Open in Web Editor NEWCreates CycloneDX Software Bill of Materials (SBOM) from Rust (Cargo) projects
Home Page: https://cyclonedx.org/
License: Apache License 2.0
Creates CycloneDX Software Bill of Materials (SBOM) from Rust (Cargo) projects
Home Page: https://cyclonedx.org/
License: Apache License 2.0
Similar to #410, but for the url::Url
type and the internal Uri
type: https://docs.rs/cyclonedx-bom/0.4.0/cyclonedx_bom/external_models/uri/struct.Uri.html
Better not to reinvent the wheel with a limited custom type.
The Cargo manifest format supports a [package.metadata]
table to be used by external tools. We should allow a
[package.metadata.cyclonedx]
all = true
format = "json"
style section. This should allow the user to specify how they want their SBOM files to be generated in a consistent manner without requiring them to specify CLI arguments every time.
manifest-path
doesn't make sense to support here, because we are reading the values from the manifest. Likewise, quite
and verbose
control CLI behavior and make sense to be specified by the specific CLI invocation.
Cargo.toml
file with a relevant package.metadata.cyclonedx
section and specify configurations for all
and format
Cargo.toml
with format = "json"
, but a CLI invocation of cargo cyclonedx --format xml
should produce XML outputREADME.md
Using serde
it is possible to parse from BLOBs, string, but also from an existing JSON value.
This crate currently only allows parsing from a reader.
I think it would make sense to allow parsing from an existing serde_json::Value
using serde_json::from_value
too.
Use something like cargo-udeps
to ensure we remove any dependencies that we no-longer require. This should run as an action during pull requests and against the main
branch.
Hello ๐ I have noticed that licensing information is formatted in an odd manner. For example when using rust-cargo scanner the licensing information is formatted in a following way:
{
"name": "byte-unit",
"version": "4.0.13",
"description": "A library for interaction with units of bytes.",
"purl": "pkg:cargo/[email protected]",
"type": "library",
"scope": "required",
"licenses": [
{
"expression": "MIT"
}
],
"externalReferences": [
{
"type": "website",
"url": "https://magiclen.org/byte-unit"
},
{
"type": "vcs",
"url": "https://github.com/magiclen/byte-unit"
}
]
},
However, when using golang scanner from https://github.com/CycloneDX/cyclonedx-gomod the licensing information is formatted differently.
{
"bom-ref": "pkg:golang/github.com/klauspost/[email protected]?type=module",
"type": "library",
"name": "github.com/klauspost/compress",
"version": "v1.13.6",
"scope": "required",
"hashes": [
{
"alg": "SHA-256",
"content": "3fbe82a292442d2d1388eda679b9a7ce059a6a3b2c3ff12cce996db604317207"
}
],
"purl": "pkg:golang/github.com/klauspost/[email protected]?type=module",
"externalReferences": [
{
"url": "https://github.com/klauspost/compress",
"type": "vcs"
}
],
"evidence": {
"licenses": [
{
"license": {
"id": "Apache-2.0"
}
}
]
}
},
As you can see the difference is in license JSON object. Rust-cargo scanner formats licensing information using:
"licenses": [
{
"expression": "MIT"
}
],
while golang scanner formats licensing information by wrapping it inside evidence JSON object:
"evidence": {
"licenses": [
{
"license": {
"id": "Apache-2.0"
}
}
]
}
Why is there a mismatch? Shouldn't the licensing structure be the same? Also when uploading SBOM generated by rust-scanner to Dependency-Track the licensing information is not shown in Dependency-Track. While uploading SBOMs generated by the golang scanner to Dependency-Track - the Licensing information is present. It seems like Dependency-Track doesn't like the following format:
"licenses": [
{
"expression": "MIT"
}
],
But prefers this one:
"evidence": {
"licenses": [
{
"license": {
"id": "Apache-2.0"
}
}
]
}
Is there an option to combat this behaviour? Or is there a possibility to add a CLI switch for licensing formatting?
Currently the BOM is generated de novo for every dependency, but some crates (e.g. crates that include native code via a build.rs) really should produce their own BOM that's carried forward by downstream users. This could be done by e.g. adding necessary metadata into a package.metadata.bom section that tells us that a BOM is present inside the cargo package/source repository and where to find it.
Assume the following Cargo.toml
:
[package]
name = "cyclonedx-rust-repro-1"
version = "0.1.0"
edition = "2021"
[dependencies]
base64 = "0.21.0"
+[patch.crates-io]
+base64 = { git = "https://github.com/marshallpierce/rust-base64" }
When the patch is missing, the bom renders as:
<components>
<component type="library" bom-ref="pkg:cargo/[email protected]">
<name>base64</name>
<version>0.21.0</version>
<description>encodes and decodes base64 as bytes or utf8</description>
<scope>required</scope>
<licenses>
<expression>MIT OR Apache-2.0</expression>
</licenses>
<purl>pkg:cargo/[email protected]</purl>
<externalReferences>
<reference type="documentation">
<url>https://docs.rs/base64</url>
</reference>
<reference type="vcs">
<url>https://github.com/marshallpierce/rust-base64</url>
</reference>
</externalReferences>
</component>
</components>
When the patch is enabled, it is missing:
<components />
As part of the library-ification project (#68), we want to be lenient in what we accept and strict in what we execute. In order to fulfill that philosophy, we want to implement validation functionality, so that a user can ensure the data they have received conforms to the specification, if that is important to them.
This validation should live on the internal model and should share functionality with the logic used to ensure that library-consumer generated SBOMs are correct by construction. A potential library to aid with this purpose is validator, but other alternatives may also be appropriate.
A hypothetical workflow is
let bom = cyclonedx_bom::parse_xml(bom_file)?;
match bom.validate() {
Ok(bom) => todo!("add vulnerability data"),
Err(validation_errors) => todo!("Display invalid data errors to the user"),
}
This will be done based on the work in #105
.validate()
function is calledmodel::Bom
and related valuesJSON Timestamps in cyclone-dx use the date-time
format:
"timestamp": {
"type": "string",
"format": "date-time",
"title": "Timestamp",
"description": "The timestamp in which the action occurred"
}
The date-time
format is described here:
"date-time": Date and time together, for example, 2018-11-13T20:20:39+00:00
Using OWASP tools:
cyclonedx-win-x64.exe validate --input-file .\bom.json
Unable to validate against any JSON schemas.
BOM is not valid.
The BOM will validate using both tools if I remove the trailing zeros from the fractional digits in output manually:
E.g. going from:
2022-12-21T23:54:20.218381200Z
to
2022-12-21T23:54:20.2183812Z
Replace xml_writer
with yaserde
.
xml_writer
is unmaintained and requires a lot of manual work to ensure the XML is structured well. We should replace it with yaserde
, which uses serde
-style attribute macros to handle the XML-specific details.
Running cargo cyclonedx
I get tons of errors like this:
Package x509-parser has an invalid license expression, trying lax parsing (MIT/Apache-2.0): Invalid SPDX expression: invalid character(s)
But how do I enable "lax parsing"?
cargo-cyclonedx
Creates a CycloneDX Software Bill-of-Materials (SBOM) for Rust project
USAGE:
cargo cyclonedx [OPTIONS]
OPTIONS:
-a, --all List all dependencies instead of only top-level ones
-f, --format <FORMAT> Output BOM format: json, xml
-h, --help Print help information
--manifest-path <PATH> Path to Cargo.toml
--output-cdx Prepend file extension with .cdx
--output-pattern <PATTERN> Prefix patterns to use for the filename: bom, package
--output-prefix <FILENAME_PREFIX> Custom prefix string to use for the filename
-q, --quiet No output printed to stdout
--top-level List only top-level dependencies (default)
-v, --verbose Use verbose output (-vv very verbose/build.rs output)
The Command Line Applications in Rust book has a page on suggested approaches to testing command line applications. In particular, it suggests assert_cmd
, predicates
, and assert_fs
as potential crates to cover testing the functionality of a command line application.
As a general testing philosophy, we want a few end-to-end tests, several integration tests, and a majority of unit tests. Things worth testing via this approach include:
Cargo.toml
metadata arguments-q
quiet outputThe goal of this issue is to lay the foundation for a good test suite rather than implementing one all at once.
I would like to use cyclonedx-bom to construct my own custom BOMs. There are some limitations with the existing cargo tool that are pushing me in this direction.
I'm struggling to figure out how to construct certain types in the model however. Namely:
These newtypes have pub(crate)
members, no constructors, and I can't find any conversion traits like from/try_from to create them.
I figured I could just look at the cargo application, but then I discovered that it doesn't even use the cyclonedx-box
crate in the same workspace.... it has it's own parallel implementation of the data structures! What's the thinking there?
The current system only supports v1.3. Is there a roadmap for v1.4 support?
This section contains data that's available from Cargo.toml
:
Adding authors
and component
are the more difficult of the two, due to cargo workspaces and multiple binaries.
In the simple case, cargo cyclonedx
is invoked on a single crate, which is either a library or a binary. In that case, we can generate exactly 1 BOM with all four of the above fields populated.
If cargo cyclonedx
is invoked on a crate which is both a library and a binary, or declares multiple targets (I think these are the binaries, but need to test that), then should cargo cyclonedx
generate 1 BOM per binary target? That seems reasonable as long as it doesn't produce one for every example and integration test. Another thing to check here would be that the cargo
crate surfaces auto-discovered targets properly.
If cargo cyclonedx
is invoked on a workspace, what's the desired behavior? Some options would include:
/crate1.bom.xml
/crate1/bom.xml
boms/crate1.xml
Since we are splitting the internal data-model from the serialization / deserialization code in #77, we should consider auto-generating the implementations based on the schema definitions in the CycloneDX/specification
repository.
lumeohq/xsd-parser-rs
is the most complete option found during initial researchschemafy
for JSON Schemaprotobuf
for ProtobufA common practice for Cargo workspace projects is to include both a version
and path
for inter-project dependencies (example: tokio-stream
depends on tokio
). This allows local development to make changes across the project without publishing, while also allows for an easy publish workflow when crates depend on each other.
Right now we do
// Filter out our own workspace crates from dependency list
for member in members {
dependencies.remove(member);
}
and
if members.contains(&package) {
// Skip listing our own packages in our workspace
continue;
}
which would mean that tokio
would be excluded from the BOM for tokio-stream
. This doesn't match the tokio-stream
dependencies list, which might be confusing to users in the context of the changes for PR #33 (create a separate BOM for each cargo workspace member).
Add a CLI argument to override the current behavior for when projects want to explicitly list dependencies from the same project. Something like --include-inter-workspace-dependencies
Several design decisions have been discussed, including:
cyclonedx-bom
into models
and specs
May be able to reuse code from https://github.com/sensorfu/cargo-bom (MIT license)
And generate purl with https://github.com/package-url/packageurl-rs
Commands can be added to cargo when an executable named cargo-<command>
is in path. For example, cargo cyclonedx
will call cargo-cyclonedx. The generated executable name could be changed by adding the following section in Cargo.toml.
[[bin]]
name = "cargo-cyclonedx"
path = "src/main.rs"
That said it would be even better to use the package name cargo-cyclonedx
, it is available on crates.io.
Replace lazy_static
with once_cell
.
lazy_static
is unmaintained and the API used by once_cell
is being integrated into the standard library (tracking issue. Switching to this library now will hopefully make the adjustment to using the standard library version require minimal work.
.expect
calls for the regular expressions and converts any errors to Result
style code.i have a project which has the dependency bluer = { version = "0.15", features = ["bluetoothd"] }
. it builds and tests successfully. i can not generate a SBOM though:
Updating crates.io index
Error: Could not process the cargo config: /home/marcel/dev/git/foo/Cargo.toml
Caused by:
failed to select a version for the requirement `bluer = "^0.15"`
candidate versions found which didn't match: 0.13.3, 0.13.2, 0.13.1, ...
location searched: crates.io index
required by package `foo v0.2.3 (/home/marcel/dev/git/foo)`
used cargo-cyclonedx was 0.3.4.
According to the recognized file patterns documentation, the following patterns are allowed
*.cdx.json
for JSON
encoded CycloneDX BOM files.*.cdx.xml
for XML
encoded CycloneDX BOM files.Currently, we always output bom.json
/bom.xml
<package name>.cdx.json
/<package name>.cdx.xml
<prefix name>.cdx.json
/<prefix name>.cdx.xml
Cargo.toml
(#103) approaches to configuration.In preparation of v1.2 of the spec we need to support JSON output.
The Authors
struct is currently responsible for parsing the authors text from the Cargo.toml file and is a bit extraneous at the moment. The work done on PR #243 removed all the other CLI library structs except for this one. We can simply move its functionality into a function in the generator.rs
file for now, and refactor generator.rs later.
generator.rs
to parse authors text from Cargo.tomlauthors.rs
The URLs are currently placed directly inside the reference tag, they should be inside a tag.
Example:
<reference type="documentation">https://docs.rs/base64</reference>
instead of
<reference type="documentation"><url>https://docs.rs/base64</url></reference>
In order to ensure we support as many development platforms as possible, we should support as old of a compiler version as possible while still enabling ourselves to use the libraries and language features we require. Research various stable Linux distros and other places where rustc
is packaged and determine the right version for us to use.
1.41.1
Cargo.toml
files contain the rust-version
field with the agreed version.github/workflows/rust.yml
is updated to specifically test with that version
PR #243 adds a new()
constructor to the Metadata struct but the error for timestamp generation is currently being ignored. Find a better approach for this constructor to handle the possible timestamp error.
The auditable-info crate allows the retrieval of dependency information embedded in a Rust binary via cargo-auditable.
The related quitters crate tries to reconstruct dependency information from panic messages in the binary (the file paths to be precise).
For details on both see this reddit post
It would be useful to be able to generate SBOMs from this information to allow the use of standard CycloneDX tools, e.g. Dependency Track, with previously compiled binaries. The quitters approach could be especially useful with older binaries which do not compile any more to allow a better decision on rewrite vs. update.
I'm afraid this PR broke the top-level dependency stuff entirely (at least for all the projects I tried, one being cyclonedx-rust-cargo itself)
0.3.8 doesn't produce a component list at all anymore.
I added debugging like this:
.filter(move |r| {
println!("{} == {}", r.0, m.package_id());
r.0 == m.package_id()
})
and if I try to generate a BOM for the cyclonedx-bom
project itself this is what I get:
cyclonedx-bom v0.4.1 (/home/lars/dev/external/cyclonedx-rust-cargo/cyclonedx-bom) == cargo-cyclonedx v0.3.8 (/home/lars/dev/external/cyclonedx-rust-cargo/cargo-cyclonedx)
I tried various other projects and all of them return an empty component list.
The same works on 0.3.7.
While migrating cargo-cyclonedx
to use the cyclonedx-bom
library, we discovered that several Rust libraries have data in the license
field of the Cargo.toml
that is not a valid SPDX expression. The Cargo documentation for the field indicates that crates.io will interpret the field as an SPDX expression. From reading their source code, they accept a /
character as OR
, even though that is not a strict adherence to the specification.
Additionally, since the SPDX requirement is for packages published to crates.io, it is possible to use the license
field in a different way when publishing to a private registry.
/
as OR
(and also the GPL + setting for consistency with the Rust ecosystem)
From #226's
- Is the purl standardized for Rust somewhere in the spec? What should the purl be for crates not distributed via crates.io?
The purl specification does not indicate a required type specific to Rust, beyond
type: the package "type" or package "protocol" such as maven, npm, nuget, gem, pypi, etc. Required.
Investigate what other CycloneDX tools are doing, particularly if they support private repositories. Rust supports private registries and indicates at a per-dependency level what registry it comes from, so we should be able to access this information if we want to use that as the purl's type
for a dependency component and the [package]
's publish
list for the package's component
As far as I can tell, there's no valid way to extract the dependency information in a Bom
. The dependencies
field is public, but it's a newtype struct with a pub(crate)
field, meaning it's private for other crates. There are no methods to access the internal Vec
of dependencies, so there's no way to access this field.
I suppose there's a workaround of using the JSON output to get a serde_json::Value
and then using the pointer
method to access the field, but this feels excessive for accessing data that's present but unnecessarily private.
Hi, I have a small project I was trying to run this on. A dummy one with only a handful of dependencies.
I installed the tool with cargo and as per the generated bom it is the 0.3.8
version. But this is presenting the exact problem stated here: #443 I don't see the dependencies in the resulting bom file.
I noticed this is supposed to be fixed, and so I tried cloning the repo, compiling and ran the resulting release build, this worked and didn't present the problem shown above. I can see the version is also shown as 0.3.8
. So I'm wondering if there's some mismatch between the version installed by cargo install
and what is currently on the main branch.
Currently, after parsing an SBOM. The specVersion
field is not available.
However, what might happen is that the 1.3
parser parses a 1.4
version successfully. However, without reparsing and inspecting the raw JSON (or XML), it seems impossible to check if this was actually a 1.3 or 1.4 document. Which might be necessary to know in some cases.
I can come up with a PR for this if that helps.
The CLI currently references the BOM library dependency via path as the BOM library is not yet published as a crate. Once the library is published as a crate, some work will need to be done to manage the dependency between the CLI and the library.
Right now the crate uses its own Purl
type, with only hidden fields (which also makes it not terribly useful as-is, except for validation; use of its contents requires Display
-ing and then reparsing with packageurl::PackageUrl
or url::Url
anyway). It's probably better to transition this to packageurl::PackageUrl
to enable easier interoperation and enable extraction of the purl's components.
The following arguments are supported via the command-line, but are not used:
target_dir
verbose
quiet
color
frozen
locked
offline
unstable_flags
config_args
They are passed to cargo::util::Config::configure
, which is only used to get the current working directory in
let manifest = args
.manifest_path
.unwrap_or_else(|| config.cwd().join("Cargo.toml"));
I think we probably want to support:
verbose
quiet
and probably drop the rest, as they don't have an effect on SBOM generation (e.g. we're always offline, because we don't download anything from the internet).
We should also consider what other CLI arguments we want to support.
--output
would be useful so the user can specify a particular location to store the SBOM rather than defaulting to the current working directory--force
flag would be a good idea to allow users more control on overwriting existing output.--include-dev-dependencies
and --include-license-text
seem like good options as well, from the CycloneDX Node repositoryRight now, the project is published under a single crate cyclonedx-sbom
. The dependencies required for the full functionality of the CLI result in a very large dependency tree and large compile times (mostly due to the dependency on cargo
due to some things not being available via cargo_metadata
).
cyclonedx-sbom
: A lightweight library that implements the specification and provides small utilities (e.g. proposed merge algorithm implementations would live here)cargo-cyclonedx
: The current functionality, but structured in a way that makes the library easy to integrate with other rust applications and makes the binary as the first-party implementation on top.cargo-cyclonedx
cratecdx-automation
user with publish access to that cratecargo-cyclonedx/
cargo-cyclonedx
and other Rust applicationsKeats/validator
publish crates
action we currently use, but more research needs to be done to determine specifying major / minor / patch versions.README.md
for each cratecargo install
command and might cause problems with users who have already installed the CLI tool1.0.0
versionTake inspiration from the actions-rs
organization and actions we use such as katyo/publish-crates
to create a GitHub Action that generates SBOM files. The intent is to use it with something like EndBug/add-and-commit
to automatically generate SBOM files on merge
bom.{xml,json}
files to be committed to the repositoryFrom #226's
Can the tool exclude build-only dependencies? I've noticed that by default is includes them.
Investigate and implement a way to handle the various types of dependencies (documentation reference). In particular, we should allow customization of:
[build-dependencies]
[dev-dependencies]
optional
dependencies
scope
on the Component
My use case is building FFI libraries that can distributed as C binary headers, but also embedded in Java Jars and Nuget pacakges.
A couple of questions:
Hello! I'm looking for a CycloneDX crate and found this, but I don't see any commits since January or any recent progress towards supporting 1.4. Is this project still maintained?
As the title says.
If I have a project with multiple packages (e.g. this one, cyclonedx-rust-cargo
) then the components for all packages will include all components from all packages combined.
We use StructOpt
for our CLI parser library. Its functionality has been added to one of its dependencies, clap
in its 3.0.0
release. StructOpt
is now in maintenance mote, so we should switch over to clap
. This should mostly translate directly, but there might be a few changes required
structopt
dependency is replaced with clap = "3.0.10"
I am working on the enarx project which supports different feature sets/dependency graphs based on which features and/or binary targets we are compiling for (e.g. for Windows/MacOS/AARCH64 we do not include the enarx-shim-sgx
and enarx-shim-kvm
binary-dependency subcrates from our workspace). One way to see the differences is on the dependencies used for the various targets/enabled features is to run cargo build -Z unstable-options --build-plan | jq '.inputs'
which outputs a list of paths to the Cargo.toml
used for building a binary.
I would recommend a mode for cargo cyclonedx
which could take parameters that would be used when running cargo build
(target, features, etc), use that to generate a build-plan, and use the inputs from that buildplan to construct a BOM.
Thoughts?
Assuming one uses e.g. rdkafka
, this pulls in rdkafka-sys
, which actually contains librdkafka
(a C library).
There are a bunch of dependencies, which follow a -sys
style pattern, adding C code to the Rust build. Such crates can sometimes by linked dynamically (re-using a system library), but sometimes they bring their own library.
In the case that such crates bring their own compiled version, I would expect to see the dependency on the SBOM too.
However, building a Rust application with this crate (using --all
) I am not aware of the fact that this contains the code of librdkafka
too:
$ cat bom.xml | grep kafka
<component type="library" bom-ref="pkg:cargo/[email protected]">
<name>rdkafka</name>
<description>Rust wrapper for librdkafka</description>
<purl>pkg:cargo/[email protected]</purl>
<url>https://github.com/fede1024/rust-rdkafka</url>
<component type="library" bom-ref="pkg:cargo/[email protected]+1.9.2">
<name>rdkafka-sys</name>
<description>Native bindings to the librdkafka library</description>
<purl>pkg:cargo/[email protected]+1.9.2</purl>
<url>rdkafka</url>
<url>https://github.com/fede1024/rust-rdkafka</url>
The rdkafka-sys
dependecy looks like this:
<component type="library" bom-ref="pkg:cargo/[email protected]+1.9.2">
<name>rdkafka-sys</name>
<version>4.3.0+1.9.2</version>
<description>Native bindings to the librdkafka library</description>
<scope>required</scope>
<licenses>
<expression>MIT</expression>
</licenses>
<purl>pkg:cargo/[email protected]+1.9.2</purl>
<externalReferences>
<reference type="other">
<url>rdkafka</url>
</reference>
<reference type="vcs">
<url>https://github.com/fede1024/rust-rdkafka</url>
</reference>
</externalReferences>
</component>
It might be hard to provide an automatic way to discover this, so I think it might make sense to have the following two features:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.