cdevents / spec Goto Github PK
View Code? Open in Web Editor NEWA common specification for Continuous Delivery events
License: Apache License 2.0
A common specification for Continuous Delivery events
License: Apache License 2.0
I propose that the subject object within each event type definition is split into two objects - subject and predicate, where the new subject object would contain items that are identical for all event types on the same subject, and the predicate contains the attributes that are different for events with different predicates but same subject.
At the same time the "content" object could be collapsed I believe.
I think this provides a clearer structure of the event. A drawback with this approach is that it might sometimes be hard to determine whether a field is to be placed in the predicate object or the subject object. One such example is the url
in the example below. It could be argued that it is the same for both predicates defined ("started" and "finished"), but I'd say it could be different since the "finished" event could link to an archived edition of the taskRun execution while the "started" event would link to an ongoing execution.
Example for the dev.cdevents.taskrun.finished event, that should be changed from:
{
"context": {
...
},
"subject": {
"id": "mySubject123",
"source": "/event/source/123",
"type": "taskRun",
"content": {
"taskName": "myTask",
"url": "https://www.example.com/mySubject123",
"pipelineRun": {
"id": "mySubject123"
},
"outcome": "failure",
"errors": "Something went wrong\nWith some more details"
}
}
}
to:
{
"context": {
...
},
"subject": {
"id": "mySubject123",
"source": "/event/source/123",
"type": "taskRun",
"name": "myTask",
"pipelineRun": {
"id": "mySubject123"
}
},
"predicate": {
"type": "finished",
"url": "https://www.example.com/mySubject123",
"outcome": "failure",
"errors": "Something went wrong\nWith some more details"
}
}
This issue was raised as a result of a discussion during the CDEvents WG on Jan 30th, 2023
These event subjects all represent executions of some kind, as they can all be started and finished:
The protocol is not consistent regarding those subjects or their predicates though:
Furthermore, none of the subjects above have a predicate that would signal that the subject has left the queue. A subject that has been queued would eventually get started, but there seems to be a need to also be able to signal that the subject has been dequeued and thus is not anymore expected to be started. The predicate to use here should proposedly be either dequeued or canceled.
Note: Some of this issue is already considered in #105
Try to write some applications to send cdevents via cloudevents, but it is really difficult to find sample event in json. (even in sdk repo)
scheme is good to define the event, and sample json can help end-user to understand and use the cdevents
so please help to add samples how the cdevents look like in json format for cdevents like https://github.com/cloudevents/spec/blob/main/cloudevents/formats/json-format.md
Many projects and systems already produce and consume CloudEvents.
When a producer starts producing CDEvents in addition to existing CloudEvents, not all consumers may be ready to consume CDEvents.
How to solve this?
Following from cdfoundation/sig-events#110
Quoting initial description:
Create an extension for the CloudEvents Java SDK that provides CD Events definitions and types:
This SDK can follow different approaches, by using just the Java SDK or the Spring integration.
We should have basic coverage for events defined in the vocabulary and at least an example app producing and consuming these events.
I attended cdCon this week, and had a lovely chat with @afrittoli and wanted to highlight a couple issues we discussed.
Currently the spec makes payload a nested object, but it NEEDs to be a first class citizen. The reason for this is anyone adopting this spec would have to break all their customers using their service that may or may not be following the spec. Further, anything that uses plugins, grpc, or something similar would be completely broken by the current standard. This is a non-starter for anyone who ensures they are never any breaking changes to their customers, ie AWS, Google, Azure.
I propose three alternative approaches.
What the spec currently has is this
{
"specversion" : "1.0",
"type" : "com.github.pull_request.opened",
"source" : "https://github.com/cloudevents/spec/pull",
"subject" : "123",
"id" : "A234-1234-1234",
"time" : "2018-04-05T17:31:00Z",
"comexampleextension1" : "value",
"comexampleothervalue" : 5,
"datacontenttype" : "text/xml",
"data" : "<much wow=\"xml\"/>" // Notice the nesting of the data
}
So instead we'd have headers, X-CloudEvents-Id
for example. This makes the payload unchanged, which is huge for any end user. Looking at logs as well as other things remains unchanged otherwise these changes may surprise a user and break a user especially if they are doing anything with those logs, like log parsing.
And now our example above looks like
POST /cgi-bin/process.cgi HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE5.01; Windows NT)
Host: some-host
Content-Type: text/xml
Content-Length: length
Accept-Language: en-us
Accept-Encoding: gzip, deflate
Connection: Keep-Alive
X-CloudEvents-Id: A234-1234-1234
X-CloudEvents-Spec-Version: 1.0
X.CloudEvents-Type: com.github.pull_request.opened
X-CloudEvents-Source: https://github.com/cloudevents/spec/pull
X-CloudEvents-Subject: 123
X-CloudEvents-Time: 2018-04-05T17:31:00Z
X-CloudEvents-Metadata: "comexampleextension1=value; comexampleothervalue=5"
<much wow=\"xml\"/>
The X-CloudEvents-Metadata
would act as any custom key values needed by the service. The header format of foo=bar; baz=qux
is widely used in HTTP headers when multiple key values are needed within a header. See Strict-Transport-Security.
The major benefits with going with this approach is the payload is COMPLETELY untouched and thus making completely backwards compatible with any pre-existing service.
Another approach is to make pre-existing payloads that customers have a first class citizen, as in making that untouched, and adding a field within the payload that would house all the event information.
<much_wow>
<somefield>hello world!</somefield>
<cloud_events_metadata>
<type>com.github.pull_request.opened</type>
<source>https://github.com/cloudevents/spec/pull</source>
<subject>123</subject>
<time>2018-04-05T17:31:00Z</time>
<comexampleextension1>value</comexampleextension1>
<comexampleothervalue>5</comexampleothervalue>
<id>A234-1234-1234</id>
<spec-version>1.0</spec-version>
</cloud_events_metadata>
</much_wow>
In this approach we use headers as well as a new cloud events field. Things like X-CloudEvents-Id
could be a header, and the customer exclusive data like comexampleextension1
could be in the payload field.
I know none of this is currently in the spec/standard, but I definitely think it's worth discussing. Basically how do you version and support interoperability between all the possibilities?
For instance, some service A could be on version 1, and some service B could be on version 2. Are major versions always going to be backwards compatible, so sending requests between these services should always work?
Further, types have to have a version as well. The com.github.pull_request.opened
type may have a completely different payload years later. That adds a whole layer of complexity... In addition, adding a new type to the spec would have to be a major version bump of the spec, since anyone using version 1.x and 1.y adds a new type, then 1.x will have no idea on how to deal with that.
How does major versioning look like with interoperability? Does EVERY service need to update to the new major version? How do we plan on handling that???
CDEvents include a source
field that matches the source
in CloudEvents (type: URI-reference
). source
+ id
should be globally unique, which means that source
should identify the producer of events using a URI-reference
.
I'm using Tekton events as an example of the possible options.
In Tekton today the source looks like:
"Ce-Id": "77f78ae7-ff6d-4e39-9d05-b9a0b7850527",
"Ce-Source": "/apis/tekton.dev/v1beta1/namespaces/default/taskruns/curl-run-6gplk",
"Ce-Specversion": "1.0",
"Ce-Subject": "curl-run-6gplk",
"Ce-Time": "2021-01-29T14:47:58.157819Z",
"Ce-Type": "dev.tekton.event.taskrun.unknown.v1",
However this is problematic because the source is different for every subject, which makes filtering events harder.
An alternative could be to use in the source the website of the platform that produces events, e.g.
"Ce-Id": "77f78ae7-ff6d-4e39-9d05-b9a0b7850527",
"Ce-Source": "https://tekton.dev/",
"Ce-Specversion": "1.0",
"Ce-Subject": "/apis/tekton.dev/v1beta1/namespaces/default/taskruns/curl-run-6gplk",
"Ce-Time": "2021-01-29T14:47:58.157819Z",
"Ce-Type": "dev.tekton.event.taskrun.unknown.v1",
That means that all events generated by any instance of Tekton would have the same source though.
I'm not aware of any standard way of identifying an instance of an application via an URI.
We could define something like <org-namespace>/tekton/<instance-descriptor>
:
"Ce-Source": "tekton.dev/tekton/dogfooding",
This could identify the dogfooding
Tekton instance used by the Tekton community.
Thoughts?
Quoting from #40 (thank you @xibz):
How do you version and support interoperability between all the possibilities?
For instance, some service A could be on version 1, and some service B could be on version 2. Are major versions always going to be backwards compatible, so sending requests between these services should always work?
Further, types have to have a version as well. The com.github.pull_request.opened
type may have a completely different payload years later. That adds a whole layer of complexity... In addition, adding a new type to the spec would have to be a major version bump of the spec, since anyone using version 1.x and 1.y adds a new type, then 1.x will have no idea on how to deal with that.
How does major versioning look like with interoperability? Does EVERY service need to update to the new major version? How do we plan on handling that???
Define a schema document, hosted by cdevents, which specifies the structure of the payload of CDEvents when transported over CloudEvents. The schema shall be specified in the dataschema
attribute of cloudevents.
Disclaimer: This issue should be put in a general cdevents community repo, but as we lack such I put it here instead.
I'd like to see a PoC on dependency updates through events. The idea is that e.g. artifact published event should trigger a new pull request being created in a repository that depends on that "upstream" repo. It could be a lib->component relation or a component->application relation or similar. The new pull request in that "downstream" repo should then have a change created event sent for it, which somehow relates to the artifact being the cause of that update.
Note: The functionality of triggering downstream dependency updates is today handled by for example Dependabot if both repos are in GitHub, but what if not? And sending such events would also make it possible to visualize and measure on such dependency updates in a generic manner
As discussed last week, a global ID is needed to ensure quick and efficient look ups for a series of events.
This new global ID would be a requirement for the links spec #10 as this would eliminate the need for links to backtrack to figure out how a series of events occurred, and instead, a simple query of the global ID would return all events associated with that global ID. Further this would give a logical grouping to the events that occurred for a particular job.
I propose the name span id and further I imagine the SDKs would need to change a little bit. I would imagine if no span ID is present, simply create one, and that would then be passed to all children events. I haven't spent too much time on thinking about this, but is that all that is needed to determine a span of events?
I'll spend some more time thinking about this, and see if I am able to get a proposal out.
Today all event schemas contain the fields id
, source
and type
in their subject
objects, but they are not all described in the corresponding event specific md file. They are explained in the spec.md file, but that is not obvious when reading the event specific md.
It should also be clarified what fields in the subject
object are mandatory and not, and what fields have default values.
The type
field should maybe have the format of an enum with only one valid value, which should then also be the default value of the type
.
By working with the Spec and current vocabulary it gets really difficult to integrate tools based on the events that we have defined.
One thing that becomes quite clear when integrating different projects is that we don't have events to express when things are requested, which is a step before other things can happen.
The most simple example is PipelineRunStarted
event.. For that event to happen, someone needs to trigger a pipeline, at that point the pipeline engine can easily emit a PipelineRunStarted
event. What is missing here, is the integration point for when we want to start pipelines from outside the pipeline engine (one of the main integration use cases) , hence adding events like PipelineRunRequested
will enable in an event-driven way a different project to accept requests for actions. How these requests are mapped to the different tools is out of the scope of the vocabulary/spec.
@afrittoli @m-linner-ericsson @e-backmark-ericsson Folks.. I think that we need this kinda urgently to make sure that the integrations that we are building make sense. Can the group discuss this in the next meeting?
The overarching issue that links the different streams of work about connecting events.
Design document: https://hackmd.io/-Or6hobHSLWVj4duAWX7nA
Related issues:
Prepare a v0.1 release announcement
According to our roadmap we should defined the requirements for v1.0 during 2022.
A common use case in proof of concepts is to start from a change in the code.
Having a re-usable GitHub -> CDEvents translation layer would enable PoC authors to kickstart the work on their demo much faster.
Add CDEvents v0.1 to the tools terminology table in the vocabulary doc.
Follow up to the discussion in #30 (comment)
We should clarify whether to we want to version events individually or not and what does that mean in terms of compliance to the spec for producers and consumers.
Based on the updated binding document and schema, we shall create the go SDK under cdevents/sdk-go.
The new SDK is required to start building a proof of concept.
This includes investigating if/how the SDK can be generated from the json schema.
Follow-up to the discussion on #79.
In Eiffel we have the following design rule:
Do not use variable key names: For purposes of automated validation, analysis and search, custom key names shall be avoided. Consequently, for custom key-value pairs
{ "key": "customKeyName", "value": "customValue" }
shall be used instead of{ "customKeyName": "customValue" }
.
And our customData is defined like this in json schema:
"customData": {
"type": "array",
"items": {
"type": "object",
"properties": {
"key": {
"type": "string"
},
"value": {}
},
"required": [
"key",
"value"
],
"additionalProperties": false
}
}
Do you think that should be a valid design rule for CDEvents as well?
It would then be something like this instead:
"customData": {
"oneOf": [
{
"type": "string",
"contentEncoding": "base64"
},
{
"type": "array",
"items": {
"type": "object",
"properties": {
"key": {
"type": "string"
},
"value": {}
},
"required": [
"key",
"value"
],
"additionalProperties": false
}
}
]
}
There was an old PR in the old sig-events repo on introducing the term "proposal" for some of our SCM event(s). Much has happened in the spec since then and the spec is moved to cdevents, so a new PR needs to be created based on the current codebase. I've closed that old PR in favor of this issue.
Is that proposal still relevant @erkist ?
The root specification document spec.md
includes aspects of CDEvents which are common across all events.
We shall restructure it and expand it a bit, following the example from CloudEvents: https://github.com/cloudevents/spec/blob/v1.0.1/spec.md#notations-and-terminology:
It was apparent in the PR #86 that the way event vs spec versions are currently handled is problematic. The current versioning scheme requires that all event schemas are updated for any change to any individual event schema, since the spec version is updated for each event update and the spec version is part of each events individual schema. I believe that an individual event schema should not need to be updated unless "its own" contents are updated.
One way to handle this would be to not restrict the possible versions to set on the context.version property, but that has other drawbacks since then events could be sent with an invalid spec version set.
Alternatively we could remove the spec version from the context object, but it is not exactly know what consequences that would have to our SDKs etc.
Build and artifact events today include an "artifact id".
This is identified as a unique identifier of an artifact.
There are open questions about which ID is available to which event though:
We should be more specific on what the artifact ID is to better guide implementors (event producers and consumers alike):
What extra standard data shall we include for artifact?
Follow-up to the discussion in #26 (comment)
The spec now distinguishes subjects as:
The terminology does not work for all subjects though, for instance artefacts are not "running".
Some alternatives:
As noted in this comment, our usage of universal resource locators/references is not really aligned over our json schemas and markdown docs. We should decide on what format to use and then make sure it is updated in the relevant files
The $id references in the json schemas for the events are not valid anymore since the rendering of cdevents.dev was changed recently.
Example:
This event schema: https://github.com/cdevents/spec/blob/spec-v0.1/schemas/artifactpackaged.json
has an $id reference to https://cdevents.dev/0.1.2/schema/artifact-packaged-event
which is not valid
CDEvents should allow carrying extra data as an extension of the standard CDEvent.
CDEvents cover the shared part of the data model, but each application has its own extra data which a consumer may decide to benefit from.
Based on the WG discussions so far, the data shall be hosted in a root level element, in parallel to context
and subject
.
We will need extra fields in the context to describe the type and format for the data. The data MUST be compatible with the hosting JSON format, so it could be nested JSON or base64 encoded for anything else.
To make it easy to mention event types, both in written text and in graphics, it would be beneficial to define abbreviations for each of the CDEvents event types.
Example:
RepCr - Repository Created Event
ChaRe - Change Reviewed Event
BuiSt - Build Started Event
ArtPa - Artifact Packaged Event
etc
This issue is to receive nominations for the CDEvents Most Valuable Contributor (MVC) Award 2022.
This award recognizes excellence of contributions to the CDEvents project, which may be any or a combination of contributions to the specification, work on the SDKs, implementation of proofs of concepts, advocacy and fostering adoption in other communities.
To nominate someone, reply to this issue with the following:
Full name of the person you’re nominating
Short description of their contributions to the CDEvents project and why they should win.
Nomination Deadline: Tuesday, April 11, 2022
You may "vote" on existing nominations by adding emojis to them ^_^
Voting Deadline: Monday, April 11, 2022
More details are available here: https://cd.foundation/cdf-community-awards/
Make it clearer in the spec that CDEvents are descriptive and not directed to a specific client.
We can use a diagram to represent this:
something along the lines of https://raw.githubusercontent.com/cdfoundation/sig-events/main/poc/sig-events-spinnaker-functional.png
Make it clearer in the spec that the subscriber model is the primary use case
It's in the primer today but we need to make it more relevant
- https://cdevents.dev/docs/primer/#why-not-point-to-point-communication
Document decision (from CDEvents community summit) about prescriptive events in the spec.
The project defines a specification for continuous delivery events on top of CloudEvents.
CloudEvents event types must be in the format <reverse-dns-domain>.something.something
.
We’d like to define events type like:
cd.events.pipeline.started
cd.events.artifact.published
which means we’d need the DNS domain events.cd
to be owned by the project, or at least owned by the CDF and assigned to the project. We would also like to use the same domain events.cd
to host a website for the project.
See: cdfoundation/sig-interoperability#81
In the working Event Vocabulary document under ["Events Categories"(https://hackmd.io/lBlDCrL7TvmtNOjxdopJ5g) we have a list of event types. Consider using the same terminology as that proposed for the Interoperability SIG's CI/CD Vocabulary.
For example, you have "Build", "Test", and "Release" events, while the Interoperability SIG vocabulary document has "Build", "Test", "Tag", "Publish" and "Deploy". ("Release" can mean "Tag", "Publish" and/or "Deploy" depending on who's using the term; are they talking about tagging the current commit in Github as version 3.7, publishing a new version of a package to NPM, or deploying a new running version of software to your production environment?)
Codify different type of tests in CDEvents, for instance
The version field https://github.com/cdevents/spec/blob/main/spec.md#version specifies that it defines the version of the specification. Also the https://github.com/cdevents/spec/blob/main/spec.md#type-context specifies that is should contain the version of the event. When looking at a event this is super confusing.
Proposal is to rename the version
filed to something more descriptive that describes that it specifies the specification version. Maybe rename it to spec_version
.
Expand the spec documents for each bucket to include some extra information:
Environment
has an ID. A Deployment
may refer to an environment by ID.Examples in the spec may also refer to specific platforms. Fields should include a type, examples of values and link to use cases.
Investigate S3C aspects for CDEvents.
Examples: artifact SBOM, signature, artifact verified event
We should document in the CloudEvents binding document how the subject should be populated when sending CDEvents.
All CDEvents are related to an activity and an entity - like taskrun started or artifact published.
Entities are identified by an ID which we could use to fill in the subject of the CloudEvents.
That would allow CloudEvents consumers to easily filter events based on the subject.
Example:
{
"specversion" : "1.0",
"type" : "cd.events.taskrun.started",
"source" : "/apis/tekton.dev/v1beta1/namespaces/default/taskruns/curl-run-6gplk",
"subject" : "my-taskrun-123", <=====
"id" : "A234-1234-1234",
"time" : "2018-04-05T17:31:00Z",
"cdeventssource" : "tekton_ci_prod/apis/tekton.dev/v1beta1/namespaces/default/taskruns/curl-run-6gplk",
"cdeventsplatform" : "tekton_node123_clusterABC",
"cdeventssourceevent": "A123-5678-9012@jenkins_ci_prod/pipeline/my-source-pipeline",
"datacontenttype" : "text/json",
"data" : {
"taskrun" : {
"id": "my-taskrun-123", <=====
"task": "my-task",
"status": "Running",
"URL": "/apis/tekton.dev/v1beta1/namespaces/default/taskruns/curl-run-6gplk"
}
}
}
When evaluating the "lead time for changes" metric, we need to be able to correlate change events to build events, so that we may discover which builds include a specific change. From the build we can then find out the associated deployment.
Build events today are relatively bare, only the type and optional artifact id
are specified.
This extra data is needed:
repository
, like the one already included in change eventslast change
, which could be an attribute of the repository or separate. It must be an identifier that can be passed to the SCM to ask whether a specific change is included or not.The CDEvents specification should allow propagating identifiers from consumed events into produced events.
When an activity is directly started as a consequence of a consumed CDEvents, an identifier of the consumed event must be included in the produced events related to the activity.
When an activity is directly started as a consequence of a consumed event (non CDEvent), an identifier of the consumed event may be included in the produced events related to the activity.
When a pipeline runs, it generates events as different tasks are started and finished. Children pipelines may be triggered by these events (whether in the same CD system or a different one), and they will in turn send events.
As a DevOps engineer I want to be able to discover children pipelines from the events they generated.
These are the spec fields that would be required for CDEvents.
Field Name | Type | Description | CloudEvents Binding | Notes |
---|---|---|---|---|
id | string | Event identifier | id | Unique for a source |
source | URI-reference | Producer of the event | source | May be globally unique? |
subject | string | Producer of the event | source | May be globally unique? |
cdevents_platform | string | Platform namespace for sources | <extension> | Only needed is source is not globally unique |
cdevents_source | URI | Globally unique source | <extension> | Only needed is source is not globally unique |
cdevents_source-event | string | Globally unique source event identified | <extension> | Serialised source + id / cdevents_platform + source + id / cdevents_source |
We already use id and source in our PoC, however we do not specify them in our spec, but we should add them.
The source-event
field is a globally unique identified of another event, which was the trigger to the activity identified by "source + subject".
CloudEvents have a Source and ID attached to them. Each instance of each platform is responsible for generating unique Source + IDs for its events, but it's not required for one platform to guarantee global uniqueness.
If we enforce global uniqueness, it would responsibility of the administrator of the overall system to ensure each source is configured with a globally unique value (e.g. a DNS name) that can be used to generated the various sources.
If we do not enforce global uniqueness, we could introduce the cdevents_platform extension or the cdevents_source extension to provide a globally unique value. Again it would be responsibility of the overall administrator to configure a globally unique seed. Alternatively platform could use some strategy to pick a name that is likely to be globally unique. Examples:
{
"specversion" : "1.0",
"type" : "cd.events.taskrun.started",
"source" : "/apis/tekton.dev/v1beta1/namespaces/default/taskruns/curl-run-6gplk",
"subject" : "my-taskrun-123",
"id" : "A234-1234-1234",
"time" : "2018-04-05T17:31:00Z",
"cdeventssource" : "tekton_ci_prod/apis/tekton.dev/v1beta1/namespaces/default/taskruns/curl-run-6gplk",
"cdeventsplatform" : "tekton_node123_clusterABC",
"cdeventssourceevent": "A123-5678-9012@jenkins_ci_prod/pipeline/my-source-pipeline",
"datacontenttype" : "text/json",
"data" : "{}"
}
It may not be possible for a platform to implement the source-event extension in all outgoing events, because of a few reason:
cdevents
are implemented via an adaptation layer, the information about the source event may not be there at allThis means that a consumer of events that would like to visualise the end to end workflow triggered by an initial event will have to use a combination of source-event, subject and event type to discover all relevant events.
Capture our key use cases and design decisions in the CDEvents primer document
To simplify for event consumer we should include a schema uri property to the cdevents context object. With that the consumer could use that schema for validation, parsing and data extraction purposes. It might be obvious that an event sent using a certain version of the spec would be possible to retrieve from https://cdevents.dev//schema/.json, but there are use cases where that might not be enough:
main
in the spec repo to the new -draft releaseA declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.