Giter Club home page Giter Club logo

automation-working-group's Introduction

Introduction

The Automation Working Group seeks to leverage automation technologies to reduce the workload related to reporting and manageing vulnerabilities with an open data model. In this repository you will find mostly "logistics" related to the execution of the AWG objectives, specifically the AWG Charter and meeting minutes (beginning from February/2024)

Prior to February/2024 meeting notes were recorded on the AWG email list which is hosted by the Groups.io group faciliation service. In the interest of more transparency to the broader community (and encouraging participationg within the community) meeting notes are now being hosted in this repository with public read access. If you wish to review meeting minutes prior to February/2024, join the AWG and you will have access to its email archives.

Joining the AWG

The AWG meets virtually (i.e., Microsoft Team meeting) every Tuesday, 4:00 PM Eastern Time. If you are interested in joining the AWG you can send an email request to [email protected] or you can submit a request to the CVE Program Secretariat using this web form. (Choose "other" for Request Type and "other" for "Type of Comment" and request to join the AWG in the "Comment Section". )

CVE Schema Content changes

CVE schema and schema discussion has been migrated to its own repository under the CVE Project. https://github.com/CVEProject/cve-schema

automation-working-group's People

Contributors

ahouseholder avatar angrymute avatar anthnysingleton avatar ccoffin avatar chandanbn avatar cristina479 avatar csj0 avatar cve-team avatar dadinolfi avatar david-waltermire avatar jdaigneau5 avatar jwhitmore-mitre avatar kernelsmith avatar kurtseifried avatar marka63 avatar mattrbianchi avatar mprpic avatar rbrittonmitre avatar theall38103 avatar zmanion avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

automation-working-group's Issues

define version with use cases and proposed json

The information about product versions is critical for the users to identify if the products are affected. In order to include such information into the reporting format, the use cases need to be investigated.

Two lists are provided for these use cases: one for common use cases (which will be included into the reporting format), and the other for uncommon use cases (which won't be included). The list for the latter can be added with new cases, or modified by moving the case into the list of the former if AWG agree upon.

The list of the common use case:

  • individual versions
    For example, CVE-2017-3240, RDBMS Security component of Oracle Database Server, the supported version that is affected is 12.1.0.2
    “version”: {
    “individuals”: [
    “strings of versions”, “strings separated by commas”
    ]
    }

-prior to including all the releases
For example, CVE-2016-4694, Apache HTTP Server in Apple OS X before 10.12

“version”: {
“priortoall”: “string of priortoall version”
}

-prior to including the specified release
For example, CVE-2016-6307, OpenSSL 1.1.0 before 1.1.0a

“version”: {
“priortoone”: [
{
“branch”: “string of branch”
“release”: “string of release”
}
]
}

-internals
For example, CVE-2016-8740, Apache HTTP Server 2.4.17 through 2.4.23
“version”: {
“interval”: [
{
“startrelease”: “string of release”
“endrelease”: “string of release”
}
]
}

-earlier for all the releases
For example, MySQL Enterprise Monitor component of Oracle MySQL (subcomponent: Monitoring: Agent). Supported versions that are affected are 3.1.3.7856 and earlier
“version”: {
“earliertoall”: “string of release”
}

-ealier for the specified release
For example, CVE-2016-6307, MySQL Server component of Oracle MySQL (subcomponent: Server: Security: Encryption). Supported versions that are affected are 5.6.34 and earlier and 5.7.16 and earlier

“version”: {
“earliertoone”: [
{
“branch”: “string of branch”
“release”: “string of release”
}
]
}

The list of the uncommon use case:
??

Json schema for version inside product:

"version":{
“type”:”object”,
“properties”:{
“individuals”: {
“type”:”array”
“items”: { “type”:”string”}
}

	“priortoall”: {
		“type”:”string”
	}
	
	“priortoone”: {
		“type”: “arrary”
		Items: {
			“type”:”object”,
			“properties”:{
				“branch”: {“type”:”string”},
				“release”:{“type”:”string”}
			}
			“required”:[“branch”, “release”],
		}
	}

	“interval”: {
		“type”: “arrary”
		Items: {
			“type”:”object”,
			“properties”:{
				“startrelease”: {“type”:”string”},
				“endrelease”:{“type”:”string”}
			}
			“required”:[“startrelease”, “endrelease”],
		}
	}

	“earliertoall”: {
		“type”:”string”
	}
		
	“earliertoone”: {
		“type”: “arrary”
		Items: {
			“type”:”object”,
			“properties”:{
				“branch”: {“type”:”string”},
				“release”:{“type”:”string”}
			}
			“required”:[“branch”, “release”],
		}
	}
	
}	

}

DOCS - Provide guidance on how to format the affected/notaffected JSON data

So when filling out the affected/notaffected data in the JSON file the two main options are:

  1. explicitly list every version (1.0, 1.0.1, 1.0.2, 1.0.3, etc.)
  2. list ranges, e.g. ">= 1.0.3" or "1.0 through 1.0.3"

There are advantages and disadvantages to both.
Explicitly listing:

con: Explicitly listing all the versions affected assumes that no new versions that are vulnerable will be released, and if they are the JKSON data needs to be updated ASAP.

pro: Explicitly listing all the affected versions makes matching the versions up easy, and consumers of CVE data don't need to guess and expand the data, e.g. "5.6 through 8.2", so for pattern matching a version we want (please note regex, so .* is any char, not just . then star) "^5.6.", "^6.", "^7.", "^8.2." which is annoying with numbers, but some projects use named releases now.

Range listing:
con: people have to parse the range to some degree, we use "through" "prior to", "before", "after", "=/>/</!/?/??????" which is tough for humans to parse let alone software (we'd want to document this).
pro: it covers software released in the meantime that may still be vulnerable, (e.g. if you list "1.2 through 2.4" and you release 1.6.... is it vuln or not?)

One major way to help solve this is to start listing the fixed versions explicitly as notaffected.

Can anyone else list any other major pros/cons to this, once done we can add it to the docs.

Discussion: allowed "markup" in description

So the description is usually just ASCII text, moving forwards it will also be unicode. Do we also want to allow formatting characters? E.g. I'm already using newlines (especially when I CVE MERGE a bunch of entries, rather than rewrite all the descriptions I just mash them together for now with line breaks). I can foresee problems with things like tabs and other special characters. Do we simply want to assume it's a free for all, anything goes, or do we want some guidelines / rules around this? I'm inclined to allow anything so people are encouraged to build robust parsers (especially as we get more CNAs and weirder data, like that RGEX that triggered a CVE handling system).

The CVE_ prefix for reserved or documented keywords is unnecessary

The CVE_ prefix for reserved or documented keywords seems unnecessary, (except for CVE_ID). Similar to how HTTP headers and extensions are named, anything on the documented list should be used as documented. An HTTP_ prefix is not required to make it official.

https://en.wikipedia.org/wiki/List_of_HTTP_header_fields

It is preferable to keep the names of most commonly used keywords simple, just like how we do not have HTTP_GET, HTTP_Accept-Encoding or SMTP_To, SMTP_From instead have a prefix for custom or vendor specific keywords for eg., X-OpenSSL-Severity-Rating or simply OpenSSL-Severity-Rating.

Document How to Join AWG

Currently there is no documentation to tell CNAs -- or other interested parties -- how to join the CVE Automation Working Group. There should be.

Nested structure versus flat structure

Interesting work.

Looking at the DRAFT document, I was wondering why there is such a nesting in the JSON format. It's usually cleaner and better to have a non-nested document and make references if needed among the objects. Is there a specific advantages for such nested structure?

Add data_type field to data

Currently we only have CVE JSON blobs in here, and that is likely a safe assumption but @dwf we're using JSON for a number of other things (CNAs, CVE Mentors, etc.). Having a data_type field would be handy to declare what this JSON blob is:

"data_type": "CVE"
"data_type": "CNA"
"data_type": "CVE_MENTOR"

and so on, it would make detection of what the JSON is a lot easier.

version_value "-"

Hello,
I have a question about version_value "-" , how should it be interpreted ?
for example in a CVE in the json official Database from NIST I found this section :
"vendor_name" : "name",
"product" : {
"product_data" : [ {
"product_name" : "name",
"version" : {
"version_data" : [ {
"version_value" : "-",
"version_affected" : "="
}, {
"version_value" : "0.1",
"version_affected" : "="
}, {
"version_value" : "0.2",
"version_affected" : "="
}, {
"version_value" : "0.3",
"version_affected" : "="
}, {
"version_value" : "0.4",
"version_affected" : "="
}, {
"version_value" : "0.5",
"version_affected" : "="
}
does the '-' mean that also versions < 0.1 are affected ? if it does, why doesn't the CVE use the <= in "version_affected " field instead ?
Or does the "-" mean "if you don't have a version number in your installed packages than your installed package is affected" ?
or just simply "we don't know" ?

thank you

require STATE?

Should STATE be required? The current v4 min schema does not enforce it, and I there is some ongoing discussion about states, but we could at least enforce the current states.

When submitting records, we're changing state from RESERVED to PUBLIC, is that appropriate?

Clarify how non-ASCII email addresses should be handled

The three state-specific schemas currently test that email addresses match the pattern "^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$". This won't work for IDNs (Internationalized Domain Names) or Emoji domains. Should the draft spec be updated to require addresses be transcoded to ASCII using Punycode, say?

Clarification on SERIAL

The 3.1 and 4.0 schemas both have an integer named "serial" or "SERIAL" that is intended to be incremented each time an artifact is updated. full_example.json shows it quoted :

"serial":"INT",

and some DWF artifact files follow that format; eg,

https://github.com/distributedweaknessfiling/DWF-Database-Artifacts/blob/master/DWF/2016/1000307/CVE-2016-1000307.json

The cmdlinejsonvalidator.py tool does not, though, allow for that and instead, if run against the CVE-2016-1000307 artifact, complains :

u'1' is not of type u'integer'

because the value is a quoted integer instead of an integer itself.

I recommend either updating the artifact files to remove the quoting around the integer for the variable or changing the tool.

Discussion for changelog container

It has been suggested we have a changelog in the CVE format, I would suggest we make it a container so it can be part of the root (e.g. global) or a specific section (e.g. Affected-Red Hat-RHEL). It would need some things like a timestamp, changelog content, importance metric maybe? type of changelog? (informational, correction, etc.). Please discuss in this issue.

Error validating 'nvdcve-1.0-recent.json' using CVE_JSON_4.0_min.schema (Python + jsonschema)

The validation of 'https://static.nvd.nist.gov/feeds/json/cve/1.0/nvdcve-1.0-recent.json' is failing when using the 'CVE_JSON_4.0_min.schema' schema, in python 3.5 with jsonschema lib.

Code snippet:

import json
import jsonschema

feed_file = open("nvdcve-1.0-recent.json").read()
schema_file = open("CVE_JSON_4.0_min.schema").read()

schema = json.loads(schema_file)
feed = json.loads(feed_file)

try:
	jsonschema.validate(feed, schema)
except Exception as ex:
	print(ex)

Thrown exception:

'data_type' is a required property

Failed validating 'required' in schema:
[schema file content]
[feed file content]

SAWG Charter: Goal: Reduce barriers to entry

Suggestion:

Goal: Automation should help reduce the barriers for participation in the CVE Program.

Rationale: Automation should help reduce barriers for new CNAs and existing CNAs who want to make use of automation for participating in the CVE program. Barriers may include costs, fees, time, effort, and technical expertise required to participate in the CVE program.

I see "reduce the amount of human intervention" phrase, perhaps that could be expanded to include other barriers, or generalized as 'barriers to entry' whatever they may be.

Embargo meta data container

It would be nice to have embargo data, such as:

  1. the expected embargo date/time
  2. who is in charge of the embargo (e.g. email address/org)
  3. TLP for the embargo (e.g. Red/orange/etc). https://www.us-cert.gov/tlp

this should probably be done as a container so the global cve can be embargoed, or specific things (e.g. exploit code).

"strings separated by commas" needs clarification

This kind of thing appears several times in the spec+examples. I'm not clear what it means:

"cwe": ["string of cwes","strings separated by commas"]

Is the idea to support either one of:
"cwe": ["CWE-123", "CWE-456"]
or:
"cwe": "CWE-123,CWE-456"
? Or something else?

This might be related to (final dot point first comment) issues/12 and perhaps indirectly to issues/8

Need a way to represent ’unknown’ as a vulnerability status

In general applicability of a vulnerability to a given version can be any of 'unknown', 'affected' or
'unaffected'. If this is not stated for some versions there is a scope for misinterpretation.

Vendors typically have blanket statements about unsupported versions in EoL policies - that the vendor will not investigate or fix vulnerabilities in such versions. However when a CVE description is read without reading a vendor's EoL policy, people and machines often wrongly conclude that the unsupported versions are unaffected. This is a major hurdle for automated consumption of CVE entries and NIST NVD data.

To make a complete, self-standing statement without having to depend on a vendor's EoL policies, the statements about versions should be able to cover all previous and future versions.

The complete list of states I can think of are:

  1. unknown (== not investigated)
  2. affected
  3. affected but not exploitable (distinguished from 2 by setting CVSS score to zero)
  4. unaffected (== fixed)
  5. unaffected (== fixed) but needs intervention (where merely moving to a fixed version is not sufficient to resolve the vulnerability, consumers would have to change things over above what is done by the vendor such as using a new library call, a new parameter or a new configuration).

For eg.,

This is an incomplete or half-true statement:

"Affected: 1.0.1a to 1.0.1c"

This is complete:

"Unknown: prior to 1.0.1a
Affected: 1.0.1a to 1.0.1c
Unaffected: 1.0.1d and subsequent."

If the CVE was requested by a researcher who only looked into some versions but did not research
others, and the information about a vendor fix is unknown, creating a CVE entry from a statement like "Affected: 1.0.1a to 1.0.1c" should have this information:

"Unknown: prior to 1.0.1a
Affected: 1.0.1a to 1.0.1c
Unknown: 1.0.1d and subsequent."

multiple notes

we should probably support multiple notes entries, in fact we should ideally support multiple entries of pretty much everything with the exception of "canonical" data such as CVE, description and so on.

Better tags for reference URLs

Under "Minimal example needed for CVE [single entry]", the URLs have a "MISC" tag before http, but all other examples lack such a tag.

CVE entries on cve.mitre.org currently have tags such as CONFIRM or MISC along with URLs.
Looks like we will be missing them in JSON unless we have some way to encode them.

Specifying an optional 'type' field with URL would be useful with possible values like 'advisory', 'test', 'code-change', 'exploit', etc., for automation and readability.
Though we can overload 'description' field to describe what the URL is, having a common set of terms for describing different types of URLs would be good. Consider 'advisory' instead of 'bulletin', or 'code-change' vs 'commit'.

SAWG Charter: Clarify open source, standards

Re:

Use open source solutions where possible
Promote standards and best practices for automated information exchange

Open source does not necessarily mean 'free'. Open source solutions may have restrictions on use, and may prevent some or all of the CVE participants from using them.

Similarly not all standards are free and fair.
Standards may require fees to access them, such as those from ISO.
Some standards may require fees for creation and use, such as SWID tags.

Use of encumbered solutions or standards would be detrimental to other stated charter goals.

Suggestion:


Use free and open source solutions where possible. Avoid solutions that require propriety, closed systems, or not compatible with CVE terms of use.

Promote free and open standards and best practices for automated information exchange. Avoid standards that are not free and not open.


We should design solutions keeping in mind that github.com and git itself may go away at some point in the future. We should plan ahead and be ready to migrate the CVE corpus and automation infrastructure to somewhere else.

cmdlinejsonvalidator.py improvement

Currently, cmdlinejsonvalidator.py only validates a file against the JSON schema. It doesn't do any checks of the data itself, it would be helpful to validate the data/format in json file as well.

George provided a schema for the 4.0 minimal spec, which could provide some basic checks. The format of the output could be improved.

Example of the validator output:

$ cmdlinejsonvalidator.py CVE-2016-test.txt CVE_JSON_4.0_min_v3.schema
Record passed validation
$ cmdlinejsonvalidator.py CVE-2016-test.txt CVE_JSON_4.0_min_v3.schema
Record did not pass:
u'CVE-206-7541' does not match u'^CVE-[0-9]{4}-[0-9]{4,}$'
$ cmdlinejsonvalidator.py CVE-2016-test.txt CVE_JSON_4.0_min_v3.schema
Traceback (most recent call last):
File "C:\Users\jim\Documents\psirt\cmdlinejsonvalidator.py", line 82, in
jsonvalidation(sys.argv[1],sys.argv[2])
File "C:\Users\jim\Documents\psirt\cmdlinejsonvalidator.py", line 67, in jsonvalidation
json_load = json.loads(json_string)
File "C:\Python27\lib\json_init_.py", line 338, in loads
return _default_decoder.decode(s)
File "C:\Python27\lib\json\decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "C:\Python27\lib\json\decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 28 column 7 (char 598)

CVE - automation

Hello,
I am trying to get and filter data from CVE (http://cve.mitre.org/data/downloads/index.html). But I have no experience in this topic.

I have to possibilities. On way is to write a makro for a google-doc-spreadsheet (is there any example) and the other way seems to be to get the data and convert it from json to cvs. How can I do this with python?

Decide on upper case or lower case

MITRE uses lower case JSON tags I use upper case (e.g. data_version vs DATA_VERSION). I personally find the ALL UPPERCASE more readable/easier to differentiate, but obviously this is just an opinion. I do think we should decide on one or the other though.

Moving Bruce Lowenthal comments here for group tracking.

• General:
o There should be a specification of the language being used for the entire document and the text of that language field should be specified from a standard. Yes, the language is specified for a few properties but, for example, product names have different text in different languages. Also, language specification is needed for many properties. Finally, while ISO 639-2 is specified in the example, this does not allow specifications of US vs UK English or Brazil vs Portugal Portuguese.
o The generator of the report should be specified. It should be either a human or organization (or both).
o There are lots of name references. What is the format? There should be a general format when humans or organizations are being referenced that should have aliases (e.g. for different languages - many people have a Chinese name and an English name) and should be a standard eMail address (e.g. John Smith [email protected]).
o More work is needed to specify which fields and combinations are optional and which are mandatory and defaults where not specified (e.g. English)
o It would be nice if arrays of CVE specifications were allowed in one document.
o CVSS V2 should not be part of this for any new report.
o Sometimes the "product" is a standard protocol. (e.g. MD5 is weak or SSL V3 is broken). It is not clear how a protocol would be specified as a "product"
• "data_version" should be specified. What is the allowed text? Is it specified by Mitre? Is this the version of the schema? If not, the version of the schema should be included in the schema. (calling it schema version would help, if that is your intent)
• "updated" general comment on date.
• "updated by" should be included
• There should be a section discussing the different modifications of different versions of a CVE report
• "serial": Why is this an integer?
• "date_requested": See general discussion above regarding dates
• "date_public": Is there a date when discussion between affected organizations begins but before there is public announcement? Also, it is important to know the timezone for "date_public" (See general date discussion). Finally, it should be specified if this can be in the future, or not or (better) there should be a specific "embargo_date".
• "requester": What is a requester id? A general name reference would be better (see general comment on name/organization reference)
• "state": What is this? From the example it is the "state of the CVE", whatever that means. Enumerated text is required.
• "vendor": There should be an Internet accessible list of allowed vendors and their aliases. Otherwise an organization processing a CVE report will not know if the report applies to their deployed products or not. Again, standard form eMail address is recommended.
• "product" There should be an Internet accessible list of allowed products specific to vendors and their aliases. This should include version information and how it is distributed between "product_name" and "version". Note that some products "change names" with different versions which is one reason alias information is necessary. Same is true of "swid", a list of allowed values needs to be Internet accessible.
• "version" is unclear in places where "version" is allowed. Are "wild-cards" allowed or is "before version x" allowed? Also, for the most part, these should be lists. Also, "versions" should have restrictions (e.g. All versions supported by the vendor on a specific date)
• "cpe": Same problems as "vendor" and "product". cpe defines a format but text of fields is not specified (e.g. IBM versus International Business Machines).
• "problem_types"
o The language should be specified (general comment above)
o OWASP should have a version
o CWE probably should have a version
• "attack"
o All the CVSS V3 factors, with their definitions, should be here. This section should be used to enhance those definitions, not replicate (or reduce) them. For example, authentication might include the role needed not just "authenticated or not". (e.g. Authenticated Page Admin role is required).
o "privileges_required" should be part of this. Of course "privileges_required" might be part of "conditions" but it is unclear what "conditions" means (which is a general comment on the schema).
• "files": it is not clear what this is, especially when considering "import_time" and "local_name".
• "exploitation": It is not clear what this is and why it is not an array.
• "time_line":
o It is not clear what this is.
o The format of "time_stamp" is not provide (see general date comment, above).
o Note that there are multiple "publish dates". There is publishing for the coordinating organizations as well as publishing for the "public"
• "source"
o "discovered_by" format needs to be specified (hopefully a standard form of an email address). This should be an array, not a comma separated list (with comma separated list issues).
o "discovered_with" is unclear. Same comments as "discovered_by".
o Something is needed for brokers. (e.g. Mary Smith via Zero Day Initiative)
o The person discovering the issue should be separate from that person's organization should be separate from the "credit name" when the person is mentioned in published text. (I assume that "credits" is where this belongs but we have the same issues - Mary Smith via Zero Day Initiative) and it would be nice to positively link the "credits" names with the "source" names.
• "credit": Similar comments as source. Credit needs both an id (hopefully a standard eMail address) and the text to be used when publishing credit information as noted above. Also, the same comments about brokers and discoverer's organization as "source"
• "conditions": It is not clear what this means? Does this include "timing" information?
• Missing items:
o Contact information is needed if people need more information or for coordination. Contact information might be "the discoverer", "a broker", the "discoverer's organization" or something else such as an email list for discussing fixes for protocol vulnerabilities.

replaced_by needs merged_to and split_from siblings

perhaps they all belong together:

"see_also" {
"replaced_by" : "CVE-ID", // only if status=REPLACED_BY
"merged_to": "CVE-ID", // only if status=MERGED_TO
"split_from": "CVE-ID", // only if status=SPLIT_FROM
"see_also": ["CVE-ID", "CVE-ID"] // random informative references
}

Opposites may be useful as optional entries: replaced_by/replaces; merged_to/merged_from; split_from/split_into.

Some of these need multiple values - up to taste whether they should:

  • require a list where multiple values make sense
  • accept a string for a single value, or a list for multiple values
    eg: "see_also":"CVE-1234" or "see_also":["CVE-1234", "CVE-4321"]
  • require a string, but accept multiple CVEs as a comma-separated list:
    eg: "see_also":"CVE-1234,CVE-4321"

Create schema for RESERVED state and REJECTED state CVE

I suspect it will be easier to create separate schemas for RESERVED and REJECT state CVEs rather than try to create a all encompassing schema with logic for all states. @theall38103 what say you? we'll need to update a bunch of docs and figure out how to break this up. We also need to do this before we can have the CI checking the schema/etc =).

UUencode vs base64?

This choice looks like it could be considered: UU includes permissions (useless here?) and filename (perhaps redundant as it looks like that is included in the JSON container object?). Base64 is marginally simpler and more common today.

Also, the UU charset includes " and \ which need to be escaped in JSON strings

SAWG Charter: Goal: Automation should help improve CVE coverage.

Suggestion:

Goal: Automation should help improve CVE coverage.

Rationale: Automation should make it easier to correctly assign CVE IDs to any and all vulnerabilities that deserve CVE IDs. Did not see anything in the charter that tries to address one of the big challenges facing the CVE program.

Size Limits for JSON Data Elements

In practice, having size limits on JSON data elements would be useful for those working with the JSON files and help avoid problems. For example, based on entries in the CVE List currently, I find the following maximums :

Longest description : 3300 bytes (CVE-2015-2165)

Largest number of URLs in a CVE entry: 306 (CVE-2009-3555)

Longest single URL: 335 bytes (https://h20566.www2.hp.com/portal/site/hpsc/template.PAGE/public/kb/docDisplay/?spf_p.tpst=kbDocDisplay&spf_p.prp_kbDocDisplay=wsrp-navigationalState%3DdocId%253Demr_na-c04260637-4%257CdocLocale%253Den_US%257CcalledBy%253DSearch_Result&javax.portlet.begCacheTok=com.vignette.cachetoken&javax.portlet.endCacheTok=com.vignette.cachetoken, in CVE-2014-0160)

Perl script is splitting internal url field on commas vs whitespace etc

Admittedly, my perl is terrible, but I believe cna-assignment-info-to-json.pl is splitting the url field using commas (so commas within a CSV file's field), which is inherently adventurous https://github.com/CVEProject/automation-working-group/blob/master/tools/cna-assignment-info-to-json.pl#L104, but more importantly, commas are valid URL characters https://tools.ietf.org/html/rfc3986#section-2, so the perl script could produce json files where one URL is incorrectly split into 2 ore more URLs. Suggest whitespace as the internal field separator since it's one of the few illegal characters.

Suggested fix:

push(@urls, split(/\s+/, $fields[4]));

I can confirm that this change fixes the problem, but I realize many other people may not be expecting this change so it may require some coordination. I can submit a pull request if desired.

"Ensure backwards compatibility"

Do we want to commit to this with a blanket statement or perhaps refine it a bit to include a time frame or like "current and last version"?

Schema Updates to Support Values of STATE Other than PUBLIC

Currently, the schema requires that an entry has information about the affected vendor / product / version, the problem type, and at least one reference. That information isn't available for all CVE ids. For example, when a block reservation happens, MITRE creates ids in the RESERVED state with a templated description. These ids do not, though, have references or information about the affected product(s) and problem type and, as a result, JSON files created for them will fail schema validation currently.

Make statement of change control/versioning

We should post an official statement about versioning, e.g. that within a major version (3.x) we will maintain compatibility (e.g. only adding fields), but if we modify fields or drop them we'll need to bump the revision. Essentially if the JSON schema needs a change then we should increment the major version number (e.g. 3 to 4).

Discussion for confidence container

It would be nice to be able to add a confidence data container to either the global CVE or specific sub sections (e.g. to indicate the confidence within the CVSS scoring, affected, workaround, etc.). Please discuss here.

Some simplifications

It looks like there are a few containers (ending with *_data) add to unnecessary complexity.
Version 3.1 did not have this complexity, but may have crept in due to how the schema was copied forward to 4.0. They may be required as references within the schema definition, but are not really needed in the JSON itself. Any of the nesting aimed at encoding complex product/version situations or content from multiple editors/CNAs can still be done without them.

The complexity is obvious when you see the mindmap https://atlas.mindmup.com/2017/04/4995ec901bcc11e7a8271fd548147e6d/cve/index.html

This is a GUI Form auto generated from the current schema:
http://bit.ly/2nMpgcS

In comparison see this GUI from a simplification: (find my suggested schema down in that page)
http://bit.ly/2oaQvBr

This is an example that uses current schema:

affects: {
    vendor: {
        vendor_data: [
            {
                vendor_name: IETF,
                product: {
                    product_data: [
                        {
                            product_name: IP,
                            version: {
                                version_data: [
                                    {version_value: v4, impact: none},
                                    {version_value: v6, impact: hign}
                                ]
                            }
                        },
                        {
                            product_name: DHCP,
                            impact: none,
                            version: {
                                version_data: [
                                    {version_value: v4, impact: none},
                                    {version_value: v6, impact: none}
                                ]
                            }
                        }
                    ]
                }
            }
        ]
    }
}

This is a simplified version with no loss of information or accuracy:


affects: [
    {
        vendor: IETF,
        products: [
            {
                product: IP
                versions: [
                    { version: v4, impact: none }
                    { version: v6, impact: high }
                ]
            },
            {
                product: DHCP
                impact: none
                versions: [
                    { version: v4, impact: none }
                    { version: v6, impact: none }
                ]
            }
        ]
    }
]

unicode is not an encoding - specify utf-8?

CVE JSON files are unicode, encoded as UTF-8. At least the following fields may be unicode: titles, descriptions, researcher names, version numbers (people use alphabetical versioning, so we should expect this but in other character sets/languages). Data should no longer be assumed to be simple ascii all the time.

specify date format as iso8601

it's obvious to most, but best to be explicit!

Not sure if a comment about TZ belongs here ... if I create a CVE in the morning Australian time but it's still yesterday in the US? yesterday in Greenwich? That's a mild/benign source of confusion in many places today - perhaps specifying that it's UTC-based "unless otherwise specified" is a good default.

add data_format field

Much like data_version I think we should have a data_format field, e.g.:

"data_format": "MITRE"

or

"data_format": "DWF"

I will add this to mine so you don't have to guess which one I used =).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.