ldbcollector
This is a rewrite of the old ldbcollector, which is found in ./old-ldbcollector or in the branch v1.
This rewrite is not yet stable and for stable use the old version is prefered.
This is a rewrite, it contains
A small application which needs a better name and collects oss-license metadata and combines it
License: Other
Would be nice to have the contained data as eclipse:approved
/ eclipse:restricted
in the generated license-classifications.yml
for ORT.
The license-classifications.yml file, as advertised in ORT's repo has a few duplicates that make ORT fail on the evalutor phase.
Full exception is reproduced below:
12:51:48.726 [main] INFO org.ossreviewtoolkit.cli.commands.EvaluatorCommand - Read ORT result from 'advisor-result.json' (3.35 MiB) in 1.405892857s.
Exception in thread "main" com.fasterxml.jackson.databind.exc.ValueInstantiationException: Cannot construct instance of `org.ossreviewtoolkit.model.licenses.LicenseClassifications`, problem: Found multiple license categorizations with the same id: [LGPL-3.0-only, AGPL-3.0-only, ICU, GPL-3.0-only, PSF-2.0]
at [Source: (File); line: 3264, column: 1]
at com.fasterxml.jackson.databind.exc.ValueInstantiationException.from(ValueInstantiationException.java:47)
at com.fasterxml.jackson.databind.DeserializationContext.instantiationException(DeserializationContext.java:2047)
at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.wrapAsJsonMappingException(StdValueInstantiator.java:587)
at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.rewrapCtorProblem(StdValueInstantiator.java:610)
at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith(StdValueInstantiator.java:293)
at com.fasterxml.jackson.module.kotlin.KotlinValueInstantiator.createFromObjectWith(KotlinValueInstantiator.kt:125)
at com.fasterxml.jackson.databind.deser.impl.PropertyBasedCreator.build(PropertyBasedCreator.java:202)
at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:444)
at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1405)
at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:352)
at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:185)
at com.fasterxml.jackson.databind.deser.DefaultDeserializationContext.readRootValue(DefaultDeserializationContext.java:323)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4674)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3514)
at org.ossreviewtoolkit.cli.commands.EvaluatorCommand.run(EvaluatorCommand.kt:804)
at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:198)
at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:211)
at com.github.ajalt.clikt.parsers.Parser.parse(Parser.kt:18)
at com.github.ajalt.clikt.core.CliktCommand.parse(CliktCommand.kt:400)
at com.github.ajalt.clikt.core.CliktCommand.parse$default(CliktCommand.kt:397)
at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:415)
at com.github.ajalt.clikt.core.CliktCommand.main(CliktCommand.kt:440)
at org.ossreviewtoolkit.cli.OrtMainKt.main(OrtMain.kt:83)
Caused by: java.lang.IllegalArgumentException: Found multiple license categorizations with the same id: [LGPL-3.0-only, AGPL-3.0-only, ICU, GPL-3.0-only, PSF-2.0]
at org.ossreviewtoolkit.model.licenses.LicenseClassifications.<init>(LicenseClassifications.kt:86)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at com.fasterxml.jackson.databind.introspect.AnnotatedConstructor.call(AnnotatedConstructor.java:128)
at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith(StdValueInstantiator.java:291)
... 18 more
I've fixed (removed) these duplicates and am willing to submit a PR, however I understand that this file is generated.
Does it make any sense to include a fix to the generated file? At least people coming from ORT's documentation would not struggle with the same issue.
Currently, entries look like
LDBcollector/ort/license-classifications.yml
Lines 51 to 57 in 4f5f327
As LDBcollector aggregates origins of classifications, it's unclear where e.g. "maybe-rating:Stop" comes from. Could we extend the "colon-syntax" to add another <origin>:
prefix to the categorization?
For "common" categorizations like "permissive" that approach would be the additional advantage to see which origins all agree on the "permissive" categorization.
Is there a way I can get hold of a released/fixed version of the JSON file containing data from the various sources?
I am currently looking in to adding license "translations" (e.g. "BSD 3 Clause" to "BSD-3-Clause") to flict via a separate file. Basing this on "__impliedNames" seems to be a good idea.
Trying to generate files myself I get the following:
$ bash ./run.sh
Downloading lts-17.9 build plan ...
RedownloadInvalidResponse Request {
host = "raw.githubusercontent.com"
port = 443
secure = True
requestHeaders = []
path = "/fpco/lts-haskell/master//lts-17.9.yaml"
queryString = ""
method = "GET"
proxy = Nothing
rawBody = False
redirectCount = 10
responseTimeout = ResponseTimeoutDefault
requestVersion = HTTP/1.1
}
"/home/hesa/.stack/build-plan/lts-17.9.yaml" (Response {responseStatus = Status {statusCode = 404, statusMessage = "Not Found"}, responseVersion = HTTP/1.1, responseHeaders = [("Connection","keep-alive"),("Content-Length","14"),("Content-Security-Policy","default-src 'none'; style-src 'unsafe-inline'; sandbox"),("Strict-Transport-Security","max-age=31536000"),("X-Content-Type-Options","nosniff"),("X-Frame-Options","deny"),("X-XSS-Protection","1; mode=block"),("Content-Type","text/plain; charset=utf-8"),("X-GitHub-Request-Id","A53E:C95B:FA0E5:102A12:60913E8B"),("Accept-Ranges","bytes"),("Date","Tue, 04 May 2021 12:31:07 GMT"),("Via","1.1 varnish"),("X-Served-By","cache-cph20636-CPH"),("X-Cache","MISS"),("X-Cache-Hits","0"),("X-Timer","S1620131467.330890,VS0,VE178"),("Vary","Authorization,Accept-Encoding"),("Access-Control-Allow-Origin","*"),("X-Fastly-Request-ID","5e461a55f828e57f3f4af153b449f84384f5ecd1"),("Expires","Tue, 04 May 2021 12:36:07 GMT"),("Source-Age","0")], responseBody = (), responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose})
Do you have any ideas?
Host
Ubuntu 20.04
LDBCollector Version
$ git log | head -7
commit bd40c3e6134b921fee359b7e476d363ba49edbc2
Author: Maximilian Huber <[email protected]>
Date: Fri Apr 23 09:21:36 2021 +0200
fix Flict exporter
Signed-off-by: Maximilian Huber <[email protected]>
$ git rev-parse --short HEAD
bd40c3e61
Just a visual thing, but currently entries look like
LDBcollector/ort/license-classifications.yml
Lines 51 to 57 in 4f5f327
and it always confuses me that the id
comes after the categories
. Could we change the order to match the one shown in https://github.com/oss-review-toolkit/ort-config/blob/eaff9f3ceff12724069e6c9d6ca3394402c77153/license-classifications.yml#L36-L39?
"__impliedNames": [
"BSL-1.0",
"BSL-1.0",
"Boost Software License 1.0",
"Boost Software License 1.0",
"boost-1.0",
"Boost 1.0",
"bsl-1.0",
"bsl-1.0",
"Business Source License 1.0",
"Business Source License 1.0",
"boost-1.0",
"Boost 1.0",
"Boost Software License 1.0 (BSL-1.0)",
"Boost Software License 1.0 (BSL-1.0)",
"BSL (v1.0)",
"BSL (v1.0)",
"BSL (v1)"
"BSL (v1)"
],
],
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.