package-url / packageurl-java Goto Github PK
View Code? Open in Web Editor NEWJava/JVM implementation of the package url spec
License: MIT License
Java/JVM implementation of the package url spec
License: MIT License
This issue is created following package-url/purl-spec#293
Slash character in qualifiers appears to be escaped in the current implementation. For example the following code
//DEPS com.github.package-url:packageurl-java:1.5.0
import java.util.TreeMap;
import com.github.packageurl.PackageURL;
public class purl {
public static void main(String[] args) throws Exception {
final TreeMap<String, String> qualifiers = new TreeMap<>();
qualifiers.put("type", "jar");
qualifiers.put("repository_url", "https://maven.repository.redhat.com/ga/");
var purl = new PackageURL(PackageURL.StandardTypes.MAVEN,
"org.apache.james",
"apache-mime4j-storage",
"0.8.9.redhat-00001",
qualifiers, null);
System.out.println(purl);
}
}
results in
pkg:maven/org.apache.james/[email protected]?repository_url=https%3A%2F%2Fmaven.repository.redhat.com%2Fga%2F&type=jar
while following the spec it should be
pkg:maven/org.apache.james/[email protected]?repository_url=https://maven.repository.redhat.com/ga/&type=jar
Identified following this analysis jeremylong/DependencyCheck#6688 (comment)
Maven supports the newline character in the version as per its XSD. When instantiating a PackageURL with such a character in its version, calling PackageURL.canonicalize()
returns an invalid URL.
PackageURL url = new PackageURL("maven", "com.google.summit", "summit-ast", "2.2.0\n", null, null);
String canonicalize = url.canonicalize();
canonicalize
is equal to pkg:maven/com.google.summit/[email protected]%0A
canonicalize
is equal to pkg:maven/com.google.summit/[email protected]%A
While parsing pkg:maven//a@v?&type=e
one would expect that an exception is thrown giving the namespace (groupId) is required for maven or that a namespace is expected, but instead the purl (string value) is parsed and it's generated with encoded characters pkg:maven/%2Fa@v?type=e
There are inconsistencies with colon encoding in different languages.
For the following input:
type:docker
name:cassandra
version: sha256:244fd47e07d1004f0aed9c
output:
java implementation: pkg:docker/cassandra@sha256%3A244fd47e07d1004f0aed9c
go implementation: pkg:docker/cassandra@sha256:244fd47e07d1004f0aed9c
python implementation: pkg:docker/cassandra@sha256:244fd47e07d1004f0aed9c
As we can see, the colon :
will be encoded as %3A
in java implementaion, but not in other languages.
According to the specification of purl
the '#', '?', '@' and ':' characters must NOT be encoded when used as separators. They may need to be encoded elsewhere
the ':' scheme and type separator does not need to and must NOT be encoded. It is unambiguous unencoded everywhere
I think :
must NOT be encoded.
Maven Central requires all published artifacts to be signed using PGP. If a publisher provides their key ID to PGP keys map then end users can use the Verify PGP signatures plugin to validate that the artifact has not been altered or replaced as part of a supply-chain attack.
Hi,
I'm waiting for a patch release 1.1.2 that includes the equals/hashcode addition. Is that already planned?
With kind regards
Thomas von Siebenthal
When I have a package URL like the following (copy/paste form dependency track):
pkg:deb/debian/mailutils@1%3A3.10-3%20b1?arch=amd64&distro=debian-11&upstream=mailutils%401%3A3.10-3
This matches the mailutils
package with version 1:3.10-3+b1
.
However when I try to decode the URL with this library, this doesn't translate to this version, but rather to 1:3.10-3 b1
. I.e. it converts (by URL decoding) the %20
to a space, which is correct, but I would have expected a +
sign.
This can be seen from this unit test too:
// when
PackageURL purl = new PackageURL("pkg:deb/debian/mailutils@1%3A3.10-3%20b1?arch=amd64&distro=debian-11&upstream=mailutils%401%3A3.10-3");
// then
assertThat(purl.getName()).isEqualTo("mailutils");
assertThat(purl.getVersion()).isEqualTo("1:3.10-3+b1");
which results in a test failure:
expected: "1:3.10-3+b1"
but was: "1:3.10-3 b1"
If I manually replace the %20
with a +
sign I get the same results.
// when
PackageURL purl = new PackageURL("pkg:deb/debian/mailutils@1%3A3.10-3+b1?arch=amd64&distro=debian-11&upstream=mailutils%401%3A3.10-3");
// then
assertThat(purl.getName()).isEqualTo("mailutils");
assertThat(purl.getVersion()).isEqualTo("1:3.10-3+b1");
which fails with the same message:
expected: "1:3.10-3+b1"
but was: "1:3.10-3 b1"
I'm now wondering... should I manually replace these spaces with a +
sign in my application, or is there a bug in this library.
See https://github.com/package-url/purl-spec and package-url/purl-spec#31 for the latest
To my knowledge, the current PackageURL.java
implementation does not provide a simple way to retrieve the package's 'coordinates' (purl without subpath or qualifiers).
The package coordinates are useful for generic component information: pkg:deb/debian/[email protected]
= cURL version 7.50.3-1.
Whereas the full purl is useful for specific component information: pkg:deb/debian/[email protected]?arch=i386&distro=jessie&repository=...
= cURL version 7.50.3-1 installed on Debian Jessie, i386 architecture, installed from this specific repository...
A PackageURL#getCoordinates
method which returns pkg:type/namespace:name@version
(no qualifiers or subpath).
For example, in Dependency-Track (a project you may have know about ๐), components have separate purl
and purlCoordinates
fields.
Currently, the subpath parsing and construction does not follow the spec:
subpath
is not split on '/' and empty, '.', or '..' segments removed.For package URLs with qualifiers and / or subpaths, calling getCoordinates()
on a PackageURL
instance after canonicalize()
was called on the same instance, will return the entire PURL instead of just the coordinates.
This happens because both canonicalize()
and getCoordinates()
use the canonicalize(boolean coordinatesOnly)
method, which caches its result after the first invocation:
packageurl-java/src/main/java/com/github/packageurl/PackageURL.java
Lines 389 to 392 in bd3241a
packageurl-java/src/main/java/com/github/packageurl/PackageURL.java
Lines 424 to 425 in bd3241a
In fact, the reverse is true as well: If getCoordinates()
is called before canonicalize()
, the result of canonicalize()
will only contain coordinates instead of the complete PURL.
The behavior is easily reproducible by modifying the testGetCoordinates()
test case by adding a purl.canonicalize()
invocation:
@Test
public void testGetCoordinates() throws Exception {
PackageURL purl = new PackageURL("pkg:generic/acme/[email protected]?key1=value1&key2=value2");
purl.canonicalize();
Assert.assertEquals("pkg:generic/acme/[email protected]", purl.getCoordinates());
}
Expected :pkg:generic/acme/[email protected]
Actual :pkg:generic/acme/[email protected]?key1=value1&key2=value2
If caching of the canonicalized representation is required, it should be done for the "complete" and "coordinatesOnly" variants separately. The current situation can cause unpredictable behavior at runtime when objects with PackageURL
fields are passed around and accessed by multiple domains or layers.
For the below pURL
pkg:maven/org.springframework.boot/[email protected]?type=jar
when PackageURL.getName() is called "spring-boot-starter" is returned but the name here is the artifact name right?
Shouldn't it return "org.springframework.boot/spring-boot-starter" instead of just "spring-boot-starter"?
I decided to try building my latest project with the java module system so I can better understand how it works. I found myself needing to generate a package URL so I dropped in a reference to this project and a warning popped up.
Since no module name is declared in the jar manifest, it seems the module system defaults to using the jar file name as the module name. Apparently this can cause a major problem if the name changes (either because a proper name was picked, or because the name of the jar file changed) and 2 different modules try to reference it with different names. Maven even goes as far as to beg me not to publish my project until the issue is fixed.
Personally I wonder if it's really that big a deal when I could presumably just release a new version of my project referencing the new module name, but anyhow it's an easy fix. You basically just need to pick a module name and add it to the manifest, which can be done while still compiling and running on JDK 8. For instance here is the jar manifest for the apache commons codec project.
This post has a lot more info on the matter.
The validation routines decode values - which would be correct if only used during parsing. However, the validation routines are also used in the constructor. The result is values could be % decoded when they should not be. While this is not likely to cause an issue in production - this should likely be cleaned up.
I am more than willing to submit a PR for this but wanted to discuss options. Specifically, would the team prefer:
My preference would be to go with option 2.
Due to the missing implementation of equals, one is not able to compare PURLs, e.g.:
String purlString = "pkg:maven/test.groupId/[email protected]";
new PackageURL(purlString)
.equals(new PackageURL(purlString)) == false
Percent is encoded as space (%20) should be as + (%2B)
See #90
It would help if the library can be used to edit PURLs.
I.e., having a builder be created from an existing package URL, and have it being edited, and re-built into a (modified) package URL.
Once the PR is merged, lets see if this is ready to be published to Maven Central.
Currently, there are two projects using this:
In the near future, the OWASP Dependency-Check project will also likely adopt it.
I'd like to have the artifact in Central so we don't have to have a local copy of the code in our own repositories.
If you parse a PURL with no qualifiers, for example pkg:maven/org.apache.commons/[email protected]
from test-suite-data.json
, and then try to check what the qualifiers were, a NullPointerException is thrown.
PackageURLTest.java
currently skips verifying the qualifiers if none are expected to be present. It should check that it can retrieve the qualifiers and that there are none.
if (qualifiers != null) {
Assert.assertNotNull(purl.getQualifiers());
Assert.assertEquals(qualifiers.length(), purl.getQualifiers().size());
qualifiers.keySet().forEach((key) -> {
String value = qualifiers.getString(key);
Assert.assertTrue(purl.getQualifiers().containsKey(key));
Assert.assertEquals(value, purl.getQualifiers().get(key));
});
// New else case:
} else {
Assert.assertEquals(0, purl.getQualifiers().size());
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.