(Nothing scheduled)
(Nothing scheduled)
I post about Rust and DevOps - tjtelan.com
My Rust crates:
Parser of git repo urls for Rust
License: MIT License
(Nothing scheduled)
(Nothing scheduled)
I post about Rust and DevOps - tjtelan.com
My Rust crates:
I fuzzed this crate using honggfuzz and found 2 crashes, both of which can be fixed by letting the normalize_url function check for null bytes.
You can fuzz the code yourself and further investigate the crashes with this repo.
In the main function there are also two test functions which can be used to easily reproduce the crashes.
The crashing inputs are:
////////ws///////////*,\u{0}\u{0}^\u{0}\u{0}\u{0}\u{0}@2\u{1}\u{0}\u{1d})\u{0}\u{0}\u{0}:\u{0}\u{0}\u{0}
and:
?\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{1f}s\u{3}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{5}\u{1}@\u{0}\u{0}\u{4}!e\u{0}\u{0}2\u{1c}^3106://?<!41\u{0}\u{0}\u{0}?\u{0}\u{0}\u{0}\u{0}\u{4}?
https://about.gitlab.com/solutions/subgroups/
The main points I've discovered:
Due to #10 this identification should be opt-in at parsing time.
This is a change to make the words used about URL components more technically correct. The scheme infers what protocol is being used.
I wrote the tests for https and ssh Azure DevOps repos, but support for parsing these urls hasn't been implemented yet.
$ cargo test
Compiling libc v0.2.66
Compiling quick-error v1.2.3
Compiling termcolor v1.1.0
Compiling humantime v1.3.0
Compiling atty v0.2.14
Compiling env_logger v0.7.1
Compiling git-url-parse v0.0.1 (/Users/telant/src/git-url-parse-rs)
Finished test [unoptimized + debuginfo] target(s) in 3.52s
Running target/debug/deps/git_url_parse-aa814d64454e74d6
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Running target/debug/deps/git_url_parse-7ddbc2b68d180cfc
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
Running target/debug/deps/mod-5c5239488f5c6da5
running 18 tests
test normalize::git ... ok
test normalize::file_scheme ... ok
test normalize::multi_git_ssh ... ok
test normalize::https ... ok
test normalize::http ... ok
test normalize::ssh_scheme ... ok
test normalize::ssh_no_scheme_no_user ... ok
test parse::https_user_auth_bitbucket ... ok
test parse::https_user_auth_github ... ok
test parse::https_user_bitbucket ... ok
test parse::https_user_github ... ok
test parse::https_user_azure_devops ... FAILED
test parse::ssh_user_ports ... ok
test normalize::file_no_scheme ... ok
test normalize::ssh_no_scheme ... ok
test parse::ssh_user_bitbucket ... ok
test parse::ssh_user_github ... ok
test parse::ssh_user_azure_devops ... FAILED
failures:
---- parse::https_user_azure_devops stdout ----
thread 'parse::https_user_azure_devops' panicked at 'assertion failed: `(left == right)`
left: `GitUrl { href: "https://[email protected]/organization/project/_git/repo", host: Some("dev.azure.com"), name: "repo", owner: Some("_git"), organization: None, fullname: "_git/repo", protocol: Https, user: Some("organization"), token: None, port: None, path: "/organization/project/_git/repo", git_suffix: false }`,
right: `GitUrl { href: "https://[email protected]/organization/project/_git/repo", host: Some("dev.azure.com"), name: "repo", owner: Some("project"), organization: Some("organization"), fullname: "organization/project/repo", protocol: Https, user: Some("organization"), token: None, port: None, path: "/organization/project/_git/repo", git_suffix: false }`', tests/parse.rs:198:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
---- parse::ssh_user_azure_devops stdout ----
thread 'parse::ssh_user_azure_devops' panicked at 'assertion failed: `(left == right)`
left: `GitUrl { href: "[email protected]:v3/CompanyName/ProjectName/RepoName", host: Some("ssh.dev.azure.com"), name: "RepoName", owner: Some("ProjectName"), organization: None, fullname: "ProjectName/RepoName", protocol: Ssh, user: Some("git"), token: None, port: None, path: "v3/CompanyName/ProjectName/RepoName", git_suffix: false }`,
right: `GitUrl { href: "[email protected]:v3/CompanyName/ProjectName/RepoName", host: Some("ssh.dev.azure.com"), name: "RepoName", owner: Some("ProjectName"), organization: Some("CompanyName"), fullname: "CompanyName/ProjectName/RepoName", protocol: Ssh, user: Some("git"), token: None, port: None, path: "v3/CompanyName/ProjectName/RepoName", git_suffix: false }`', tests/parse.rs:176:5
failures:
parse::https_user_azure_devops
parse::ssh_user_azure_devops
test result: FAILED. 16 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out
error: test failed, to rerun pass '--test mod'
I have a use-case where I need to be able to compare git urls where the only difference between them is auth info, such as embedded username and passwords. Trimming this info in this library would make this goal easier.
Unfortunately, using the ~
specifier in Cargo.toml
makes it very difficult to use this package in repositories that share the same dependencies.
Tilde requirements specify a minimal version with some ability to update. If you specify a major, minor, and patch version or only a major and minor version, only patch-level changes are allowed. If you only specify a major version, then minor- and patch-level changes are allowed.
from Cargo reference: tilde requirements
This means that by using url = "~2.1.1"
, I cannot use any dependencies that list their dependency as url = "^2.2"
because I will get the following error:
$ cargo build
error: failed to select a version for `url`.
... required by package `git-url-parse v0.3.0`
... which is depended on by `mypackage (/path/to/mypackage)`
versions that meet the requirements `~2.1` are: 2.1.1, 2.1.0
all possible versions conflict with previously selected packages.
previously selected package `url v2.2.0`
... which is depended on by `reqwest v0.11.0`
... which is depended on by `mypackage (/path/to/mypackage)`
I'm fairly certain there isn't any reason why the more permissive ^
syntax wouldn't be the best choice for all of the dependencies listed here, unless there are specific breaking changes introduced in higher versions (which, there shouldn't be if the library authors follow semver!)
Thanks for taking the time to read through this, I'm happy to make a PR making this change ๐
PR #13 revealed that there's a breaking change when updating the version specifier on the strum dependency.
strum
+ strum_macros
to ^0.20
0.3.1
(minor version change)Right now, if we give a Windows-style path using backslashes \
, the parsing seems to succeed w/o error.
However, the normalization process converts the backslash separated paths into Unix-style paths with forward slashes /
. I'm not sure how much of this is influenced by me working in Linux/MacOS.
It makes sense that if you give a Windows-style path, you should return a Windows-style path in the parse output.
Example:
use git_url_parse::GitUrl;
fn main() {
GitUrl::parse("C:\\test_repo");
}
thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', src/lib.rs:237:39
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
It just doesn't work, but I am also testing in Linux and MacOS. I'm not sure if the Url crate handles this cross-platform.
This one will need a test that runs in CI. Probably in a Windows runner, if possible.
Our Display
implementation prints the raw un-parsed version of our url. This makes comparing urls less robust, as well as keeps unnecessary duplication in our struct for a singular purpose.
We need to reimplement the Display
trait to build the printable version through its components, so the resulting string is standardized.
A specific detail I'd like regarding SSH urls is to only print the url scheme if it is provided during the parsing phase.
Hey Tj Telan,
Is there a way for You to contact me regarding a security vulnerability I discovered, so I can report it to You privately, or if You can give me your contact.
Thank You
Namespacing pattern assumptions in URLs needs a revisit.
Organizations embedded in URLs are more of an exception, for example, Azure or GitLab (depending on how subgroups are implemented. It is up to the user though).
Unfortunately, there's too much room for mis-identification, and typical users need to think too hard to override the default behavior.
Some kind of builder pattern that lets you give hints to the parser for a Git host may suffice?
I'm not sure what this looks like. I don't personally use any git providers that embed organizations in the URL, so more research or feedback from users is needed.
I think one of the main outcomes I want is to have some git provider specializations that are opt-in, since there are certain providers that use URL patterns that are unique. Perhaps this will help typical users.
Thanks for this crate! I had a stack overflow when attempting to parse a bad URL (https://github.com:crypto-browserify/browserify-rsa.git
); here's a diff with both a test case and a potential fix:
diff --git a/src/lib.rs b/src/lib.rs
index a311e1f..cb0984d 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -364,8 +364,7 @@ pub fn normalize_url(url: &str) -> Result<Url> {
}
}
}
- Err(_e) => {
- // e will most likely be url::ParseError::RelativeUrlWithoutBase
+ Err(url::ParseError::RelativeUrlWithoutBase) => {
// If we're here, we're only looking for Scheme::Ssh or Scheme::File
// Assuming we have found Scheme::Ssh if we can find an "@" before ":"
@@ -387,5 +386,8 @@ pub fn normalize_url(url: &str) -> Result<Url> {
}
}
}
+ Err(err) => {
+ return Err(eyre!("url parsing failed: {:?}", err));
+ }
})
}
diff --git a/tests/parse.rs b/tests/parse.rs
index 7619ca6..e6e6558 100644
--- a/tests/parse.rs
+++ b/tests/parse.rs
@@ -366,3 +366,15 @@ fn ssh_without_organization() {
assert_eq!(parsed, expected);
}
+
+#[test]
+fn bad_port_number() {
+ let test_url = "https://github.com:crypto-browserify/browserify-rsa.git";
+ let e = GitUrl::parse(test_url);
+
+ assert!(e.is_err());
+ assert_eq!(
+ format!("{}", e.err().unwrap()),
+ "Url normalization into url::Url failed"
+ );
+}
With that diff, cargo test
runs without failures, but based on the uncertainty in the (deleted) comment around url::ParseError::RelativeUrlWithoutBase
, there may be a reason why it wasn't done this way originally...
Thanks for this lib. I got tired of trying to support all the variants myself.
With that said, it would be nice if this lib had less dependencies. For context, I try and keep our binaries lean, but this pulls in eyre and strum, which then pulls in more stuff. We don't use eyre but now it's a part of the build.
Would be nice to be able to not have this. I know how much work managing Rust libs is, so at minimum, maybe these can be behind a feature flag?
This added debug context will be useful in my other projects using this crate.
First, thanks for your work on this crate!
I've been using it to follow projects whose repos are posted on HackerNews. Recently, someone posted the following link to HackerNews, which leads to a panic: https://www.heliosleep.com//
thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', /.../.cargo/registry/src/github.com-1ecc6299db9ec823/git-url-parse-0.4.0/src/lib.rs:238:39
stack backtrace:
0: rust_begin_unwind
at /rustc/532d3cda90b8a729cd982548649d32803d265052/library/std/src/panicking.rs:584:5
1: core::panicking::panic_fmt
at /rustc/532d3cda90b8a729cd982548649d32803d265052/library/core/src/panicking.rs:142:14
2: core::panicking::panic_bounds_check
at /rustc/532d3cda90b8a729cd982548649d32803d265052/library/core/src/panicking.rs:85:5
3: <usize as core::slice::index::SliceIndex<[T]>>::index
at /rustc/532d3cda90b8a729cd982548649d32803d265052/library/core/src/slice/index.rs:189:10
4: core::slice::index::<impl core::ops::index::Index<I> for [T]>::index
at /rustc/532d3cda90b8a729cd982548649d32803d265052/library/core/src/slice/index.rs:15:9
5: <alloc::vec::Vec<T,A> as core::ops::index::Index<I>>::index
at /rustc/532d3cda90b8a729cd982548649d32803d265052/library/alloc/src/vec/mod.rs:2531:9
6: git_url_parse::GitUrl::parse
at /.../.cargo/registry/src/github.com-1ecc6299db9ec823/git-url-parse-0.4.0/src/lib.rs:238:39
The affected line is this one: https://github.com/tjtelan/git-url-parse-rs/blob/main/src/lib.rs#L238
After re-reading the docs, I wasn't able to determine whether this is expected behavior or a bug. My expectation was that GitUrl::parse()
is safe to call with arbitrary URLs and it would return Ok(...)
if it could detect a GitUrl
and Err(...)
otherwise. But perhaps the intended spec is instead "you are expected to call GitUrl::parse()
only on URLs you reasonably believe are actually pointing to GitHub and similar sites" and my code is misusing that call when it attempts it on arbitrary URLs sourced from HackerNews.
Thanks again for building this crate and for looking into this!
The parser needs to add ftp://
and ftps://
to the Scheme
enum.
Need to have a CI check to ensure the crate version gets increased when the source code or Cargo.toml gets updated
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.