Giter Club home page Giter Club logo

git-url-parse-rs's Introduction

git-url-parse-rs's People

Contributors

amircodota avatar dax avatar everlastingbugstopper avatar github-actions[bot] avatar kaplanelad avatar tjtelan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

git-url-parse-rs's Issues

normalize_url(url: &str) doesn't check for null bytes, which leads to crashes

I fuzzed this crate using honggfuzz and found 2 crashes, both of which can be fixed by letting the normalize_url function check for null bytes.

You can fuzz the code yourself and further investigate the crashes with this repo.

In the main function there are also two test functions which can be used to easily reproduce the crashes.

The crashing inputs are:
////////ws///////////*,\u{0}\u{0}^\u{0}\u{0}\u{0}\u{0}@2\u{1}\u{0}\u{1d})\u{0}\u{0}\u{0}:\u{0}\u{0}\u{0}
and:
?\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{1f}s\u{3}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{0}\u{5}\u{1}@\u{0}\u{0}\u{4}!e\u{0}\u{0}2\u{1c}^3106://?<!41\u{0}\u{0}\u{0}?\u{0}\u{0}\u{0}\u{0}\u{4}?

Azure DevOps repo support

I wrote the tests for https and ssh Azure DevOps repos, but support for parsing these urls hasn't been implemented yet.

$ cargo test
   Compiling libc v0.2.66
   Compiling quick-error v1.2.3
   Compiling termcolor v1.1.0
   Compiling humantime v1.3.0
   Compiling atty v0.2.14
   Compiling env_logger v0.7.1
   Compiling git-url-parse v0.0.1 (/Users/telant/src/git-url-parse-rs)
    Finished test [unoptimized + debuginfo] target(s) in 3.52s
     Running target/debug/deps/git_url_parse-aa814d64454e74d6

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/git_url_parse-7ddbc2b68d180cfc

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

     Running target/debug/deps/mod-5c5239488f5c6da5

running 18 tests
test normalize::git ... ok
test normalize::file_scheme ... ok
test normalize::multi_git_ssh ... ok
test normalize::https ... ok
test normalize::http ... ok
test normalize::ssh_scheme ... ok
test normalize::ssh_no_scheme_no_user ... ok
test parse::https_user_auth_bitbucket ... ok
test parse::https_user_auth_github ... ok
test parse::https_user_bitbucket ... ok
test parse::https_user_github ... ok
test parse::https_user_azure_devops ... FAILED
test parse::ssh_user_ports ... ok
test normalize::file_no_scheme ... ok
test normalize::ssh_no_scheme ... ok
test parse::ssh_user_bitbucket ... ok
test parse::ssh_user_github ... ok
test parse::ssh_user_azure_devops ... FAILED

failures:

---- parse::https_user_azure_devops stdout ----
thread 'parse::https_user_azure_devops' panicked at 'assertion failed: `(left == right)`
  left: `GitUrl { href: "https://[email protected]/organization/project/_git/repo", host: Some("dev.azure.com"), name: "repo", owner: Some("_git"), organization: None, fullname: "_git/repo", protocol: Https, user: Some("organization"), token: None, port: None, path: "/organization/project/_git/repo", git_suffix: false }`,
 right: `GitUrl { href: "https://[email protected]/organization/project/_git/repo", host: Some("dev.azure.com"), name: "repo", owner: Some("project"), organization: Some("organization"), fullname: "organization/project/repo", protocol: Https, user: Some("organization"), token: None, port: None, path: "/organization/project/_git/repo", git_suffix: false }`', tests/parse.rs:198:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

---- parse::ssh_user_azure_devops stdout ----
thread 'parse::ssh_user_azure_devops' panicked at 'assertion failed: `(left == right)`
  left: `GitUrl { href: "[email protected]:v3/CompanyName/ProjectName/RepoName", host: Some("ssh.dev.azure.com"), name: "RepoName", owner: Some("ProjectName"), organization: None, fullname: "ProjectName/RepoName", protocol: Ssh, user: Some("git"), token: None, port: None, path: "v3/CompanyName/ProjectName/RepoName", git_suffix: false }`,
 right: `GitUrl { href: "[email protected]:v3/CompanyName/ProjectName/RepoName", host: Some("ssh.dev.azure.com"), name: "RepoName", owner: Some("ProjectName"), organization: Some("CompanyName"), fullname: "CompanyName/ProjectName/RepoName", protocol: Ssh, user: Some("git"), token: None, port: None, path: "v3/CompanyName/ProjectName/RepoName", git_suffix: false }`', tests/parse.rs:176:5


failures:
    parse::https_user_azure_devops
    parse::ssh_user_azure_devops

test result: FAILED. 16 passed; 2 failed; 0 ignored; 0 measured; 0 filtered out

error: test failed, to rerun pass '--test mod'

feature request: be less restrictive with dependency versions

Unfortunately, using the ~ specifier in Cargo.toml makes it very difficult to use this package in repositories that share the same dependencies.

Tilde requirements specify a minimal version with some ability to update. If you specify a major, minor, and patch version or only a major and minor version, only patch-level changes are allowed. If you only specify a major version, then minor- and patch-level changes are allowed.
from Cargo reference: tilde requirements

This means that by using url = "~2.1.1", I cannot use any dependencies that list their dependency as url = "^2.2" because I will get the following error:

$ cargo build
error: failed to select a version for `url`.
    ... required by package `git-url-parse v0.3.0`
    ... which is depended on by `mypackage (/path/to/mypackage)`
versions that meet the requirements `~2.1` are: 2.1.1, 2.1.0

all possible versions conflict with previously selected packages.

  previously selected package `url v2.2.0`
    ... which is depended on by `reqwest v0.11.0`
    ... which is depended on by `mypackage (/path/to/mypackage)`

I'm fairly certain there isn't any reason why the more permissive ^ syntax wouldn't be the best choice for all of the dependencies listed here, unless there are specific breaking changes introduced in higher versions (which, there shouldn't be if the library authors follow semver!)

Thanks for taking the time to read through this, I'm happy to make a PR making this change ๐Ÿ˜„

Fix breaking changes from updating Strum dependency

PR #13 revealed that there's a breaking change when updating the version specifier on the strum dependency.

  • Update Cargo.toml : strum + strum_macros to ^0.20
  • Fix code break from Cargo.toml changes
  • Bump crate version to 0.3.1 (minor version change)

Consistent support for Windows-style filesystem paths (backslash separators)

Right now, if we give a Windows-style path using backslashes \, the parsing seems to succeed w/o error.

However, the normalization process converts the backslash separated paths into Unix-style paths with forward slashes /. I'm not sure how much of this is influenced by me working in Linux/MacOS.

It makes sense that if you give a Windows-style path, you should return a Windows-style path in the parse output.

Support for absolute file paths in Windows

Example:

use git_url_parse::GitUrl;

fn main() {
  GitUrl::parse("C:\\test_repo");
}
thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', src/lib.rs:237:39
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

It just doesn't work, but I am also testing in Linux and MacOS. I'm not sure if the Url crate handles this cross-platform.

This one will need a test that runs in CI. Probably in a Windows runner, if possible.

Remove duplication - GitUrl href field

Our Display implementation prints the raw un-parsed version of our url. This makes comparing urls less robust, as well as keeps unnecessary duplication in our struct for a singular purpose.

We need to reimplement the Display trait to build the printable version through its components, so the resulting string is standardized.


A specific detail I'd like regarding SSH urls is to only print the url scheme if it is provided during the parsing phase.

Security vulnerability in the latest version

Hey Tj Telan,

Is there a way for You to contact me regarding a security vulnerability I discovered, so I can report it to You privately, or if You can give me your contact.

Thank You

Organizations should be explicit instead of implicit

Namespacing pattern assumptions in URLs needs a revisit.

Organizations embedded in URLs are more of an exception, for example, Azure or GitLab (depending on how subgroups are implemented. It is up to the user though).

Unfortunately, there's too much room for mis-identification, and typical users need to think too hard to override the default behavior.

  • As a user, my git provider does not embed organization in the URL. I expect the parser to handle my URLs as-is.
  • As a user, I know my git provider uses organizations. I'd like to be able to give hints to the parser to find the organization.

Some kind of builder pattern that lets you give hints to the parser for a Git host may suffice?

I'm not sure what this looks like. I don't personally use any git providers that embed organizations in the URL, so more research or feedback from users is needed.


I think one of the main outcomes I want is to have some git provider specializations that are opt-in, since there are certain providers that use URL patterns that are unique. Perhaps this will help typical users.

stack overflow on URLs with an invalid port

Thanks for this crate! I had a stack overflow when attempting to parse a bad URL (https://github.com:crypto-browserify/browserify-rsa.git); here's a diff with both a test case and a potential fix:

diff --git a/src/lib.rs b/src/lib.rs
index a311e1f..cb0984d 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -364,8 +364,7 @@ pub fn normalize_url(url: &str) -> Result<Url> {
                 }
             }
         }
-        Err(_e) => {
-            // e will most likely be url::ParseError::RelativeUrlWithoutBase
+        Err(url::ParseError::RelativeUrlWithoutBase) => {
             // If we're here, we're only looking for Scheme::Ssh or Scheme::File
 
             // Assuming we have found Scheme::Ssh if we can find an "@" before ":"
@@ -387,5 +386,8 @@ pub fn normalize_url(url: &str) -> Result<Url> {
                 }
             }
         }
+        Err(err) => {
+            return Err(eyre!("url parsing failed: {:?}", err));
+        }
     })
 }
diff --git a/tests/parse.rs b/tests/parse.rs
index 7619ca6..e6e6558 100644
--- a/tests/parse.rs
+++ b/tests/parse.rs
@@ -366,3 +366,15 @@ fn ssh_without_organization() {
 
     assert_eq!(parsed, expected);
 }
+
+#[test]
+fn bad_port_number() {
+    let test_url = "https://github.com:crypto-browserify/browserify-rsa.git";
+    let e = GitUrl::parse(test_url);
+
+    assert!(e.is_err());
+    assert_eq!(
+        format!("{}", e.err().unwrap()),
+        "Url normalization into url::Url failed"
+    );
+}

With that diff, cargo test runs without failures, but based on the uncertainty in the (deleted) comment around url::ParseError::RelativeUrlWithoutBase, there may be a reason why it wasn't done this way originally...

Less dependencies?

Thanks for this lib. I got tired of trying to support all the variants myself.

With that said, it would be nice if this lib had less dependencies. For context, I try and keep our binaries lean, but this pulls in eyre and strum, which then pulls in more stuff. We don't use eyre but now it's a part of the build.

Would be nice to be able to not have this. I know how much work managing Rust libs is, so at minimum, maybe these can be behind a feature flag?

Panic when parsing URL with two consecutive "/" chars or empty path component

First, thanks for your work on this crate!

I've been using it to follow projects whose repos are posted on HackerNews. Recently, someone posted the following link to HackerNews, which leads to a panic: https://www.heliosleep.com//

thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', /.../.cargo/registry/src/github.com-1ecc6299db9ec823/git-url-parse-0.4.0/src/lib.rs:238:39
stack backtrace:
   0: rust_begin_unwind
             at /rustc/532d3cda90b8a729cd982548649d32803d265052/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/532d3cda90b8a729cd982548649d32803d265052/library/core/src/panicking.rs:142:14
   2: core::panicking::panic_bounds_check
             at /rustc/532d3cda90b8a729cd982548649d32803d265052/library/core/src/panicking.rs:85:5
   3: <usize as core::slice::index::SliceIndex<[T]>>::index
             at /rustc/532d3cda90b8a729cd982548649d32803d265052/library/core/src/slice/index.rs:189:10
   4: core::slice::index::<impl core::ops::index::Index<I> for [T]>::index
             at /rustc/532d3cda90b8a729cd982548649d32803d265052/library/core/src/slice/index.rs:15:9
   5: <alloc::vec::Vec<T,A> as core::ops::index::Index<I>>::index
             at /rustc/532d3cda90b8a729cd982548649d32803d265052/library/alloc/src/vec/mod.rs:2531:9
   6: git_url_parse::GitUrl::parse
             at /.../.cargo/registry/src/github.com-1ecc6299db9ec823/git-url-parse-0.4.0/src/lib.rs:238:39

The affected line is this one: https://github.com/tjtelan/git-url-parse-rs/blob/main/src/lib.rs#L238

After re-reading the docs, I wasn't able to determine whether this is expected behavior or a bug. My expectation was that GitUrl::parse() is safe to call with arbitrary URLs and it would return Ok(...) if it could detect a GitUrl and Err(...) otherwise. But perhaps the intended spec is instead "you are expected to call GitUrl::parse() only on URLs you reasonably believe are actually pointing to GitHub and similar sites" and my code is misusing that call when it attempts it on arbitrary URLs sourced from HackerNews.

Thanks again for building this crate and for looking into this!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.