Giter Club home page Giter Club logo

globwalk's People

Contributors

alexanderkjall avatar ds-cbo avatar epage avatar gilnaa avatar jmcomets avatar killercup avatar lo48576 avatar lucab avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

globwalk's Issues

remove public dependency on `ignore`

It looking at the API of this crate, I noticed that an ignore::Error is re-exported through a type alias. I think we should probably try to avoid that, since the ignore crate isn't something I'd consider stable yet.

This crate also has a public dependency on the walkdir crate, but this one is a bit trickier to remove. Namely, WalkError exposes walkdir's error type, but the iterator itself also exposes walkdir's DirEntry type. This public dependency might be OK though, since I largely consider walkdir's API stable and there are no current plans for releasing any new major versions.

Absolute path on Windows with ** and `glob`

Looks like the glob fn does not work on absolute paths on Windows when they contain **.
To try, run this code below and change the path to an existing one:

for file in glob(r#"C:\Users\User\Documents\something\**\*.{html, xml}") {
    println!("{:?}", file);
}

It will find the files correctly but they will all be ignored.

The reason seems to be that glob uses the GitIgnore builder under the hood which will turn the **\\**.{html, xml} glob pattern into **/**\\*.{html, xml} which will not match anything.

Doing it manually with GlobWalkBuilder works fine so it seems it's just a matter of replacing the Windows backslashes when passing the pattern on Windows since the underlying library expects /?

ignore: Directory glob is too eager

For the following dir structure:

mkdir target
touch {.,target}/{a,b,c}.rs

And the following settings:

GlobWalker::from_patterns(".", &["target/"])

The following Results are expected:

Ok(DirEntry("../testg/target"))

But extra results appear:

Ok(DirEntry("../testg/target"))
Ok(DirEntry("../testg/target/c.rs"))
Ok(DirEntry("../testg/target/b.rs"))
Ok(DirEntry("../testg/target/a.rs"))

This, and also #5 both stem from the use of ignore and the result negation.

(feature request) Ability to filter based on file metadata (e.g. `created_after`, `modified_after`, etc)

First, love the API and speed of this library. Thank you!

I was using https://github.com/ParthJadhav/Rust_Search previously... believe it or not, from my testing globwalk is actually faster (didn't do any thorough benchmarking or anything but for my specific usecase I'm seeing that it's faster. And moreover, its API doesn't support glob for multiple file extensions.

One thing Rust_Search does have that I'm missing in globwalk is the ability to filter against file metadata. This was added here to Rust_Search.

Would it be possible to add something similar to globwalk?


Update: Just saw filter_map... perhaps it could be used for this? If so, some example code would be super helpful (I'm a total Rust newbie).

Directory Walk Order Quesiton

In Ubuntu 18.04, Windows and MacOS the directories are walked in the same order when I run this code:

let target_dir = Path::from("/tmp/temp_356d21aa-5320-4fc1-a6db-e47b28ed2a94/toplevel/");
let walker = globwalk::GlobWalkerBuilder::from_patterns(target_dir, &[".sr"])
        .max_depth(100)
        .follow_links(false)
        .build()
        .expect("Could not build globwalk directory walker.")
        .into_iter()
        .filter_map(Result::ok);

for sr_file in walker {
    println!("{:?}", sr_file);
}

Here is the order:.

"/tmp/temp_356d21aa-5320-4fc1-a6db-e47b28ed2a94/toplevel/.sr"
"/tmp/temp_356d21aa-5320-4fc1-a6db-e47b28ed2a94/toplevel/components/level1/.sr"
"/tmp/temp_356d21aa-5320-4fc1-a6db-e47b28ed2a94/toplevel/components/level1/components/level2/.sr"
"/tmp/temp_356d21aa-5320-4fc1-a6db-e47b28ed2a94/toplevel/components/level1/components/level2/components/level3/.sr"
"/tmp/temp_356d21aa-5320-4fc1-a6db-e47b28ed2a94/toplevel/node_modules/blink_firmware/.sr"

However, running the same code in Travis CI (which I think still uses Ubuntu 14.04 by default) or on Manjaro Linux (Arch Linux based) gives a different order:

"/tmp/temp_de230ca5-121e-4293-98f3-c599b83f18ed/toplevel/node_modules/blink_firmware/.sr"
"/tmp/temp_de230ca5-121e-4293-98f3-c599b83f18ed/toplevel/.sr"
"/tmp/temp_de230ca5-121e-4293-98f3-c599b83f18ed/toplevel/components/level1/.sr"
"/tmp/temp_de230ca5-121e-4293-98f3-c599b83f18ed/toplevel/components/level1/components/level2/.sr"
"/tmp/temp_de230ca5-121e-4293-98f3-c599b83f18ed/toplevel/components/level1/components/level2/components/level3/.sr"

I'm assuming that the difference is based on which libraries are used on the systems, but I'm not sure. Is there a way to enforce the same sort, regardless of what system the code is running on?

globwalk exposes `globset::Error` in its API

This forces clients to directly depend on globset just for error handling.

I'm working on a CLI test utility for easily populating a temp dir and am looking at using globwalk because it implements "good enough" policy to expose in a simple API. Ran into this when integrating globwalk.

Ignore negation doesn't work on directories

To reproduce:

mkdir test_area
touch test_area/{.,Pictures}/{a,b,c}.{jpg,png,gif}

Code:

fn main() {
    let walker = globwalk::GlobWalker::from_patterns("/path/to/test_area", &["*.{png,jpg,gif}", "!Pictures"])
        .unwrap()
        .into_iter()
        .filter_map(Result::ok);
    for img in walker {
        println!("{:?}", img.path());
    }
}

Result:

"/home/gilnaa/proj/ttt/tree/a.gif"
"/home/gilnaa/proj/ttt/tree/a.jpg"
"/home/gilnaa/proj/ttt/tree/a.png"
"/home/gilnaa/proj/ttt/tree/b.gif"
"/home/gilnaa/proj/ttt/tree/b.jpg"
"/home/gilnaa/proj/ttt/tree/b.png"
"/home/gilnaa/proj/ttt/tree/c.gif"
"/home/gilnaa/proj/ttt/tree/c.jpg"
"/home/gilnaa/proj/ttt/tree/c.png"
"/home/gilnaa/proj/ttt/tree/Pictures/a.gif"
"/home/gilnaa/proj/ttt/tree/Pictures/a.jpg"
"/home/gilnaa/proj/ttt/tree/Pictures/a.png"
"/home/gilnaa/proj/ttt/tree/Pictures/b.gif"
"/home/gilnaa/proj/ttt/tree/Pictures/b.jpg"
"/home/gilnaa/proj/ttt/tree/Pictures/b.png"
"/home/gilnaa/proj/ttt/tree/Pictures/c.gif"
"/home/gilnaa/proj/ttt/tree/Pictures/c.jpg"
"/home/gilnaa/proj/ttt/tree/Pictures/c.png"

Should globwalk return only files by default?

In a lot of cases, people will do globs like *.rs which 99% of the time will only return files. In this case, if they then refactor and switch to *, they can now start to get directories.

imo it seems less surprising if globwalk only returned files by default.

I could see having a .file_type(globwalk.FileType::File | globwalk.FileType::Dir || globwalk.FileType::SymLink) to give users control over what is returned.

Globs that start with "./" broken

Consider the following main.rs:

extern crate globwalk;
fn main() {
    println!("dot");
    for f in globwalk::glob("./globwalk/**/*.rs").unwrap() {
        println!("  {:?}", f);
    }

    println!("nodot");
    for f in globwalk::glob("globwalk/**/*.rs").unwrap() {
        println!("  {:?}", f);
    }
}

And Cargo.toml:

[package]
name = "glest"
version = "0.1.0"

[dependencies]
globwalk = "0.8.0"

Running the resulting binary in a directory containing a checkout of globwalk:

T:\glest>target\debug\glest
dot
nodot
  Ok(DirEntry(".\\globwalk\\examples\\list.rs"))
  Ok(DirEntry(".\\globwalk\\src\\doctests.rs"))
  Ok(DirEntry(".\\globwalk\\src\\lib.rs"))
  Ok(DirEntry(".\\globwalk\\tests\\docs.rs"))

Note how an otherwise identical glob which started with ./ returned no results.

Unexpected results with single `*` pattern

* seems to incorrectly match both the current directory itself and all of the recursive entries.

Here's a test case:

    #[test]
    fn test_glob_single_star() {
        let dir = TempDir::new("globset_walkdir").expect("Failed to create temporary folder");
        let dir_path = dir.path();
        create_dir_all(dir_path.join("Pictures")).expect("");

        touch(
            &dir,
            &[
                "a.png",
                "b.png",
                "c.png",
                "Pictures[/]a.png",
                "Pictures[/]b.png",
                "Pictures[/]c.png",
            ][..],
        );

        let mut actual = vec![];
        for matched_file in GlobWalkerBuilder::from_patterns(dir_path, &["*"])
            .sort_by(|a, b| a.path().cmp(b.path()))
            .build().unwrap()
            .into_iter()
            .filter_map(Result::ok)
        {
            actual.push( matched_file
                .path()
                .strip_prefix(dir_path)
                .unwrap()
                .to_str()
                .unwrap().to_owned());
        }

        assert_eq!(actual, vec!["Pictures", "a.png", "b.png", "c.png"]);

    }

It fails:

---- tests::test_glob_single_star stdout ----
thread 'tests::test_glob_single_star' panicked at 'assertion failed: `(left == right)`
  left: `["", "Pictures", "Pictures/a.png", "Pictures/b.png", "Pictures/c.png", "a.png", "b.png", "c.png"]`,
 right: `["Pictures", "a.png", "b.png", "c.png"]`', src/lib.rs:843:9

"Why not glob" entry misleading?

glob searches for files in the current working directory, whereas globwalk starts at a specified base-dir.

Both seem to default to the current directory for their simplest API, and both have the option of using an absolute path.
e.g.

for entry in glob::glob("/media/**/*.jpg")

Re-export `DirEntry`

Was just about to use the DirEntry when I realized I'd have to get it from WalkDir.

Related: #15

Improve example code in readme

It is really not a good idea to put fs::remove_file in example code.

Also, it seems that readme has not been synced with the code.

The following piece of code recursively find all mp3 and FLAC files:

globwalk looks at all files even when it doesn't need to

https://docs.rs/globwalk/0.8.0/src/globwalk/lib.rs.html#355

If the top-level directory is not a match, no subdirectories or files in the directory will be a match either. However globwalk will still look at them. If there are many files in the ignored directory, this can cause enormous slowdowns. In https://github.com/rust-lang/docs.rs/pull/861/files#diff-ee6431d852ce8913514eece9e3982d32R96-R99, we have several hundred thousand files in subdirectories, but only ~10 in the matched directory, so this causes a slowdown of several orders of magnitude.

API feedback

I just tried globwalk after reading Keats/tera#212 and the globset PR associated.

First: the example in the README does not compile.

Second, I think an API like this would be nicer:

// not going to actually write the code
struct GlobWalk {}

impl GlobWalk {
 // takes a single pattern and assume current  dir -> like `glob` and the python `glob` (https://docs.python.org/3/library/glob.html)
  pub fn new(pat: &str) -> Result<Self, Error> {}

  // same as current one, minus the base directory
  pub fn from_patterns(pat: &[S]) -> Result<Self, Error> {}

  pub base_directory(&mut self, dir: &str) -> Self {}

 // rest of the builder similar to currently
}

The main advantage with that API is simplicity for the common case: find everything matching in a single given pattern for the cwd. This is how glob works in pretty much every languages I know (including Rust) and that would be weird to change that.
I see "glob searches for files in the current working directory, whereas globwalk starts at a specified base-dir." in the README but doesn't really explain why.

I'd be happy to implement that in a PR if you agree.

Does not work on absolute paths

Hi, I ran into a surprizing error today. I was using the tera template engine (which depends on globwalk), and I couldn't get it to load my templates. It turns out that while the glob crate works with absolute paths, globwalk does not. I was using the glob crate in addition to tera and was very confused about why this wasn't working.

I made this simple example program to demonstrate:

use std::fs::canonicalize;

use globwalk::glob;

fn main() {
    let relpattern = "src/*.rs";
    println!("With relative path: {}", relpattern);
    for file in glob(relpattern).unwrap() {
        println!("{:?}", file.unwrap());
    }

    let abspath = canonicalize("src").unwrap().join("*.rs");
    let abspath = abspath.to_str().unwrap();
    println!("With absolute path: {}", abspath);
    for file in glob(abspath).unwrap() {
        println!("{:?}", file.unwrap());
    }
}

This produces the following output:

With relative path: src/*.rs
DirEntry("./src/main.rs")
With absolute path: /tmp/glob-bug/src/*.rs

As you can see, with the relative path, it prints the correct entires. However, with the absolute path, nothing is printed at all.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.