Giter Club home page Giter Club logo

Comments (6)

gulrak avatar gulrak commented on July 28, 2024 1

I found at least one of the reasons going through the source with the call stack as hint, the evaluation of the permissions "executable" part is quite expensive and most of the time permissions of a directory_entry are not used.

I think I'll defer the X-Part of the flags until permissions are actually used, and additionally try to optimize their evaluation.

I'm on a trip right now (browsing code on the phone), but I think a test of that should be on a branch sometime tomorrow. Not sure about the impact yet.

from filesystem.

gulrak avatar gulrak commented on July 28, 2024 1

My work is on branch: feature-73-performance-optimization

Okay, let me start by saying, I was not able to reproduce the huge gap from your measurements. My tests where on my Dev-Laptop, with an SSD (I have no Windows system with HDD available), and I'm using VisualStudio 2019, 16.7.4 and my baseline measurements are:

Recursive iteration over C:\Windows (nested directories, 324681 entries, 247597 files) with additional fs::status() call, as in the given code:

Implementation Time Relative
std::filesystem 47.2s 100%
ghc::filesystem 51.5s +9%
ghc::filesystem with reduced path overhead 49.7s +5%

Not that impressive, still useful, with the difference of internal implementation in mind. I want to point out, that the additional fs::status(entry_path) call is the mail culprit of time taken. The fs::directory_entry from the iterator already has a status, so using fs::file_status entry_status = entry_path.status() inside the loop instead leads to:

Implementation Time Relative
std::filesystem 10.5s 100%
ghc::filesystem 13.1s +25%
ghc::filesystem with reduced path overhead 11.8s +12%

So my optimizations halved the overhead of the different internal representations, I'm happy with that.

I then also created a single test directory with 20k files, to have a comparison without the impact of many directory_iterator creations during the work of the recursive_directory_iterator used on the C:\Windows folder:

Implementation Time
std::filesystem with fs::status(entry_path) 1480ms
ghc::filesystem with fs::status(entry_path) 1570ms
ghc::filesystem with fs::status(entry_path) and path optimizations 1500ms
std::filesystem with entry_path.status() 47ms
ghc::filesystem with entry_path.status() 66ms
ghc::filesystem with entry_path.status() and path optimizations 44ms

So in the best case, no additional status call, just flat iteration, the optimizations make it faster than std::filesystem on average. I guess there is some non-optimal code in there as well, as it should be faster with native storage of the path.

I hope this helps your performance issues as well, even if I couldn't replicates your numbers.

from filesystem.

gulrak avatar gulrak commented on July 28, 2024

Yeah, that is more than I expected. There is an overhead in ghc::filesystem::path on Windows because of it using the generic representation as internal representation instead of the native one, but I guess this might be more the result of the directory_iterator / directory_entry workings.

I'll do some tests and try to optimize it. I'm quite busy currently, but I plan to work on this and #70, the other Windows issue the next days, hopefully tomorrow.

from filesystem.

gulrak avatar gulrak commented on July 28, 2024

Sorry, there is some delay with the availability of a branch on this.

My test was with recursive_directory_iterator as I had no large enough single directory, and there where differences between the results of iterating a huge tree with ghc::filesystem and MS std::filesystem in the number of regular files and the sum of their sizes, so I took quite some time to analyze this to find a possible hidden bug, and it seems I found an issue in the std::filesystem implementation that I'm going to report over there.

I hope to give it another shot after work today, but I wanted to report back why there is no branch yet.

from filesystem.

alexbobryshev avatar alexbobryshev commented on July 28, 2024

ok, thanks for information! I'm ready for test

from filesystem.

gulrak avatar gulrak commented on July 28, 2024

Released with v1.3.6

from filesystem.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.