Giter Club home page Giter Club logo

Comments (3)

graveljp avatar graveljp commented on September 29, 2024 1

Your upload might have the time to finish before I'm done implementing this! ;)

Multi-threading in Python usually gives questionable results because of the GIL, but in this case, I think most of the time is spent waiting for SmugMug to reply, so there could be a large gain with using multi-threading. I'll have a look, it could be fun to implement.

from smugcli.

graveljp avatar graveljp commented on September 29, 2024

The sync command now runs multi-threaded! I'm sorry that took a while to implement, I needed to completely re-write the the sync algorithm and change lots of surrounding code, but before I could do that, I had to write thorough unit testing to make sure I wouldn't break anything.

Let me know how you like it, whether it runs faster for you or if you find any issues.

Because files are now processed in parallel, providing meaningful feedback on the command-line provided it's own challenge. I implemented a terminal-based floating text rendering (where each status update overwrite the previous printout). This allows users to follow the progress in real time without cluttering the terminal output. I tested that on Windows, Cygwin, Linux and Mac and it all worked fine (you need to pip install colorama for this to work on Windows). Let me know if that works for you.

I implemented three level of parallelism: folder, file and upload. You can override the default parallelism by doing:

smugcli.py sync ... --folder_threads=4 --file_threads=16 --upload_threads=3

When you are happy with a certain config, you can save it to be used as default next time by doing:

smugcli.py sync --set_defaults --folder_threads=4 --file_threads=16 --upload_threads=3

The --folder_threads parameter specifies the number of folders to process in parallel. --file_threads corresponds to the number of files to read in parallel and compare to the server side version. If files need to be uploaded, --upload_threads control the number of parallel files being uploaded to the SmugMug severs. Keep in mind that when using a large --file_threads, more files will have to be stored in memory simultaneously. If you upload many large video files, this can sum up to a very large amount of RAM. Let me know if that's a problem.

Because many threads are now working in parallel on the same folder hierarchy, I had to implement caching to make sure that each threads won't fetch the same nodes multiple times from the server. I then had to implement a garbage collection policy to keep SmugCLI from storing the whole SmugMug folder hierarchy in cache (in your case, the JSON metadata for 6 million photos and their parent folders). I tested this on my photo library, but it's a fairly small dataset. Keeping an eye on memory usage, let me know how this works out with your image collection.

from smugcli.

pbtrung avatar pbtrung commented on September 29, 2024

Sorry for the late reply. I am not on Github very often recently.
Just updated my local clone of your repo this morning and WOW!! It has been faster definitely.
I will watch memory usage and let you know if there are any problems.
Have been using smugcli continuously for three months and so far it has uploaded 4 million files successfully. Thanks so much for this awesome tool!

from smugcli.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.