Giter Club home page Giter Club logo

Comments (7)

willirath avatar willirath commented on June 23, 2024 2

Thanks for writing this in the first place. This tool already helped a log in getting rid of all my "maintain conda behind a tight firewall" problems.

from conda-mirror.

willirath avatar willirath commented on June 23, 2024 1

+1 for this feature.

It looks like https://github.com/maxpoint/conda-mirror/blob/master/conda_mirror/conda_mirror.py#L359 could be rewritten to a function:

def func_instead_of_inner_loop_validate(package):

    # ensure the packages in this directory are in the upstream
    # repodata.json
    try:
        package_metadata = package_repodata[package]
        except KeyError:
            logger.warning("%s is not in the upstream index. Removing...",
                           package)
            _remove_package(os.path.join(package_directory, package),
                            reason="Package is not in the repodata index")
        else:
            # validate the integrity of the package, the size of the package and
            # its hashes
            logger.info('Validating package {}'.format(package)
            _validate(os.path.join(package_directory, package),
                      md5=package_metadata.get('md5'),
                      size=package_metadata.get('size'))

(Note that I dropped idx for now.)

This allows for using map(func_instead_of_inner_loop_validate, sorted(local_packages)) instead of the loop.

Then, multiprocessing.map is all we need:

[...]
import multiprocessing
[...]
def _validate_packages(package_repodata, package_directory):
    [...]
    p = multiprocessing.Pool() # careful! This uses _all_ CPUs
    p.map(func_instead_of_inner_loop_validate, sorted(local_packages))
    p.close()
    p.terminate()
    p.join()
    
[...]

from conda-mirror.

willirath avatar willirath commented on June 23, 2024 1

I am working on it: #45. Won't make it past a very rough sketch today. I'll be busy the next two weeks but definitely get back to this at the end of April.

from conda-mirror.

ericdill avatar ericdill commented on June 23, 2024

You are 100% correct that parallelization would dramatically reduce the runtime. I did play around with adding such a feature but was not pleased with the increased complexity. Since I run this as a nightly cron job, the extra time is not posing a real issue for me at the moment. That being said, if you are interested in parallelizing the validation, I'd be more than happy to review the pull request.

from conda-mirror.

ericdill avatar ericdill commented on June 23, 2024

@willirath Thanks for the interest. That seems like a sensible solution. Would you be able to submit a PR with this code change and we can discuss its details in that PR?

from conda-mirror.

ericdill avatar ericdill commented on June 23, 2024

Awesome 🎉 ! Thanks for the effort. Ping me on the PR when you'd like some feedback.

from conda-mirror.

ericdill avatar ericdill commented on June 23, 2024

Closed by #48 . Thanks for the substantial effort here @willirath

from conda-mirror.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.