Giter Club home page Giter Club logo

conda-mirror's Introduction

conda-mirror

Build Status PyPI version codecov

Mirrors an upstream conda channel to a local directory.

Install

conda-mirror is available on PyPI and conda-forge.

Install with:

pip install conda-mirror

or:

conda install conda-mirror -c conda-forge

Compatibility

conda-mirror is intentionally a py3 only package

CLI

CLI interface for conda-mirror.py

usage: conda-mirror [-h] [--upstream-channel UPSTREAM_CHANNEL]
                    [--target-directory TARGET_DIRECTORY]
                    [--temp-directory TEMP_DIRECTORY] [--platform PLATFORM]
                    [-D] [-v] [--config CONFIG] [--pdb]
                    [--num-threads NUM_THREADS] [--version] [--dry-run]
                    [--no-validate-target]
                    [--minimum-free-space MINIMUM_FREE_SPACE] [--proxy PROXY]
                    [--ssl-verify SSL_VERIFY] [-k]
                    [--max-retries MAX_RETRIES] [--no-progress]

CLI interface for conda-mirror.py

optional arguments:
  -h, --help            show this help message and exit
  --upstream-channel UPSTREAM_CHANNEL
                        The target channel to mirror. Can be a channel on
                        anaconda.org like "conda-forge" or a full qualified
                        channel like "https://repo.continuum.io/pkgs/free/"
  --target-directory TARGET_DIRECTORY
                        The place where packages should be mirrored to
  --temp-directory TEMP_DIRECTORY
                        Temporary download location for the packages.
                        Defaults to a randomly selected temporary directory.
                        Note that you might need to specify a different
                        location if your default temp directory has less
                        available space than your mirroring target
  --platform PLATFORM   The OS platform(s) to mirror. one of: {'linux-64',
                        'linux-32','osx-64', 'win-32', 'win-64'}
  -D, --include-depends
                        Include packages matching any dependencies of
                        packages in whitelist.
  -v, --verbose         logging defaults to error/exception only. Takes up to
                        three '-v' flags. '-v': warning. '-vv': info. '-vvv':
                        debug.
  --config CONFIG       Path to the yaml config file
  --pdb                 Enable PDB debugging on exception
  --num-threads NUM_THREADS
                        Num of threads for validation. 1: Serial mode. 0: All
                        available.
  --version             Print version and quit
  --dry-run             Show what will be downloaded and what will be
                        removed. Will not validate existing packages
  --no-validate-target  Skip validation of files already present in target-
                        directory
  --minimum-free-space MINIMUM_FREE_SPACE
                        Threshold for free diskspace. Given in megabytes.
  --proxy PROXY         Proxy URL to access internet if needed
  --ssl-verify SSL_VERIFY, --ssl_verify SSL_VERIFY
                        Path to a CA_BUNDLE file with certificates of trusted
                        CAs, this may be "False" to disable verification as
                        per the requests API.
  -k, --insecure        Allow conda to perform "insecure" SSL connections and
                        transfers. Equivalent to setting 'ssl_verify' to
                        'false'.
  --max-retries MAX_RETRIES
                        Maximum number of retries before a download error is
                        reraised, defaults to 100
  --no-progress         Do not display progress bars.

Example Usage

WARNING: Invoking this command will pull ~10TB and take at least an hour

conda-mirror --upstream-channel conda-forge --target-directory local_mirror --platform linux-64

More Details

blacklist/whitelist configuration

example-conf.yaml:

blacklist:
  - license: "*agpl*"
  - license: None
  - license: ""

whitelist:
  - name: system

blacklist removes package(s) that match the condition(s) listed from the upstream repodata.

whitelist re-includes any package(s) from blacklist that match the whitelist conditions.

blacklist and whitelist both take lists of dictionaries. The keys in the dictionary need to be values in the repodata.json metadata. The values are (unix) globs to match on, but in the case of the version attribute, conda package match version specifications may also be used.

Go here for the full repodata of the upstream "defaults" channel: http://conda.anaconda.org/anaconda/linux-64/repodata.json

Here are the contents of one of the entries in repodata['packages']

{'botocore-1.4.10-py34_0.tar.bz2': {'arch': 'x86_64',
  'binstar': {'channel': 'main',
   'owner_id': '55fc8527d3234d09d4951c71',
   'package_id': '56b88ea1be1cc95a362b218e'},
  'build': 'py34_0',
  'build_number': 0,
  'date': '2016-04-11',
  'depends': ['docutils >=0.10',
   'jmespath >=0.7.1,<1.0.0',
   'python 3.4*',
   'python-dateutil >=2.1,<3.0.0'],
  'license': 'Apache',
  'md5': 'b35a5c1240ba672e0d9d1296141e383c',
  'name': 'botocore',
  'platform': 'linux',
  'requires': [],
  'size': 1831799,
  'version': '1.4.10'}}

See implementation details in the conda_mirror:match function for more information.

Common usage patterns

Mirror only one specific package

If you wanted to match exactly the botocore package listed above with your config, then you could use the following configuration to first blacklist all packages and then include just the botocore packages:

blacklist:
  - name: "*"
whitelist:
  - name: botocore
    version: 1.4.10
    build: py34_0

you can use standard conda package version specifiers to filter a range of versions:

blacklist:
  - name: "*"
whitelist:
  - name: botocore
    version: ">=1.4.10,<1.5"
Mirror everything but agpl licenses
blacklist:
  - license: "*agpl*"
Mirror only python 3 packages
blacklist:
  - name: "*"
whitelist:
  - build: "*py3*"
Mirror specified packages and their dependencies

This will include all instances of botocore with at least version 1.4.10 along with any packages that match its dependencies (and likewise for dependencies of those packages).

blacklist:
  - name: "*"
whitelist:
  - name: botocore
    version: ">=1.4.10"
include_depends: True

If this includes too many packages versions, you can add additional entries to the whitelist to limit what will be included.

Testing

Install test requirements

Note: Will install packages from pip

$ pip install -r test-requirements.txt
Requirement already satisfied: pytest in /home/edill/miniconda/lib/python3.5/site-packages (from -r test-requirements.txt (line 1))
Requirement already satisfied: coverage in /home/edill/miniconda/lib/python3.5/site-packages (from -r test-requirements.txt (line 2))
Requirement already satisfied: pytest-ordering in /home/edill/miniconda/lib/python3.5/site-packages (from -r test-requirements.txt (line 3))
Requirement already satisfied: py>=1.4.29 in /home/edill/miniconda/lib/python3.5/site-packages (from pytest->-r test-requirements.txt (line 1))

Run the tests, invoking with the coverage tool.

$ coverage run run_tests.py
sys.argv=['run_tests.py']
========================================= test session starts ==========================================
platform linux -- Python 3.5.3, pytest-3.0.6, py-1.4.31, pluggy-0.4.0 -- /home/edill/miniconda/bin/python
cachedir: .cache
rootdir: /home/edill/dev/maxpoint/github/conda-mirror, inifile:
plugins: xonsh-0.5.2, ordering-0.4
collected 4 items

test/test_conda_mirror.py::test_match PASSED
test/test_conda_mirror.py::test_cli[https://repo.continuum.io/pkgs/free-linux-64] PASSED
test/test_conda_mirror.py::test_cli[conda-forge-linux-64] PASSED
test/test_conda_mirror.py::test_handling_bad_package PASSED

======================================= 4 passed in 4.41 seconds =======================================

Show the coverage statistics

$ coverage report -m
Name                           Stmts   Miss  Cover   Missing
------------------------------------------------------------
conda_mirror/__init__.py           3      0   100%
conda_mirror/conda_mirror.py     236     20    92%   203-205, 209-210, 214, 240, 249-254, 262-264, 303, 366, 497, 542-543, 629
------------------------------------------------------------
TOTAL                            239     20    92%

Other

After a new contributor makes a pull-request that is approved, we will reach out and invite you to be a maintainer of the project.

Releasing

To release you need three things

  1. Commit rights to conda-mirror
  2. A github token
  3. The version number that you want to use for the new tag

After you have all three of these things, run the release.sh script (on a unix machine) and pass it the tag that you want to use and your github token:

GITHUB_TOKEN=<github_token> ./release.sh <tag>

conda-mirror's People

Contributors

analog-cbarber avatar asilenzi avatar diogocp avatar dmkent avatar ericdill avatar faustincarter avatar fhoehle avatar gmertes avatar goanpeca avatar ilanschnell avatar jakirkham avatar jneines avatar magnuhho avatar manics avatar mariusvniekerk avatar nephiaust avatar nicoddemus avatar opiethehokie avatar parente avatar scopatz avatar willirath avatar xhochy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

conda-mirror's Issues

Support for using Conda/Mamba solver to recursively find dependencies

I've got a proof-of-concept of using the Mamba/Conda dependency solver to find the full set of dependencies for a set of package specifications:
https://github.com/manics/conda-mirror/tree/conda-mamba-downloader

The main advantage of this is it only downloads one version of each each package or dependency. The corresponding disadvantage is you can't whitelist multiple versions of a package, so if you're adding to an existing set of downloaded packages you may not be able to install the new packages alongside them. This is acceptable for me and is easier than trying to restrict the number of versions of everything.

I've done this by using the mamba/conda dependency solver to provide a list of package filenames (*.tar.bz2), and passing this into conda-mirror to handle downloads, validation and updating repodata.json.

I've effectively written a wrapper script https://github.com/manics/conda-mirror/blob/conda-mamba-downloader/conda_downloader/download.py
which calls the solver: https://github.com/manics/conda-mirror/blob/bc1eca6bac0ec78ee35d2f2ce833b7fcbe794a82/conda_mamba_downloader/download.py#L122-L143
and wraps a call to conda_mirror.main(): https://github.com/manics/conda-mirror/blob/bc1eca6bac0ec78ee35d2f2ce833b7fcbe794a82/conda_mamba_downloader/download.py#L203-L229

I made one change to conda-mirror to use the filename as an additional match key:
manics@2bad966

Do you have any thoughts on how/whether this should work with conda-mirror? As a standalone change adding the filename key so conda-mirror can be used as a backend library for this, or add the full solver functionality (with limitations) to conda-mirror?

Blocking infected packages that have certain bad dependency

Background:
Few packages with the following dependence "abseil-cpp =20190808" are causing build failure issues all over the place.

Suggestion:
It would be useful to be able to block such packages from getting mirrored into channels.

Thanks!

--version should not include log output

Currently, the --version flag includes logging output, e.g:

Log level set to ERROR
0.8.2+2.gd78be23

The log output is not useful in this case and hampers the ability for external tools to parse the version (to check for version compatibility).

Instead it should print the version and exit before initializing the logger.

Warn users about potentially breaking Anaconda's new TOS

As conda-mirror maintainers, we should probably consider warning users that they may be breaking Anaconda's new terms of service for repo.anaconda.com unless either of the following is true:

  1. They have an existing commercial relationship with Anaconda
  2. They are not a large-scale commercial entity, specifically referring to this line in the new TOS, we are not granting you permission to use the Repository for large-scale commercial activities, such as downloading or mirroring the entire Repository or use of the Repository by multiple members of the business you work for.

I've got a draft PR that implements what I think is a reasonable message, though I'm sure we will need to update the language based on folks opinions.

Before we get to the specific language in the warning / error message though, is anyone opposed to forcing users to address the TOS changes in conda-mirror itself *if they are using Anaconda's channels on repo.anaconda.com or repo.continuum.io?

Create logo

We could simply have a mirror where we have a conda logo in front and on in a mirror. Issue is probably that the Conda logo is trademarked, so having a package in front and in the mirror itself might be a better choice from the legal side.

Options to set limit on number and size of packages to download.

It is pretty easy when mirroring from large repos to inadvertently get more than you really need.

One way to address this would be to add limits on both the number of packages and the total
size of packages to be downloaded and to not download anything if the limit is exceeded. We
can pick some suitable default value and require users to override it for larger mirroring jobs.

Intermittent internet loss causes downloads to hang indefinitely

I have unstable internet, and every so often my internet will disconnect for a second or two. When this happens right in the middle of a package download, the download will stall and hang indefinitely. The progress bar gets stuck and it will not recover by itself when connectivity is restored (ctrl+c also does nothing). The only solution is killing the process and restarting.

Is there a way to detect loss of connectivity during a transfer, or can we set a maximum time that a transfer is allowed to take before it kills itself and tries again? The current retry mechanic does not do anything for these kind of mid-download disconnects.

Improve command line help

The current help could use some improvement. The description of the program is simply
"CLU interface for conda-mirror.py" and the options are jumbled together in no particular order.

The description should be improved and the options could be grouped by category to make them
easier to read.

Support custom channel settings

Any custom channel aliases configured in the user's .condarc file should be recognized so that users can write the alias instead of having to write out the full URL.

E.g. If you have:

custom_channels:
   my-channel: https://my-custom-server.com/

you should be able to specify my-channel as the channel rather than https://my-custom-server.com/my-channel

Allow diff-tar to accept a target

Currently the filepath to which conda-diff-tar writes the reference.json file, and the path to which it writes the update.tar file are hardcoded. Would be nice if these could be passed as an argument.

Progress bars

Would be nice if you added (optional) progress bars.

This is pretty easy using tqdm

--dry-run creates target directory

Not a big deal, but I was surprised to see that --dry-run will create the target directory if it does not already exist.

I usually expect --dry-run options to not modify the file system (except perhaps to create/remove temp files).

--dry-run should imply more verbosity

There isn't much point in specifying the --dry-run option without also turning the logging verbosity up to see what packages are to be downloaded. We may as well do it automatically.

Activate CI

There is currently a working Travis setup. We might want to move to a different CI or use a different setup but should first enable Travis.

@scopatz Travis isn't showing the project for me, can you have a look?

Specifying a whitelist should make blacklist default to everything.

If you have specified a whitelist, then an empty blacklist doesn't really make any sense.

Instead if there is a whitelist perhaps the blacklist should default to everything if not explicitly
specified.

To make such a change backward compatible, we could add additional names to use for
blacklist/whitelist (probably a good idea anyway) and only apply the new behavior to the
new names. E.g. 'exclude' for 'blacklist' and 'include', 'require' and/or 'depends' for the whitelist.

No release script

The release instructions in the README tell you to run the release.sh script but there is no such script currently in this repository.

The developer-specific part of the README should probably go into a separate file (e.g. CONTRIBUTING.md) in any case.

Option to only get n latest versions of matching packages.

As new versions of packages are released, mirroring a channel from the same spec using '>=' version specifiers will pull in more and more packages. But the user may be satisfied with only the more recent versions.

We could add an option to specify that you should only take the latest n versions of each package that passes the filter. We may also want to consider options to control whether to pick up dev versions (and how many).

Include packages for removal when using conda-diff-tar

In order to truly manage a mirror, it is also important to know which packages have been removed from the remote mirror. Suggestion is to include these packages in updates.tar as some kind of summary.json, so one may choose to delete those packages if deisred.

Support multiple input channels

I have some use cases where we would like to be able to grab packages from multiple input channels (e.g. conda-forge plus an internal private channel) and combine them in a single private channel. It would be nice if this tool could support that.

Also see #32

Is this project still being maintained?

There haven't been any commits for a long time and there are three old pending pull requests.

If pull requests aren't going to be merged, then there is really no point in going to the trouble of submitting them and I might as well just work independently on my own fork.

Tests relying on path equality fail on Windows

Some tests on Windows fail because the Windows path separator '\\' and the Unix path separator '/' are different. Moving towards using pathlib.Path objects rather than string paths will fix this. (I'll try and issue a PR for this in the near future!)

Exception during validation

Validation steps always ends like this. Is python 3.8 supported?

Traceback (most recent call last):
  File "c:\programdata\miniconda3\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "c:\programdata\miniconda3\lib\multiprocessing\pool.py", line 48, in mapstar
    return list(map(*args))
  File "c:\programdata\miniconda3\lib\site-packages\conda_mirror\conda_mirror.py", line 650, in _validate_or_remove_package
    logger.info('Validating {:4d} of {:4d}: {}.'.format(num + 1, num_packages,
AttributeError: 'NoneType' object has no attribute 'info'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\programdata\miniconda3\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\programdata\miniconda3\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\ProgramData\Miniconda3\Scripts\conda-mirror.exe\__main__.py", line 7, in <module>
  File "c:\programdata\miniconda3\lib\site-packages\conda_mirror\conda_mirror.py", line 350, in cli
    main(**_parse_and_format_args())
  File "c:\programdata\miniconda3\lib\site-packages\conda_mirror\conda_mirror.py", line 879, in main
    validation_results = _validate_packages(packages, download_dir,
  File "c:\programdata\miniconda3\lib\site-packages\conda_mirror\conda_mirror.py", line 604, in _validate_packages
    validation_results = p.map(_validate_or_remove_package,
  File "c:\programdata\miniconda3\lib\multiprocessing\pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "c:\programdata\miniconda3\lib\multiprocessing\pool.py", line 771, in get
    raise self._value
AttributeError: 'NoneType' object has no attribute 'info'

"conda-mirror --upstream-channel conda-forge ..... --platform linux-64" =10GB? More like 10TB

I am trying to create a offline python mirror. I know nothing of python.

conda-mirror --upstream-channel conda-forge --target-directory local_mirror --platform linux-64
WARNING: Invoking this command will pull ~10GB and take at least an hour

So I left it running and it filled up the server temp space at 265GB, crashing the server and was only on packages beginning with "a".
noticed there was some none python packages , noticed where it was python it was all python versions, noticed that it downloaded multiple package versions of the same python version.

Is there a huge chuck of configuration missing?
I have since create a yaml file specifying build: "py36" , not sure if I should have added the six. I am sure my users will tell me!
Also it is downloading several versions of the same package, is this needed?

Dependencies Calculation

I found this tool very nice as the official mirroring tool cas-mirror is not free.

Currently only blacklist and whitelist is considered, not including dependencies. How can mirror the minimal packages with dependencies?

Fewer required arguments

Currently the channel, target directory and platform are all required arguments that must either come from the command line or config file.

Instead, I think we should have the platform default to the current platform and the target directory should default to something like channel-name + -mirror in the current working directory.

We should also add shorter option flag aliases for these common arguments.

Script doesn't handle versions that serves both *.tar.bz2 and *.conda

From the validation function logs "Validating XXXX of YYYY", we observed that...

  • Older Python versions that only serves *.tar.bz2 is captured
  • Newer Python versions that serves in both format is captured
  • Repo like libdeflate is handled for all versions, they serve only 1 format in each version

Support conda package version specifiers in include/exclude (whitelist/blacklist) specs

Currently only glob expressions are supported for fields in the include/exclude lists.

It would be more useful if you could use regular conda package version specifiers
for the version field. For example:

include:
  - yaml >=0.2.3

To do this, we could consider any match spec containing one of the characters (^$=<>,|) to
be a conda version specifier rather than a glob expression.

Since the version matching logic is non-trivial, it makes the most sense to not try to reimplement it in this project.
Either import it directly from conda.models.version (either add a dependency on conda or do a runtime check)
or copy that module here.

Adding a conda dependency would mean that this package should only be installed in the base environment,
so it might be safer to do a runtime check and raise an error if a version specifier is encountered when running
outside of base environment.

Also see #26 and #22

Allow required options to come from config file

The code currently checks for required arguments when parsing arguments but before processing the config file, so it is not possible to simply specify the channel, platform or target directory in the config file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.