Giter Club home page Giter Club logo

conda-mirror's People

Contributors

analog-cbarber avatar asilenzi avatar diogocp avatar dmkent avatar ericdill avatar faustincarter avatar fhoehle avatar gmertes avatar goanpeca avatar ilanschnell avatar jakirkham avatar jneines avatar magnuhho avatar manics avatar mariusvniekerk avatar nephiaust avatar nicoddemus avatar opiethehokie avatar parente avatar scopatz avatar willirath avatar xhochy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

conda-mirror's Issues

Options to set limit on number and size of packages to download.

It is pretty easy when mirroring from large repos to inadvertently get more than you really need.

One way to address this would be to add limits on both the number of packages and the total
size of packages to be downloaded and to not download anything if the limit is exceeded. We
can pick some suitable default value and require users to override it for larger mirroring jobs.

Tests relying on path equality fail on Windows

Some tests on Windows fail because the Windows path separator '\\' and the Unix path separator '/' are different. Moving towards using pathlib.Path objects rather than string paths will fix this. (I'll try and issue a PR for this in the near future!)

"conda-mirror --upstream-channel conda-forge ..... --platform linux-64" =10GB? More like 10TB

I am trying to create a offline python mirror. I know nothing of python.

conda-mirror --upstream-channel conda-forge --target-directory local_mirror --platform linux-64
WARNING: Invoking this command will pull ~10GB and take at least an hour

So I left it running and it filled up the server temp space at 265GB, crashing the server and was only on packages beginning with "a".
noticed there was some none python packages , noticed where it was python it was all python versions, noticed that it downloaded multiple package versions of the same python version.

Is there a huge chuck of configuration missing?
I have since create a yaml file specifying build: "py36" , not sure if I should have added the six. I am sure my users will tell me!
Also it is downloading several versions of the same package, is this needed?

Warn users about potentially breaking Anaconda's new TOS

As conda-mirror maintainers, we should probably consider warning users that they may be breaking Anaconda's new terms of service for repo.anaconda.com unless either of the following is true:

  1. They have an existing commercial relationship with Anaconda
  2. They are not a large-scale commercial entity, specifically referring to this line in the new TOS, we are not granting you permission to use the Repository for large-scale commercial activities, such as downloading or mirroring the entire Repository or use of the Repository by multiple members of the business you work for.

I've got a draft PR that implements what I think is a reasonable message, though I'm sure we will need to update the language based on folks opinions.

Before we get to the specific language in the warning / error message though, is anyone opposed to forcing users to address the TOS changes in conda-mirror itself *if they are using Anaconda's channels on repo.anaconda.com or repo.continuum.io?

Exception during validation

Validation steps always ends like this. Is python 3.8 supported?

Traceback (most recent call last):
  File "c:\programdata\miniconda3\lib\multiprocessing\pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "c:\programdata\miniconda3\lib\multiprocessing\pool.py", line 48, in mapstar
    return list(map(*args))
  File "c:\programdata\miniconda3\lib\site-packages\conda_mirror\conda_mirror.py", line 650, in _validate_or_remove_package
    logger.info('Validating {:4d} of {:4d}: {}.'.format(num + 1, num_packages,
AttributeError: 'NoneType' object has no attribute 'info'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\programdata\miniconda3\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "c:\programdata\miniconda3\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\ProgramData\Miniconda3\Scripts\conda-mirror.exe\__main__.py", line 7, in <module>
  File "c:\programdata\miniconda3\lib\site-packages\conda_mirror\conda_mirror.py", line 350, in cli
    main(**_parse_and_format_args())
  File "c:\programdata\miniconda3\lib\site-packages\conda_mirror\conda_mirror.py", line 879, in main
    validation_results = _validate_packages(packages, download_dir,
  File "c:\programdata\miniconda3\lib\site-packages\conda_mirror\conda_mirror.py", line 604, in _validate_packages
    validation_results = p.map(_validate_or_remove_package,
  File "c:\programdata\miniconda3\lib\multiprocessing\pool.py", line 364, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "c:\programdata\miniconda3\lib\multiprocessing\pool.py", line 771, in get
    raise self._value
AttributeError: 'NoneType' object has no attribute 'info'

Support conda package version specifiers in include/exclude (whitelist/blacklist) specs

Currently only glob expressions are supported for fields in the include/exclude lists.

It would be more useful if you could use regular conda package version specifiers
for the version field. For example:

include:
  - yaml >=0.2.3

To do this, we could consider any match spec containing one of the characters (^$=<>,|) to
be a conda version specifier rather than a glob expression.

Since the version matching logic is non-trivial, it makes the most sense to not try to reimplement it in this project.
Either import it directly from conda.models.version (either add a dependency on conda or do a runtime check)
or copy that module here.

Adding a conda dependency would mean that this package should only be installed in the base environment,
so it might be safer to do a runtime check and raise an error if a version specifier is encountered when running
outside of base environment.

Also see #26 and #22

Fewer required arguments

Currently the channel, target directory and platform are all required arguments that must either come from the command line or config file.

Instead, I think we should have the platform default to the current platform and the target directory should default to something like channel-name + -mirror in the current working directory.

We should also add shorter option flag aliases for these common arguments.

Option to only get n latest versions of matching packages.

As new versions of packages are released, mirroring a channel from the same spec using '>=' version specifiers will pull in more and more packages. But the user may be satisfied with only the more recent versions.

We could add an option to specify that you should only take the latest n versions of each package that passes the filter. We may also want to consider options to control whether to pick up dev versions (and how many).

Blocking infected packages that have certain bad dependency

Background:
Few packages with the following dependence "abseil-cpp =20190808" are causing build failure issues all over the place.

Suggestion:
It would be useful to be able to block such packages from getting mirrored into channels.

Thanks!

--version should not include log output

Currently, the --version flag includes logging output, e.g:

Log level set to ERROR
0.8.2+2.gd78be23

The log output is not useful in this case and hampers the ability for external tools to parse the version (to check for version compatibility).

Instead it should print the version and exit before initializing the logger.

No release script

The release instructions in the README tell you to run the release.sh script but there is no such script currently in this repository.

The developer-specific part of the README should probably go into a separate file (e.g. CONTRIBUTING.md) in any case.

Dependencies Calculation

I found this tool very nice as the official mirroring tool cas-mirror is not free.

Currently only blacklist and whitelist is considered, not including dependencies. How can mirror the minimal packages with dependencies?

Support multiple input channels

I have some use cases where we would like to be able to grab packages from multiple input channels (e.g. conda-forge plus an internal private channel) and combine them in a single private channel. It would be nice if this tool could support that.

Also see #32

Specifying a whitelist should make blacklist default to everything.

If you have specified a whitelist, then an empty blacklist doesn't really make any sense.

Instead if there is a whitelist perhaps the blacklist should default to everything if not explicitly
specified.

To make such a change backward compatible, we could add additional names to use for
blacklist/whitelist (probably a good idea anyway) and only apply the new behavior to the
new names. E.g. 'exclude' for 'blacklist' and 'include', 'require' and/or 'depends' for the whitelist.

Progress bars

Would be nice if you added (optional) progress bars.

This is pretty easy using tqdm

Improve command line help

The current help could use some improvement. The description of the program is simply
"CLU interface for conda-mirror.py" and the options are jumbled together in no particular order.

The description should be improved and the options could be grouped by category to make them
easier to read.

Intermittent internet loss causes downloads to hang indefinitely

I have unstable internet, and every so often my internet will disconnect for a second or two. When this happens right in the middle of a package download, the download will stall and hang indefinitely. The progress bar gets stuck and it will not recover by itself when connectivity is restored (ctrl+c also does nothing). The only solution is killing the process and restarting.

Is there a way to detect loss of connectivity during a transfer, or can we set a maximum time that a transfer is allowed to take before it kills itself and tries again? The current retry mechanic does not do anything for these kind of mid-download disconnects.

Allow diff-tar to accept a target

Currently the filepath to which conda-diff-tar writes the reference.json file, and the path to which it writes the update.tar file are hardcoded. Would be nice if these could be passed as an argument.

Script doesn't handle versions that serves both *.tar.bz2 and *.conda

From the validation function logs "Validating XXXX of YYYY", we observed that...

  • Older Python versions that only serves *.tar.bz2 is captured
  • Newer Python versions that serves in both format is captured
  • Repo like libdeflate is handled for all versions, they serve only 1 format in each version

--dry-run creates target directory

Not a big deal, but I was surprised to see that --dry-run will create the target directory if it does not already exist.

I usually expect --dry-run options to not modify the file system (except perhaps to create/remove temp files).

Support custom channel settings

Any custom channel aliases configured in the user's .condarc file should be recognized so that users can write the alias instead of having to write out the full URL.

E.g. If you have:

custom_channels:
   my-channel: https://my-custom-server.com/

you should be able to specify my-channel as the channel rather than https://my-custom-server.com/my-channel

--dry-run should imply more verbosity

There isn't much point in specifying the --dry-run option without also turning the logging verbosity up to see what packages are to be downloaded. We may as well do it automatically.

Allow required options to come from config file

The code currently checks for required arguments when parsing arguments but before processing the config file, so it is not possible to simply specify the channel, platform or target directory in the config file.

Is this project still being maintained?

There haven't been any commits for a long time and there are three old pending pull requests.

If pull requests aren't going to be merged, then there is really no point in going to the trouble of submitting them and I might as well just work independently on my own fork.

Create logo

We could simply have a mirror where we have a conda logo in front and on in a mirror. Issue is probably that the Conda logo is trademarked, so having a package in front and in the mirror itself might be a better choice from the legal side.

Activate CI

There is currently a working Travis setup. We might want to move to a different CI or use a different setup but should first enable Travis.

@scopatz Travis isn't showing the project for me, can you have a look?

Include packages for removal when using conda-diff-tar

In order to truly manage a mirror, it is also important to know which packages have been removed from the remote mirror. Suggestion is to include these packages in updates.tar as some kind of summary.json, so one may choose to delete those packages if deisred.

Support for using Conda/Mamba solver to recursively find dependencies

I've got a proof-of-concept of using the Mamba/Conda dependency solver to find the full set of dependencies for a set of package specifications:
https://github.com/manics/conda-mirror/tree/conda-mamba-downloader

The main advantage of this is it only downloads one version of each each package or dependency. The corresponding disadvantage is you can't whitelist multiple versions of a package, so if you're adding to an existing set of downloaded packages you may not be able to install the new packages alongside them. This is acceptable for me and is easier than trying to restrict the number of versions of everything.

I've done this by using the mamba/conda dependency solver to provide a list of package filenames (*.tar.bz2), and passing this into conda-mirror to handle downloads, validation and updating repodata.json.

I've effectively written a wrapper script https://github.com/manics/conda-mirror/blob/conda-mamba-downloader/conda_downloader/download.py
which calls the solver: https://github.com/manics/conda-mirror/blob/bc1eca6bac0ec78ee35d2f2ce833b7fcbe794a82/conda_mamba_downloader/download.py#L122-L143
and wraps a call to conda_mirror.main(): https://github.com/manics/conda-mirror/blob/bc1eca6bac0ec78ee35d2f2ce833b7fcbe794a82/conda_mamba_downloader/download.py#L203-L229

I made one change to conda-mirror to use the filename as an additional match key:
manics@2bad966

Do you have any thoughts on how/whether this should work with conda-mirror? As a standalone change adding the filename key so conda-mirror can be used as a backend library for this, or add the full solver functionality (with limitations) to conda-mirror?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.