Giter Club home page Giter Club logo

conda-mirror's People

Contributors

diogocp avatar dmkent avatar ericdill avatar jakirkham avatar jneines avatar magnuhho avatar mariusvniekerk avatar nicoddemus avatar opiethehokie avatar parente avatar willirath avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

conda-mirror's Issues

Add a --dry-run mode

As the maintainer of the conda-mirror infrastructure at MaxPoint I want to be able to update my yaml config file and see what is going to happen as a result without actually running the full mirror. A --dry-run flag would be a reasonable way to do this. My thoughts are that enabling this flag would do the following:

  1. Use the target yaml config file to compute the packages that we want to have locally
  2. Compare the contents of the target mirror directory with the packages that we want to have that was computed in (1)
  3. If no verbosity (-v) flags are provided, assume that the user is not interested in exactly what packages are going to be mirrored and show a summary statistic of the number of files that will be downloaded, the total download size and the number of files that will be removed and the total size of packages removed. The package size can be obtained from the "size" key in each packages repodata entry.
  4. If any verbosity flag is provided, show all of the packages that are going to be removed and all of the packages that are going to be downloaded in addition to (3)

Fetching necessary dependencies for the whitelist.

I'm using conda-mirror for air-gapped computers. (I know it's a special use case rather than just mirroring.)

Config look like:

blacklist:
- name: '*'
platform: osx-64
target_directory: mirrored
upstream_channel: https://repo.continuum.io/pkgs/main/
whitelist:
- build: mkl
  name: _tflow_select
  version: 2.3.0
- build: py36_0
  name: absl-py
  version: 0.7.0
- build: py36_0
  name: astor
  version: 0.7.1
- ...

But after conda-mirroring, I can't install whitelisted packages via mirrored, which weren't covered the dependencies.

How to catch stalling HTTP connections?

I'm struggling with an uplink that has stalling HTTP connections from time to time. Currently, conda-mirror just seems to wait forever if this happens during a download. Is there any easy way of adding a timeout / retry on timeout here?

NOTYPE error of 'package'

platform: anaconda2018.12 @ win10

the error shows up in conda-mirror.py line 540:
logger.info('Validating {:4d} of {:4d}: {}.'.format(num + 1, num_packages, package))

the notype error came from the 'package'

Question about Incremental Syncing

Thanks! This tool is awesome! Now I want to get the up-to-date version of conda pkgs and, is there possibility that I could check then just download the updated part of pkgs but not all of the pkgs?

SSL Issue when creating mirror

On a fresh install of Miniconda on Linux Mint 18, 64 bit.

Attempting to create a mirror of conda-forge using "conda-mirror".

A 20 GB file system was prepared and mounted:

richard@goldlaptop /CondaMirror $ df -h ./
Filesystem                                  Size  Used Avail Use% Mounted on
/dev/mapper/vg_goldlaptop_b-lv_CondaMirror   20G   44M   19G   1% /CondaMirror

Conda mirror was installed:

conda install conda-mirror -c conda-forge

A mirror configuration file was created:

richard@goldlaptop /CondaMirror $ cat conda-mirror.conf 
blacklist:
    - license: "*agpl*"

whitelist:
    - name: system

An attempt to create a mirror of conda-forge was made, which gave an SSL error.

richard@goldlaptop /CondaMirror $ conda-mirror --upstream-channel conda-forge --target-directory /CondaMirror/conda-forge --platform linux-64 --config /CondaMirror/conda-mirror.conf 
Log level set to ERROR
Traceback (most recent call last):
  File "/home/richard/miniconda3/lib/python3.6/site-packages/urllib3/contrib/pyopenssl.py", line 441, in wrap_socket
	cnx.do_handshake()
  File "/home/richard/miniconda3/lib/python3.6/site-packages/OpenSSL/SSL.py", line 1716, in do_handshake
	self._raise_ssl_error(self._ssl, result)
  File "/home/richard/miniconda3/lib/python3.6/site-packages/OpenSSL/SSL.py", line 1449, in _raise_ssl_error
	raise SysCallError(-1, "Unexpected EOF")
OpenSSL.SSL.SysCallError: (-1, 'Unexpected EOF')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/richard/miniconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
	chunked=chunked)
  File "/home/richard/miniconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 346, in _make_request
	self._validate_conn(conn)
  File "/home/richard/miniconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 850, in _validate_conn
	conn.connect()
  File "/home/richard/miniconda3/lib/python3.6/site-packages/urllib3/connection.py", line 326, in connect
	ssl_context=context)
  File "/home/richard/miniconda3/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 329, in ssl_wrap_socket
	return context.wrap_socket(sock, server_hostname=server_hostname)
  File "/home/richard/miniconda3/lib/python3.6/site-packages/urllib3/contrib/pyopenssl.py", line 448, in wrap_socket
	raise ssl.SSLError('bad handshake: %r' % e)
ssl.SSLError: ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/richard/miniconda3/lib/python3.6/site-packages/requests/adapters.py", line 440, in send
	timeout=timeout
  File "/home/richard/miniconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 639, in urlopen
	_stacktrace=sys.exc_info()[2])
  File "/home/richard/miniconda3/lib/python3.6/site-packages/urllib3/util/retry.py", line 388, in increment
	raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='conda.anaconda.org', port=443): Max retries exceeded with url: //conda-forge/linux-64/airflow-1.8.0-py35_1.tar.bz2 (Caused by SSLError(SSLError("bad handshake: SysCallError(-1, 'Unexpected EOF')",),))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/richard/miniconda3/bin/conda-mirror", line 6, in <module>
	sys.exit(conda_mirror.conda_mirror.cli())
  File "/home/richard/miniconda3/lib/python3.6/site-packages/conda_mirror/conda_mirror.py", line 261, in cli
	main(**_parse_and_format_args())
  File "/home/richard/miniconda3/lib/python3.6/site-packages/conda_mirror/conda_mirror.py", line 685, in main
	_download(url, download_dir)
  File "/home/richard/miniconda3/lib/python3.6/site-packages/conda_mirror/conda_mirror.py", line 380, in _download
	ret = requests.get(url, stream=True)
  File "/home/richard/miniconda3/lib/python3.6/site-packages/requests/api.py", line 72, in get
	return request('get', url, params=params, **kwargs)
  File "/home/richard/miniconda3/lib/python3.6/site-packages/requests/api.py", line 58, in request
	return session.request(method=method, url=url, **kwargs)
  File "/home/richard/miniconda3/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
	resp = self.send(prep, **send_kwargs)
  File "/home/richard/miniconda3/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
	r = adapter.send(request, **kwargs)
  File "/home/richard/miniconda3/lib/python3.6/site-packages/requests/adapters.py", line 506, in send
	raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='conda.anaconda.org', port=443): Max retries exceeded with url: //conda-forge/linux-64/airflow-1.8.0-py35_1.tar.bz2 (Caused by SSLError(SSLError("bad handshake: SysCallError(-1, 'Unexpected EOF')",),))
richard@goldlaptop /CondaMirror $ 

The connection to the internet is working well at this time.

Can you review my actions and suggest what I may have done incorrectly.

Issues with sha256 checksums

Hello,
since a few days we have an issue that the sha256 checksums are not currectly written into the repodata.json
This results in the following message on client side:


CondaMultiError: ChecksumMismatchError: Conda detected a mismatch between the expected content and downloaded content
for url 'http://internal-mirror/conda/anaconda/linux-64/pyqt-5.9.2-py27h05f1152_2.tar.bz2'.
  download saved to:[...] pyqt-5.9.2-py27h05f1152_2.tar.bz2
  expected sha256: 9aa0fa86d8331b06286f0da0b1bef3d32780d65a69c29433a434b46abdb84e13
  actual sha256: 999239e84ec2163de3094909cd4c05dd18fa28182a03bead3f5f3c4e9f853f58

(happened with several packages)
From what we see the download is correct and the upstream repodata.json includes the correct sha256 checksum. also the md5sum is correct.
we updated now from 0.7.2. to 0.8.0, but the issue remains

Validate that there is enough space to actually perform the mirror

There has been a user report of the following stack trace:

INFO: download_url=https://anaconda.org/conda-forge/gdal/2.1.1/download/win-64/gdal-2.1.1-np111py34_4.tar.bz2
Traceback (most recent call last):
  File "/opt/miniconda3/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 253, in _download
    tf.write(data)
OSError: [Errno 28] No space left on device
 
During handling of the above exception, another exception occurred:
 
Traceback (most recent call last):
  File "/opt/miniconda3/bin/conda-mirror", line 11, in <module>
    sys.exit(cli())
  File "/opt/miniconda3/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 183, in cli
    blacklist, whitelist)
  File "/opt/miniconda3/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 427, in main
    _download(url, download_dir, repodata)
  File "/opt/miniconda3/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 253, in _download
    tf.write(data)
OSError: [Errno 28] No space left on device

I'm pretty sure that this is because there is not enough space in the /tmp directory on the host machine where this command was being run.

One way to fix this would be to compute the space required to perform the mirror for all packages that are in the to_mirror set by using the bytes stored in the size key in repodata[pkg_name]['size']. Would need to check that there is enough space in the temp directory located at download_dir and the final location for these packages at 'local_directory'.

Support for .conda package mirroring

Apparently, https://repo.anaconda.com/pkgs/main now supports a speedier version of the tarballs dubbed ".conda" packages (https://www.anaconda.com/how-we-made-conda-faster-4-7/). It is curious that the "index.html" generated by conda index does not show them, but you can try and download any of the packages by replacing the .tar.bz2 extension by .conda, and it works.

It would be cool if this package also mirrored those. Currently, it is bound to mirror only *.tar.bz2 packages.

sync only new packages

All,

from a quick look to the code looks like conda-mirror copies all the repository (aka channel) files every time is launched.
Is this correct?

I would be useful to download only missing/new packages in order to save bandwidth.

Thanks
GP

TypeError: 'set' object is not subscriptable

argh. introduced a new bug!

Thu Feb 16 06:00:01 CST 2017
INFO: Loading config from /opt/maxpoint/tools/conda-mirror-config.yaml
INFO: config: {'whitelist': [{'name': 'system'}], 'blacklist': [{'license': '*agpl*'}, {'license': 'None'}, {'license': ''}, {'name': 'marshmallow', 'build': 'py35_0', 'version': '2.10.4'}]}
Log level set to INFO
Traceback (most recent call last):
  File "/opt/maxpoint/envs/conda-mirror/bin/conda-mirror", line 11, in <module>
    sys.exit(cli())
  File "/opt/maxpoint/envs/conda-mirror/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 227, in cli
    args.platform, blacklist, whitelist)
  File "/opt/maxpoint/envs/conda-mirror/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 493, in main
    _validate_packages(possible_packages_to_mirror, local_directory)
  File "/opt/maxpoint/envs/conda-mirror/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 364, in _validate_packages
    package_metadata = package_repodata[package]
TypeError: 'set' object is not subscriptable

Package license

Would be nice to include the LICENSE in the MANIFEST.in to ensure it is/can be included in sdists and related packages generated from the source.

New release/version in pypi/conda-forge?

Any way we could get a new release of conda-mirror on GitHub and the new version uploaded to PyPi and/or conda-forge? (I'd prefer PyPi myself!)

There's been a bunch of very nice/useful commits to master since the latest 0.7.4 release on September 19, 2017 (including some really nice new options to the package)...

It'd be much easier to install the latest from PyPi or conda-forge rather than pulling down the master source!

Seems like there's a nice release.sh script to make making new releases much easier too!

Really great package!!

Thanks!!

concurrent package validation

It appears that the vast majority of the run-time is spent validating package digests. In my last couple of test, it took ~1 hour 15mins for a single platform of pkgs/free on an ec2 c4.2xlarge instance. This is acceptable but would likely see a near linear speed up with some simple parallelization.

Does not run under py 2.7

I just discovered this package but I'm unable to to get it run on stock el7 python with either 0.6.5 published on pypi or from master. This appears to be because of the usage of tempfile.TemporaryDirectory, which is a py >= 3.2 construct. It seems like a py 2.7 compatible tempdir wrapper needs to be used or the import from future can be removed.

$ python --version
Python 2.7.5
$ conda-mirror --version
Log level set to ERROR
0.6.5+10.g37f310c
$ conda-mirror --upstream-channel https://repo.continuum.io/pkgs/free/ --target-directory local_mirror --platform linux-64 -vvv
DEBUG: sys.argv: ['/home/vagrant/venv/bin/conda-mirror', '--upstream-channel', 'https://repo.continuum.io/pkgs/free/', '--target-directory', 'local_mirror', '--platform', 'linux-64', '-vvv']
DEBUG: download_template=https://repo.continuum.io/pkgs/{channel}/{platform}/{file_name}. channel=free
DEBUG: true blacklist
DEBUG: []
DEBUG: possible_packages_to_mirror
DEBUG: [u'_license-1.1-py27_0.tar.bz2',
 u'_license-1.1-py27_1.tar.bz2',
...
 u'zope.sqlalchemy-0.7.7-py36_0.tar.bz2']
DEBUG: download_template=https://repo.continuum.io/pkgs/{channel}/{platform}/{file_name}. channel=free
Log level set to DEBUG
Traceback (most recent call last):
  File "/home/vagrant/venv/bin/conda-mirror", line 9, in <module>
    load_entry_point('conda-mirror==0.6.5-10.g37f310c', 'console_scripts', 'conda-mirror')()
  File "/home/vagrant/venv/lib/python2.7/site-packages/conda_mirror/conda_mirror.py", line 227, in cli
    args.platform, blacklist, whitelist)
  File "/home/vagrant/venv/lib/python2.7/site-packages/conda_mirror/conda_mirror.py", line 513, in main
    with tempfile.TemporaryDirectory(dir=temp_directory) as download_dir:
AttributeError: 'module' object has no attribute 'TemporaryDirectory'

Validation errors caused by incomplete tar file

I recently hit an issue where an exception was unhandled when validating a file that
was, for unknown reasons, not a valid tar file. Traceback was:

    Traceback (most recent call last):
      File "/miniconda/envs/conda-mirror/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 303, in _validate
        t.extractfile('info/index.json').read().decode('utf-8')
      File "/miniconda/envs/conda-mirror/lib/python3.5/tarfile.py", line 2066, in extractfile
        tarinfo = self.getmember(member)
      File "/miniconda/envs/conda-mirror/lib/python3.5/tarfile.py", line 1741, in getmember
        tarinfo = self._getmember(name)
      File "/miniconda/envs/conda-mirror/lib/python3.5/tarfile.py", line 2321, in _getmember
        members = self.getmembers()
      File "/miniconda/envs/conda-mirror/lib/python3.5/tarfile.py", line 1752, in getmembers
        self._load()        # all members, we first have to
      File "/miniconda/envs/conda-mirror/lib/python3.5/tarfile.py", line 2344, in _load
        tarinfo = self.next()
      File "/miniconda/envs/conda-mirror/lib/python3.5/tarfile.py", line 2275, in next
        self.fileobj.seek(self.offset - 1)
      File "/miniconda/envs/conda-mirror/lib/python3.5/bz2.py", line 277, in seek
        return self._buffer.seek(offset, whence)
      File "/miniconda/envs/conda-mirror/lib/python3.5/_compression.py", line 143, in seek
        data = self.read(min(io.DEFAULT_BUFFER_SIZE, offset))
      File "/miniconda/envs/conda-mirror/lib/python3.5/_compression.py", line 99, in read
        raise EOFError("Compressed file ended before the "
    EOFError: Compressed file ended before the end-of-stream marker was reached

There seems to have been something wrong with the file downloaded – have yet to reproduce what. Possibly our caching proxy helped make things worse…

However, the file should have failed validation, been removed and conda-mirror continued rather than giving up completely.

Will submit a small patch that resolves this by handling EOFError.

Allow to filter packages based on the target Python version

In the case of packages that are built for various Python versions (e.g. botocore: py27, py35, py36, py37…), is it possible to sync only the package builds for the specified Python version? cas-mirror allows it using configuration key python_versions.

[BUG] Problematic implementation of the mirror

Through reviewing #45, I've discovered a bug in the current implementation of the mirror. The problematic aspect is as follows:

  1. _validate_packages can remove packages on disk that exist in the local repodata.json file
  2. The local repodata.json file is not updated before that problematic file is removed
  3. This means that a user of the condaserver can ask for a package that conda thinks it has (since it is still in repodata.json) but that it can't find (because it's been removed from disk)

As such, this code needs to be changed so that

  1. _validate_packages returns a list of packages to remove
  2. a new dict of package metadata is created that does not contain the packages that we are going to remove
  3. That new package metadata dict is written to disk as an atomic operation
  4. The problematic files are removed

With the above changes we will reduce the chance that the user will encounter an error from conda saying that the file cannot be found on the conda server. This is not a blocking issue on getting this PR merged. I'll fix this problem in a follow-on PR

Remove conda_build as a dependency

Need to ape parts of the read_index_tar function inside of conda-build that reads a conda package and extracts the recipe/info.json. Things to compare out of the info.json are "size", "md5" if it exists and "sha256" if it exists

[Optimization] Shuffle package validation order before validating

As implemented, the concurrent package validation chunks the input list of packages to validate. This generally results in the package validation going a whole lot faster but also causes one executor to be stuck with a group of beefy packages to validate. The net result is a long tail at the end of the package validation where on executor is running a bunch of these slow-to-validate packages at the end. I think that shuffling the order (with random.shuffle) will distribute these beefy packages more reliably across all executors. Definitely a much smaller optimization than the implementation of concurrent package validation.

Sort packages for validation

As the maintainer of conda-mirror infrastructure at MaxPoint, I want to see the packages get validated in alphabetical order instead of the random order that happens now.

keep old packages

Hello,
I have a little struggle with conda-mirror:
We use environments where we sometimes define package versions.
Unfortunately conda-mirror deleted one of them, since it was "to old"
Is there a way to keep old packages, or even better: have a blacklist of packages not to remove?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.