vericast / conda-mirror Goto Github PK
View Code? Open in Web Editor NEWMirror upstream conda channels
License: BSD 3-Clause "New" or "Revised" License
Mirror upstream conda channels
License: BSD 3-Clause "New" or "Revised" License
As the maintainer of the conda-mirror infrastructure at MaxPoint I want to be able to update my yaml config file and see what is going to happen as a result without actually running the full mirror. A --dry-run
flag would be a reasonable way to do this. My thoughts are that enabling this flag would do the following:
I'm using conda-mirror for air-gapped computers. (I know it's a special use case rather than just mirroring.)
Config look like:
blacklist:
- name: '*'
platform: osx-64
target_directory: mirrored
upstream_channel: https://repo.continuum.io/pkgs/main/
whitelist:
- build: mkl
name: _tflow_select
version: 2.3.0
- build: py36_0
name: absl-py
version: 0.7.0
- build: py36_0
name: astor
version: 0.7.1
- ...
But after conda-mirror
ing, I can't install whitelisted packages via mirrored, which weren't covered the dependencies.
I'm struggling with an uplink that has stalling HTTP connections from time to time. Currently, conda-mirror
just seems to wait forever if this happens during a download. Is there any easy way of adding a timeout / retry on timeout here?
Related #63
platform: anaconda2018.12 @ win10
the error shows up in conda-mirror.py line 540:
logger.info('Validating {:4d} of {:4d}: {}.'.format(num + 1, num_packages, package))
the notype error came from the 'package'
Thanks! This tool is awesome! Now I want to get the up-to-date version of conda pkgs and, is there possibility that I could check then just download the updated part of pkgs but not all of the pkgs?
On a fresh install of Miniconda on Linux Mint 18, 64 bit.
Attempting to create a mirror of conda-forge using "conda-mirror".
A 20 GB file system was prepared and mounted:
richard@goldlaptop /CondaMirror $ df -h ./
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_goldlaptop_b-lv_CondaMirror 20G 44M 19G 1% /CondaMirror
Conda mirror was installed:
conda install conda-mirror -c conda-forge
A mirror configuration file was created:
richard@goldlaptop /CondaMirror $ cat conda-mirror.conf
blacklist:
- license: "*agpl*"
whitelist:
- name: system
An attempt to create a mirror of conda-forge was made, which gave an SSL error.
richard@goldlaptop /CondaMirror $ conda-mirror --upstream-channel conda-forge --target-directory /CondaMirror/conda-forge --platform linux-64 --config /CondaMirror/conda-mirror.conf
Log level set to ERROR
Traceback (most recent call last):
File "/home/richard/miniconda3/lib/python3.6/site-packages/urllib3/contrib/pyopenssl.py", line 441, in wrap_socket
cnx.do_handshake()
File "/home/richard/miniconda3/lib/python3.6/site-packages/OpenSSL/SSL.py", line 1716, in do_handshake
self._raise_ssl_error(self._ssl, result)
File "/home/richard/miniconda3/lib/python3.6/site-packages/OpenSSL/SSL.py", line 1449, in _raise_ssl_error
raise SysCallError(-1, "Unexpected EOF")
OpenSSL.SSL.SysCallError: (-1, 'Unexpected EOF')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/richard/miniconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/home/richard/miniconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 346, in _make_request
self._validate_conn(conn)
File "/home/richard/miniconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 850, in _validate_conn
conn.connect()
File "/home/richard/miniconda3/lib/python3.6/site-packages/urllib3/connection.py", line 326, in connect
ssl_context=context)
File "/home/richard/miniconda3/lib/python3.6/site-packages/urllib3/util/ssl_.py", line 329, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "/home/richard/miniconda3/lib/python3.6/site-packages/urllib3/contrib/pyopenssl.py", line 448, in wrap_socket
raise ssl.SSLError('bad handshake: %r' % e)
ssl.SSLError: ("bad handshake: SysCallError(-1, 'Unexpected EOF')",)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/richard/miniconda3/lib/python3.6/site-packages/requests/adapters.py", line 440, in send
timeout=timeout
File "/home/richard/miniconda3/lib/python3.6/site-packages/urllib3/connectionpool.py", line 639, in urlopen
_stacktrace=sys.exc_info()[2])
File "/home/richard/miniconda3/lib/python3.6/site-packages/urllib3/util/retry.py", line 388, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='conda.anaconda.org', port=443): Max retries exceeded with url: //conda-forge/linux-64/airflow-1.8.0-py35_1.tar.bz2 (Caused by SSLError(SSLError("bad handshake: SysCallError(-1, 'Unexpected EOF')",),))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/richard/miniconda3/bin/conda-mirror", line 6, in <module>
sys.exit(conda_mirror.conda_mirror.cli())
File "/home/richard/miniconda3/lib/python3.6/site-packages/conda_mirror/conda_mirror.py", line 261, in cli
main(**_parse_and_format_args())
File "/home/richard/miniconda3/lib/python3.6/site-packages/conda_mirror/conda_mirror.py", line 685, in main
_download(url, download_dir)
File "/home/richard/miniconda3/lib/python3.6/site-packages/conda_mirror/conda_mirror.py", line 380, in _download
ret = requests.get(url, stream=True)
File "/home/richard/miniconda3/lib/python3.6/site-packages/requests/api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "/home/richard/miniconda3/lib/python3.6/site-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/home/richard/miniconda3/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "/home/richard/miniconda3/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "/home/richard/miniconda3/lib/python3.6/site-packages/requests/adapters.py", line 506, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='conda.anaconda.org', port=443): Max retries exceeded with url: //conda-forge/linux-64/airflow-1.8.0-py35_1.tar.bz2 (Caused by SSLError(SSLError("bad handshake: SysCallError(-1, 'Unexpected EOF')",),))
richard@goldlaptop /CondaMirror $
The connection to the internet is working well at this time.
Can you review my actions and suggest what I may have done incorrectly.
Seems the home doesn't point to this repo. Should it?
Hello,
since a few days we have an issue that the sha256 checksums are not currectly written into the repodata.json
This results in the following message on client side:
CondaMultiError: ChecksumMismatchError: Conda detected a mismatch between the expected content and downloaded content
for url 'http://internal-mirror/conda/anaconda/linux-64/pyqt-5.9.2-py27h05f1152_2.tar.bz2'.
download saved to:[...] pyqt-5.9.2-py27h05f1152_2.tar.bz2
expected sha256: 9aa0fa86d8331b06286f0da0b1bef3d32780d65a69c29433a434b46abdb84e13
actual sha256: 999239e84ec2163de3094909cd4c05dd18fa28182a03bead3f5f3c4e9f853f58
(happened with several packages)
From what we see the download is correct and the upstream repodata.json includes the correct sha256 checksum. also the md5sum is correct.
we updated now from 0.7.2. to 0.8.0, but the issue remains
There has been a user report of the following stack trace:
INFO: download_url=https://anaconda.org/conda-forge/gdal/2.1.1/download/win-64/gdal-2.1.1-np111py34_4.tar.bz2
Traceback (most recent call last):
File "/opt/miniconda3/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 253, in _download
tf.write(data)
OSError: [Errno 28] No space left on device
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/miniconda3/bin/conda-mirror", line 11, in <module>
sys.exit(cli())
File "/opt/miniconda3/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 183, in cli
blacklist, whitelist)
File "/opt/miniconda3/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 427, in main
_download(url, download_dir, repodata)
File "/opt/miniconda3/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 253, in _download
tf.write(data)
OSError: [Errno 28] No space left on device
I'm pretty sure that this is because there is not enough space in the /tmp
directory on the host machine where this command was being run.
One way to fix this would be to compute the space required to perform the mirror for all packages that are in the to_mirror
set by using the bytes stored in the size
key in repodata[pkg_name]['size']
. Would need to check that there is enough space in the temp directory located at download_dir
and the final location for these packages at 'local_directory'.
Apparently, https://repo.anaconda.com/pkgs/main now supports a speedier version of the tarballs dubbed ".conda" packages (https://www.anaconda.com/how-we-made-conda-faster-4-7/). It is curious that the "index.html" generated by conda index
does not show them, but you can try and download any of the packages by replacing the .tar.bz2
extension by .conda
, and it works.
It would be cool if this package also mirrored those. Currently, it is bound to mirror only *.tar.bz2
packages.
All,
from a quick look to the code looks like conda-mirror copies all the repository (aka channel) files every time is launched.
Is this correct?
I would be useful to download only missing/new packages in order to save bandwidth.
Thanks
GP
argh. introduced a new bug!
Thu Feb 16 06:00:01 CST 2017
INFO: Loading config from /opt/maxpoint/tools/conda-mirror-config.yaml
INFO: config: {'whitelist': [{'name': 'system'}], 'blacklist': [{'license': '*agpl*'}, {'license': 'None'}, {'license': ''}, {'name': 'marshmallow', 'build': 'py35_0', 'version': '2.10.4'}]}
Log level set to INFO
Traceback (most recent call last):
File "/opt/maxpoint/envs/conda-mirror/bin/conda-mirror", line 11, in <module>
sys.exit(cli())
File "/opt/maxpoint/envs/conda-mirror/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 227, in cli
args.platform, blacklist, whitelist)
File "/opt/maxpoint/envs/conda-mirror/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 493, in main
_validate_packages(possible_packages_to_mirror, local_directory)
File "/opt/maxpoint/envs/conda-mirror/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 364, in _validate_packages
package_metadata = package_repodata[package]
TypeError: 'set' object is not subscriptable
Would be nice to include the LICENSE
in the MANIFEST.in
to ensure it is/can be included in sdist
s and related packages generated from the source.
Any way we could get a new release of conda-mirror on GitHub and the new version uploaded to PyPi and/or conda-forge? (I'd prefer PyPi myself!)
There's been a bunch of very nice/useful commits to master since the latest 0.7.4 release on September 19, 2017 (including some really nice new options to the package)...
It'd be much easier to install the latest from PyPi or conda-forge rather than pulling down the master source!
Seems like there's a nice release.sh script to make making new releases much easier too!
Really great package!!
Thanks!!
conda-mirror unfortunately cannot be used behind a proxy.
So I had a look at your code and figured out, that it's easy to patch thanks to your use of requests.
816-8055/conda-mirror@6d31e67
It appears that the vast majority of the run-time is spent validating package digests. In my last couple of test, it took ~1 hour 15mins for a single platform of pkgs/free
on an ec2 c4.2xlarge instance. This is acceptable but would likely see a near linear speed up with some simple parallelization.
I just discovered this package but I'm unable to to get it run on stock el7 python with either 0.6.5
published on pypi or from master. This appears to be because of the usage of tempfile.TemporaryDirectory
, which is a py >= 3.2 construct. It seems like a py 2.7 compatible tempdir wrapper needs to be used or the import from future can be removed.
$ python --version
Python 2.7.5
$ conda-mirror --version
Log level set to ERROR
0.6.5+10.g37f310c
$ conda-mirror --upstream-channel https://repo.continuum.io/pkgs/free/ --target-directory local_mirror --platform linux-64 -vvv
DEBUG: sys.argv: ['/home/vagrant/venv/bin/conda-mirror', '--upstream-channel', 'https://repo.continuum.io/pkgs/free/', '--target-directory', 'local_mirror', '--platform', 'linux-64', '-vvv']
DEBUG: download_template=https://repo.continuum.io/pkgs/{channel}/{platform}/{file_name}. channel=free
DEBUG: true blacklist
DEBUG: []
DEBUG: possible_packages_to_mirror
DEBUG: [u'_license-1.1-py27_0.tar.bz2',
u'_license-1.1-py27_1.tar.bz2',
...
u'zope.sqlalchemy-0.7.7-py36_0.tar.bz2']
DEBUG: download_template=https://repo.continuum.io/pkgs/{channel}/{platform}/{file_name}. channel=free
Log level set to DEBUG
Traceback (most recent call last):
File "/home/vagrant/venv/bin/conda-mirror", line 9, in <module>
load_entry_point('conda-mirror==0.6.5-10.g37f310c', 'console_scripts', 'conda-mirror')()
File "/home/vagrant/venv/lib/python2.7/site-packages/conda_mirror/conda_mirror.py", line 227, in cli
args.platform, blacklist, whitelist)
File "/home/vagrant/venv/lib/python2.7/site-packages/conda_mirror/conda_mirror.py", line 513, in main
with tempfile.TemporaryDirectory(dir=temp_directory) as download_dir:
AttributeError: 'module' object has no attribute 'TemporaryDirectory'
I recently hit an issue where an exception was unhandled when validating a file that
was, for unknown reasons, not a valid tar file. Traceback was:
Traceback (most recent call last):
File "/miniconda/envs/conda-mirror/lib/python3.5/site-packages/conda_mirror/conda_mirror.py", line 303, in _validate
t.extractfile('info/index.json').read().decode('utf-8')
File "/miniconda/envs/conda-mirror/lib/python3.5/tarfile.py", line 2066, in extractfile
tarinfo = self.getmember(member)
File "/miniconda/envs/conda-mirror/lib/python3.5/tarfile.py", line 1741, in getmember
tarinfo = self._getmember(name)
File "/miniconda/envs/conda-mirror/lib/python3.5/tarfile.py", line 2321, in _getmember
members = self.getmembers()
File "/miniconda/envs/conda-mirror/lib/python3.5/tarfile.py", line 1752, in getmembers
self._load() # all members, we first have to
File "/miniconda/envs/conda-mirror/lib/python3.5/tarfile.py", line 2344, in _load
tarinfo = self.next()
File "/miniconda/envs/conda-mirror/lib/python3.5/tarfile.py", line 2275, in next
self.fileobj.seek(self.offset - 1)
File "/miniconda/envs/conda-mirror/lib/python3.5/bz2.py", line 277, in seek
return self._buffer.seek(offset, whence)
File "/miniconda/envs/conda-mirror/lib/python3.5/_compression.py", line 143, in seek
data = self.read(min(io.DEFAULT_BUFFER_SIZE, offset))
File "/miniconda/envs/conda-mirror/lib/python3.5/_compression.py", line 99, in read
raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached
There seems to have been something wrong with the file downloaded – have yet to reproduce what. Possibly our caching proxy helped make things worse…
However, the file should have failed validation, been removed and conda-mirror continued rather than giving up completely.
Will submit a small patch that resolves this by handling EOFError.
In the case of packages that are built for various Python versions (e.g. botocore: py27, py35, py36, py37…), is it possible to sync only the package builds for the specified Python version? cas-mirror
allows it using configuration key python_versions
.
Is it possible (or could it be possible) to specify multiple platforms at once?
I'm imagining lots of reasons why this might be inadvisable, but is there a way (or reasons why not to) mirror the Anaconda "pro" repository? Seems the "free" repository is the only one to mirror when referring to the 'defaults' or 'anaconda' repo, but when I look at anaconda.org and search for a package under "pro" (e.g. accelerate) it appears as being under the "anaconda" owner. Recommendations or thoughts to work around this?
Through reviewing #45, I've discovered a bug in the current implementation of the mirror. The problematic aspect is as follows:
_validate_packages
can remove packages on disk that exist in the local repodata.json fileAs such, this code needs to be changed so that
_validate_packages
returns a list of packages to removeWith the above changes we will reduce the chance that the user will encounter an error from conda saying that the file cannot be found on the conda server. This is not a blocking issue on getting this PR merged. I'll fix this problem in a follow-on PR
Need to ape parts of the read_index_tar
function inside of conda-build that reads a conda package and extracts the recipe/info.json
. Things to compare out of the info.json are "size", "md5" if it exists and "sha256" if it exists
As implemented, the concurrent package validation chunks the input list of packages to validate. This generally results in the package validation going a whole lot faster but also causes one executor to be stuck with a group of beefy packages to validate. The net result is a long tail at the end of the package validation where on executor is running a bunch of these slow-to-validate packages at the end. I think that shuffling the order (with random.shuffle
) will distribute these beefy packages more reliably across all executors. Definitely a much smaller optimization than the implementation of concurrent package validation.
As the maintainer of conda-mirror infrastructure at MaxPoint, I want to see the packages get validated in alphabetical order instead of the random order that happens now.
Hello,
I have a little struggle with conda-mirror:
We use environments where we sometimes define package versions.
Unfortunately conda-mirror deleted one of them, since it was "to old"
Is there a way to keep old packages, or even better: have a blacklist of packages not to remove?
Can be reproduced by having a package that fails size validation. It then gets removed and the md5 check proceeds but the file is already gone so it blows up on a FileNotFoundError
Expected: 'keras-applications' (1.0.7) can be fetched.
Actually Happened: 'keras-applications' (1.0.7) are in 'noarch', so can't be fetched with conda-mirror.
(https://repo.continuum.io/pkgs/main/noarch/keras-applications-1.0.7-py_0.tar.bz2)
Needs to include noarch
in DEFAULT_PLATFORMS
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.