Giter Club home page Giter Club logo

tunasync-scripts's Introduction

tunasync-scripts

Custom scripts for mirror jobs

LICENCE

This program is free software: you can redistribute it and/or modify    
it under the terms of the GNU General Public License as published by    
the Free Software Foundation, either version 3 of the License, or    
(at your option) any later version.    

This program is distributed in the hope that it will be useful,    
but WITHOUT ANY WARRANTY; without even the implied warranty of    
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the    
GNU General Public License for more details.    

You should have received a copy of the GNU General Public License    
along with this program.  If not, see <http://www.gnu.org/licenses/>.

tunasync-scripts's People

Contributors

abcfy2 avatar alienzj avatar bigeagle avatar dramforever avatar happyaron avatar harry-chen avatar huiyiqun avatar jiegec avatar johnnychen94 avatar kmxz avatar ksqsf avatar njzjz avatar peterjc123 avatar qwe7002 avatar qy117121 avatar red54 avatar robberphex avatar shankerwangmiao avatar sparkcyf avatar sssxie avatar taoky avatar terrorjack avatar wangling12 avatar wangyonghong avatar wzb198910ab avatar xavieryao avatar xfoxfu avatar z4yx avatar zenithalhourlyrate avatar zhenruyan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tunasync-scripts's Issues

How to set "TUNASYNC_WORKING_DIR" as a customize download folder

Hello, I clone tunasync-scripts to local by gh, and here I only want to sync part of repos by using bash file insde the script tar ball. For example, here I want to use proxmox.sh to rsync proxmox repos to local OS, how should I set variable TUNASYNC_WORKING_DIR in the script to a customize path. Is there any conf file to define TUNASYNC_WORKING_DIR in a central path? Currently I define the proxmox.sh script, but I don't want to change the structure the scripts pool inside tunasync-scripts fodler.

intel channel 404

~/.condarc 配置如下:
--------------------------------------
channels:

show_channel_urls: true

default_channels:

custom_channels:
conda-forge: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
bioconda: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud
intel: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud

--------------------------------------
出现404

Collecting package metadata (current_repodata.json): failed

UnavailableInvalidChannel: The channel is not accessible or is invalid.
channel name: intel
channel url: https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/intel
error code: 404

You will need to adjust your conda configuration to proceed.
Use conda config --show channels to view your configuration's current state,
and use conda config --show-sources to view config file locations.

LLVM 同步没有支持 debian 11 (Bullseye)

Bullseye (Debian 11 - next stable) - Last update : Wed, 18 Aug 2021 01:33:06 UTC / Revision: 20210817071721+198e6771e24f
deb http://apt.llvm.org/bullseye/ llvm-toolchain-bullseye main
deb-src http://apt.llvm.org/bullseye/ llvm-toolchain-bullseye main

12

deb http://apt.llvm.org/bullseye/ llvm-toolchain-bullseye-12 main
deb-src http://apt.llvm.org/bullseye/ llvm-toolchain-bullseye-12 main

13

deb http://apt.llvm.org/bullseye/ llvm-toolchain-bullseye-13 main
deb-src http://apt.llvm.org/bullseye/ llvm-toolchain-bullseye-13 main

Anaconda: Installer sync failed

remote_filesize = int(r.headers['content-length'])

[2020-08-13 22:36:59,004] [INFO] Syncing installers...
[2020-08-13 22:36:59,005] [INFO] Start syncing https://repo.continuum.io/archive
[2020-08-13 22:37:01,967] [ERROR] Failed to sync installers of archive
Traceback (most recent call last):
  File "/home/scripts/anaconda.py", line 270, in main
    sync_installer(remote_url, local_dir)
  File "/home/scripts/anaconda.py", line 216, in sync_installer
    remote_filesize = int(r.headers['content-length'])
  File "/usr/local/lib/python3.7/dist-packages/requests/structures.py", line 54, in __getitem__
    return self._store[key.lower()][1]
KeyError: 'content-length'

ref: tuna/issues#915

no using issue which can be delete by owner

  • 请问下如果我只保留ubuntu16.04以上的版本,应该怎么写 gen_ubuntu_exclude 的思路呢。
  • 同理,如果我只保留 centos/7 的,资源最大的地方应该是rpm包,是吧,那么这个rpm-url路径的exclude有什么思路呢。
  • exclude_file 被识别是 endwith, 还是正则整个url路径

请问下docker-ce.py同步的问题

rsync同步centos镜像是可以增量对比的,但是docker-ce.py没有增量对比功能,每次运行docker-ce.py都会从头开始每个文件再次对比(已存在的话会Skipping),耗时非常长,一旦某些原因导致连接中断,又要从头来。
请问这是这样的现象吗?是否有什么更好的解决方案?谢谢

Rewrite with python

Exception handling in bash is extremely difficult and in-consistent, so keep calm and rewrite them with python.

Use createrepo_c to replace createrepo

createrepo_c is the C re-implementation of createrepo.

createrepo itself is written in Python, when the --update option is used all nodes are read into RAM, which is affected by the inefficiency of Python's objects storage. This is the reason of the high memory usage of genpkgmetadata process, affecting mysql, grafana, influxdata, kubernets and other yum-sync based repos.

createrepo_c is first included in Debian bullseye, replacing the createrepo package in buster.

本地同步 flutter 镜像后发现少文件,如何处理?

有必要增加GC功能的同步脚本

目前发现一些镜像更新快,小文件多,无法手工清理旧版本文件,有必要增加扫描并清理文件的功能。

  • homebrew-bottle-mirror
  • anaconda.py
  • pub-mirror
  • flutter.sh
  • nix-channels.py
  • rustup-mirror

Ref:

git-recursive脚本同步git仓库时,工作不正常

配置:

#/etc/tunasync/mirrors.conf.d/boost.conf 
[[mirrors]]
name = "boost.git"
provider = "command"
command = "/home/tunasync-scripts/git-recursive.sh"
upstream = "https://github.com/boostorg/boost.git"
docker_image = "tunathu/tunasync-scripts:latest"
size_pattern = "size-pack: ([0-9\\.]+[KMGTP])"
    [mirrors.env]
    MIRROR_BASE_URL="http://mirror.example.com/"
    WORKING_DIR_BASE="/srv/git-mirror/"
    GENERATED_SCRIPT="/srv/git-mirror/boost-git.sh"
    RECURSIVE="1"

日志:http://fars.ee/pGah

worker日志提示该镜像任务Success,但查看内容时发现,实际只同步了url中指定的仓库。而里面submodules的其他仓库没有被正确执行同步。

请问下tunasync使用docker时候在workers文件传入参数有哪些

由于不知道传入参数,我当前使用宿主机调用command,接着在shell里面调用了 docker; 还在抓,但是我不清楚会不会丢失了 size-sum 不了解这个传入机制

/home/scripts/cs2c.sh

#!/bin/bash
# requires: docker 

set -e
set -o pipefail
_here=`dirname $(realpath $0)`

BASE_PATH="${TUNASYNC_WORKING_DIR}"
BASE_URL=${TUNASYNC_UPSTREAM_URL:-"http://update.cs2c.com.cn:8080"}


export REPO_SIZE_FILE=/tmp/reposize.$RANDOM

docker run --rm \
    -v $BASE_PATH:/mirrors/cs2c \
    -v /home/scripts/:/home/scripts/ \
    tunathu/tunasync-scripts \
    /bin/bash /home/scrips/cs2c_ns.sh  $BASE_URL

echo "YUM finished"
"${_here}/helpers/size-sum.sh" $REPO_SIZE_FILE --rm

[linuxbrew-bottles] Failed to download resource "[email protected]"

Use create_repo and apt-mirror to sync nodesource instead of lftp

Currently, deb_14.x/pool/main/n/nodejs/nodejs_14.8.0-deb-1nodesource1_amd64.deb exists on the upstream server. However, it does not show up in https://deb.nodesource.com/node_14.x/pool/main/n/nodejs/

function sync_nodesource() {
repo_url="$1"
repo_dir="$2"
lftp_opts="$3"
[[ ! -d "$repo_dir" ]] && mkdir -p "$repo_dir"
cd $repo_dir
lftp "${repo_url}/" -e "mirror --verbose $lftp_opts -P 5 --delete --only-newer; bye"
}
DEB_BASE_URL="https://deb.nodesource.com"
RPM_BASE_URL="https://rpm.nodesource.com"
node_versions=("0.10" "0.12" "4.x" "6.x" "7.x" "8.x" "9.x" "10.x" "11.x" "12.x" "13.x" "14.x")
declare success=true
for ver in ${node_versions[@]}; do
sync_nodesource "${DEB_BASE_URL}/node_${ver}" "${TUNASYNC_WORKING_DIR}/deb_${ver}" "--exclude db/ --exclude conf/" || success=false

Issues in ftpsync-wrapper.sh

Oct 27 20:47:12 ab497fe801fb ftpsync-debian[7]: Mirrorsync start
Oct 27 20:47:12 ab497fe801fb ftpsync-debian[7]: Running mirrorsync, update is required, /data/mirrors/debian//Archive-Update-Required-nanomirrors.tuna.tsinghua.edu.cn exists
tail: tail: cannot open '/home/log/tunasync/ftpsync/rsync-ftpsync-debian.log' for reading: No such file or directorycannot open '/home/log/tunasync/ftpsync/rsync-ftpsync-debian.error' for reading
: No such file or directorytail: 
no files remaining
tail: no files remaining
/home/bin/ftpsync-wrapper.sh: line 1: kill: (30) - No such process

S3 sync missing files

exec aws --no-sign-request --endpoint-url="${TUNASYNC_S3_ENDPOINT}" s3 sync ${TUNASYNC_AWS_OPTIONS} "${TUNASYNC_UPSTREAM_URL}" "${TUNASYNC_WORKING_DIR}"

ref: tuna/issues#910

The last modify time of py3/redhat/8/x86_64/latest/repodata/repomd.xml on our server is May 4, and is Aug 14 according to the headers of the file in question.

hackage 更新了,需要更新一下脚本了

https://github.com/tuna/tunasync-scripts/blob/master/hackage.sh

cabal 2.0.0.0 更新后出现的

$ cabal update -v3
no user package environment file found at /Users/eccstartup
Trying to locate mirrors via DNS for initial bootstrap of secure repository
'http://hackage.haskell.org/' ...
Searching for nslookup in path.
Found nslookup at /usr/bin/nslookup
/usr/bin/nslookup '-query=TXT' _mirrors.hackage.haskell.org
located 2 mirrors for http://hackage.haskell.org/ :
- http://hackage.fpcomplete.com/
- http://objects-us-west-1.dream.io/hackage-mirror/
Selected mirror http://hackage.haskell.org/
Downloading root
Searching for curl in path.
Found curl at /usr/bin/curl
Searching for powershell in path.
Cannot find powershell on the path
Searching for wget in path.
Found wget at /usr/local/bin/wget
Selected http transport implementation: curl
/usr/bin/curl 'http://hackage.haskell.org/root.json' --output /var/folders/tb/wpytxqpx111fsxk0tg1zsmgm0000gn/T/transportAdapterGet24956-1 --location --write-out '%{http_code}' --user-agent 'cabal-install/2.0.0.0 (osx; x86_64)' --silent --show-error --dump-header /var/folders/tb/wpytxqpx111fsxk0tg1zsmgm0000gn/T/curl-headers24956-2.txt --header 'Cache-Control: no-transform'
Downloading the latest package list from hackage.haskell.org
Selected mirror http://hackage.haskell.org/
Downloading timestamp
/usr/bin/curl 'http://hackage.haskell.org/timestamp.json' --output /var/folders/tb/wpytxqpx111fsxk0tg1zsmgm0000gn/T/transportAdapterGet24956-4 --location --write-out '%{http_code}' --user-agent 'cabal-install/2.0.0.0 (osx; x86_64)' --silent --show-error --dump-header /var/folders/tb/wpytxqpx111fsxk0tg1zsmgm0000gn/T/curl-headers24956-5.txt --header 'Cache-Control: no-transform'
Downloading snapshot
/usr/bin/curl 'http://hackage.haskell.org/snapshot.json' --output /var/folders/tb/wpytxqpx111fsxk0tg1zsmgm0000gn/T/transportAdapterGet24956-7 --location --write-out '%{http_code}' --user-agent 'cabal-install/2.0.0.0 (osx; x86_64)' --silent --show-error --dump-header /var/folders/tb/wpytxqpx111fsxk0tg1zsmgm0000gn/T/curl-headers24956-8.txt --header 'Cache-Control: no-transform'
Downloading mirrors
/usr/bin/curl 'http://hackage.haskell.org/mirrors.json' --output /var/folders/tb/wpytxqpx111fsxk0tg1zsmgm0000gn/T/transportAdapterGet24956-10 --location --write-out '%{http_code}' --user-agent 'cabal-install/2.0.0.0 (osx; x86_64)' --silent --show-error --dump-header /var/folders/tb/wpytxqpx111fsxk0tg1zsmgm0000gn/T/curl-headers24956-11.txt --header 'Cache-Control: no-transform'
Cannot update index (no local copy)
Downloading index
/usr/bin/curl 'http://hackage.haskell.org/01-index.tar.gz' --output /var/folders/tb/wpytxqpx111fsxk0tg1zsmgm0000gn/T/transportAdapterGet24956-13 --location --write-out '%{http_code}' --user-agent 'cabal-install/2.0.0.0 (osx; x86_64)' --silent --show-error --dump-header /var/folders/tb/wpytxqpx111fsxk0tg1zsmgm0000gn/T/curl-headers24956-14.txt --header 'Cache-Control: no-transform'

可以看到,新出现了

注意,最后一个是 01- 不是 00-

其他问题还没有注意到,https://github.com/tuna/tunasync-scripts/blob/master/stackage.py 目前没发现太大影响。

[Julia] update julia sync script

最近把Julia的镜像代码重写了一下,调用方式做了一些改变,所以需要更新一下。

更新:

  • 所有数据全部从上游下载,移除了对 /clones/registries 的需求,移除了对git的需求
  • 加入了一些缓存机制(存储在/julia/static/.cache中)来避免增量同步时的不必要的CPU和IO开销
  • 下载失败的资源现在会记录在 /julia/static/failed_resources.txt里,在24小时内进行增量同步时会略过这里面记录的资源,这样能大大加快增量的效率。
  • 把 timeout 的控制调整到了对每一个资源的请求上
  • 兼容早期的hash机制,避免了大量的 hash mismatch 错误
  • 尝试修复了任务卡住的bug

不再需要 /julia/clones/julia/registries 这两个文件夹了,所以理论上可以把/julia/static 挂载到 /julia下,但不太确定能不能以一种兼容的方式实现。

@z4yx 我不太确定这个应该怎么改进tunasync的脚本里,所以可能需要你们来帮忙,#81 给了一个参考

上游服务器现在有两个:https://kr.storage.juliahub.com 以及 https://us-east.storage.juliahub.com,可以两个都添加进来,也可以只选择一个。kr(韩国首尔)服务器则采用了优化后的构建代码,所以从Github注册表同步的延迟更低,而 us-east大概有30-60分钟的延迟。(Ref: JuliaRegistries/General#16777 (comment))

aosp.sh git repack -a -b -d error: unknown switch `b'

When I run aosp.sh , command 'git repack -a -b -d' will give a error like this:
git repack -a -b -d error: unknown switch `b'
I think maybe my git(git version 1.8.3.1) is newer or older than yours, and option '-b' is abandoned.
Why not use 'git gc'? Do you have any other considerations?

nix-channel GC: dangling narinfo files

Current GC logic deletes dead narinfo files along with the nar files they referring to. But live narinfo files can still refer to these deleted nar files, because multiple narinfo files may have same URL.

if DELETE_OLD:
narinfo = parse_narinfo(path.read_text())
try:
path.unlink()
except:
pass
try:
(working_dir / STORE_DIR / narinfo['URL']).unlink()
except:
pass

Narinfo file example:

# github:nixos/nixpkgs/e10da1c7f542515b609f8dfbcf788f3d85b14936#element-web
$ curl https://cache.nixos.org/hdz90ld1wwj6nwp580avv02v62cjh7h3.narinfo
StorePath: /nix/store/hdz90ld1wwj6nwp580avv02v62cjh7h3-element-web-1.10.10
URL: nar/0ji0p1g4fjwpqmf33maf7irznb2wclzi483hicyyjifybh43qxrs.nar.xz
Compression: xz
FileHash: sha256:0ji0p1g4fjwpqmf33maf7irznb2wclzi483hicyyjifybh43qxrs
FileSize: 13688456
NarHash: sha256:0r3pdgcnrlvf56cxlgxdsai5f1k9pf7cq80zssrbfbabrasbkk2v
NarSize: 43380024
References: 
Deriver: af669vfa956f7znxarma0rf1883nxy8k-element-web-1.10.10.drv
Sig: cache.nixos.org-1:T0b6vIGHgFC2E9SspFsf/MMubYixGGYna/JOssawEaQh4jdoyJNAkRNN9pJDFJDTPKER7DCXAVXE5zgzYedPBw==

# github:nixos/nixpkgs/c30945a93fbd3122a55ee6a63c9bfef7556bc82e#element-web
$ curl https://cache.nixos.org/zgwkj12lfii1ii041497bxm8rzcx23sd.narinfo
StorePath: /nix/store/zgwkj12lfii1ii041497bxm8rzcx23sd-element-web-1.10.10
URL: nar/0ji0p1g4fjwpqmf33maf7irznb2wclzi483hicyyjifybh43qxrs.nar.xz
Compression: xz
FileHash: sha256:0ji0p1g4fjwpqmf33maf7irznb2wclzi483hicyyjifybh43qxrs
FileSize: 13688456
NarHash: sha256:0r3pdgcnrlvf56cxlgxdsai5f1k9pf7cq80zssrbfbabrasbkk2v
NarSize: 43380024
References: 
Deriver: 51mif1gp7i3igq8d9a9aff073qn9drd4-element-web-1.10.10.drv
Sig: cache.nixos.org-1:kAT7/4P/GNm0kmFoTss6klYET4ZdYjNhs7OW3q3jDIQMXZqiTdXyZ8uLW0k2eiyuusgdBOSHzvZpcRjxWUFXBw==

Names of nar archives on https://cache.nixos.org are their base32 hashs. For example:

$ nix hash file <(curl https://cache.nixos.org/nar/1jh8kd7ql2v73fdspp1v08hvfdfxjvl2i86g409ckgpy8k6l41g8.nar.xz) | \
  xargs nix hash to-base32
1jh8kd7ql2v73fdspp1v08hvfdfxjvl2i86g409ckgpy8k6l41g8

Request and RFC for add Lean4 mirrors

To mirror Lean4 the task is split into:

  1. The Elan installer itself and its init scripts
  2. The Lean4 toolchains binary
  3. The Mathlib4 library and its recursive dependencies
  4. The Mathlib4 web docs
  5. The Mathlib4 cache

The Elan installer itself and its init scripts

init scripts: elan-init.sh, elan-init.ps1

  • mirror init scripts

The mirror can change variable ELAN_UPDATE_ROOT or ElanRoot to the mirrored one. The request URL structure is exactly what GitHub release like.

  • mirror Elan binary releases

The Lean4 toolchains binary

In the Elan repo, src/elan-dist/src/manifestation.rs and src/elan-dist/src/dist.rs should take config custom URL like what rustup had done. (See src/config.rs)

  • make elan read env vars
  • mirror Lean4 binary releases

The Mathlib4 library and its recursive dependencies

It would be better to direct require from tuna mirror. There should have some recursive modification automatically.

  • mirror Mathlib4 library and its recursive dependencies git repo

The Mathlib4 web docs

See https://github.com/leanprover-community/mathlib4#building-html-documentation

  • mirror the web docs

The Mathlib4 cache

The Mathlib4 cache is stored in Azure blob storage. It can be replace by an Azure compatible server.

See https://github.com/leanprover-community/mathlib4/blob/0469f845e132ccd0e56c40aafd34bd9084c104bb/Cache/Requests.lean#L14

  • make Mathlib4 cache read env var
  • set up Azure compatible server
  • mirror Mathlib4 cache

I have draft some checkboxes above to make a initial plan for mirror Lean4 ecosystem. If Tuna is willing for mirroring the Lean4 ecosystem which would be a great help!

It would be better if there is some people more familiar with Tuna mirror system. If someone is not available to approach them I can do most of above job, once I learned how to debug and test the Tuna mirror system. I have basic skill for Lean4 and general programming and I think I can do the programming task at both side, Tuna and the Lean4 ecosystem...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.