Giter Club home page Giter Club logo

pakrat's Introduction

Pakrat

A tool to mirror and version YUM repositories

What does it do?

  • You invoke pakrat and pass it some information about your repositories.
  • Pakrat mirrors the YUM repositories, and optionally arranges the data in a versioned manner.

It is easiest to demonstrate what Pakrat does by shell example:

$ pakrat --repodir /etc/yum.repos.d

  repo              done/total       complete    metadata
  -------------------------------------------------------
  base               357/6381        5%          -         
  updates            112/1100        10%         -         
  extras              13/13          100%        complete  

  total:             482/7494        6%

Features

  • Mirror repository packages from remote sources
  • Optional repository versioning with user-defined version schema
  • Mirror YUM group metadata
  • Supports standard YUM configuration files
  • Supports YUM configuration directories (repos.d style)
  • Supports command-line repos for zero-configuration (--name and --baseurl)
  • Command-line interface with real-time progress indicator
  • Parallel repository downloads for maximum effeciency
  • Syslog integration
  • Supports user-specified callbacks

Installation

Pakrat is available in PyPI as pakrat. That means you can install it with easy_install:

# easy_install pakrat

NOTE Installation from PyPI should work on any Linux. However, since Pakrat depends on YUM and Createrepo, which are not available in PyPI, these dependencies will not be detected as missing. The easiest install path is to install on some kind of RHEL like so:

# yum -y install createrepo
# easy_install pakrat

How to use it

The simplest possible example would involve mirroring a YUM repository in a very basic way, using the CLI:

$ pakrat --name centos --baseurl http://mirror.centos.org/centos/6/os/x86_64
$ tree -d centos
centos/
├── Packages
└── repodata

A slightly more complex example would be to version the same repository. To do this, you must pass in a version number. An easy example is to mirror a repository daily.

$ pakrat \
    --repoversion $(date +%Y-%m-%d) \
    --name centos \
    --baseurl http://mirror.centos.org/centos/6/os/x86_64
$ tree -d centos
centos/
├── 2013-07-29
│   ├── Packages -> ../Packages
│   └── repodata
├── latest -> 2013-07-29
└── Packages

If you were to configure the above to command to run on a daily schedule, eventually you would see something like:

$ tree -d centos
centos/
├── 2013-07-29
│   ├── Packages -> ../Packages
│   └── repodata
├── 2013-07-30
│   ├── Packages -> ../Packages
│   └── repodata
├── 2013-07-31
│   ├── Packages -> ../Packages
│   └── repodata
├── latest -> 2013-07-31
└── Packages

You can also opt to have a combined repository for each of your repos. This is useful because you could simply point your clients to the root of your repository, and they will have access to its complete history of RPMs. You can do this by passing in the --combined option when versioning repositories.

Pakrat is also capable of handling multiple YUM repositories in the same mirror run. If multiple repositories are specified, each repository will get its own download thread. This is handy if you are syncing from a mirror that is not particularly quick. The other repositories do not need to wait on it to finish.

$ pakrat \
    --repoversion $(date +%Y-%m-%d) \
    --name centos --baseurl http://mirror.centos.org/centos/6/os/x86_64 \
    --name epel --baseurl http://dl.fedoraproject.org/pub/epel/6/x86_64
$ tree -d centos epel
centos/
├── 2013-07-29
│   ├── Packages -> ../Packages
│   └── repodata
├── latest -> 2013-07-29
└── Packages
epel/
├── 2013-07-29
│   ├── Packages -> ../Packages
│   └── repodata
├── latest -> 2013-07-29
└── Packages

Configuration can also be passed in from YUM configuration files. See the CLI --help for details.

Pakrat also exposes its interfaces in plain python for integration with other projects and software. A good starting point for using Pakrat via the python API is to take a look at the pakrat.sync method. The CLI calls this method almost exclusively, so it should be fairly straightforward in its usage (all arguments are named and optional):

pakrat.sync(basedir, objrepos, repodirs, repofiles, repoversion, delete, callback)

Another handy python method is pakrat.repo.factory, which creates YUM repository objects so that no file-based configuration is needed.

pakrat.repo.factory(name, baseurls=None, mirrorlist=None)

User-defined callbacks

Since the YUM team did a decent job at externalizing the progress data, pakrat will return the favor by exposing the same data, plus some extras via user callbacks.

A user callback is a simple class that implements some methods for handling received data. It is not mandatory to implement any of the methods.

A few of the available user callbacks in pakrat come directly from the urlgrabber interface (namely, any user callback beginning with download_. The other methods are called by pakrat, which explains why the interfaces are varied.

The supported user callbacks are listed in the following method signatures:

""" Called when the number of packages a repository contains becomes known """
repo_init(repo_id, num_pkgs)

""" Called when 'createrepo' begins running and when it completes """
repo_metadata(repo_id, status)

""" Called when a repository finishes downloading all packages """
repo_complete(repo_id)

""" Called whenever an exception is thrown from a repo thread """
repo_error(repo_id, error)

""" Called when a package becomes known as 'already downloaded' """
local_pkg_exists(repo_id, pkgname)

""" Called when a file begins downloading (non-exclusive) """
download_start(repo_id, fpath, url, fname, fsize, text)

""" Called during downloads, 'size' is bytes downloaded """
download_update(repo_id, size)

""" Called when a file download completes, 'size' is file size in bytes """
download_end(repo_id, size)

The following is a basic example of how to use user callbacks in pakrat. Note that an instance of the class is passed into the pakrat.sync() call as the named argument callback.

import pakrat

class mycallback(object):
    def log(self, msg):
        with open('log.txt', 'a') as logfile:
            logfile.write('%s\n' % msg)

    def repo_init(self, repo_id, num_pkgs):
        self.log('Found %d packages in repo %s' % (num_pkgs, repo_id))

    def download_start(self, repo_id, _file, url, basename, size, text):
        self.fname = basename

    def download_end(self, repo_id, size):
        if self.fname.endswith('.rpm'):
            self.log('%s, repo %s, size %d' % (self.fname, repo_id, size))

    def repo_metadata(self, repo_id, status):
        self.log('Metadata for repo %s is now %s' % (repo_id, status))

myrepo = pakrat.repo.factory(
    'extras',
    mirrorlist='http://mirrorlist.centos.org/?repo=extras&release=6&arch=x86_64'
)

mycallback_instance = mycallback()
pakrat.sync(objrepos=[myrepo], callback=mycallback_instance)

If you run the above example, and then take a look in the log.txt file (which the user callbacks should have created), you will see something like:

Found 13 packages in repo extras
bakefile-0.2.8-3.el6.centos.x86_64.rpm, repo extras, size 256356
centos-release-cr-6-0.el6.centos.x86_64.rpm, repo extras, size 3996
centos-release-xen-6-2.el6.centos.x86_64.rpm, repo extras, size 4086
freenx-0.7.3-9.4.el6.centos.x86_64.rpm, repo extras, size 99256
jfsutils-1.1.13-9.el6.x86_64.rpm, repo extras, size 244104
nx-3.5.0-2.1.el6.centos.x86_64.rpm, repo extras, size 2807864
opennx-0.16-724.el6.centos.1.x86_64.rpm, repo extras, size 1244240
python-empy-3.3-5.el6.centos.noarch.rpm, repo extras, size 104632
wxBase-2.8.12-1.el6.centos.x86_64.rpm, repo extras, size 586068
wxGTK-2.8.12-1.el6.centos.x86_64.rpm, repo extras, size 3081804
wxGTK-devel-2.8.12-1.el6.centos.x86_64.rpm, repo extras, size 1005036
wxGTK-gl-2.8.12-1.el6.centos.x86_64.rpm, repo extras, size 31824
wxGTK-media-2.8.12-1.el6.centos.x86_64.rpm, repo extras, size 38644
Metadata for repo extras is now working
Metadata for repo extras is now complete

Building an RPM

Pakrat can be easily packaged into an RPM.

  1. Download a release and name the tarball pakrat.tar.gz:
curl -o pakrat.tar.gz -L https://github.com/ryanuber/pakrat/archive/master.tar.gz
  1. Build it into an RPM:
rpmbuild -tb pakrat.tar.gz

What's missing

  • Unit tests (preliminary work done in unit_test branch)

Thanks

Thanks to Keith Chambers for help with the ideas and useful input on CLI design.

pakrat's People

Contributors

ryanuber avatar jrwesolo avatar bortels avatar

Stargazers

Pao. avatar  avatar Xhark avatar Heiko Schwarz avatar  avatar Bill Glick avatar Samveen avatar Marius Karnauskas avatar Russell VT avatar Neal Gompa (ニール・ゴンパ) avatar Yan avatar Don Jackson avatar GAURAV avatar Robert Deusser avatar Marcin Nowicki avatar David Aguilar avatar Feng Lyu avatar Onkar Kadam avatar Seungwon Heo avatar Sher Chowdhury avatar Alexandre Nicastro avatar Dmitry Makovey avatar Christian Ekstam avatar waz0wski avatar David James avatar Samuel Graenacher avatar Craig Dunn avatar Moosedemeanor avatar  avatar John Owen Nixon avatar Nikolay Kolev avatar Bueller avatar Adrian Likins avatar Yoram Hekma avatar  avatar Paul Komkoff avatar Werner Strydom avatar 孙松 avatar Britt Treece avatar Romain avatar  avatar Sezgin Erman avatar Anthony Scalisi avatar Adam Reid avatar Drake Youngkun Min avatar Youngwoo Kim avatar Cameron Ruatta avatar Andrew Kroh avatar Diogo Leal avatar Eugene L. avatar Aaron Zauner avatar Petar Forai avatar  avatar sbrock avatar Giorgio Crivellari avatar Sumbry avatar  avatar Mark Newman avatar Tom Bevan avatar Keith Chambers avatar

Watchers

Britt Treece avatar  avatar Paul Komkoff avatar James Cloos avatar Dmitry Makovey avatar Keith Chambers avatar George Liu (eva2000) avatar  avatar Sher Chowdhury avatar  avatar

pakrat's Issues

RHEL7

When trying to use pakrat on a fully patched RHEL 7.2 system, I'm getting the following error:

# pakrat --repofile /etc/yum.repos.d/redhat.repo --outdir /yum --repoversion 20160401

repo              done/total       complete    metadata

repo              done/total       complete    metadata
-------------------------------------------------------
rhel-ha-for-rhel-7-server-eus-rpms       error
rhel-7-server-rpms       error
rhel-7-server-eus-rpms       error
rhel-rs-for-rhel-7-server-eus-rpms       error
rhel-ha-for-rhel-7-server-rpms       error

total:               0/-13846      0%

errors(5):
start() got an unexpected keyword argument 'now'
start() got an unexpected keyword argument 'now'
start() got an unexpected keyword argument 'now'
start() got an unexpected keyword argument 'now'
start() got an unexpected keyword argument 'now'

Finished in 0:00:30

I saw #5 and wondering if it's a similar issue?

[question] space used ?

Hi,

Glad to see there is this kind of project and will love to see some day an equivalent of https://snapshot.debian.org/ but that's not the purpose of my question :)

Is there some kind of duplication of the same files (hardlinks?) to reduce the space used by periodically mirroring ?

I didn't test it yet but sound very promising :)

Thanks,

Regards,

Unexpected keyword argument when updating local repo.

When initially using pakrat to create local mirror everything downloads and is created with no issues.

When trying to update a local repo I get the error:

errors(1):
start() got an unexpected keyword argument 'filename'

I am running the command: pakrat --repoversion=Updates-$(date +%Y-%m-%d) --name=centos7 --baseurl=http://mirror.bytemark.co.uk/centos/7/updates/x86_64/

This is being run on a CentOS7 VM in VMWare. I have also tried using different mirrors. Also this happens with all directories from mirrors not just "updates" but also "centosplus" "extras" etc.

Any help in resolving the issue would be appreciated.

Support sha1 for CentOS/RHEL 5 repos

If Pakrat is run from a RHEL 6 node it uses sha256 checksum when creating repository metadata. This breaks any RHEL 5 repositories as they only support sha1. It would be really nice to have the ability to specify the checksum type Pakrat uses to create the repository metadata.

Downloads Packages but not the repo data

I've noticed running the command:
pakrat --name 7.4.1708 --baseurl http://..*./repository/7.4.1708/os/x86_64/

That it only downloads the packages, not the repo data. So that would mean I can't then turn around and use this snapshot as a yum repo.

versioning only?

I would like to use pakrat, to create versioned repos..but keep using mrepo to mirror (because I mirror RHN repos - which pakrat AFAIK does not support).

I can't seem to figure out, if I can coerce it into doing only that?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.