Giter Club home page Giter Club logo

arthur's Introduction

Arthur

King Arthur commands his loyal knight Perceval on the quest to fetch data from software repositories.

Arthur is a distributed job queue platform that schedules and executes Perceval. The platform is composed by two components: arthurd, the server that schedules the jobs and one or more instances of arthurw, the work horses that will run each Perceval job.

The repositories whose data will be fetched are added to the platform using a REST API. Then, the server transforms these repositories into Perceval jobs and schedules them between its job queues.

Workers are waiting for new jobs checking these queues. Workers only execute a job at a time. When a new job arrives, an idle worker will take and run it. Once a job is finished, if the result is succesful, the server will re-schedule it to retrieve new data.

By default, items fetched by each job will be published using a Redis queue. Additionally, they can be written to an Elastic Search index.

Usage

arthurd

usage: arthurd [-c <file>] [-g] [-h <host>] [-p <port>] [-d <database>]
               [--es-index <index>] [--log-path <path>] [--cache-path <cpath>]
               [--no-cache] [--no-daemon] | --help

King Arthur commands his loyal knight Perceval on the quest
to retrieve data from software repositories.

This command runs an Arthur daemon that waits for HTTP requests
on port 8080. Repositories to analyze are added using an REST API.
Repositories are transformed into Perceval jobs that will be
scheduled and run using a distributed job queue platform.

optional arguments:
  -?, --help            show this help message and exit
  -c FILE, --config FILE
                        set configuration file
  -g, --debug           set debug mode on
  -h, --host            set the host name or IP address on which to listen for connections
  -p, --port            set listening TCP port (default: 8080)
  -d, --database        URL database connection (default: 'redis://localhost/8')
  -s, --sync            work in synchronous mode (without workers)
  --es-index            output ElasticSearch server index
  --log-path            path where logs are stored
  --cache-path          path to cache directory
  --no-cache            do not cache fetched raw data
  --no-daemon           do not run arthur in daemon mode

arthurw

usage: arthurw [-g] [-d <database>] [--burst] [<queue1>...<queueN>] | --help

King Arthur's worker. It will run Perceval jobs on the quest
to retrieve data from software repositories.

positional arguments:
   queues               list of queues this worker will listen for
                        ('create' and 'update', by default)

optional arguments:
  -?, --help            show this help message and exit
  -g, --debug           set debug mode on
  -d, --database        URL database connection (default: 'redis://localhost/8')
  -b, --burst           Run in burst mode (quit after all work is done)

Requirements

  • Python >= 3.4
  • Redis >= 2.3
  • python3-dateutil >= 2.0
  • python3-redis >= 2.10
  • python3-rq >= 0.6
  • python3-cherrypy >= 5.1.0
  • perceval >= 0.2

Installation

$ pip3 install -r requirements.txt
$ python3 setup.py install

How to run it

The first step is to run a Redis server that will be used for comunicating Arthur's components. Moreover, an Elastic Search server can be used to store the items generated by jobs. Please refer to their documentation to know how to install and run them both.

To run Arthur server:

$ arthurd -g -d redis://localhost/8 --es-index http://localhost:9200/items --log-path /tmp/logs/arthud --no-cache

To run a worker:

$ arthurw -d redis://localhost/8

To add jobs to Arthur, create a JSON object containing the repositories to analyze and the Perceval parameters needed for each backend.

$ cat repositories.json

{
 "repositories" : [
    {
     "origin" : "https://github.com/grimoirelab/arthur.git",
     "backend" : "git",
     "args" : {
         "gitpath" : "/tmp/git/",
         "uri" : "https://github.com/grimoirelab/arthur.git",
         "cache" : true,
         "cache_fetch" : false
         }
    },
    {
     "origin" : "bugzilla_redhat",
     "backend" : "bugzilla",
     "args" : {
         "url" : "https://bugzilla.redhat.com/",
         "from_date" : "2016-09-19",
         "cache" : true,
         "cache_fetch" : false,
         "delay" : 60
         }
    }
  ]
}

Then, send this JSON stream to the server calling add method.

$ curl -H "Content-Type: application/json" --data @repositories.json http://127.0.0.1:8080/add

For this example, items will be stored in the items index on the Elastic Search server (http://localhost:9200/items).

License

Licensed under GNU General Public License (GPL), version 3 or later.

arthur's People

Contributors

sduenas avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.