Giter Club home page Giter Club logo

replicator's Introduction

Replicator

Build Status Scrutinizer Code Quality Code Coverage

Replicator is a CLI application for replicating a Wikibase entity base such as Wikidata.

Replicator can import entities from the Wikidata API and from Wikibase dumps in various formats. It features abort/resume, graceful error handling, progress reporting, dynamic fetching of dependencies, API batching and standalone installation (no own MediaWiki or Wikibase required). Furthermore it uses the same deserialization code as Wikibase itself, so is always 100% compatible.

Information is by default written to the QueryR EntityStore and Queryr TermStore, as Replicator was created to populate the QueryR REST API. With some simple PHP additions you can write to the sources of your choosing.

Installation

Installation with Vagrant (inside a virtual machine)

Get a copy of the code and make sure you have Vagrant installed.

Copy config/db-sqlite-example.json to config/db.json.

cp config/db-sqlite-example.json config/db.json

Then, inside the root directory of the project, execute

vagrant up
vagrant ssh

Once you're ssh'd into the VM, you can find Replicator fully installed in /vagrant.

cd /vagrant
./replicator

If you get database authentication errors, check that config/db.json contains the right credentials. You can switch to an in memory SQLite database for testing Replicator:

{
	"driver": "pdo_sqlite",
	"memory": true
}

Local installation

Make sure you have all system dependencies:

  • PHP 7
  • php7.0-mysql
  • php7.0-sqlite (only needed for running the tests)

For an always fully up to date list, see build/vagrant/install_packages.sh.

Clone the git repository and move into its directory.

Enter the details of your database in config/db.json. An example of how this is done can be found in config/db-example.json. The parameters are fed directly to Doctrine DBAL. A list of available parameters can be found [in the DBAL docs] (http://docs.doctrine-project.org/projects/doctrine-dbal/en/latest/reference/configuration.html).

Get Composer and execute:

composer install
php replicator install

If you just downloaded the composer.phar executable, the install command works as follows:

php composer.phar install

Certain functions from the PHP Process Control library (PCNTL) are used. They are disabled by default on some linux distributions. You might need to remove some functions from the disable_functions section in your php.ini file. In particular the pcntl_signal_dispatch function.

Updating

git pull
composer update

Removal

This will remove Replicator from the system, without deleting the application files themselves.

php replicator uninstall

Usage

List of commands:

php replicator

Importing extracted JSON dumps

Importing a JSON dump:

php replicator import:json tests/data/simple/five-entities.json -v

Import command help:

php replicator help import:json

The command can be aborted with ctrl+c. It will exit gracefully and provide you with the page position marker needed to resume the import.

php replicator import:json tests/data/simple/five-entities.json -v --continue 66943

Importing compressed JSON dumps

Importing a gzipped JSON dump:

php replicator import:gz tests/data/simple/five-entities.json.gz -v

Import command help:

php replicator help import:gz

The command can be aborted with ctrl+c. It will exit gracefully and provide you with the page position marker needed to resume the import.

php replicator import:gz tests/data/simple/five-entities.json.gz -v --continue=76071

Bzip2 support is also available via the import:bz2 command. However beware that at the time of writing this documentation (November 2015), the Wikidata bz2 dumps have an issue that prevents PHP (and thus this library) from reading them entirely.

Importing from the Wikidata.org API

Importing entities via the web API:

php replicator import:api Q1 Q2 Q1337 -v

Including referenced entities:

php replicator import:api Q1 Q2 Q1337 -v -r

Import command help:

php replicator help import:api

It is possible to specify ID ranges:

php replicator import:api Q1-Q1000

Multiple ranges and single IDs can be specified:

php replicator import:api Q1 Q100-Q102 P43-P45 Q64

Importing XML dumps

Importing an XML dump:

php replicator import:xml tests/data/big/5341-revs-3-props.xml -v

Import command help:

php replicator help import:xml

The command can be aborted with ctrl+c. It will exit gracefully and provide you with the page title needed to resume the import.

php replicator import:xml tests/data/big/5341-revs-3-props.xml --continue Q15826105 -v

Logging

All logs are written into var/log. Each import run writes a detailed log to a dedicated file, which gets named based on the time the import started. Error events get written to errors.log, which is a general error file, appended to by each import run.

Running the tests

For tests only

composer test

For style checks only

composer cs

For a full CI run

composer ci

Release notes

Version 0.2 (2017-03-06)

  • Upgraded Wikibase DataModel from 4.x to 6.x (needed to work with recent data from Wikidata)
  • Added Vagrant support
  • The query store is no longer installed by default (install with composer require jeroen/query-engine)
  • PHP 7.0 or later is now required (for local installation)

Version 0.1 (2016-01-25)

replicator's People

Contributors

abta avatar jeroendedauw avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.