Giter Club home page Giter Club logo

migrate-metadata's Introduction

Migrate Metadata

Migrate Metadata is a tool to migrate all datasets from a CKAN instance to a metastore-lib backend such as GitHub or a local filesystem.

Installation

Currently migrate-metadata has support for Python 2.7 only.

First, clone the repo via git:

$ git clone https://github.com/datopian/migrate-metadata.git

Move to directory:

$ cd migrate-metadata

Install all requirements and the package (it is recommended that this is done in a virtual environment):

$ pip install -r requirements.txt .
$ python setup.py develop

Usage

To import all CKAN datasets into a metastore backend, run:

$ metastore-import-ckan -c $CKAN_API_URL -k $CKAN_API_KEY \
                        -m $METASTORE_TYPE -o $METASTORE_OPTIONS

Replace all environment variables above with relevant values.

  • $METASTORE_TYPE should be a metastore-lib backend type
  • $METASTORE_OPTIONS should be a JSON-serialized object with the configucation options expected by the specific metastore-lib backend you are using.

See the metastore-lib documentation for a list of supported backends and their respective configuration options.

Run metastore-import-ckan --help to get the full list of command line options.

Examples

Import all CKAN datasets from http://ckan:500 to a local filesystem metastore in ./metastore:

$ metastore-import-ckan -c http://ckan:5000 -k 123-abc-321-xyz \
                        -m filesystem -o '{"uri":"./metastore"}'

Import all CKAN datasets from http://ckan:500 to a private GitHub repository:

$ GITHUB_OPTS='{
    "github_options": {"login_or_token": "averylongtokenthatwasgeneratedespeciallyforthis"},
    "private":true,
    "default_owner":"myorganization"}'
$ metastore-import-ckan -c http://ckan:5000 -k 123-abc-321-xyz \
                        -m github -o "$GITHUB_OPTS"

The jq command line tool can be useful to debug the output of the migration process while working locally. For example, to list all datasets migrated you can execute:

$ cat $(ls  metastore/p/*/* | grep -v '\.') | jq '.title'

Tests

To run tests:

make test

License

This project is licensed under the MIT License - see the LICENSE file for details

migrate-metadata's People

Contributors

pdelboca avatar shevron avatar uwaheed88 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

isabella232

migrate-metadata's Issues

README with instructions

Can we get some instructions on installation and usage in the README ๐Ÿ˜„

Also suggest renaming to metastore-lib-migrator

Tool does not migrate private datasets

The API used by the tool to migrate datasets will not include any private datasets.

Acceptance

  • All datasets in a CKAN instance are migrated including private ones
  • Optionally, add a command line option to decide whether to include private datasets

Analysis

Possible solution is to switch from the package_list API to fetching a list of organizations (organization_list) and then calling group_package_show for each org. Need to validate that this works.

Add proper requirements.txt

This package has some external requirements but no proper requirements management.
Also, the directory structure should be cleaned up as it seems some generated files are committed (Such as MANIFEST.in)

Set up CI

Probably want to have the tests and some quality checks run on push / PR.

Acceptance

  • make test target that runs all tests + quality checks and passes
  • Configured to automatically run in Travis CI on push / PR.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.