Giter Club home page Giter Club logo

oapenmemoharvester's Introduction

OAPEN MEMO XOAI Harvester

What is it?

A harvester service that populates the OAPEN MEMO database with data from the OAPEN DSpace XOAI provider.

What does it do?

  1. Read last harvest date;
  2. harvest XOAI since last harvest date;
  3. parse records and save to database;
  4. fetch corresponding export chunks and save to database;
  5. Update last harvest date.

Configuration settings are read from application.properties

Harvesting may use a negative amount of offset days from the current date, to ensure only redacted and matured data is harvested.
Use app.harvest.daysBack = 7 to set an offset period of 7 days.

How to run it?

Run as an executable jar:

./harvester-x.y.z.jar

Add an integer argument to override the app.harvest.daysBack value as set in the properties file:

./harvester-x.y.z.jar 5

Typically you want to run this as a cronjob (for the corresponding user) once a day:

0 1 * * * ~/harvester-x.y.z.jar >/dev/null 2>&1 

Anything else?

  • Application status (last harvest date etc.) is saved to a directory [user.home]/oapenmemo;
  • Downloaded export files are saved to a directory [user.home]/oapenmemo/downloads. These files are only downloaded once, after the initial harvest. After the initial harvest, export data chunks are requested separately from the DSpace API;
  • logs are saved to a directory [user.home]/oapenmemo/logs

oapenmemoharvester's People

Contributors

acdhirr avatar

Watchers

Riesgo Eléctrico avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.