Giter Club home page Giter Club logo

hyphe-corpus-cleaner's Introduction

Hyphe Corpus Cleaner

Simple script and related Docker container to automatically remove old corpus from an Hyphe instance.

Cleaner arguments

The following required arguments allow for the cleaning to take place:

  • HYPHE_API_URL: Hyphe API URL to clean. For example, http://HYPHE_HOST/HYPHE_PATH/api/
  • HYPHE_MONGODB_HOST: Mongo hostname.
  • HYPHE_MONGO_PORT: Mongo hostname.
  • HYPHE_MONGO_DBNAME: Mongo database name.
  • HYPHE_ADMIN_PASSWORD: Admin password if set in backend config.

Cleaner options

The following optional arguments allow to tweak the cleaning policy and loging:

  • CRON_SCHEDULE: If defined, a built-in cron-like daemon will take care of running the the cleaning script at here-defined intervals.

This is mostly meant to be used with docker-compose and an always-running cleaning job.
Do not use this if you use another way of schedulling, like Kubernetes cronjobs for instance (since it launches "one-shot" jobs at intervals defined in the kubernetes cronjob parameters) or if you simply run it manually from time to time.

A cron expression represents a set of times, using 6 space-separated fields.

Field name Mandatory? Allowed values Allowed special characters
Seconds No 0-59 * / , -
Minutes Yes 0-59 * / , -
Hours Yes 0-23 * / , -
Day of month Yes 1-31 * / , - ?
Month Yes 1-12 or JAN-DEC * / , -
Day of week Yes 0-6 or SUN-SAT * / , - ?

For example, to clean every day at midnight sharp, set it to 0 0 0 * * *

  • DAYSBACK: Days to keep. (Defaults to 7.)
  • MAILER_HOST: SMTP server hostname.
  • MAILER_PORT: SMTP server port.
  • MAILER_FROM: From: email address.
  • MAILER_TO: To: email address(es). For example, [email protected],[email protected].
    If any of MAILER_HOST, MAILER_PORT, MAILER_FROM or MAILER_TO is missing, no mail will be sent and logging will only take place on the container's console.
  • MAIL_TIMEOUT: Maximum amount of time (seconds) to wait for the mail to be sent. (Defaults to 10 seconds.)
  • MAIL_AND_CONSOLE: When "MAIL_*" variables are defined, the job's output are sent by mail; set this to 0 to prevent logs from being also displayed on the container's console. (Defaults to 1.)
  • NO_EMPTY_MAIL: If set to "1" (default), prevents empty mail to be sent (if there is nothing to clean up and the job produces no output). (Defaults to 1.)

You also need to link this container to mongo and backend containers.

Sample for your docker-compose.yml:

services:
  cleaner:
    image: scpomedialab/hyphe-corpus-cleaner:latest
    links:
     - "mongo:mongo"
     - "backend:backend"
    environment:
     - CRON_SCHEDULE=0 30 04 * * *
     - HYPHE_MONGODB_HOST=mongo
     - HYPHE_API_URL=http://backend:6978/
     - HYPHE_MONGODB_PORT=27017
     - HYPHE_MONGODB_DBNAME=hyphe

hyphe-corpus-cleaner's People

Contributors

boogheta avatar jri-sp avatar pipojojo avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

rouxrc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.