Giter Club home page Giter Club logo

mincer's Introduction

Mincer - batch processing daemon

Summary

Listen incoming directory for new files, process them and put processing results to the output directory.

Uses inoticoming based on inotify lib. Mincer provides a few useful functions, not found in the inoticoming tool:

  • more friendly and convenient command line interface allow to start and stop Mincer daemon;
  • simple but flexible configuration file with a lot of reasonable defaults;
  • can save processed files (optionally gzipping them) in a separate directory;
  • improved logging of what is going on within the processor.

Stacking multiple instances of Mincer allows to build complex queued data processing conveyers. To do so, just configure your first Mincer instance OUTPUT_DIR to the INCOMING_DIR of the second one.

Usage

The package doesn't provide a system daemon but only a few scripts which can be used to build your own daemons.

Daemon creation mini-howto

  1. Create a root directory for your Mincer instance, say /var/mincer1;
  2. Create a callback script callback.sh inside the root directory. Take a callback.sh.skeleton as a start point. It must take an absolute file name of incoming data file as first argument. It must create result file (or files) inside its current directory (actually this can be overrided. See examples);
  3. Create your own Mincer configuration file based on mincer.conf.example, provided inside the upstream tarball. You have to define at least ROOT_DIR and FILENAME_REGEXP variables. Save the created file as /var/mincer1/mincer.conf;
  4. Start the Mincer daemon with mincer /var/mincer1/mincer.conf. Note only the very first command will start the daemon and all subsequent invocations will gracefully exit.

Now you can:

  • add your data files to the /var/mincer1/incoming directory;
  • examine what is going on in the /var/mincer1/processing.log;
  • stop the daemon with mincer /var/mincer1/mincer.conf stop.

Things you should know

  • files are processed one-by-one;
  • when incoming queue is empty, file processing begins immediately after write close;
  • during moving processing results to output directory (or directories) mincer moves files in two stages: first it copies them to $NAME.tmp, then renames $NAME.tmp to $NAME. This allow to deliver all or nothing;
  • write your FILENAME_REGEXP in such way to honor special meaning of '.tmp' extension, otherwise mincer will start processing files earlier than it should. It means you never use regexps like '.*' or '.tmp$'.
  • callback script is always started in configured WORK_DIR. All files in WORK_DIR are removed before callback start, but directories are not. So you can save some state between callback invocation;
  • callback script always invoked with the one argument - absolute path of the incoming file;
  • callback script should never move, change or remove incoming file because it will break mincer processing logic;
  • empty incoming files are not processed and are immediately removed and callback is not started;
  • incoming files without read permissions are immediately moved to the FAILED_DIR and callback is not started;
  • all configuration options except ROOT_DIR, FILENAME_REGEXP, LOGFILE and INCOMING_DIR can be safely changed in runtime without any additional efforts;
  • if you want to change one of ROOT_DIR, FILENAME_REGEXP, LOGFILE, INCOMING_DIR, you must stop running mincer instance first, then change the variables and then start mincer again;
  • symlinks are NOT processed. If you want to process files located somewhere in the filesystem, use hardlinks or tiny files with target file path.

Examine mincer.conf.example for further information about available configuration options.

mincer's People

Contributors

tuxofil avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.