Giter Club home page Giter Club logo

knimer's Introduction

knimer

knimer provides an automation mechanism for running KNIME workflows and can be used as an alternative to KNIME Server.

There are two parts to this project:

  1. Docker configuration (Dockerfile) to create an image which:
    • Downloads a zipped KNIME workflow from an AWS S3 Bucket
    • Runs that workflow in KNIME batch execution mode
    • Handles custom workflow variables
    • Handles custom workflow credentials
  2. A set of Terraform scripts which can be used as a module to create AWS infrastructure which:
    • Creates an ECS Task Definition (incl. Cloudwatch logging) to run the Docker image
    • Optionally can use AWS EventBridge to schedule periodic running of the Task
    • Optionally can use an AWS Lambda to send Slack notifications to show progress of the Task

Why this was created?

KNIME is a powerful application for automating the manipulation of data. When workflows need to be run regularly though it's inconvenient to require a human to open up the KNIME Desktop application and execute them by hand.

Even using something like Windows Task Scheduler to run the workflow in batch mode still needs the computer to be on and is limited by the compute resources available on the machine.

KNIME Server offers scheduling but comes at a hefty price tag for both licenses and hosting fees. This project focuses on core needs of scheduling and notification, rather than any team collaboration features.

Instructions for Use

Uploading the KNIME Workflow

Firstly create a workflow as usual in KNIME.

If you need dynamic data passed into the workflow you can use Workflow Variables by right-clicking on the workflow in the KNIME Explorer and selecting Workflow Variables.

If you need secret data passed into the workflow you can use KNIME Credentials by adding a Credentials Configuration node.

Zip up the workflow folder (the one containing the workflow.knime file) and upload it to an S3 bucket.

Setting up Slack Notifications (optional)

Follow the instructions here to get a Webhook URL (which should be treated as confidential).

You should then manually store this URL in an AWS Parameter.

The name and ARN of this parameter are then passed as two of the variables into the Terraform module.

Setting up Slack Slash Command (optional)

It's possible to also trigger one-off runs of workflows via a Slack Slash command.

Create a new Slack App and from within Settings -> Basic Information -> App Credentials find the Signing Secret.

You should then manually store this Signing Secret in an AWS Parameter. The name and ARN of this parameter are then passed as two of the variables into the Terraform module.

Add the following module to your terraform configuration and customise the variables:

module "knimer_slash" {
   # Uses the terraform scripts directly from this repo (can also pin version with knimer.git?ref=<BLAH>)
   source                    = "github.com/nick-solly/knimer.git//terraform/slack-trigger"
   
   # Used for naming AWS resources
   name_prefix               = "prod"
   aws_region                = "eu-west-2"
   
   slack_signing_secret_name = "secret"
   slack_signing_secret_arn  = "arn:aws:ssm:eu-west-2:111122223333:parameter/secret"
   
   # Only allow this slash command to be run from certain channel (optional)
   slack_channel_restriction = "tl_knimer"
   
   # ID's of the subnets to run the ECS Task on
   ecs_subnets               = ["SN1", "SN2"]
   
   # A map of the ECS Tasks which can be triggered via the slash command.
   # Key is Task Definition ARN, Value is Cluster ARN.
   knimer_ecs_tasks          = {
      "arn:aws:ecs:us-west-2:111122223333:task-definition:asf123-kcf38k" = "arn:aws:ecs:us-west-2:111122223333:tcluster:389djd-x93kd3"
   }
}

Follow the instructions here to create the slash command for the new App.

One of the outputs of the terraform module is the endpoint URL you'll need to set as the Request URL.

Running the Workflow

Method A (preferred): AWS ECS Fargate configured via Terraform

Add the following module into your terraform configuration and customise the variables:

module "knimer" {
   
   # Uses the terraform scripts directly from this repo (can also pin version with knimer.git?ref=<BLAH>)
   source              = "github.com/nick-solly/knimer.git//terraform/knimer"
   
   aws_region          = "eu-west-2"
   
   # Used for naming AWS resources
   name_prefix         = "my-workflow"
   
   cpu                 = 2048
   memory              = 16384
   
   # In the S3 bucket, the file should be named `my_workflow.zip`
   knime_workflow_file = "my_workflow"
   s3_bucket_name      = "all_the_workflows"
   
   workflow_variables  = {
    variable1 = "ThisIsAValue,String",
    variable2 = "1234,int",
   }
   
   workflow_secrets = {
    database_creds = "username;password",
   }
   
   # Contains the Slack Webhook URL (optional)
   slack_webhook_url_secret_name = "slack-webhook-url"
   slack_webhook_url_secret_arn  = "arn:aws:ssm:eu-west-2:111122223333:parameter/slack-webhook-url"
   
   # Where the ECS Task should be run
   subnet_ids                   = ["subnet-0af169a6f98a3hg34", "subnet-042b69da4001512ca"]
   
   # When you want the workflow to be run
   schedule_expressions         = ["cron(0 4 * * ? *)"] 

}

For advice on cpu and memory values see here.

For advice on Schedule Expressions see here.

Method B: Running the Container Manually

This method does not include scheduling or Slack notifications as they are actioned via AWS services.

On a Docker host run:

docker run \
   -e S3_BUCKET_NAME=my_workflow_bucket \
   -e KNIME_WORKFLOW_FILE=workflow_file \
   -e WORKFLOW_VARIABLES="-workflow.variable=variable_a,foo,String -workflow.variable=variable_b,6,int" \
   -e WORKFLOW_SECRETS="-credential=database_creds;username;password -credential=sharepoint_creds;username;password" \
   -e AWS_ACCESS_KEY_ID=ABCDALKNCLASASASC \
   -e AWS_SECRET_ACCESS_KEY=A2309F23J02 \
   ghcr.io/nick-solly/knimer/knimer:latest

Note:

  • KNIME_WORKFLOW_FILE is without the .zip extension
  • AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are just for accessing the S3 bucket where the zipped workflow file is located

Testing

You can locally build the image using the included Makefile.

TODOs

  • Allow customisable KNIME extensions to be installed
  • Add a diagram of the AWS infrastructure this module creates
  • Improve handling of KNIME product versioning as currently it's hardcoded
  • Add GitHub Releases and versioning to the Docker images
  • Add GitHub action to run terraform linting/checking
  • Add a link to the Cloudwatch logs to the Slack message
  • Consider how to move the docker image into AWS ECR to reduce ECS Task spin-up time

knimer's People

Contributors

christina-moore avatar nick-solly avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

Forkers

aguseka

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.