runabol / tork Goto Github PK

View Code? Open in Web Editor NEW

492.0 7.0 24.0 7.67 MB

A distributed workflow engine

Home Page: https://tork.run

License: MIT License

Go 99.84% Makefile 0.16%

golang workflow-engine distributed go background-jobs job-queue task-queue work-queue docker

tork's Introduction

Features • Installation • Documentation • Quick Start • REST API • Web UI

Tork is a highly-scalable, general-purpose workflow engine.

Features:

REST API
Extensible
Horizontally scalable
Task isolation - tasks are executed within a container to provide isolation, idempotency, and in order to enforce resource limits
Automatic recovery of tasks in the event of a worker crash
Supports both stand-alone and distributed setup
Retry failed tasks
Middleware
Pre/Post tasks
No single point of failure
Task timeout
Full-text search
Expression Language
Conditional Tasks
Parallel Task
Each Task
Subjob Task
Web UI

Documentation

See tork.run for the full documentation.

Quick Start

Ensure you have Docker with API Version >= 1.42 (use docker version | grep API to check).
Download the binary for your system from the releases page.

Hello World

Start in standalone mode:

./tork run standalone

Submit a job in another terminal:

# hello.yaml
---
name: hello job
tasks:
  - name: say hello
    image: ubuntu:mantic #docker image
    run: |
      echo -n hello world
  - name: say goodbye
    image: ubuntu:mantic
    run: |
      echo -n bye world

JOB_ID=$(curl \
  -s \
  -X POST \
  --data-binary @hello.yaml \
  -H "Content-type: text/yaml" \
  http://localhost:8000/jobs | jq -r .id)

Query for the status of the job:

curl -s http://localhost:8000/jobs/$JOB_ID | jq .

{
  "id": "ed0dba93d262492b8cf26e6c1c4f1c98",
  "state": "COMPLETED",
  ...
  "execution": [
    {
      ...
      "state": "COMPLETED",
    }
  ],
}

A slightly more interesting example

The following job:

Downloads a remote video file using a pre task to a shared /tmp volume.
Converts the first 5 seconds of the downloaded video using ffmpeg.
Uploads the converted video to a destination using a post task.

# convert.yaml
---
name: convert a video
inputs:
  source: https://upload.wikimedia.org/wikipedia/commons/1/18/Big_Buck_Bunny_Trailer_1080p.ogv
tasks:
  - name: convert the first 5 seconds of a video
    image: jrottenberg/ffmpeg:3.4-alpine
    run: |
      ffmpeg -i /tmp/input.ogv -t 5 /tmp/output.mp4
    mounts:
      - type: volume
        target: /tmp
    pre:
      - name: download the remote file
        image: alpine:3.18.3
        env:
          SOURCE_URL: "{{ inputs.source }}"
        run: |
          wget \
          $SOURCE_URL \
          -O /tmp/input.ogv
    post:
      - name: upload the converted file
        image: alpine:3.18.3
        run: |
          wget \
          --post-file=/tmp/output.mp4 \
          https://devnull-as-a-service.com/dev/null

Submit the job in another terminal:

JOB_ID=$(curl \
  -s \
  -X POST \
  --data-binary @convert.yaml \
  -H "Content-type: text/yaml" \
  http://localhost:8000/jobs | jq -r .id)

More examples

Check out the examples folder.

REST API

See the REST API documentation.

Swagger Docs

Make sure you have CORS configured in your config file:

[middleware.web.cors]
enabled = true

Start Tork in standalone or coordinator mode.

go run cmd/main.go run standalone

Serve the Swagger Docs

docker compose up -d swagger

Visit http://localhost:9000

Web UI

Tork Web is a web based tool for interacting with Tork.

License

tork's People

Contributors

Stargazers

Watchers

Forkers

windhooked ivicac networkinss alenjohn05 v36372 duot schubter c-nv-s mefuller jrottenberg akarshitjoshi ionut-maxim shaddow7 pirosauro and9527 gtrevg yin-zt bit-wiz anhgeeky jspc

tork's Issues

Setting task working dirs

As per #245 (comment)

Moving this discussion into an issue to avoid polluting the PR.

I've been working on an implementation/ extension of Tork which allows me to persist files across tasks with a TaskMiddlewareFunc that looks a little like:

func (c coordinator) enrichTask(ctx context.Context, tt task.EventType, t *tork.Task) (err error) {
	volume, err := c.deriveVolumes(ctx)
	if err != nil {
		return
	}

	t.Mounts = []tork.Mount{
		{
			Type:   tork.MountTypeBind,
			Source: volume,
			Target: "/workdir",
		},
	}

	return
}

(I've elided some stuff from there, like checking the value of tt and so on).

What I'd like to do is set the Workdir of each task (via args to the docker runtime) to the one I create above, purely as a convenience from needing to run cd /blah.

I think from #245 I get the problem- container images with entrypoints that used relative paths would fail to start, and so the solution I'm looking for would come with caveats, but it'd be a huge QoL fix for me and, I hope, others too.

Do you know of any gotchas I might hit? Failing that, would you accept a patch to open that functionality? I'm more than happy to fork and just make the fix for me if you don't want to support this behaviour, too.

[Question] Workflow with arbitrary input

I am building a job system that accepts user input. Some of the examples accept an input section, but they are all concrete values. It would be better if we can substitute values in. Or maybe I am missing something in the docs?

Another related point is secret management. Having secrets in plain text in job yaml is frown upon for production use, is there a better way to handle secrets?

priority of a job

You can use RabbitMQ's priority to determine the priority of a job, thereby ensuring the order of jobs.

Feature Request: support allowing custom headers in webhooks

some endpoints require authorization headers or an api key e.g.:
"Authorization":"bearer myCoolJWTTokenHere" or "X-API-Key":"myCoolAPIKeyHere"
Thus it would be very useful if you could allow a user to set custom headers as part of the webhook.

cancel task when task is pending, scheduled, error,

cancel task when task is pending, scheduled, error, msg, will lost. scheduler did not handle cancel.
worker will run a task which canceled.
only can cancel running task?

Add support for release-drafter

This issue is to track adding support for release-drafter. With release drafter the changelog of esch release is automatically generated reducing the burden of creating releases. This would also allow users to know what changed during each release.

@runabol I can work on this during the weekend, I already have the templates/configs.

feature request: Download all logs for a job

Hi, I am having a need to download all logs belong to a jobs. Something like the SQL query below

SELECT
	contents
FROM
	tasks_log_parts
WHERE
	task_id in(
		SELECT
			id FROM tasks
		WHERE
			job_id = 'b9cffbdc0a24451fa09763e80eb6b4ce')
ORDER BY
	number_;

Also, may I ask why does some log lines got prepended by some random bytes like this:

Passing in JSON through task output

Hi, i'm wondering if it's possible to pass in JSON into task outputs.

{
            "name": "A",
            "image": "ubuntu:mantic",
            "run": "echo '{\"age\": \"10\"}' > $TORK_OUTPUT",
            "var": "outputA"
        },
        {
            "name": "B",
            "image": "ubuntu:mantic",
            env: {
                TEST: '{"age": 10}',
                AGE: "100",
                NUMBER: "{{ tasks.outputA}}"
            },
            if: "{{ tasks.outputA['age'] > '100' }}",
            "run": "echo {{tasks.outputA['age']}} less than 100",
        }

Would this be possible? i tried various ways such as
tasks.outputA.age and tasks.outputA['age'] and i get this error

id: c8edec21b9194e02be52bcc02689e3be
jobId: 5d885a89c697488594cbcdbb49765f1e
position: 2
name: B
state: COMPLETED
createdAt: 2024-02-28T05:20:33.298201783Z
scheduledAt: 2024-02-28T05:20:33.298334524Z
startedAt: 2024-02-28T05:20:33.298457605Z
completedAt: 2024-02-28T05:20:33.933894354Z
failedAt: 2024-02-28T05:20:33.298201783Z
run: echo {{tasks.outputA['age']}} less than 100
image: ubuntu:mantic
env:
  AGE: "100"
  NUMBER: |
    {"age": "10"}
  TEST: '{"age": 10}'
queue: default
error: >-
  error compiling expression: tasks.outputA['age'] > '100': type string[string]
  is undefined (1:14)
   | tasks.outputA['age'] > '100'
   | .............^
nodeId: EQxKvZirEWkuPHh2zGTkcN
if: "{{ tasks.outputA['age'] > '100' }}"

any thoughts on why this happens or how would this be done through the code.

Thanks

Tork API expose

Hello,
I'm trying to use Tork API for creating Jobs.
I'm using tork + tork-web and for me it was little bit tricky how to create Jobs via API besause I sent API requests to web-ui port and only after some time I realised that I have to send them to coordinator port.
In this case we have to expose 2 ports outside: coordinator port + web-ui port.
Is it possible to allow send API request to web-ui and it will transfer it to tork directly just to not expose coordinator outside?

Like how simple this application is. How to contribute?

i love how simple this library is. is there any way to see a roadmap of what needs to be done? From there it would be easier to contribute as well.

Cheers

Prune old nodes

The nodes table contain entries for both current online nodes and those that were online in the past. Over time the nodes table will grow indefinitely. A process should be put in place to prune old nodes -- say over 24 hours --- to prevent the table from getting too large.

Feedback wanted

I would love to get some feedback on how easy it is to get started with Tork when following the docs.

In particular:

Going through the Quick Start.
Installation
Submitting jobs.
Extending Tork.
Anything else you think is interesting.

Jobs with Parallel, Each, and SubJobs

I can see validations around ensuring a job doesn't contain a mix of Parallel, Each, and SubJobs, and I can see the commits that made those changes, but I can't see why that's the case. I can't see it in the docs either.

Is there a reason we can't do it?

Job Scheduling doesn't work after recoonection to queue

Hello
I faced a strange issue with task scheduling.
How to reproduce:

1 standalone tork-worker
docker-compose with coordinator/web/rabbitmq/postgres

worker registered and I can see it in nodes. Then I had to reconfigure tork coordinator and restart docker compose.
worker shows some error about rabbitmq and queue
after some time when rabbitmq is online worker stops producing errors
then I add some Job via UI or API
and worker doesn't pickup this job

It seems after reconnection to rabbitmq worker doesn't subscribe to the queue again: doesn't see any queue and I dont see this worker as subscriber.
And only after I restarted worker it starts to pick up the task from the queue

Custom Mounter for docker

I was looking at creating a custom mounter for a docker worker.

I already have a volume created with docker volume

docker volume ls
DRIVER    VOLUME NAME
local     abc_example_data
...

And I wanted to be able to attach that to a task:

...
mounts:
    - type: my_custom_mount
      source: abc_example_data
      target: /abc/data
...

But I came across this bit of code which seems to check that the mounter type is either: tork.MountTypeVolume, tork.MountTypeBind or tork.MountTypeTmpfs. All other custom types will raise an error: unknown mount type.

tork/runtime/docker/docker.go

Lines 180 to 208 in 4319d73

 for _, m := range t.Mounts { 

 var mt mount.Type 

 switch m.Type { 

 case tork.MountTypeVolume: 

 mt = mount.TypeVolume 

 if m.Target == "" { 

 return errors.Errorf("volume target is required") 

 } 

 case tork.MountTypeBind: 

 mt = mount.TypeBind 

 if m.Target == "" { 

 return errors.Errorf("bind target is required") 

 } 

 if m.Source == "" { 

 return errors.Errorf("bind source is required") 

 } 

 case tork.MountTypeTmpfs: 

 mt = mount.TypeTmpfs 

 default: 

 return errors.Errorf("unknown mount type: %s", m.Type) 

 } 

 mount := mount.Mount{ 

 Type: mt, 

 Source: m.Source, 

 Target: m.Target, 

 } 

 log.Debug().Msgf("Mounting %s -> %s", mount.Source, mount.Target) 

 mounts = append(mounts, mount) 

 }

Am I correct in saying that this switch logic would need to be updated to allow custom mounts for docker workers?

Thanks for Tork btw 😃, I really like the way it is possible to extend each component.

Add support for go-memdb

Tork has support for an in-memory datastore to store job and task metadata.

The goal of this issue is to look into replacing this custom implementation with Hashicorp's go-memdb or something similar.

Rough outline of the implementation:

Add a gomemdb.go file to the datastore packge.
Implement the Datastore interface.
Register the new implementation.
Add tests.

Add support for WASM Runtime

WebAssembly is a binary instruction format designed for safe and efficient execution on web browsers, but it can also be employed on the server to enhance the performance and security of web applications.

WebAssembly is known for its high-performance characteristics. It offers near-native execution speed, making it potentially a compelling choice as a runtime environment for Tork tasks.

To run WebAssembly on the server-side, we need a runtime environment. There are several runtime options available, such as Wasmer, Wasmtime, and V8 (with the V8 isolates feature).

The goal of this issue is to explore the pros and cons and potential viability of using WASM as an execution runtime implementation for Tork tasks.

Pre/Post task evaluate issue when used with job middleware

I'm currently having the following setup:

func (c *Coordinator) ModifyInputMiddleware(next job.HandlerFunc) job.HandlerFunc {
	return func(ctx context.Context, et job.EventType, j *tork.Job) error {
		inputs := make(map[string]string)
		inputs["Hello"] = "World"
		newJob := j.Clone()
		newJob.Inputs = inputs
		newJob.Context.Inputs = inputs
		return next(ctx, et, newJob)
	}
}
...
coordinator.SetJobMiddleware(coordinator.ModifyInputMiddleware)

I have a job that makes use of each task and also pre task:

name: Hello world
inputs:
  Hello: Abc123
tasks:
  - name: For each
    each:
      list: "{{ ['1','2','3'] }}"
      task:
        name: Each
        run: echo $HELLO > $TORK_OUTPUT
        env:
          HELLO: "{{ inputs.Hello }}"     
        image: ubuntu:mantic
        pre:
          - name: Pre
            run: |
              echo $HELLO
            image: ubuntu:mantic
            env:
              HELLO: "{{ inputs.Hello }}"
        retry:
          limit: 1

This would produce Hello: Abc123 not Hello: World. Any idea what I did wrong and how to fix it?

request: disable logging

For now as I can understand we have such flow:

workers publish logs in rabbit in queue logs
coordinator pull logs from this queue logs
coordinator saves logs into postgres DB
after I press button Logs I can see them (logs from DB)

This makes some problems - if we have a lot of workers and logs I can see more than x millions mesages in the queue logs. As I can see coordinator doesn't pull and save logs in DB in proper time and that means if I press button Logs it shows nothing. And from time to time I have to purge this queue to see logs

Possible options:

disable logging (use alternatives like ELK stack)
adjust coordinator\logging performance

Issue with rabbitmq and long tasks

Hello.
I faced with strange issue: I have a lot of long tasks > 5 hours. And after task finished I see errors in the logs:

tork[3136]: 9:52PM ERR failed to ack message error="Exception (504) Reason: \"channel/connection is not open\""

Also I see this inside rabbitmq settings:

Maybe it relates to each other but after tork-worker finishes task it receives this error ans start this task again...:

Oct 02 16:44:21 ruvds-tiy9w tork[81156]: 4:44PM DBG received task task-id=639c83e4a371467c8add77f3031806f8
Oct 02 16:44:21 ruvds-tiy9w tork[81156]: 4:44PM DBG Created workdir /tmp/tork3494944280
Oct 02 16:44:22 ruvds-tiy9w tork[81156]: 4:44PM INF Config loaded from /etc/tork/config.toml
Oct 02 16:44:22 ruvds-tiy9w tork[81156]: {"level":"debug","time":"2023-10-02T16:44:22+03:00","message":"reexecing: bash -c /tmp/tork3494944280/entrypoint as -:-"}
Oct 02 16:44:22 ruvds-tiy9w tork[81156]: Start at 2023-10-02 13:44:22.635340645 +0000 UTC
.......
Oct 02 20:13:32 ruvds-tiy9w tork[81156]: FINISHED --2023-10-02 20:13:32--
Oct 02 20:13:32 ruvds-tiy9w tork[81156]: Total wall clock time: 3h 29m 9s

and same task again:

Oct 02 20:13:33 ruvds-tiy9w tork[81156]: 8:13PM ERR failed to ack message error="Exception (504) Reason: \"channel/connection is not open\""
Oct 02 20:13:33 ruvds-tiy9w tork[81156]: 8:13PM INF default channel closed. reconnecting
Oct 02 20:13:33 ruvds-tiy9w tork[81156]: 8:13PM DBG created channel de321733e2b74c368ad0d3cf33c7e78f for queue: default
Oct 02 20:13:33 ruvds-tiy9w tork[81156]: 8:13PM DBG received task task-id=639c83e4a371467c8add77f3031806f8
Oct 02 20:13:33 ruvds-tiy9w tork[81156]: 8:13PM DBG Created workdir /tmp/tork4048566087
Oct 02 20:13:33 ruvds-tiy9w tork[81156]: 8:13PM INF Config loaded from /etc/tork/config.toml
Oct 02 20:13:33 ruvds-tiy9w tork[81156]: {"level":"debug","time":"2023-10-02T20:13:33+03:00","message":"reexecing: bash -c /tmp/tork4048566087/entrypoint as -:-"}
Oct 02 20:13:34 ruvds-tiy9w tork[81156]: Start at 2023-10-02 17:13:34.102253289 +0000 UTC

So as you can see it receives the same task again:

task task-id=639c83e4a371467c8add77f3031806f8

Logs inside rabbitmq:

2023-10-03 19:23:01.214304+00:00 [error] <0.13814.7> Channel error on connection <0.1216.0> (ip:51850 -> ip2.3:5672, vhost: '/', user: 'rabbit'), channel 13:
2023-10-03 19:23:01.214304+00:00 [error] <0.13814.7> operation none caused a channel exception precondition_failed: delivery acknowledgement on channel 13 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more

Maybe tork-worker can send some keep-alive to rabbitmq?

Option to delete old jobs

I have a lot of Jobs in the UI and it will be nice to have an option to hide or delete old jobs.
For example:
I'm testing a new job by POST request and during this test I could create 10-15 jobs which failed or canceled by me. And I can't delete them from web UI

And regarding last changes about job logs will be nice if with deleting Job - all related logs also will be deleted

Run Tork workers on K8s environment

Statement: run N workers inside K8s as pods
Run Tork inside K8s is ideal option for me, because I can easily scale workers.
For now Tork requires Docker inside and it means I can run it only virtual machines or I have to use somehow dind images to run docker inside docker.

Is it possible to disable run tasks inside docker containers if I'm already inside K8s?
Something like skip image field which is required?
In this case we will have to options how to run tasks:

inside docker container on VM
inside K8s pod directly without docker

Load some config values from env and merge with config.toml

I'm using both config.toml and some secrets to be injected as envs.

I notice that conf package loads from the first found config file and then exit, leaving the envs loading untouched.

	// load configs from file paths
	for _, f := range paths {
		err := konf.Load(file.Provider(f), toml.Parser())
		if errors.Is(err, os.ErrNotExist) {
			continue
		}
		if err != nil {
			return errors.Wrapf(err, "error loading config from %s", f)
		}
		logger.Info().Msgf("Config loaded from %s", f)
		return nil
	}
	// load configs from env vars
	if err := konf.Load(env.Provider("TORK_", ".", func(s string) string {
		return strings.Replace(strings.ToLower(
			strings.TrimPrefix(s, "TORK_")), "_", ".", -1)
	}), nil); err != nil {
		return errors.Wrapf(err, "error loading config from env")
	}

Ideally, what I want is to be able to load some config values from env and override that value in the config.toml. Something like this:

         // load configs from file paths
	for _, f := range paths {
                 ...
		break
	}
	// load configs from env vars
        ...

Not sure if it's the correct approach to get what I want so I didn't create a PR.

quality of life: when duplicate a job, don't redact environment variables in task if it inherit from input

Hi, when I duplicate a job, the task environment variables that are matched with secret pattern will be replaced its value with "[REDACTED]", even if it only inherit the value from inputs

For example:

name: Example
inputs:
  SECRET: 12345
tasks:
  - name: Example
    run: |
      echo "hello world"
    image: python:3-slim
    env:
      SECRET: "{{ inputs.SECRET }}"

When duplicating this job, it's definition become

name: Example
inputs:
  SECRET: "[REDACTED]"
tasks:
  - name: Example
    run: |
      echo "hello world"
    image: python:3-slim
    env:
      SECRET: "[REDACTED]"

The second redaction is unnecessary.

RUNNING state change not shown on middleware

Playing with a job middleware, I met a behaviour which I'm not quite sure.
My dummy code is:

package main

import (
	"context"
	"fmt"
	"os"

	"github.com/runabol/tork/cli"
	"github.com/runabol/tork/conf"
	"github.com/runabol/tork/engine"
	"github.com/rs/zerolog/log"
	"github.com/runabol/tork"
	"github.com/runabol/tork/middleware/job"
)

func main() {
	if err := conf.LoadConfig(); err != nil {
		fmt.Println(err)
		os.Exit(1)
	}

	mw := func(next job.HandlerFunc) job.HandlerFunc {
		return func(ctx context.Context, et job.EventType, j *tork.Job) error {
			log.Debug().
				Msgf("received job %s at state %s (%s)", j.ID, j.State, et)
			return next(ctx, et, j)
		}
	}

	engine.RegisterJobMiddleware(mw)

	if err := cli.New().Run(); err != nil {
		fmt.Println(err)
		os.Exit(1)
	}
}

Following, I added a job which will fail:

name: my failing job
tasks:
  - name: my first task
    run: echo hello world
  - name: raise an error
    run: exit 1
  - name: my second task
    run: echo bye world

And, from the middleware, I got:

{"level":"debug","time":1702025778,"message":"received job ed83c87901d24b14a7ed44735b130a8a at state PENDING (STATE_CHANGE)"}
...
{"level":"debug","time":1702025778,"message":"received job ed83c87901d24b14a7ed44735b130a8a at state FAILED (STATE_CHANGE)"}
{"level":"debug","time":1702025779,"message":"received job ed83c87901d24b14a7ed44735b130a8a at state FAILED (READ)"}
...
{"level":"debug","time":1702026006,"message":"received job ed83c87901d24b14a7ed44735b130a8a at state FAILED (READ)"}

But what I'm expecting is also a RUNNING (STATE_CHANGE). Am I wrong? Does it also happen to you or am I missing something?
A similar result also got with a successful job (https://gist.github.com/Pirosauro/62545d0c957e10b52083df5a928af329)

Add task timeout

From time to time some workers hang due to execution script.

Example:
we have python scripts + playwright Firefox browser. Sometimes this Firefox hangs and task stops. In this state it could be long time before we identify that this is fail of script (and another worker already finish this task due to Queue timeout).

So we know average time of task execution and is it possible to add some value during Job creation that if task executing longer than X we can restart it?

Setting job failed state in Job HandlerFunc

Consider the job.HandlerFunc

func erroringHandlerFunc(next job.HandlerFunc) job.HandlerFunc {
	return func(ctx context.Context, et job.EventType, j *tork.Job) (err error) {
		if et == job.Read {
			return next(ctx, et, j)
		}

                return errors.New("some error")
       }
}

When this gets called the logs, as expected, show ERR unexpcted error occurred while processing task error="some error", however the status of the Job remains as PENDING, where I'd expect it to be FAILED.

I've tried to manually update the job to set the expected state, but this seems to be ignored.

Is this a bug, or am I doing something wrong?

Add support for S3 mounts in the Docker runtime

Mountpoint S3 is a file client that translates local file system API calls to S3 object API calls like GET and LIST.

The goal of this feature is to add first-class support to mounting S3-based location so users don't have to rely on using the AWS S3 client and have a more streamlined experience that resembles mounting a local directory.

Rough outline of the implementation:

Add an s3.go file to the docker packge.
Implement the Mounter interface.
Register the new mounter implementation when wiring in the Docker runtime.
Add tests.

Questions about Tork Architecture

@runabol I had a few questions after looking at the docker-compose.yml from here https://github.com/runabol/tork/blob/main/docker-compose.yaml

Does the database need be exposed to the whole network?
Which services need to access the DB? Looking at the docs it should only be the coordinator which i'm assuming is part of this repo
Why use swagger-ui instead of serving swagger directly using an Echo middleware?
What's the purpose between Tork and Tork-Web? Is the UI just a frontend with no backend?
RabbitMQ is known to be a resource hog, have you considered using Queues/Lists with Redis? Their Golang support is excellent
What's the purpose of the "migration" cmd?
The front end is using BACKEND_URL: http://host.docker.internal:8000 but there are no services running on port 8000, is that assuming that the user ran /tork outside of Docker?

Trying to get a general understanding of the architecture before doing another PR :-)

Source: https://www.tork.run/architecture

Docker compose

Is it possible to run Tork with docker compose?

For development purposes I was hoping to have a coordinator, and a worker, running via compose. However, when I try to make a request (http://localhost:8000/jobs for example) I get a socket hang up error.

Dockerfile

FROM ubuntu

WORKDIR /tork

RUN apt update
RUN apt install curl -y

ADD https://github.com/runabol/tork/releases/download/v0.1.22/default.release.tork_0.1.22_linux_amd64.tgz tork.tgz
RUN tar -zxf tork.tgz
RUN chmod +x ./tork

EXPOSE 8000

CMD ["./tork", "run", "coordinator"]

compose.yml

services:
  tork:
    build: ./
    ports:
      - "8000:8000"

Kicking jobs off with arbitrary data

Is there any way to set arbitrary data/ job context when POSTing a job yaml to tork?

Right now the flow I have is:

sequenceDiagram
    Dispatcher->>Tork: Create job
    Tork->>Dispatcher: JobSummary
    Dispatcher->>Dispatcher: Create config, persist job ID
    opt On Job Start
    Tork->>Dispatcher: Get Job Config
    Dispatcher->>Tork: map[string]string
    end

But what I'd quite like is to be able to kick a job off with some additional data.

If it can't be done, are there any other ways you can think of that would do this? I know tasks can have tags, which I could potentially mis-use

Add support for tmpfs mounts on Docker runtime

As opposed to volumes and bind mounts, a tmpfs mount is temporary, and only persisted in the host memory. When the container stops, the tmpfs mount is removed, and files written there won't be persisted.

This is useful to temporarily store sensitive files that you don't want to persist in either the host or the container writable layer.

Limitations:

Unlike volumes and bind mounts, you can't share tmpfs mounts between containers.
This functionality is only available if you're running Docker on Linux.

The goal of this issue is to add support for a tmpfs mount.

Rough outline of the implementation:

Add an tmpfs.go file to the docker packge.
Implement the Mounter interface.
Register the new mounter implementation when wiring in the Docker runtime.
Add tests.

Support of recurring jobs

Submit once:
-repeat until canceled,
-repeat X times or
-repeat till a specified date/time.

Examples of usecases:
-Folder watching
-Periodic web scraping
-IOT/network scans
-etc

Add support for Podman runtime

Tork has support for Docker as its default runtime environment. Podman and Docker share many similarities in terms of functionality and concepts, but they also have some differences.

One significant difference is that Podman operates in a daemonless mode by default. It runs containers as child processes of the Podman command, while Docker traditionally uses a daemon (dockerd) to manage containers. This can be an advantage in terms of security and resource usage for Podman.

Podman has robust support for running containers as non-root users, while Docker requires privileged access to the daemon. This is seen as a security improvement in Podman.

Podman had been gaining popularity within the containerization and DevOps communities. It had a growing and active community of users and contributors.

The goal of this feature is to add first-class support to running tasks within a Podman environment.

Rough outline of the implementation:

Create a new podman package in the runtime package.
Create a podman.go file within this package.
Create a PodmanRuntime struct to implement the Runtime interface.
Wire the runtime into the engine on startup when the runtime.type config is set to podman.
Add tests.

[feature request] If one parallel task fails - don't fail entire Job

Hello
Really important feature:
If one parallel task fails - don't fail entire Job and add possibility to re-run failed Task

Our case:
we have one big Job with 50x parallel tasks inside. Task lifetime > 5h. And If one task fails - it fails entire Job and we lose results of another workers.
If we restart this Job different workers can pickup random tasks again and it will the mess for us.

And one question: if task fails - will the post work after this?

[bug] entrypoint can't find script inside

Faces strange issue:

Job

run: |-
  ls -la && chmod +x script.sh
  ./script.sh

total 16
drwx------    2 root     root          4096 Nov 16 08:36 .
drwxrwxrwt    1 root     root          4096 Nov 16 08:36 ..
-r-xr-xr-x    1 root     root           109 Nov 16 08:36 entrypoint
-rw-r--r--    1 root     root          3146 Nov 16 08:36 script.sh
-rw----r--    1 root     root             0 Nov 16 08:36 stdout
/tmp/tork246489349/entrypoint: line 2: ./script.sh: not found

Can't understand what is going on

No env vars in pre/post tasks

There is a problem with passing env variables to pre/post tasks:

with:

[middleware.task.hostenv]
vars = [
  "AWS_ACCESS_KEY_ID", 
  "AWS_SECRET_ACCESS_KEY"
]

variables are correctly forwarded to tasks, but not to pre/post:

example:

tasks:
  - name: process
    run: aws s3 cp /tmp/master.mov s3://bucket/master.mov # will work
    image: amazon/aws-cli:latest
    post:
      - name: upload the final video to minio
        run: aws s3 cp /tmp/master.mov s3://bucket/master.mov # won't work (no AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY)
        image: amazon/aws-cli:latest

[feature] Option to stop/continue current Job with parallel tasks

Will be cool if you implement some feature to stop current job.
Example: we have a job with 1000 parallel tasks. And in some moment we want to reconfigure workers - we dont want to cancel job and restart it - because sone of parallel tasks already compleated.

If we have option Stop - tasks in job could change state to Not active for example and after we reconfigure workers or ENV or something else we can proceed Job with pressing Continue.

Sub-Job Task use mounts

Can I use mounts to share in Sub-Job Task?

Tork worker inside script doesn't see host env vars

I faced an issue when host\k8s pod has some env values and I want to use them (mount values from k8s secrets) but tork inside task doesn't see any env values at all. If I check env command I can see only this:

TORK_OUTPUT=/tmp/tork4231472022/stdout
SHLVL=2
_=/usr/bin/env
PWD=/tmp/tork4231472022

But host itself has a lot of values I want to use.
And the most strange thing: some time ago it worked fine. Maybe I'm missing something?

Task container port mapping

Hi, I'm having a need for one of my tasks to connect to a http server running on the worker node (for my usecase, task containers are spawned from the worker dockerd deamon). To do this, I have thought of several ways:

map http server port into the task container (-p 8080:8080)
add extra host param for the task container to have access to host.docker.internal hostname (--add-host=host.docker.internal:host-gateway)
use host network for task container (--net=host). This isn't a viable option for me due to security reason.

I think port mapping could be the easiest to implement

	for _, m := range t.Mounts {
	var mt mount.Type
	switch m.Type {
	case tork.MountTypeVolume:
	mt = mount.TypeVolume
	if m.Target == "" {
	return errors.Errorf("volume target is required")
	}
	case tork.MountTypeBind:
	mt = mount.TypeBind
	if m.Target == "" {
	return errors.Errorf("bind target is required")
	}
	if m.Source == "" {
	return errors.Errorf("bind source is required")
	}
	case tork.MountTypeTmpfs:
	mt = mount.TypeTmpfs
	default:
	return errors.Errorf("unknown mount type: %s", m.Type)
	}
	mount := mount.Mount{
	Type: mt,
	Source: m.Source,
	Target: m.Target,
	}
	log.Debug().Msgf("Mounting %s -> %s", mount.Source, mount.Target)
	mounts = append(mounts, mount)
	}

runabol / tork Goto Github PK

tork's Introduction

Features:

Documentation

Quick Start

Hello World

A slightly more interesting example

More examples

REST API

Swagger Docs

Web UI

License

tork's People

Contributors

Stargazers

Watchers

Forkers

tork's Issues

Recommend Projects

Recommend Topics

Recommend Org