Giter Club home page Giter Club logo

orkes-conductor-community's People

Contributors

boney9 avatar cherishsantoshi avatar imprakharsachan avatar macca2317 avatar manan164 avatar maryamghani avatar meggarr avatar nhandt2021 avatar nittamathew avatar rizafarheen avatar silent-lad avatar v1r3n avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

orkes-conductor-community's Issues

Swagger UI displays no operation - "No operations defined in spec! " error message

Describe the bug
When running the Docker standalone, http://localhost:8080/swagger-ui/index.html does not display operations, instead an error "No operations defined in spec!"

Steps To Reproduce
Run the self-contained, standalone Docker image as per instructions. Navigate to http://localhost:8080/, then click on "Swagger Documentation".

Expected behavior
The usual Swagger page to be displayed with operations.

Device/browser

  • OS: Centos 7
  • Browser: Firefox
  • Version 1.0.3 (of orkes-conductor-community)

Additional context
The same issue occurs when downloading, building the server and running locally outside Docker against local REDIS and Postgres.

Conductor:UI: Only workflows with status failed are visible

Describe the bug
Since version 1.0.7 only

Steps To Reproduce
Steps to reproduce the behavior:

  1. Run orkesio/orkes-conductor-community-standalone:1.0.7 or higher in Docker as described in readme:

docker volume create postgres
docker volume create redis
docker run --init -p 8080:8080 -p 1234:5000 --mount source=redis,target=/redis
--mount source=postgres,target=/pgdata orkesio/orkes-conductor-community-standalone:latest

  1. Navigate to the Conductor UI (http://localhost:1234)
  2. Go to Workbench and start workflow "load_test" -> which is now in status running
  3. Start Workflow "http" -> which is now in status failed
  4. Go to the Executions Window: Only failed workflows are displayed. Even if you select "running" in the status dropdown, only failed tasks are shown.

Expected behavior
I expect that all workflows are shown.

Device/browser

  • OS: Docker on Windows (WSL2), Docker on Ubuntu 22.04
  • Browser: Chrome
  • Version: 117

Additional context
In version 1.0.6 it worked fine. Bug is here since 1.0.7.

missing curl in standalone docker image => health check always down

Describe the bug
We run the docker image orkesio/orkes-conductor-community-standalone:latest and the container is always unhealthy because curl seem to be missing

"Health": {
"Status": "unhealthy",
"FailingStreak": 386,
"Log": [
{
"Start": "2024-07-08T16:51:14.249784013Z",
"End": "2024-07-08T16:51:14.344971847Z",
"ExitCode": 1,
"Output": "/bin/sh: curl: not found\n"
},
{
"Start": "2024-07-08T16:52:14.346606472Z",
"End": "2024-07-08T16:52:14.433134513Z",
"ExitCode": 1,
"Output": "/bin/sh: curl: not found\n"
},
{
"Start": "2024-07-08T16:53:14.440914138Z",
"End": "2024-07-08T16:53:14.549711097Z",
"ExitCode": 1,
"Output": "/bin/sh: curl: not found\n"
},
{
"Start": "2024-07-08T16:54:14.552011055Z",
"End": "2024-07-08T16:54:14.644104638Z",
"ExitCode": 1,
"Output": "/bin/sh: curl: not found\n"
},
{
"Start": "2024-07-08T16:55:14.646305222Z",
"End": "2024-07-08T16:55:14.75006993Z",
"ExitCode": 1,
"Output": "/bin/sh: curl: not found\n"
}
]

Race condition when indexTask with ES

Describe the bug
Race condition found when indexTask with ES, the index requests' count sent by conductor server are not matched by received on ES side

Steps To Reproduce
Steps to reproduce the behavior:

  1. Run multiple tasks in parallel
  2. Change ES index log to debug
  3. Logged requests in ES (Using ES7 , may same in ES6)
  4. Some task status finished in IN_PROGRESS rather than COMPLETED after workflow COMPLETED
  5. On the other hand, the persistency component status is right (using postgres as persistency)
  6. indexBatchSize is default as 1 and asyncIndexingEnabled is also default as false
    Even better - add a Loom video where you walk through the steps of the error.

Expected behavior
All the task should in terminated status, such as COMPLETED/FAILED in ES rather than IN_PROGRESS

Device/browser

  • OS: Ubuntu
  • Browser N/A
  • Version 3.14

Additional context

  1. When debug log opened, following log printed right in our env, we have 3 index requests per task, which logged in ElasticSearchRestDAOV7.java -> indexTask, the average time cost is less than 30 ms

Time taken {} for indexing task:{} in workflow: {}

  1. On ES side, the received records count is less than 3 randomly

  2. Seem that, there is a race condition in function indexObject and indexBulkRequest,

`

private void indexObject(
        final String index, final String docType, final String docId, final Object doc) {

    byte[] docBytes;
    try {
        docBytes = objectMapper.writeValueAsBytes(doc);
    } catch (JsonProcessingException e) {
        logger.error("Failed to convert {} '{}' to byte string", docType, docId);
        return;
    }
    IndexRequest request = new IndexRequest(index);
    request.id(docId).source(docBytes, XContentType.JSON);

    if (bulkRequests.get(docType) == null) {
        bulkRequests.put(
                docType, new BulkRequests(System.currentTimeMillis(), new BulkRequest()));
    }

    bulkRequests.get(docType).getBulkRequest().add(request);
    if (bulkRequests.get(docType).getBulkRequest().numberOfActions() >= this.indexBatchSize) {
        indexBulkRequest(docType);
    }
}

private synchronized void indexBulkRequest(String docType) {
    if (bulkRequests.get(docType).getBulkRequest() != null
            && bulkRequests.get(docType).getBulkRequest().numberOfActions() > 0) {
        synchronized (bulkRequests.get(docType).getBulkRequest()) {
            indexWithRetry(
                    bulkRequests.get(docType).getBulkRequest().get(),
                    "Bulk Indexing " + docType,
                    docType);
            bulkRequests.put(
                    docType, new BulkRequests(System.currentTimeMillis(), new BulkRequest()));
        }
    }
}`
  • No lock found when add request to bulkRequest in indexObject
  • lock found when sent bulkRequest and removed local bulkRequest in indexBulkRequest
  • When exec with order in 2 threads as, T1 sent bulkRequest -> T2 add request to bulkRequest -> T2 might wait on synchronized of indexBulkRequest -> T1 removed local bulkRequest -> T2 runs into indexBulkRequest and failed with check, nothing to be sent/or even if T3 added a new one to empty bulkRequest so that the check past ...

Thanks

Conductor.db.type of redis_sentinel is being ignored

I'm trying to run the orkes server, pointing it to a redis sentinel cluster. I am mounting the following in /app/config/config.properties

    spring.datasource.url=jdbc:postgresql://postgres:5432/postgres

    spring.datasource.username=postgres

    spring.datasource.password=postgres

    conductor.db.type=redis_sentinel

    conductor.redis-lock.serverAddress=redis://redis:26379

    conductor.redis.hosts=redis:26379:this-one

Below is the output from orkes, grepping for the wording redis:

10:32:07.812 [main] INFO io.orkes.conductor.OrkesConductorApplication - System Env Props - Key: REDIS_PORT_6379_TCP_PROTO, Value: tcp

10:32:07.812 [main] INFO io.orkes.conductor.OrkesConductorApplication - System Env Props - Key: REDIS_PORT_6379_TCP_ADDR, Value: 10.43.51.71

10:32:07.812 [main] INFO io.orkes.conductor.OrkesConductorApplication - System Env Props - Key: REDIS_PORT, Value: tcp://10.43.51.71:6379

10:32:07.813 [main] INFO io.orkes.conductor.OrkesConductorApplication - System Env Props - Key: REDIS_SERVICE_PORT_TCP_SENTINEL, Value: 26379

10:32:07.813 [main] INFO io.orkes.conductor.OrkesConductorApplication - System Env Props - Key: REDIS_PORT_26379_TCP, Value: tcp://10.43.51.71:26379

10:32:07.814 [main] INFO io.orkes.conductor.OrkesConductorApplication - System Env Props - Key: REDIS_PORT_26379_TCP_ADDR, Value: 10.43.51.71

10:32:07.814 [main] INFO io.orkes.conductor.OrkesConductorApplication - System Env Props - Key: REDIS_PORT_26379_TCP_PORT, Value: 26379

10:32:07.814 [main] INFO io.orkes.conductor.OrkesConductorApplication - System Env Props - Key: REDIS_SERVICE_HOST, Value: 10.43.51.71

10:32:07.815 [main] INFO io.orkes.conductor.OrkesConductorApplication - System Env Props - Key: REDIS_SERVICE_PORT_TCP_REDIS, Value: 6379

10:32:07.815 [main] INFO io.orkes.conductor.OrkesConductorApplication - System Env Props - Key: REDIS_PORT_26379_TCP_PROTO, Value: tcp

10:32:07.815 [main] INFO io.orkes.conductor.OrkesConductorApplication - System Env Props - Key: REDIS_PORT_6379_TCP, Value: tcp://10.43.51.71:6379

10:32:07.815 [main] INFO io.orkes.conductor.OrkesConductorApplication - System Env Props - Key: REDIS_PORT_6379_TCP_PORT, Value: 6379

10:32:07.816 [main] INFO io.orkes.conductor.OrkesConductorApplication - System Env Props - Key: REDIS_SERVICE_PORT, Value: 6379

10:32:07.832 [main] INFO io.orkes.conductor.OrkesConductorApplication - Setting conductor.redis-lock.serverAddress - redis://redis:26379

10:32:07.832 [main] INFO io.orkes.conductor.OrkesConductorApplication - Setting conductor.db.type - redis_sentinel

10:32:07.832 [main] INFO io.orkes.conductor.OrkesConductorApplication - Setting conductor.redis.hosts - redis:26379:this-one

ESC[30m2023-02-21 10:32:18,624ESC[0;39m ESC[34mINFO ESC[0;39m [ESC[34mmainESC[0;39m] ESC[33mio.orkes.conductor.queue.config.RedisQueueConfigurationESC[0;39m: Starting conductor server using redis_standalone - use SSL? false

ESC[30m2023-02-21 10:32:19,055ESC[0;39m ESC[1;31mERRORESC[0;39m [ESC[34mmainESC[0;39m] ESC[33mcom.netflix.conductor.redis.dao.RedisMetadataDAOESC[0;39m: refresh TaskDefs failed

redis.clients.jedis.exceptions.JedisDataException: ERR unknown command `HSCAN`, with args beginning with: `conductor.test.TASK_DEFS`, `0`,

        at redis.clients.jedis.Protocol.processError(Protocol.java:135)

        at redis.clients.jedis.Protocol.process(Protocol.java:169)

        at redis.clients.jedis.Protocol.read(Protocol.java:223)

        at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:352)

        at redis.clients.jedis.Connection.getUnflushedObjectMultiBulkReply(Connection.java:314)

        at redis.clients.jedis.Connection.getObjectMultiBulkReply(Connection.java:319)

        at redis.clients.jedis.Jedis.hscan(Jedis.java:3727)

        at redis.clients.jedis.Jedis.hscan(Jedis.java:3719)

        at com.netflix.conductor.redis.jedis.JedisStandalone.lambda$hscan$127(JedisStandalone.java:706)

        at com.netflix.conductor.redis.jedis.JedisStandalone.executeInJedis(JedisStandalone.java:59)

        at com.netflix.conductor.redis.jedis.JedisStandalone.hscan(JedisStandalone.java:706)

        at com.netflix.conductor.redis.jedis.OrkesJedisProxy.hgetAll(OrkesJedisProxy.java:148)

        at com.netflix.conductor.redis.dao.RedisMetadataDAO.getAllTaskDefs(RedisMetadataDAO.java:125)

        at com.netflix.conductor.redis.dao.RedisMetadataDAO.refreshTaskDefs(RedisMetadataDAO.java:92)

        at com.netflix.conductor.redis.dao.RedisMetadataDAO.<init>(RedisMetadataDAO.java:57)

        at com.netflix.conductor.redis.dao.OrkesMetadataDAO.<init>(OrkesMetadataDAO.java:55)

ESC[30m2023-02-21 10:32:19,059ESC[0;39m ESC[34mINFO ESC[0;39m [ESC[34mmainESC[0;39m] ESC[33mcom.netflix.conductor.redis.dao.OrkesMetadataDAOESC[0;39m: taskDefCacheTTL set to 1000

So I believe I am loading the config fine, but it's still running the standalone Redis configuration. It looks like the default config, despite me attempting to override it and the output messages suggesting I've done that, is still take precedence?

Any ideas please?

Wrong row count in executions UI landing page

Describe the bug
In Ui even though there are many execution requests it does not show correct row and pagination count

Steps To Reproduce
Steps to reproduce the behavior:

  1. Make sure you have many executions in prior preferably more than 15

  2. Go to Executions tab, keep default view count of executions as 15, the page nav icons are disabled and number of executions count is wrongly displayed
    Screenshot 2022-09-16 at 1 09 21 PM

  3. Click on drop down to increase row count and you will be able to more extra records
    Screenshot 2022-09-16 at 1 13 49 PM

Expected behavior
Rows count should be correct and also the nav icons allow to navigate.

Additional context
These workflows were spawn from backend using locahost:1234 endpoints not via UI workbench. Thanks

Conductor client does not work with spring cloud config

Describe the bug
We are working on an adhoc task workflow. For that we have created a new spring boot service with conductor client. We are using conductor server in docker [orkesio/orkes-conductor-community-standalone:latest].
My service works fine without spring cloud config. It is able to poll for the tasks and execute it. However, when I add spring boot cloud config dependencies, it is able to poll for tasks and hence it does not execute it.

Steps To Reproduce
Steps to reproduce the behavior:
I have created a demo project in my github.

  1. Go to https://github.com/arpitrathore/conductor-cloud-config-test)
  2. Follow the steps in the README to spin up two docker containers. One for spring cloud config and one for conductor server.
  3. Start the service by running the main method in src/main/java/com/arpitrathore/test/Application.java
  4. Run following curl command to submit a task
curl -H 'Content-Type: application/json' http://localhost:8080/submit/ -d '{"someId": 123}'

Notice that the service is NOT able to poll the task and execute it.

  1. Now switch the branch to without-cloud-config. This branch does not have spring cloud dependency. Run the main method in src/main/java/com/arpitrathore/test/Application.java again.
  2. Run following curl command to submit a task
curl -H 'Content-Type: application/json' http://localhost:8080/submit/ -d '{"someId": 123}'

Notice the service is able to poll the task and execute it.

Expected behavior
Service should poll and execute the task with or without spring cloud config dependency

Device/browser

  • OS: Mac OS M1
  • Browser NA

High redis usage caused by OrkesWorkflowSweeper

In our production use-case, we often have long running workflows that wait on human tasks.
Because we want to be able to track human tasks in our own backoffice systems, we created a subworkflow that creates and tracks human tasks for us and ends with a HUMAN task in conductor.
We noticed an absurd load on REDIS, even when every single currently non-completed workflow is idling on a subworkflow that's idling on a HUMAN task. Looking into it more we noticed that our logs are getting spammed with
INFO [sweeper-thread-1] io.orkes.conductor.server.service.OrkesWorkflowSweeper: Running sweeper for workflow ***. This constantly fetches the workflows and its tasks, and it seems like it is currently impossible to slow this process down.

Looking into the contradictory statements of this code and it's comment : https://github.com/orkes-io/orkes-conductor-community/blob/60325ef7b196a96d1062ddfecf924c4be7866309/server/src/main/java/io/orkes/conductor/server/service/OrkesWorkflowSweeper.java#L152C4-L152C4 ( Comment says 60 seconds, code is 60 milis ) , I'm worried a mistake might have been made in the implementation of the sweeper service, and workflows are being checked way more often than they should be.

I believe this to be a root cause of our production systems failing under relatively light load. Is there any way to slow down the sweeper without disabling it completely, or does a bug need to be fixed?

RetryDelay do not work

Describe the bug
parameter retryDelaySeconds in tasks definition do not add delay while retrying the task. The issue is observed with all tasks.

Can try to reproduce on sample workflow in this repo:
path: orkes-conductor-community-build/persistence/src/test/resources/wf2.json

Expected behavior
Failed Tasks should retry after delay of N seconds

startDelay do not work

Describe the bug
I created a http task and inline task with an intention of starting it with an delay of 5 seconds, there is a predined property suggested to use for this feature - startDelay( in seconds). After adding a delay of 5 (also tried 5000) but it didnot work, workflows are starting instantaneously.

Reproducible with all workflows, can try on sample workflow in this workflow by modifing the value of startDelay:
Path: orkes-conductor-community-build/server/src/main/resources/workflows.json

Expected behavior
Workflow should start after N seconds

Device/browser
Across all browsers

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.