Giter Club home page Giter Club logo

Comments (11)

jeffpeiyt avatar jeffpeiyt commented on May 28, 2024

@agustintorres , do you mean the count does not match?

Normally this happens with duplicated requests; as we deduplicate and use a hashmap to store the different requests.

Could you try to execute just 100 of the clientIds and see if you miss any? And try to put into a hashset first for the clientIds and see if there is any duplicate?

from parallec.

agustintorres avatar agustintorres commented on May 28, 2024

@jeffpeiyt Yes, that's what I mean.

I have also confirmed that there are no duplicates. They're 19749 unique requests.

If I make smaller requests(100 or even 8000), I do not miss any. It is only the 19,749 set for which I am missing results.

from parallec.

jeffpeiyt avatar jeffpeiyt commented on May 28, 2024

Interesting.

Could you please try

  • pass your hashmap to responseContext and save the responses / APIs to this hashmap during onCompleted ; then after the task is done to examine which api you miss. You can tell if any api was not able to go thru the onComplete..
  • debug and examine parallelTaskResult see the entry size of this hashmap.
  • Check the logs.... the logs have pretty good display of the progress such as 5000/6000 completes etc, is the total number correct when the task is running?

Some internal team has similar to single target usage and scale and they did not report this. I have not encountered this before either.

from parallec.

agustintorres avatar agustintorres commented on May 28, 2024

@jeffpeiyt Here are some findings:

The job steadily progresses and at some point it jumps from about 54% progress to 100%. It seems that it gets terminated mid-way. These are some things I see in the logs:

9/23/2016 17:31:58:816 .c.a.OperationWorker [atcher-6] INFO  - asyncWorker has not been initilized (null). Will not tell it cancel

09/23/2016 17:31:58:826 c.a.ExecutionManager [atcher-3] INFO  - ExecutionManager sending cancelPendingRequest at time: 2016-09-23 17:31:58.826-0400

09/23/2016 17:31:58:831 c.a.ExecutionManager [atcher-3] INFO  - task.totalJobNumActual : 19749 InitCount: 19749

09/23/2016 17:31:58:833 c.a.ExecutionManager [atcher-3] INFO  - task.response received Num 10716 

09/23/2016 17:31:58:836 c.a.ExecutionManager [atcher-3] INFO  - COMPLETED_WITH_ERROR.  19749 at time: 2016.09.23.17.31.58.836-0400

09/23/2016 17:31:58:839 .ParallelTaskManager [Thread-5] INFO  - !!COMPLETED sendTaskToExecutionManager : PT_19749_20160923172157463_df655ac0-66f at 2016-09-23 17:31:58.839-0400          GenericResponseMap in future size: 10716
09/23/2016 17:31:58:839 c.a.ExecutionManager [atcher-3] INFO  - 
Time taken to get all responses back : 601.318 secs

09/23/2016 17:31:58:842 .ParallelTaskManager [Thread-5] INFO  - Removed task PT_19749_20160923172157463_df655ac0-66f from the running inprogress map... . This task should be garbage collected if there are no other pointers.

I am putting everything in the responseContext on the onCompleted method and the size of the map at the end is: 10718.

Further, task.getParallelTaskResult().keySet().size() size is 19749.

Keep in mind that the job runs for about 10 minutes before it gets cancelled. Why would it terminate? Maybe there is some sort of timeout?

from parallec.

jeffpeiyt avatar jeffpeiyt commented on May 28, 2024

Sorry. My bad, we got internal users report the same issue. Very easy to fix. It is a global timeout that kills the whole job.

from parallec.

jeffpeiyt avatar jeffpeiyt commented on May 28, 2024

#38 has been tracked this issue. Please set it to a larger value. Default is 600 seconds ....

defaults:

    /**
     * The command manager internal timeout and cancel itself time in seconds
     * Note this may need to be adjusted for long polling jobs.
     */
    public static long timeoutInManagerSec = 600;

    /** The timeout the director send to the manager to cancel it from outside. */
    public static long timeoutAskManagerSec = timeoutInManagerSec + 10;

from parallec.

agustintorres avatar agustintorres commented on May 28, 2024

@jeffpeiyt Thanks! That fixed my problem and it works great now. I'm actually planning to do something similar for around 300,000 clientIds, even if it takes 5+ hours. Do you foresee any problems with it running for this long?

from parallec.

jeffpeiyt avatar jeffpeiyt commented on May 28, 2024

@agustintorres Great! I am updating the documents to be more clear on this.

I do not see any problems. We do run jobs that are on 100,000+ hosts and it runs fine. Please let us know with any issues you encounter.

from parallec.

jeffpeiyt avatar jeffpeiyt commented on May 28, 2024

Updated doc at : http://www.parallec.io/docs/configurations/#long-running-jobs

from parallec.

harjitdotsingh avatar harjitdotsingh commented on May 28, 2024

I'm still seeing this....

2018-06-23 00:46:03.401 INFO 20095 --- [ParallecActorSystem-akka.actor.default-dispatcher-10] io.parallec.core.actor.ExecutionManager :
[4]__RESP_RECV_IN_MGR 4 (+0) / 4 (100.00%) AFT 14.133 S @ API_2 @ 2018.06.23.00.46.03.401-0400 , TaskID : 0f39b86b-82a , CODE: NA, RESP_BRIEF: EMPTY , ERR: java.util.concurrent.TimeoutException: No response received after 14000

I have the following config Set as per the Doc

private ParallelTaskConfig genParallelTaskConfig() {

        ParallelTaskConfig config = new ParallelTaskConfig();
        config.setActorMaxOperationTimeoutSec(120);

        config.setTimeoutInManagerSec(120);
        config.setTimeoutAskManagerSec(710);

        return config;
    }

This is my task call

.setHttpHeaders(new ParallecHeader().addPair("x-user", env.getProperty("ifi.user")).addPair("x-password", env.getProperty("ifi.password"))).setProtocol(RequestProtocol.HTTPS)
                .setHttpPort(443)
                .setConfig(genParallelTaskConfig())
                .setTcpConnectTimeoutMillis(100*120)
                .async()
                .setReplaceVarMapToSingleTargetSingleVar("QUERY", queryList, "cdws21.ificlaims.com")
                .setResponseContext(returnMap)
                .execute((res, responseContext) -> 

from parallec.

jeffpeiyt avatar jeffpeiyt commented on May 28, 2024

from parallec.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.