Giter Club home page Giter Club logo

Comments (22)

GoogleCodeExporter avatar GoogleCodeExporter commented on August 12, 2024
additional info:
log file doesn't say anything about disconnection. maybe the worker just 
doesn't accept new jobs anymore ?!

Original comment by [email protected] on 13 Aug 2012 at 7:41

from java-gearman-service.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 12, 2024
Hi stiwanchinazki,

I'm looking into this. I'll keep you updated.

Thanks,
isaiah.v

Original comment by [email protected] on 13 Aug 2012 at 7:41

from java-gearman-service.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 12, 2024
So far, I cannot reproduce this. Could you attach your client too?

Original comment by [email protected] on 14 Aug 2012 at 2:25

from java-gearman-service.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 12, 2024
Hi Isaiah,

thanks for your effort.

I created two pastebins.

Client: http://pastebin.com/kF3HvQUP
Worker: http://pastebin.com/JHfTjK11

Both processes are still alive when gearman doesn't do anything anymore (i can 
notice this, because a lifesign is written to a mysql database)

-Stiwan

Original comment by [email protected] on 15 Aug 2012 at 8:48

from java-gearman-service.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 12, 2024
Try to set the maximum concurrency (worker) to a very low value, like 2 or 3.

i have the feeling, that after that number of worked jobs, the worker doesn't 
accept new jobs.

Original comment by [email protected] on 15 Aug 2012 at 10:42

from java-gearman-service.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 12, 2024
in my production environment, i start about 10 workers (10 functions with 1-10 
max concurrency). in the beginning it says:

130 [gearman-7] INFO gearman - [remoteip:4730] : Connected
130 [gearman-9] INFO gearman - [remoteip:4730] : Connected
130 [gearman-8] INFO gearman - [remoteip:4730] : Connected
130 [gearman-5] INFO gearman - [remoteip:4730] : Connected
130 [gearman-6] INFO gearman - [remoteip:4730] : Connected
130 [gearman-3] INFO gearman - [remoteip:4730] : Connected
130 [gearman-1] INFO gearman - [remoteip:4730] : Connected
130 [gearman-4] INFO gearman - [remoteip:4730] : Connected
132 [gearman-10] INFO gearman - [remoteip:4730] : Connected
132 [gearman-11] INFO gearman - [remoteip:4730] : Connected
132 [gearman-9] INFO gearman - [remoteip:4730] : OUT : CAN_DO
132 [gearman-8] INFO gearman - [remoteip:4730] : OUT : CAN_DO
132 [gearman-5] INFO gearman - [remoteip:4730] : OUT : CAN_DO
132 [gearman-6] INFO gearman - [remoteip:4730] : OUT : CAN_DO
132 [gearman-1] INFO gearman - [remoteip:4730] : OUT : CAN_DO
132 [gearman-4] INFO gearman - [remoteip:4730] : OUT : CAN_DO
132 [gearman-3] INFO gearman - [remoteip:4730] : OUT : CAN_DO
132 [gearman-10] INFO gearman - [remoteip:4730] : OUT : CAN_DO
132 [gearman-7] INFO gearman - [remoteip:4730] : OUT : CAN_DO
134 [gearman-11] INFO gearman - [remoteip:4730] : OUT : CAN_DO
134 [gearman-12] INFO gearman - [remoteip:4730] : Connected
134 [gearman-12] INFO gearman - [remoteip:4730] : OUT : CAN_DO
135 [gearman-4] INFO gearman - [remoteip:4730] : OUT : GRAB_JOB
135 [gearman-13] INFO gearman - [remoteip:4730] : OUT : GRAB_JOB
135 [gearman-15] INFO gearman - [remoteip:4730] : OUT : GRAB_JOB
136 [gearman-3] INFO gearman - [remoteip:4730] : OUT : GRAB_JOB
136 [gearman-4] INFO gearman - [remoteip:4730] : OUT : GRAB_JOB
136 [gearman-14] INFO gearman - [remoteip:4730] : OUT : GRAB_JOB
138 [gearman-4] INFO gearman - [remoteip:4730] : Connected
138 [gearman-9] INFO gearman - [remoteip:4730] : OUT : GRAB_JOB
138 [gearman-4] INFO gearman - [remoteip:4730] : OUT : CAN_DO
138 [gearman-8] INFO gearman - [remoteip:4730] : OUT : GRAB_JOB
138 [gearman-6] INFO gearman - [remoteip:4730] : OUT : GRAB_JOB
138 [gearman-14] INFO gearman - [remoteip:4730] : Connected
138 [gearman-13] INFO gearman - [remoteip:4730] : OUT : GRAB_JOB
139 [gearman-14] INFO gearman - [remoteip:4730] : OUT : CAN_DO
139 [gearman-8] INFO gearman - [remoteip:4730] : OUT : GRAB_JOB
141 [gearman-8] INFO gearman - [remoteip:4730] : Connected
141 [gearman-8] INFO gearman - [remoteip:4730] : OUT : CAN_DO
142 [gearman-14] INFO gearman - [remoteip:4730] : OUT : GRAB_JOB
156 [gearman-14] INFO gearman - [remoteip:4730] : IN : JOB_ASSIGN
158 [gearman-14] INFO gearman - [remoteip:4730] : IN : JOB_ASSIGN
160 [gearman-14] INFO gearman - [remoteip:4730] : IN : NO_JOB
160 [gearman-14] INFO gearman - [remoteip:4730] : OUT : PRE_SLEEP
161 [gearman-14] INFO gearman - [remoteip:4730] : IN : JOB_ASSIGN
162 [gearman-14] INFO gearman - [remoteip:4730] : IN : NO_JOB
162 [gearman-14] INFO gearman - [remoteip:4730] : OUT : PRE_SLEEP
168 [gearman-5] INFO gearman - [remoteip:4730] : IN : JOB_ASSIGN
173 [gearman-5] INFO gearman - [remoteip:4730] : IN : JOB_ASSIGN
175 [gearman-5] INFO gearman - [remoteip:4730] : IN : NO_JOB
175 [gearman-5] INFO gearman - [remoteip:4730] : OUT : PRE_SLEEP
178 [gearman-5] INFO gearman - [remoteip:4730] : IN : NO_JOB
178 [gearman-5] INFO gearman - [remoteip:4730] : OUT : PRE_SLEEP
179 [gearman-5] INFO gearman - [remoteip:4730] : IN : JOB_ASSIGN
179 [gearman-7] INFO gearman - [remoteip:4730] : OUT : GRAB_JOB



and in the end it's down to two:

19340115 [gearman-619] INFO gearman - [remoteip:4730] : OUT : GRAB_JOB
19340115 [gearman-614] INFO gearman - [remoteip:4730] : OUT : GRAB_JOB
19340128 [gearman-614] INFO gearman - [remoteip:4730] : IN : NO_JOB
19340128 [gearman-614] INFO gearman - [remoteip:4730] : OUT : PRE_SLEEP
19340129 [gearman-614] INFO gearman - [remoteip:4730] : IN : NO_JOB
19340129 [gearman-614] INFO gearman - [remoteip:4730] : OUT : PRE_SLEEP

VisualVM shows that there are 5 threads alive (in the beginning it were 10) and 
the number of those threads is varying between 2 and 5.

Original comment by [email protected] on 15 Aug 2012 at 2:41

from java-gearman-service.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 12, 2024
Based on the log... it looks like the worker is still asking for work, but the 
server is not giving it. Periodically, are seeing the workers sending 
"GRAB_JOB" packets? and getting a "NO_JOB" response? If so... I'm thinking the 
worker is working correctly. If that's the case, I'd make sure the server's 
queue was not empty, and the client was still successfully submitting jobs. Use 
the GearmanJobReturn returned from the call, 
client.submitBackgroundJob(functionName,new byte[0]).

Given there is a maximum of concurrency of 5 per worker and 3 workers, I would 
expect 15+ threads if all the workers are kept busy.

Original comment by [email protected] on 16 Aug 2012 at 2:54

from java-gearman-service.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 12, 2024
Isaiah,

yes, it's still periodically sending GRAB_JOB.
How can i check the server's queue?
What should i do with the GearmanJobReturn?
I see there are two methods: poll and isEOF. ´´

Currently, i'm just submitting and doing nothing with the GearmanJobReturn.

-Stiwan

Original comment by [email protected] on 16 Aug 2012 at 4:34

from java-gearman-service.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 12, 2024
Isaiah,

when restarting the client, the worker still doesn't accept new jobs. seems to 
be on the worker's end. it seems to me, that the worker doesn't mark the jobs 
as finished and doesn't accept new ones:

in production, my client submits one function at a time (let's call the 
function foo). the worker has max concurrency 10 for foo. after foo is 
finished, the client submits the next foo job. after 10 jobs, the worker 
doesn't accept new jobs.

My Client shows this when submitting a Background job (at the time, when the 
job doesn't reach the worker)

19:02:26,896 [main] INFO  - Submitted Job: PgTX2zwr
133794 [gearman-20] INFO gearman - [remoteip:4730] : OUT : SUBMIT_JOB_BG
133822 [gearman-20] INFO gearman - [remoteip:4730] : IN : JOB_CREATED
19:02:26,966 [JobStatusPrint-PgTX2zwr] INFO  - Event: GEARMAN_SUBMIT_SUCCESS

after Gearman_Submit_Success it is instantly EOF, which should be right.

I'm currently checking, if long jobs (which take about 60+ minutes) could cause 
this behaviour. i tested it in production with very long jobs, and now i'm down 
to using 1-5 minutes, which is now working fine. will get back to you about that

-Stiwan

Original comment by [email protected] on 16 Aug 2012 at 6:27

from java-gearman-service.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 12, 2024
Hi Stiwan,

Is v0.6.5 the first version you have used? Or did you upgrade from an earlier 
version? Because between v0.6.4 and v0.6.5, I added in a check to see if a 
disconnect occurred while working on a job. If the check fails, the complete 
packet is not sent.

I'm thinking gearmand is closing the connection due to a timeout.

In versions lower than 0.6.5, if a disconnect occurred and the connection was 
re-established while the GearmanFunction was executing a job, the worker would 
still tell the server that it completed the job. I found that to be an 
undesirable behavior, so I "fixed" it.

On the server side, when a disconnect occurs, it re-assigns the job. So the job 
may go to more than one worker (or given concurrency is greater than 1, it may 
be the same worker). When the WORK_COMPLETE packet comes in, the server does 
not care if it doesn't come from the assigned worker. The server will complete 
the job. If more than one worker attempts to complete a job, it just takes the 
first one.

I think what might be happening now it, gearmand is disconnecting from all of 
the workers and reestablishing connections, the worker sees that it had 
disconnected before it finished the job, and does not sent the WORK_COMPLETE 
packet.

Does that make since. If we find this to be the case, the next step would be to 
figure out a way to make sure the worker is not disconnecting from the server.

Original comment by [email protected] on 16 Aug 2012 at 7:28

from java-gearman-service.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 12, 2024
Yes, i also think, that this behaviour is caused by a timeout.
I could observe this behaviour:
1. "short" jobs - about 1-5 min - are completed successfully and their workers 
accept new jobs
2. "long" jobs - about 20+ min - are completed successfully but their workers 
DONT accept new jobs.

I tested this twice about 4 hours. I conclude that it must have something to do 
with a timeout. I don't know if this is caused by gearmand or 
java-gearman-service.
I saw that the PHP binding has a settimeout function for worker: 
php.net/manual/de/gearmanworker.settimeout.php

Maybe it would work better if i sent a signal from the worker every once in 
awhile?
I'm not sure if WORK_COMPLETE is sent, according to my log it is sent in some 
cases, have to dig deeper into this.

I'm using 0.6.5 but was using 0.6.2 before - i don't think their are any 
remains in my project - i don't understand this comment.

I'll check the WORK_COMPLETE behaviour, will post the result shortly.

Original comment by [email protected] on 17 Aug 2012 at 6:00

from java-gearman-service.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 12, 2024
Short addition to the observed behaviour:

"long" jobs (with 20+ min) - accept only maxconcurrency-1 new jobs.
so if you had a worker with maxconcurrency 10 and it shows this behaviour, the 
worker automatically has a max concurrenvy of 9.
this explains why workers with concurrency 1 stop working after long jobs (1-1 
= 0 :D)

Original comment by [email protected] on 17 Aug 2012 at 6:06

from java-gearman-service.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 12, 2024
[deleted comment]

from java-gearman-service.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 12, 2024
OK i checked the long (20+ min) jobs.
It seems that they actually send WORK_COMPLETE

2040609 [MYJOB-ZYqxDpTQ] INFO gearman - [remoteip:4730] : OUT : WORK_COMPLETE

please notice, that this worker now doesn't accept new jobs.
the short jobs also send WORK_COMPLETE, but accept new jobs.

Original comment by [email protected] on 17 Aug 2012 at 7:01

from java-gearman-service.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 12, 2024
aaaah breakthrough :-)
i see now what is causing all this!

09:23:02,530 [gearman-37] WARN  - Lost Gearman Connection: FUNCTION2 - 
UNEXPECTED_DISCONNECT (remoteip:4730)

i log this event when lostConnection in the GearmanLostConnectionPolicy is 
called for the worker.
Notice that this is another worker than the original worker (i have a setup 
where worker1 and worker2 alternate)

so there must be some kind of disconnect. my lost connection policy looks like 
this:

this.gearmanWorker.setLostConnectionPolicy(new GearmanLostConnectionPolicy() {
                @Override
                public GearmanLostConnectionAction lostConnection(GearmanServer gearmanServer, GearmanLostConnectionGrounds gearmanLostConnectionGrounds) {
                    logger.warn("Lost Gearman Connection: " + worker.getJobType() + " - " + gearmanLostConnectionGrounds + " (" + gearmanServer.getHostName() + ":" + gearmanServer.getPort() + ")");
                    return GearmanLostConnectionAction.RECONNECT;
                }

                @Override
                public void shutdownServer(GearmanServer gearmanServer) {
                    logger.warn("Shutdown Gearman Server: " + worker.getJobType() + " (" + gearmanServer.getHostName() + ":" + gearmanServer.getPort() + ")");
                }
            });

-Stiwan
and

this.gearmanWorker.setReconnectPeriod(5, TimeUnit.MINUTES);

-Stiwan

Original comment by [email protected] on 17 Aug 2012 at 7:29

from java-gearman-service.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 12, 2024
it seems that, if i call "sendData(new byte[0])", there's no disconnect to the 
jobserver.

so, that's my temporary solution, but i hope we find the solution to this issue.

Original comment by [email protected] on 18 Aug 2012 at 10:32

from java-gearman-service.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 12, 2024
Hi Stiwan,

From a user standpoint, you should not need to ping the server in order to keep 
the connection alive. The worker should do this automatically. There are 
"ECHO_REQ" and "ECHO_RES" packets in the gearman 
protocol(http://gearman.org/index.php?id=protocol). I think I'm going to use 
this to ping the server if communication has not occurred some period of time.

Thanks,
isaiah.v

Original comment by [email protected] on 18 Aug 2012 at 4:17

  • Changed state: Accepted
  • Added labels: Priority-High
  • Removed labels: Priority-Medium

from java-gearman-service.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 12, 2024
Hi Stiwan,

As a side note, this has nothing to do with the issue at hand, but you might be 
able to save a few cpu cycles by getting rid of the three threads created 
created when setting up the workers.

If you want to setup 3 workers, you could do this:

public class BugWorker {
    public static void main(String[] args) throws Exception {
        Gearman gearman = Gearman.createGearman();

        GearmanWorker[] workers = new GearmanWorker[3];
        workers[0] = createWorker("function1", gearman, new WorkerObject());
        workers[1] = createWorker("function2", gearman, new WorkerObject());
        workers[2] = createWorker("function3", gearman, new WorkerObject());
    }

    public static GearmanWorker createWorker(String functionName, Gearman gearman, WorkerObject workerObject) {
        GearmanWorker worker = gearman.createGearmanWorker();
        worker.addServer(gearman.createGearmanServer("ip",1234));
        worker.addFunction(functionName, workerObject);
        worker.setMaximumConcurrency(5);
        worker.setReconnectPeriod(5, TimeUnit.MINUTES);

        return worker;
    }
}

final class WorkerObject implements GearmanFunction {
    @Override
    public byte[] work(String s, byte[] bytes, GearmanFunctionCallback gearmanFunctionCallback) throws Exception {
        // do some stuff.   for example sleep 10 seconds.
        Thread.sleep(10000);
        return new byte[0];
    }
}


The will remove the statement, "while(true);"  You may even want to use only 
one worker with all three function registered to it. If so, try this:


public class BugWorker {
    public static void main(String[] args) throws Exception {
        Gearman gearman = Gearman.createGearman();

        GearmanWorker worker = gearman.createGearmanWorker();
        worker.addServer(gearman.createGearmanServer("ip",1234));

        worker.addFunction("function1", new WorkerObject());
        worker.addFunction("function2", new WorkerObject());
        worker.addFunction("function3", new WorkerObject());

        worker.setMaximumConcurrency(15);
        worker.setReconnectPeriod(5, TimeUnit.MINUTES);
    }
}

final class WorkerObject implements GearmanFunction {
    @Override
    public byte[] work(String s, byte[] bytes, GearmanFunctionCallback gearmanFunctionCallback) throws Exception {
        // do some stuff.   for example sleep 10 seconds.
        Thread.sleep(10000);
        return new byte[0];
    }
}

Original comment by [email protected] on 18 Aug 2012 at 4:47

from java-gearman-service.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 12, 2024
Hi Isaiah,

thanks for your help regarding the improvement of my threading.

It's great to hear that i could help a little and that this issue will be fixed 
:-)

-Stiwan

Original comment by [email protected] on 18 Aug 2012 at 5:49

from java-gearman-service.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 12, 2024
Hi Stiwan,

I really appreciate the feedback. If you find any more issues or have any 
questions, please let me know.

I'll keep you updated on the progress of this issue.

thanks,
isaiah.v

Original comment by [email protected] on 18 Aug 2012 at 7:44

from java-gearman-service.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 12, 2024
Fix committed to trunk. This fix will be in version 0.6.6. This should be 
released in the next week.

Original comment by [email protected] on 22 Sep 2012 at 10:26

from java-gearman-service.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 12, 2024

Original comment by [email protected] on 22 Sep 2012 at 10:27

  • Changed state: Fixed

from java-gearman-service.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.