Giter Club home page Giter Club logo

Comments (14)

zhangsj0608 avatar zhangsj0608 commented on July 27, 2024 3

Hi, there

To build the enviroment needs no more operations than jus cloning the whole repository. To refer, my tf.version is 1.13 and python version is 3.6.

from decima-sim.

hongzimao avatar hongzimao commented on July 27, 2024

Ok we need to debug this - it's been a while since I trained with the makespan reward. The reward calculation is here

elif args.learn_obj == 'makespan':
reward -= (curr_time - self.prev_time) / \
args.reward_scale
. You should check if the reward at each action checks out with this reward calculation.

The learning curve you show is helpful - it shows the agent doesn't get any learning signal, the actor loss is essential 0 (in the scale of 1e-11). It's likely that the reward agent gets is all 0 or all constant. Somewhere the reward assignment to the action is off.

I will try to squeeze some time to run the code myself too - but could you run it and print out the reward to start debugging? Thanks!

from decima-sim.

zhangsj0608 avatar zhangsj0608 commented on July 27, 2024

Hi, there

I have tried for a couple of times with careful setting on args, however, the problem persists. I suppose the reward (shown in line 33 above) is obtained with a static time interval, e.g. from last scheduling to current scheduling step. Then the long-term return is calculated by summation of them, which is actually the time point of the final scheduling step. It is not the makespan of all jobs, as some might be still running after the scheduling. Just guess it's because the reward does not reflect the accurate makespan at all. The problem seems to be what the proper reward function that reflects the metric (makespan) should be like?

from decima-sim.

hongzimao avatar hongzimao commented on July 27, 2024

For makespan, it only makes sense to run a fixed batch of jobs (i.e., no new arrival of jobs). In your settings, did you set these settings --num_stream_* to 0 and only use --num_init_dags?

from decima-sim.

zhangsj0608 avatar zhangsj0608 commented on July 27, 2024

Hi, Hongzi,

It might be the problem, as I am not aware of the effects of stream jobs in the system on make-span. Actually, I kept them 200 (num stream jobs) each episode. I will quicly figure it out and see the result.

from decima-sim.

Nannnnnn avatar Nannnnnn commented on July 27, 2024

Hi, Hongzi

I noticed your code supports the makespan-optimized policy by setting args.learn_obj to 'makespan'. However, when trained with the recommended small scale setting (200 stream jobs on 8 agents) in 3000 episodes, the model doesn't seem to converge as it normally does with objective of avg JCT. The following figures demonstrate the actor_loss and average_reward_per_second collected during training. The average_reward_per_second is always around -1, which is due to the reward is the same as negative makespan (equal to total time to be divided by). Could you suggest the setting that is maybe missed to guarantee the convergence?
avg_reward_per_sec
actor_loss

Hi Zhang!
It seems that you have built up the enviorment successfully. May I know the SW version(e.g. tf verison, python version) you have for setting up the enviroment? I tried it but found some libs are missing. Thanks in advance!

from decima-sim.

zhangsj0608 avatar zhangsj0608 commented on July 27, 2024

Hi, Hongzi

During past days, I retrained the model with suggested settings, e.g. num_init_dags > 0 and num_stream_dags = 0. The detailed instruction is as follow.

nohup python3 train.py --exec_cap 25 --num_init_dags 100 --learn_obj 'makespan' --num_stream_dags 0 --reset_prob 5e-7 --reset_prob_min 5e-8 --reset_prob_decay 4e-10 --diff_reward_enabled 1 --num_agents 4 --model_save_interval 100 --num_ep 3005 --model_folder ./models/batch_100_job_diff_reward_reset_5e-7_5e-8_makespan_ep3000/ > out.log 2>&1 &

However, the average reward collected by the agent is still -1 during training. I feel the function (line 33-34) used by reward calculator may just give a static signal over time. Any suggestions on it?

from decima-sim.

hongzimao avatar hongzimao commented on July 27, 2024

We may have to print the reward values and examine it. Just start from the bare minimum, try using num_init_dags = 1 and num_stream_dags = 0. Log all the reward values for the actions to finish this single job. Could you check if the reward you get corresponds to the this job completion time? After checking this simple scenario, we can move to two jobs, and multiple jobs. Based on what you showed, there might be some bugs with the current code for this makespan reward. Thanks!

from decima-sim.

Nannnnnn avatar Nannnnnn commented on July 27, 2024

Hi, there

To build the enviroment needs no more operations than jus cloning the whole repository. To refer, my tf.version is 1.13 and python version is 3.6.

Thanks! May I know the HW setup you have? I am trying a CPU version(unfortunatley I don't have a qulified GPU) but it is halting at
image

from decima-sim.

Nannnnnn avatar Nannnnnn commented on July 27, 2024

Hi there, I have a question regarding the number a agents. Do you know the reason to have multiple agents, e.g. args.num_agents = 16 by default.

from decima-sim.

hongzimao avatar hongzimao commented on July 27, 2024

When the program halts, is there an error message?

Multiple agents are just for speeding up the training. Parallel agents (threads on CPUs) generate experience concurrently. You can set args.num_agents based on the number of CPUs you have on your machine.

from decima-sim.

Nannnnnn avatar Nannnnnn commented on July 27, 2024

Hi Hongzi, thanks! There is no error message at all but some warnings(related to some python libarary funcitons) which seems not to critical. Since I am trying it with CPU version, I suppose that it takes too long to train it which looks like that the program stops. Could you share a bit on the tranning time you had before?

from decima-sim.

hongzimao avatar hongzimao commented on July 27, 2024

You might find this useful regarding the training time #21

Also, we provided a trained model if you find the training time too long #12

from decima-sim.

jahidhasanlinix avatar jahidhasanlinix commented on July 27, 2024

@zhangsj0608 @Nannnnnn hi, would you like share the code part that you used to plot those figure. I need help on that part to plot those figures that used in Decima paper. I can't generate any figure so far. Can you please help sharing those code part to plot as like paper. Thank you.

from decima-sim.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.