The model training issue with reward function optimizing makespan about decima-sim HOT 14 OPEN

zhangsj0608 commented on July 27, 2024

The model training issue with reward function optimizing makespan

from decima-sim.

Comments (14)

zhangsj0608 commented on July 27, 2024 3

Hi, there

To build the enviroment needs no more operations than jus cloning the whole repository. To refer, my tf.version is 1.13 and python version is 3.6.

from decima-sim.

hongzimao commented on July 27, 2024

Ok we need to debug this - it's been a while since I trained with the makespan reward. The reward calculation is here

decima-sim/spark_env/reward_calculator.py

Lines 32 to 34 in c010dd7

 elif args.learn_obj == 'makespan': 

 reward -= (curr_time - self.prev_time) / \ 

 args.reward_scale

. You should check if the reward at each action checks out with this reward calculation.

The learning curve you show is helpful - it shows the agent doesn't get any learning signal, the actor loss is essential 0 (in the scale of 1e-11). It's likely that the reward agent gets is all 0 or all constant. Somewhere the reward assignment to the action is off.

I will try to squeeze some time to run the code myself too - but could you run it and print out the reward to start debugging? Thanks!

from decima-sim.

zhangsj0608 commented on July 27, 2024

Hi, there

I have tried for a couple of times with careful setting on args, however, the problem persists. I suppose the reward (shown in line 33 above) is obtained with a static time interval, e.g. from last scheduling to current scheduling step. Then the long-term return is calculated by summation of them, which is actually the time point of the final scheduling step. It is not the makespan of all jobs, as some might be still running after the scheduling. Just guess it's because the reward does not reflect the accurate makespan at all. The problem seems to be what the proper reward function that reflects the metric (makespan) should be like?

from decima-sim.

hongzimao commented on July 27, 2024

For makespan, it only makes sense to run a fixed batch of jobs (i.e., no new arrival of jobs). In your settings, did you set these settings --num_stream_* to 0 and only use --num_init_dags?

from decima-sim.

zhangsj0608 commented on July 27, 2024

Hi, Hongzi,

It might be the problem, as I am not aware of the effects of stream jobs in the system on make-span. Actually, I kept them 200 (num stream jobs) each episode. I will quicly figure it out and see the result.

from decima-sim.

Nannnnnn commented on July 27, 2024

Hi, Hongzi

I noticed your code supports the makespan-optimized policy by setting args.learn_obj to 'makespan'. However, when trained with the recommended small scale setting (200 stream jobs on 8 agents) in 3000 episodes, the model doesn't seem to converge as it normally does with objective of avg JCT. The following figures demonstrate the actor_loss and average_reward_per_second collected during training. The average_reward_per_second is always around -1, which is due to the reward is the same as negative makespan (equal to total time to be divided by). Could you suggest the setting that is maybe missed to guarantee the convergence?

Hi Zhang!
It seems that you have built up the enviorment successfully. May I know the SW version(e.g. tf verison, python version) you have for setting up the enviroment? I tried it but found some libs are missing. Thanks in advance!

from decima-sim.

zhangsj0608 commented on July 27, 2024

Hi, Hongzi

During past days, I retrained the model with suggested settings, e.g. num_init_dags > 0 and num_stream_dags = 0. The detailed instruction is as follow.

nohup python3 train.py --exec_cap 25 --num_init_dags 100 --learn_obj 'makespan' --num_stream_dags 0 --reset_prob 5e-7 --reset_prob_min 5e-8 --reset_prob_decay 4e-10 --diff_reward_enabled 1 --num_agents 4 --model_save_interval 100 --num_ep 3005 --model_folder ./models/batch_100_job_diff_reward_reset_5e-7_5e-8_makespan_ep3000/ > out.log 2>&1 &

However, the average reward collected by the agent is still -1 during training. I feel the function (line 33-34) used by reward calculator may just give a static signal over time. Any suggestions on it?

from decima-sim.

hongzimao commented on July 27, 2024

We may have to print the reward values and examine it. Just start from the bare minimum, try using num_init_dags = 1 and num_stream_dags = 0. Log all the reward values for the actions to finish this single job. Could you check if the reward you get corresponds to the this job completion time? After checking this simple scenario, we can move to two jobs, and multiple jobs. Based on what you showed, there might be some bugs with the current code for this makespan reward. Thanks!

from decima-sim.

Nannnnnn commented on July 27, 2024

Hi, there

To build the enviroment needs no more operations than jus cloning the whole repository. To refer, my tf.version is 1.13 and python version is 3.6.

Thanks! May I know the HW setup you have? I am trying a CPU version(unfortunatley I don't have a qulified GPU) but it is halting at

from decima-sim.

Nannnnnn commented on July 27, 2024

Hi there, I have a question regarding the number a agents. Do you know the reason to have multiple agents, e.g. args.num_agents = 16 by default.

from decima-sim.

hongzimao commented on July 27, 2024

When the program halts, is there an error message?

Multiple agents are just for speeding up the training. Parallel agents (threads on CPUs) generate experience concurrently. You can set args.num_agents based on the number of CPUs you have on your machine.

from decima-sim.

Nannnnnn commented on July 27, 2024

Hi Hongzi, thanks! There is no error message at all but some warnings(related to some python libarary funcitons) which seems not to critical. Since I am trying it with CPU version, I suppose that it takes too long to train it which looks like that the program stops. Could you share a bit on the tranning time you had before?

from decima-sim.

hongzimao commented on July 27, 2024

You might find this useful regarding the training time #21

Also, we provided a trained model if you find the training time too long #12

from decima-sim.

jahidhasanlinix commented on July 27, 2024

@zhangsj0608 @Nannnnnn hi, would you like share the code part that you used to plot those figure. I need help on that part to plot those figures that used in Decima paper. I can't generate any figure so far. Can you please help sharing those code part to plot as like paper. Thank you.

from decima-sim.

The model training issue with reward function optimizing makespan about decima-sim HOT 14 OPEN

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	elif args.learn_obj == 'makespan':
	reward -= (curr_time - self.prev_time) / \
	args.reward_scale