rchalyang / soft-module Goto Github PK

Code for "Multi-task Reinforcement Learning with Soft Modularization"

Home Page: https://rchalyang.github.io/SoftModule

Python 100.00%

reinforcement-learning multitask-learning pytorch

soft-module's Introduction

I'm currently a PhD student at UC San Diego, advised by Prof Xiaolong Wang. Before coming to UC San Diego, I received my B.E. in Software Engineering from Nankai University in 2019

I'm interested in reinforcement learning, machine learning, robotics and some system stuff. Specifically, I'd like to build intelligent agent, which make decision with information coming from different sources

Email / Google Scholar / Github / Linkedin

I'm open to discussion or collaboration. Feel free to drop me an email if you're interested in my research.

soft-module's People

Contributors

Stargazers

Watchers

Forkers

nagisazj gawain-st yoshinobc chongyi-zheng guoyijie yaoxt3 gaojl19 cra2ydavid ja4822 upupgoo mybirth0407 yyds-xtt junhyekh maburto00 zhoiyuxuan fulchou shism2 derektan95 junhyeokrui

soft-module's Issues

Training set

Hello, when I tried to run the MT10 setup as instructed in the first line on GitHub, I noticed that each epoch takes an average of 200 seconds. At this rate, it would take around 20 days to complete. Additionally, the GPU only utilizes 1235MiB of memory. Could this be a configuration issue?

Training efficiency

Hi, I'm interested in your work and appreciate the sharing of source code. I have some questions.
First, I run MT10-Conditioned task, I find that the time consumption is average 200s per epoch, meaning that we need 18 days to perform all 7500 epoches. Moreover, I also run MT50-Fixed task, the consumption is average 2500s per epoch. And you use multiple-processing technique, even the policy network and Q-function network is deployed in GPU, these networks only consume 1.5G GPU memory. Is it normal training speed?
Second, you use multiple-processing technique to collect data and perform multi-task learning, What is the training process of multi-task? Each time you input a state vector and task id one-hot vector into policy network, it means that the batch size is equal to 1, but you define the batch size as 1280, What is the specific training detail?
Look forward to your reply, thanks!

Heavy CPU Consumption

Hi @RchalYang ! Thank you for providing the code!

I found that running the code consumes a lot of CPU resources. Running 2 MT10 experiments at the same time consumes all of my 80 CPU cores. Is this normal? Or is there any way to limit CPU use? Thank you!

The version of metaworld

Hi, I would like to ask what version of metaworld you are using. The current metaworld does not have metaworld.envs.mujoco.multitask_env.MultiClassMultiTaskEnv and metaworld.core.serializable.Serializable, which prevents me from running your code.
Hope to get your reply, thank you!

Synchronization problem of policy between collector and MTSAC trainer

@RchalYang Thank you so much for sharing your code !
I was running some experiment based on Soft-Module, and found that the pf used for evaluation in the collector sometimes was not the current pf used in MTSAC (the one that has just been updated during an epoch).
This has caused inconsistency between the performance of saved models and their evaluation results. Since the code uses the network in collector to generate evaluation results and update "model_pf_best.pth", but saves the state_dict based on networks in MTSAC; if the pf model used in evaluation(collector) is not the same as the model in MTSAC, the code would record the current pf network in MTSAC rather than the one really used in evaluation.

To reproduce

using python3.7.10, pytorch 1.7.0

in torchrl/collector/para/async_mt.py class AsyncMultiTaskParallelCollectorUniform(AsyncSingleTaskParallelCollector):

@staticmethod
def eval_worker_process(shared_pf, ...):
    ...
    shared_que.put({
                'eval_rewards': eval_rews,
                'success_rate': success / env_info.eval_episodes,
                'task_name': task_name,
                'pf_state_dict': pf.state_dict()   # add this line
            })

def eval_one_epoch(self):
    ...
    state_dict = []  # add this line
    for _ in range(self.eval_worker_nums):
            worker_rst = self.eval_shared_que.get()
            if worker_rst["eval_rewards"] is not None:
                active_task_counts += 1
                eval_rews += worker_rst["eval_rewards"]
                mean_success_rate += worker_rst["success_rate"]
                tasks_result.append((worker_rst["task_name"], worker_rst["success_rate"], np.mean(worker_rst["eval_rewards"])))
                state_dict.append(worker_rst['pf_state_dict'])  # add this line
    ....
    dic['mean_success_rate'] = mean_success_rate / active_task_counts
    dic['state_dict']        = state_dict  # add this line
    return dic

(To pass the pf model used during evaluation in each process back)

in torchrl/algo/rl_algo.py function train():

def train():
     ...
    for reward in eval_infos["eval_rewards"]:
        self.episode_rewards.append(reward)
     # del eval_infos["eval_rewards"]
    
    # add the following lines
    print("current pf_dict")  
    for name, network in self.snapshot_networks:
        if name == 'pf':
            print(network.state_dict()['base.fc0.weight'])
   
    print("current collector pf_dict")
    for state_dict in eval_infos['state_dict']:
        print(state_dict['base.fc0.weight'])

Then train mt10_fixed_shallow, result:

2022-01-16 06:35:23,897 MainThread INFO: Finished Pretrain
current pf_dict
tensor([[ 0.0050,  0.0437, -0.0044,  ...,  0.0339, -0.0154, -0.0314],
        [ 0.0525, -0.0376,  0.0404,  ...,  0.0587, -0.0008,  0.0007],
        [-0.0723, -0.0500,  0.0435,  ..., -0.0295,  0.0371,  0.0737],
        ...,
        [ 0.0258,  0.0019,  0.0116,  ...,  0.1017,  0.0180, -0.0306],
        [-0.0036, -0.0427,  0.0276,  ...,  0.0337,  0.0001,  0.0388],
        [ 0.0093, -0.0112,  0.0333,  ..., -0.0134,  0.0433,  0.0355]])
current collector pf_dict
tensor([[ 0.0050,  0.0437, -0.0044,  ...,  0.0339, -0.0154, -0.0314],
        [ 0.0525, -0.0376,  0.0404,  ...,  0.0587, -0.0008,  0.0007],
        [-0.0723, -0.0500,  0.0435,  ..., -0.0295,  0.0371,  0.0737],
        ...,
        [ 0.0258,  0.0019,  0.0116,  ...,  0.1017,  0.0180, -0.0306],
        [-0.0036, -0.0427,  0.0276,  ...,  0.0337,  0.0001,  0.0388],
        [ 0.0093, -0.0112,  0.0333,  ..., -0.0134,  0.0433,  0.0355]])
tensor([[ 0.0050,  0.0437, -0.0044,  ...,  0.0339, -0.0154, -0.0314],
        [ 0.0525, -0.0376,  0.0404,  ...,  0.0587, -0.0008,  0.0007],
        [-0.0723, -0.0500,  0.0435,  ..., -0.0295,  0.0371,  0.0737],
        ...,
        [ 0.0258,  0.0019,  0.0116,  ...,  0.1017,  0.0180, -0.0306],
        [-0.0036, -0.0427,  0.0276,  ...,  0.0337,  0.0001,  0.0388],
        [ 0.0093, -0.0112,  0.0333,  ..., -0.0134,  0.0433,  0.0355]])
tensor([[ 0.0050,  0.0437, -0.0044,  ...,  0.0339, -0.0154, -0.0314],
        [ 0.0525, -0.0376,  0.0404,  ...,  0.0587, -0.0008,  0.0007],
        [-0.0723, -0.0500,  0.0435,  ..., -0.0295,  0.0371,  0.0737],
        ...,
        [ 0.0258,  0.0019,  0.0116,  ...,  0.1017,  0.0180, -0.0306],
        [-0.0036, -0.0427,  0.0276,  ...,  0.0337,  0.0001,  0.0388],
        [ 0.0093, -0.0112,  0.0333,  ..., -0.0134,  0.0433,  0.0355]])
tensor([[ 0.0050,  0.0437, -0.0044,  ...,  0.0339, -0.0154, -0.0314],
        [ 0.0525, -0.0376,  0.0404,  ...,  0.0587, -0.0008,  0.0007],
        [-0.0723, -0.0500,  0.0435,  ..., -0.0295,  0.0371,  0.0737],
        ...,
        [ 0.0258,  0.0019,  0.0116,  ...,  0.1017,  0.0180, -0.0306],
        [-0.0036, -0.0427,  0.0276,  ...,  0.0337,  0.0001,  0.0388],
        [ 0.0093, -0.0112,  0.0333,  ..., -0.0134,  0.0433,  0.0355]])
tensor([[ 0.0050,  0.0437, -0.0044,  ...,  0.0339, -0.0154, -0.0314],
        [ 0.0525, -0.0376,  0.0404,  ...,  0.0587, -0.0008,  0.0007],
        [-0.0723, -0.0500,  0.0435,  ..., -0.0295,  0.0371,  0.0737],
        ...,
        [ 0.0258,  0.0019,  0.0116,  ...,  0.1017,  0.0180, -0.0306],
        [-0.0036, -0.0427,  0.0276,  ...,  0.0337,  0.0001,  0.0388],
        [ 0.0093, -0.0112,  0.0333,  ..., -0.0134,  0.0433,  0.0355]])
tensor([[ 0.0050,  0.0437, -0.0044,  ...,  0.0339, -0.0154, -0.0314],
        [ 0.0525, -0.0376,  0.0404,  ...,  0.0587, -0.0008,  0.0007],
        [-0.0723, -0.0500,  0.0435,  ..., -0.0295,  0.0371,  0.0737],
        ...,
        [ 0.0258,  0.0019,  0.0116,  ...,  0.1017,  0.0180, -0.0306],
        [-0.0036, -0.0427,  0.0276,  ...,  0.0337,  0.0001,  0.0388],
        [ 0.0093, -0.0112,  0.0333,  ..., -0.0134,  0.0433,  0.0355]])
tensor([[ 0.0050,  0.0437, -0.0044,  ...,  0.0339, -0.0154, -0.0314],
        [ 0.0525, -0.0376,  0.0404,  ...,  0.0587, -0.0008,  0.0007],
        [-0.0723, -0.0500,  0.0435,  ..., -0.0295,  0.0371,  0.0737],
        ...,
        [ 0.0258,  0.0019,  0.0116,  ...,  0.1017,  0.0180, -0.0306],
        [-0.0036, -0.0427,  0.0276,  ...,  0.0337,  0.0001,  0.0388],
        [ 0.0093, -0.0112,  0.0333,  ..., -0.0134,  0.0433,  0.0355]])
tensor([[ 0.0050,  0.0437, -0.0044,  ...,  0.0339, -0.0154, -0.0314],
        [ 0.0525, -0.0376,  0.0404,  ...,  0.0587, -0.0008,  0.0007],
        [-0.0723, -0.0500,  0.0435,  ..., -0.0295,  0.0371,  0.0737],
        ...,
        [ 0.0258,  0.0019,  0.0116,  ...,  0.1017,  0.0180, -0.0306],
        [-0.0036, -0.0427,  0.0276,  ...,  0.0337,  0.0001,  0.0388],
        [ 0.0093, -0.0112,  0.0333,  ..., -0.0134,  0.0433,  0.0355]])
tensor([[ 0.0050,  0.0437, -0.0044,  ...,  0.0339, -0.0154, -0.0314],
        [ 0.0525, -0.0376,  0.0404,  ...,  0.0587, -0.0008,  0.0007],
        [-0.0723, -0.0500,  0.0435,  ..., -0.0295,  0.0371,  0.0737],
        ...,
        [ 0.0258,  0.0019,  0.0116,  ...,  0.1017,  0.0180, -0.0306],
        [-0.0036, -0.0427,  0.0276,  ...,  0.0337,  0.0001,  0.0388],
        [ 0.0093, -0.0112,  0.0333,  ..., -0.0134,  0.0433,  0.0355]])
tensor([[ 0.0050,  0.0437, -0.0044,  ...,  0.0339, -0.0154, -0.0314],
        [ 0.0525, -0.0376,  0.0404,  ...,  0.0587, -0.0008,  0.0007],
        [-0.0723, -0.0500,  0.0435,  ..., -0.0295,  0.0371,  0.0737],
        ...,
        [ 0.0258,  0.0019,  0.0116,  ...,  0.1017,  0.0180, -0.0306],
        [-0.0036, -0.0427,  0.0276,  ...,  0.0337,  0.0001,  0.0388],
        [ 0.0093, -0.0112,  0.0333,  ..., -0.0134,  0.0433,  0.0355]])
tensor([[ 0.0050,  0.0437, -0.0044,  ...,  0.0339, -0.0154, -0.0314],
        [ 0.0525, -0.0376,  0.0404,  ...,  0.0587, -0.0008,  0.0007],
        [-0.0723, -0.0500,  0.0435,  ..., -0.0295,  0.0371,  0.0737],
        ...,
        [ 0.0258,  0.0019,  0.0116,  ...,  0.1017,  0.0180, -0.0306],
        [-0.0036, -0.0427,  0.0276,  ...,  0.0337,  0.0001,  0.0388],
        [ 0.0093, -0.0112,  0.0333,  ..., -0.0134,  0.0433,  0.0355]])
2022-01-16 06:39:08,773 MainThread INFO: EPOCH:0
2022-01-16 06:39:08,773 MainThread INFO: Time Consumed:224.8757426738739s
2022-01-16 06:39:08,773 MainThread INFO: Total Frames:42000s

...
current pf_dict
tensor([[-2.6430e-02,  4.6329e-02,  8.7816e-05,  ...,  9.8133e-03,
         -2.0852e-02, -3.7802e-02],
        [ 2.1514e-02, -3.1466e-02,  4.7950e-02,  ...,  3.4531e-02,
         -3.7469e-03, -4.1485e-03],
        [-4.2676e-02, -6.0403e-02,  3.2930e-02,  ..., -1.7281e-02,
          3.8448e-02,  8.5184e-02],
        ...,
        [-4.0366e-03,  9.0814e-03,  1.9489e-02,  ...,  8.5376e-02,
          1.4941e-02, -3.9423e-02],
        [ 2.8164e-02, -5.6839e-02,  1.3960e-02,  ...,  3.6394e-02,
         -6.9350e-04,  5.4603e-02],
        [ 3.9544e-02, -2.2280e-02,  2.2379e-02,  ..., -4.6273e-03,
          4.4896e-02,  4.9126e-02]])
current collector pf_dict
tensor([[ 0.0050,  0.0437, -0.0044,  ...,  0.0339, -0.0154, -0.0314],
        [ 0.0525, -0.0376,  0.0404,  ...,  0.0587, -0.0008,  0.0007],
        [-0.0723, -0.0500,  0.0435,  ..., -0.0295,  0.0371,  0.0737],
        ...,
        [ 0.0258,  0.0019,  0.0116,  ...,  0.1017,  0.0180, -0.0306],
        [-0.0036, -0.0427,  0.0276,  ...,  0.0337,  0.0001,  0.0388],
        [ 0.0093, -0.0112,  0.0333,  ..., -0.0134,  0.0433,  0.0355]])
tensor([[ 0.0050,  0.0437, -0.0044,  ...,  0.0339, -0.0154, -0.0314],
        [ 0.0525, -0.0376,  0.0404,  ...,  0.0587, -0.0008,  0.0007],
        [-0.0723, -0.0500,  0.0435,  ..., -0.0295,  0.0371,  0.0737],
        ...,
        [ 0.0258,  0.0019,  0.0116,  ...,  0.1017,  0.0180, -0.0306],
        [-0.0036, -0.0427,  0.0276,  ...,  0.0337,  0.0001,  0.0388],
        [ 0.0093, -0.0112,  0.0333,  ..., -0.0134,  0.0433,  0.0355]])
tensor([[ 0.0050,  0.0437, -0.0044,  ...,  0.0339, -0.0154, -0.0314],
        [ 0.0525, -0.0376,  0.0404,  ...,  0.0587, -0.0008,  0.0007],
        [-0.0723, -0.0500,  0.0435,  ..., -0.0295,  0.0371,  0.0737],
        ...,
        [ 0.0258,  0.0019,  0.0116,  ...,  0.1017,  0.0180, -0.0306],
        [-0.0036, -0.0427,  0.0276,  ...,  0.0337,  0.0001,  0.0388],
        [ 0.0093, -0.0112,  0.0333,  ..., -0.0134,  0.0433,  0.0355]])
tensor([[ 0.0050,  0.0437, -0.0044,  ...,  0.0339, -0.0154, -0.0314],
        [ 0.0525, -0.0376,  0.0404,  ...,  0.0587, -0.0008,  0.0007],
        [-0.0723, -0.0500,  0.0435,  ..., -0.0295,  0.0371,  0.0737],
        ...,
        [ 0.0258,  0.0019,  0.0116,  ...,  0.1017,  0.0180, -0.0306],
        [-0.0036, -0.0427,  0.0276,  ...,  0.0337,  0.0001,  0.0388],
        [ 0.0093, -0.0112,  0.0333,  ..., -0.0134,  0.0433,  0.0355]])
tensor([[-2.6430e-02,  4.6329e-02,  8.7816e-05,  ...,  9.8133e-03,
         -2.0852e-02, -3.7802e-02],
        [ 2.1514e-02, -3.1466e-02,  4.7950e-02,  ...,  3.4531e-02,
         -3.7469e-03, -4.1485e-03],
        [-4.2676e-02, -6.0403e-02,  3.2930e-02,  ..., -1.7281e-02,
          3.8448e-02,  8.5184e-02],
        ...,
        [-4.0366e-03,  9.0814e-03,  1.9489e-02,  ...,  8.5376e-02,
          1.4941e-02, -3.9423e-02],
        [ 2.8164e-02, -5.6839e-02,  1.3960e-02,  ...,  3.6394e-02,
         -6.9350e-04,  5.4603e-02],
        [ 3.9544e-02, -2.2280e-02,  2.2379e-02,  ..., -4.6273e-03,
          4.4896e-02,  4.9126e-02]])
tensor([[ 0.0050,  0.0437, -0.0044,  ...,  0.0339, -0.0154, -0.0314],
        [ 0.0525, -0.0376,  0.0404,  ...,  0.0587, -0.0008,  0.0007],
        [-0.0723, -0.0500,  0.0435,  ..., -0.0295,  0.0371,  0.0737],
        ...,
        [ 0.0258,  0.0019,  0.0116,  ...,  0.1017,  0.0180, -0.0306],
        [-0.0036, -0.0427,  0.0276,  ...,  0.0337,  0.0001,  0.0388],
        [ 0.0093, -0.0112,  0.0333,  ..., -0.0134,  0.0433,  0.0355]])
tensor([[ 0.0050,  0.0437, -0.0044,  ...,  0.0339, -0.0154, -0.0314],
        [ 0.0525, -0.0376,  0.0404,  ...,  0.0587, -0.0008,  0.0007],
        [-0.0723, -0.0500,  0.0435,  ..., -0.0295,  0.0371,  0.0737],
        ...,
        [ 0.0258,  0.0019,  0.0116,  ...,  0.1017,  0.0180, -0.0306],
        [-0.0036, -0.0427,  0.0276,  ...,  0.0337,  0.0001,  0.0388],
        [ 0.0093, -0.0112,  0.0333,  ..., -0.0134,  0.0433,  0.0355]])
tensor([[ 0.0050,  0.0437, -0.0044,  ...,  0.0339, -0.0154, -0.0314],
        [ 0.0525, -0.0376,  0.0404,  ...,  0.0587, -0.0008,  0.0007],
        [-0.0723, -0.0500,  0.0435,  ..., -0.0295,  0.0371,  0.0737],
        ...,
        [ 0.0258,  0.0019,  0.0116,  ...,  0.1017,  0.0180, -0.0306],
        [-0.0036, -0.0427,  0.0276,  ...,  0.0337,  0.0001,  0.0388],
        [ 0.0093, -0.0112,  0.0333,  ..., -0.0134,  0.0433,  0.0355]])
tensor([[-2.6430e-02,  4.6329e-02,  8.7816e-05,  ...,  9.8133e-03,
         -2.0852e-02, -3.7802e-02],
        [ 2.1514e-02, -3.1466e-02,  4.7950e-02,  ...,  3.4531e-02,
         -3.7469e-03, -4.1485e-03],
        [-4.2676e-02, -6.0403e-02,  3.2930e-02,  ..., -1.7281e-02,
          3.8448e-02,  8.5184e-02],
        ...,
        [-4.0366e-03,  9.0814e-03,  1.9489e-02,  ...,  8.5376e-02,
          1.4941e-02, -3.9423e-02],
        [ 2.8164e-02, -5.6839e-02,  1.3960e-02,  ...,  3.6394e-02,
         -6.9350e-04,  5.4603e-02],
        [ 3.9544e-02, -2.2280e-02,  2.2379e-02,  ..., -4.6273e-03,
          4.4896e-02,  4.9126e-02]])
tensor([[ 0.0050,  0.0437, -0.0044,  ...,  0.0339, -0.0154, -0.0314],
        [ 0.0525, -0.0376,  0.0404,  ...,  0.0587, -0.0008,  0.0007],
        [-0.0723, -0.0500,  0.0435,  ..., -0.0295,  0.0371,  0.0737],
        ...,
        [ 0.0258,  0.0019,  0.0116,  ...,  0.1017,  0.0180, -0.0306],
        [-0.0036, -0.0427,  0.0276,  ...,  0.0337,  0.0001,  0.0388],
        [ 0.0093, -0.0112,  0.0333,  ..., -0.0134,  0.0433,  0.0355]])
2022-01-16 06:43:27,157 MainThread INFO: EPOCH:1
2022-01-16 06:43:27,157 MainThread INFO: Time Consumed:257.96906661987305s
2022-01-16 06:43:27,157 MainThread INFO: Total Frames:44000s

( On epoch 0, everything looks great; but on epoch 1, some of the processes use the new updated pf model for evaluation, but some still use the older verison of pf)
Could you please comment on this? Also I am a bit confused about how you keep the pf model in collector synchronized with the model in MTSAC, after calling start_worker(), because the pf model should update every epoch but you can only pass the current pf model when creating the processes. It would be very helpful if you can provide some hints on this.

Thanks in advance !

Hardware Resource Requirements

Hello, I want to know the requirements of your program on computer CPU computing resources and memory storage resources.
Thank you!

Bus error

Hi, I was trying to run your code. Upon issuing

python starter/mt_para_mtsac_modular_gated_cas.py --config meta_config/mt10/modular_2_2_2_256_reweight_rand.json --id MT10_Conditioned_Modular_Shallow --seed 1 --worker_nums 10 --eval_worker_nums 10

I am getting Bus error (core dumped).
I am also attaching a screenshot of the error here.

Thank you!

MT50

May I ask how you trained MT50? My 3080 training shows insufficient graphics memory

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.