<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

CC <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

PPO model training with habitat 2020 challenge config,about facebookresearch/habitat-challenge

Comments (20)

erikwijmans commented on September 12, 2024 1

You can change resnet50 to resnet18 in the config, that will improve training speed.

from habitat-challenge.

commented on September 12, 2024 1

@erikwijmans As you suggested I trained DD-PPO model with resnet18 backbone.

When I tried to evaluate it, I got the following error:

Traceback (most recent call last):
  File "agent.py", line 165, in <module>
    main()
  File "agent.py", line 155, in main
    agent = DDPPOAgent(config)
  File "agent.py", line 88, in __init__
    for k, v in ckpt["state_dict"].items()
  File "/opt/conda/envs/habitat/lib/python3.6/site-packages/torch/nn/modules/module.py", line 847, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for PointNavResNetPolicy:
	Missing key(s) in state_dict: "net.visual_encoder.backbone.layer1.0.convs.6.weight", "net.visual_encoder.backbone.layer1.0.convs.7.weight", "net.visual_encoder.backbone.layer1.0.convs.7.bias", "net.visual_encoder.backbone.layer1.0.downsample.0.weight", "net.visual_encoder.backbone.layer1.0.downsample.1.weight", "net.visual_encoder.backbone.layer1.0.downsample.1.bias", "net.visual_encoder.backbone.layer1.1.convs.6.weight", "net.visual_encoder.backbone.layer1.1.convs.7.weight", "net.visual_encoder.backbone.layer1.1.convs.7.bias", "net.visual_encoder.backbone.layer1.2.convs.0.weight", "net.visual_encoder.backbone.layer1.2.convs.1.weight", "net.visual_encoder.backbone.layer1.2.convs.1.bias", "net.visual_encoder.backbone.layer1.2.convs.3.weight", "net.visual_encoder.backbone.layer1.2.convs.4.weight", "net.visual_encoder.backbone.layer1.2.convs.4.bias", "net.visual_encoder.backbone.layer1.2.convs.6.weight", "net.visual_encoder.backbone.layer1.2.convs.7.weight", "net.visual_encoder.backbone.layer1.2.convs.7.bias", "net.visual_encoder.backbone.layer2.0.convs.6.weight", "net.visual_encoder.backbone.layer2.0.convs.7.weight", "net.visual_encoder.backbone.layer2.0.convs.7.bias", "net.visual_encoder.backbone.layer2.1.convs.6.weight", "net.visual_encoder.backbone.layer2.1.convs.7.weight", "net.visual_encoder.backbone.layer2.1.convs.7.bias", "net.visual_encoder.backbone.layer2.2.convs.0.weight", "net.visual_encoder.backbone.layer2.2.convs.1.weight", "net.visual_encoder.backbone.layer2.2.convs.1.bias", "net.visual_encoder.backbone.layer2.2.convs.3.weight", "net.visual_encoder.backbone.layer2.2.convs.4.weight", "net.visual_encoder.backbone.layer2.2.convs.4.bias", "net.visual_encoder.backbone.layer2.2.convs.6.weight", "net.visual_encoder.backbone.layer2.2.convs.7.weight", "net.visual_encoder.backbone.layer2.2.convs.7.bias", "net.visual_encoder.backbone.layer2.3.convs.0.weight", "net.visual_encoder.backbone.layer2.3.convs.1.weight", "net.visual_encoder.backbone.layer2.3.convs.1.bias", "net.visual_encoder.backbone.layer2.3.convs.3.weight", "net.visual_encoder.backbone.layer2.3.convs.4.weight", "net.visual_encoder.backbone.layer2.3.convs.4.bias", "net.visual_encoder.backbone.layer2.3.convs.6.weight", "net.visual_encoder.backbone.layer2.3.convs.7.weight", "net.visual_encoder.backbone.layer2.3.convs.7.bias", "net.visual_encoder.backbone.layer3.0.convs.6.weight", "net.visual_encoder.backbone.layer3.0.convs.7.weight", "net.visual_encoder.backbone.layer3.0.convs.7.bias", "net.visual_encoder.backbone.layer3.1.convs.6.weight", "net.visual_encoder.backbone.layer3.1.convs.7.weight", "net.visual_encoder.backbone.layer3.1.convs.7.bias", "net.visual_encoder.backbone.layer3.2.convs.0.weight", "net.visual_encoder.backbone.layer3.2.convs.1.weight", "net.visual_encoder.backbone.layer3.2.convs.1.bias", "net.visual_encoder.backbone.layer3.2.convs.3.weight", "net.visual_encoder.backbone.layer3.2.convs.4.weight", "net.visual_encoder.backbone.layer3.2.convs.4.bias", "net.visual_encoder.backbone.layer3.2.convs.6.weight", "net.visual_encoder.backbone.layer3.2.convs.7.weight", "net.visual_encoder.backbone.layer3.2.convs.7.bias", "net.visual_encoder.backbone.layer3.3.convs.0.weight", "net.visual_encoder.backbone.layer3.3.convs.1.weight", "net.visual_encoder.backbone.layer3.3.convs.1.bias", "net.visual_encoder.backbone.layer3.3.convs.3.weight", "net.visual_encoder.backbone.layer3.3.convs.4.weight", "net.visual_encoder.backbone.layer3.3.convs.4.bias", "net.visual_encoder.backbone.layer3.3.convs.6.weight", "net.visual_encoder.backbone.layer3.3.convs.7.weight", "net.visual_encoder.backbone.layer3.3.convs.7.bias", "net.visual_encoder.backbone.layer3.4.convs.0.weight", "net.visual_encoder.backbone.layer3.4.convs.1.weight", "net.visual_encoder.backbone.layer3.4.convs.1.bias", "net.visual_encoder.backbone.layer3.4.convs.3.weight", "net.visual_encoder.backbone.layer3.4.convs.4.weight", "net.visual_encoder.backbone.layer3.4.convs.4.bias", "net.visual_encoder.backbone.layer3.4.convs.6.weight", "net.visual_encoder.backbone.layer3.4.convs.7.weight", "net.visual_encoder.backbone.layer3.4.convs.7.bias", "net.visual_encoder.backbone.layer3.5.convs.0.weight", "net.visual_encoder.backbone.layer3.5.convs.1.weight", "net.visual_encoder.backbone.layer3.5.convs.1.bias", "net.visual_encoder.backbone.layer3.5.convs.3.weight", "net.visual_encoder.backbone.layer3.5.convs.4.weight", "net.visual_encoder.backbone.layer3.5.convs.4.bias", "net.visual_encoder.backbone.layer3.5.convs.6.weight", "net.visual_encoder.backbone.layer3.5.convs.7.weight", "net.visual_encoder.backbone.layer3.5.convs.7.bias", "net.visual_encoder.backbone.layer4.0.convs.6.weight", "net.visual_encoder.backbone.layer4.0.convs.7.weight", "net.visual_encoder.backbone.layer4.0.convs.7.bias", "net.visual_encoder.backbone.layer4.1.convs.6.weight", "net.visual_encoder.backbone.layer4.1.convs.7.weight", "net.visual_encoder.backbone.layer4.1.convs.7.bias", "net.visual_encoder.backbone.layer4.2.convs.0.weight", "net.visual_encoder.backbone.layer4.2.convs.1.weight", "net.visual_encoder.backbone.layer4.2.convs.1.bias", "net.visual_encoder.backbone.layer4.2.convs.3.weight", "net.visual_encoder.backbone.layer4.2.convs.4.weight", "net.visual_encoder.backbone.layer4.2.convs.4.bias", "net.visual_encoder.backbone.layer4.2.convs.6.weight", "net.visual_encoder.backbone.layer4.2.convs.7.weight", "net.visual_encoder.backbone.layer4.2.convs.7.bias". 
	size mismatch for net.visual_encoder.backbone.layer1.0.convs.0.weight: copying a param with shape torch.Size([32, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 1, 1]).
	size mismatch for net.visual_encoder.backbone.layer1.1.convs.0.weight: copying a param with shape torch.Size([32, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 128, 1, 1]).
	size mismatch for net.visual_encoder.backbone.layer2.0.convs.0.weight: copying a param with shape torch.Size([64, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 128, 1, 1]).
	size mismatch for net.visual_encoder.backbone.layer2.0.downsample.0.weight: copying a param with shape torch.Size([64, 32, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 128, 1, 1]).
	size mismatch for net.visual_encoder.backbone.layer2.0.downsample.1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for net.visual_encoder.backbone.layer2.0.downsample.1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for net.visual_encoder.backbone.layer2.1.convs.0.weight: copying a param with shape torch.Size([64, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 256, 1, 1]).
	size mismatch for net.visual_encoder.backbone.layer3.0.convs.0.weight: copying a param with shape torch.Size([128, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 256, 1, 1]).
	size mismatch for net.visual_encoder.backbone.layer3.0.downsample.0.weight: copying a param with shape torch.Size([128, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 256, 1, 1]).
	size mismatch for net.visual_encoder.backbone.layer3.0.downsample.1.weight: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for net.visual_encoder.backbone.layer3.0.downsample.1.bias: copying a param with shape torch.Size([128]) from checkpoint, the shape in current model is torch.Size([512]).
	size mismatch for net.visual_encoder.backbone.layer3.1.convs.0.weight: copying a param with shape torch.Size([128, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 512, 1, 1]).
	size mismatch for net.visual_encoder.backbone.layer4.0.convs.0.weight: copying a param with shape torch.Size([256, 128, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 512, 1, 1]).
	size mismatch for net.visual_encoder.backbone.layer4.0.downsample.0.weight: copying a param with shape torch.Size([256, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([1024, 512, 1, 1]).
	size mismatch for net.visual_encoder.backbone.layer4.0.downsample.1.weight: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for net.visual_encoder.backbone.layer4.0.downsample.1.bias: copying a param with shape torch.Size([256]) from checkpoint, the shape in current model is torch.Size([1024]).
	size mismatch for net.visual_encoder.backbone.layer4.1.convs.0.weight: copying a param with shape torch.Size([256, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([256, 1024, 1, 1]).
	size mismatch for net.visual_encoder.compression.0.weight: copying a param with shape torch.Size([128, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 1024, 3, 3]).

Looks like it loads config for model with resnet50 backbone, but here is my config file:

TRAINER_NAME: "ddppo"
ENV_NAME: "NavRLEnv"
SIMULATOR_GPU_ID: 0
TORCH_GPU_ID: 0
VIDEO_OPTION: []
TENSORBOARD_DIR: "tb"
VIDEO_DIR: "video_dir"
TEST_EPISODE_COUNT: -1
EVAL_CKPT_PATH_DIR: "data/new_checkpoints"
NUM_PROCESSES: 8
SENSORS: ["RGB_SENSOR" , "DEPTH_SENSOR"]
CHECKPOINT_FOLDER: "data/new_checkpoints"
NUM_UPDATES: 1000000
LOG_INTERVAL: 10
CHECKPOINT_INTERVAL: 50

RL:
  SLACK_REWARD: -0.001
  SUCCESS_REWARD: 2.5
  PPO:
    # ppo params
    clip_param: 0.2
    ppo_epoch: 2
    num_mini_batch: 2
    value_loss_coef: 0.5
    entropy_coef: 0.01
    lr: 2.5e-4
    eps: 1e-5
    max_grad_norm: 0.2
    num_steps: 64
    use_gae: True
    gamma: 0.99
    tau: 0.95
    use_linear_clip_decay: False
    use_linear_lr_decay: False
    reward_window_size: 50
    use_normalized_advantage: False

    hidden_size: 512

  DDPPO:
    sync_frac: 0.6
    # The PyTorch distributed backend to use
    distrib_backend: GLOO
    # Visual encoder backbone
    pretrained_weights: data/ddppo-models/gibson-2plus-resnet50.pth
    # Initialize with pretrained weights
    pretrained: False
    # Initialize just the visual encoder backbone with pretrained weights
    pretrained_encoder: False
    # Whether or not the visual encoder backbone will be trained.
    train_encoder: True
    # Whether or not to reset the critic linear layer
    reset_critic: True

    # Model parameters
    backbone: resnet18
    rnn_type: LSTM
    num_recurrent_layers: 2

I am wondering where could be the problem.

from habitat-challenge.

erikwijmans commented on September 12, 2024

DATA_PATH: data/datasets/pointnav/gibson/v1/{split}/{split}.json.gz

This needs to be v2 (from here: https://github.com/facebookresearch/habitat-api#task-datasets). We changed the agent size for the challenge this year and that requires new episodes

from habitat-challenge.

commented on September 12, 2024

@erikwijmans thanks for fast answer!
I downloaded and changed the data to version 2.

DATASET:
  TYPE: PointNav-v1
  SPLIT: train
  DATA_PATH: data/datasets/pointnav/gibson/v2/{split}/{split}.json.gz

I trained PPO model on new data and than wanted to evaluate it with docker, but got the following error. Looks like something is wrong with resolution, but I can't find out the specific issue.

Traceback (most recent call last):
  File "ppo_agent.py", line 41, in <module>
    main()
  File "ppo_agent.py", line 30, in main
    agent = PPOAgent(agent_config)
  File "/habitat-api/habitat_baselines/agents/ppo_agents.py", line 92, in __init__
    for k, v in ckpt["state_dict"].items()
  File "/opt/conda/envs/habitat/lib/python3.6/site-packages/torch/nn/modules/module.py", line 830, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for PointNavBaselinePolicy:
	size mismatch for net.visual_encoder.cnn.6.weight: copying a param with shape torch.Size([512, 99712]) from checkpoint, the shape in current model is torch.Size([512, 184832]).

Here is my ppo_agent.py file:

import argparse
import habitat
import random
import numpy
import os

from habitat.config import Config
from habitat.config.default import get_config
from habitat_baselines.agents.ppo_agents import PPOAgent


def get_default_config():
    c = Config()
    c.INPUT_TYPE = "rgbd"  #["blind", "rgb", "depth", "rgbd"]
    c.MODEL_PATH = "models/ckpt.3.pth"
    c.RESOLUTION = 640
    c.HIDDEN_SIZE = 512
    c.RANDOM_SEED = 7
    c.PTH_GPU_ID = 1
    c.GOAL_SENSOR_UUID = "pointgoal"
    return c


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--evaluation", type=str, required=True, choices=["local", "remote"])
    args = parser.parse_args()

    agent_config = get_default_config()
    agent = PPOAgent(agent_config)

    if args.evaluation == "local":
        challenge = habitat.Challenge(eval_remote=False)
    else:
        challenge = habitat.Challenge(eval_remote=True)

    challenge.submit(agent)


if __name__ == "__main__":
    main()

from habitat-challenge.

commented on September 12, 2024

@erikwijmans @dhruvbatra maybe you have some suggestions to the above error (with mismatched shapes) ?

Also, when I want to test the same ppo model using python -u habitat_baselines/run.py --exp-config habitat_baselines/config/pointnav/ppo_pointnav.yaml --run-type eval command I get the following error:

---
 The active scene does not contain semantic annotations. 
---
I0402 11:34:42.003789 2415 simulator.py:143] Loaded navmesh data/scene_datasets/gibson/Ribera.navmesh
I0402 11:34:42.004161 2415 simulator.py:155] Recomputing navmesh for agent's height 0.88 and radius 0.18.
I0402 11:34:42.011056  2415 PathFinder.cpp:338] Building navmesh with 127x175 cells
I0402 11:34:42.062223  2415 PathFinder.cpp:606] Created navmesh with 96 vertices 46 polygons
I0402 11:34:42.062256  2415 Simulator.cpp:403] reconstruct navmesh successful
2020-04-02 11:34:42,083 Initializing task Nav-v0
  0%|                                                                | 1/994 [00:00<12:20,  1.34it/s]Traceback (most recent call last):
  File "habitat_baselines/run.py", line 70, in <module>
    main()
  File "habitat_baselines/run.py", line 40, in main
    run_exp(**vars(args))
  File "habitat_baselines/run.py", line 66, in run_exp
    trainer.eval()
  File "/home/pryhoda/HabitatProject/habitat-api/habitat_baselines/common/base_trainer.py", line 108, in eval
    checkpoint_index=prev_ckpt_ind,
  File "/home/pryhoda/HabitatProject/habitat-api/habitat_baselines/rl/ppo/ppo_trainer.py", line 574, in _eval_checkpoint
    metric_name=self.metric_uuid,
AttributeError: 'PPOTrainer' object has no attribute 'metric_uuid'
  0%|                                                                | 1/994 [00:01<18:36,  1.12s/it]
Exception ignored in: <bound method VectorEnv.__del__ of <habitat.core.vector_env.VectorEnv object at 0x7f496cd2b160>>
Traceback (most recent call last):
  File "/home/pryhoda/HabitatProject/habitat-api/habitat/core/vector_env.py", line 468, in __del__
    self.close()
  File "/home/pryhoda/HabitatProject/habitat-api/habitat/core/vector_env.py", line 350, in close
    write_fn((CLOSE_COMMAND, None))
  File "/home/pryhoda/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/pryhoda/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header + buf)
  File "/home/pryhoda/anaconda3/envs/habitat/lib/python3.6/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

from habitat-challenge.

erikwijmans commented on September 12, 2024

Not sure about the size mismatch. The AttributeError: 'PPOTrainer' object has no attribute 'metric_uuid' is fixed in facebookresearch/habitat-lab#357

from habitat-challenge.

commented on September 12, 2024

@erikwijmans confirm that with latest commit habitata-api:fix-video-gen the issue AttributeError: 'PPOTrainer' object has no attribute 'metric_uuid' has been solved.

from habitat-challenge.

commented on September 12, 2024

@erikwijmans coming back to previous issue with size mismatch. I have an assumption that the problem is in get_default_config() function for PPO agent. Specifically in c.RESOLUTION variable.

In 2019 challenge config the resolution was 256x256, and in 2020 challenge it is 640x480. But how properly set the new size up in configuration if there is only one variable c.RESOLUTION ?

from habitat-challenge.

erikwijmans commented on September 12, 2024

CC @mathfac @Skylion007 @abhiskk for resolution issue.

from habitat-challenge.

commented on September 12, 2024

@abhiskk @mathfac @Skylion007 any suggestions ?

from habitat-challenge.

mathfac commented on September 12, 2024

@AdventureO, we have PR in progress that will introduce crop and resize functionality to Habitat baselines: https://github.com/facebookresearch/habitat-api/pull/365/files. For now, you can change challenge config to square resolution to unblock your training.

from habitat-challenge.

commented on September 12, 2024

@mathfac now I get the following error:

  File "ppo_agent.py", line 41, in <module>
    main()
  File "ppo_agent.py", line 37, in main
    challenge.submit(agent)
  File "/habitat-api/habitat/core/challenge.py", line 19, in submit
    metrics = super().evaluate(agent)
  File "/habitat-api/habitat/core/benchmark.py", line 163, in evaluate
    return self.local_evaluate(agent, num_episodes)
  File "/habitat-api/habitat/core/benchmark.py", line 137, in local_evaluate
    action = agent.act(observations)
  File "/habitat-api/habitat_baselines/agents/ppo_agents.py", line 134, in act
    deterministic=False,
  File "/habitat-api/habitat_baselines/rl/ppo/policy.py", line 40, in act
    observations, rnn_hidden_states, prev_actions, masks
  File "/opt/conda/envs/habitat/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/habitat-api/habitat_baselines/rl/ppo/policy.py", line 167, in forward
    perception_embed = self.visual_encoder(observations)
  File "/opt/conda/envs/habitat/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/habitat-api/habitat_baselines/rl/models/simple_cnn.py", line 147, in forward
    return self.cnn(cnn_input)
  File "/opt/conda/envs/habitat/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/conda/envs/habitat/lib/python3.6/site-packages/torch/nn/modules/container.py", line 100, in forward
    input = module(input)
  File "/opt/conda/envs/habitat/lib/python3.6/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/habitat-api/habitat_baselines/common/utils.py", line 22, in forward
    return x.view(x.size(0), -1)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

Does it make sense now to try to finally train and evaluate ppo model or better switch to dd-ppo baseline ?

from habitat-challenge.

mathfac commented on September 12, 2024

@AdventureO, yes DDPPO baseline is preferred and merged into master facebookresearch/habitat-lab#370. Please, let me know if you face any issues running it.

from habitat-challenge.

commented on September 12, 2024

@mathfac to reduce training speed of DD-PPO I comented out all the lines which includes the key 'NOISE_MODEL' in config file as suggested, like this:

ENVIRONMENT:
  MAX_EPISODE_STEPS: 500
  ITERATOR_OPTIONS:
    SHUFFLE: False
SIMULATOR:
  TURN_ANGLE: 30
  AGENT_0:
    SENSORS: ['RGB_SENSOR', 'DEPTH_SENSOR']
    HEIGHT: 0.88
    RADIUS: 0.18
  HABITAT_SIM_V0:
    GPU_DEVICE_ID: 1
    ALLOW_SLIDING: False
  RGB_SENSOR:
    WIDTH: 640
    HEIGHT: 360
    HFOV: 70
    POSITION: [0, 0.88, 0]
#    NOISE_MODEL: "GaussianNoiseModel"
#    NOISE_MODEL_KWARGS:
#      intensity_constant: 0.1

  DEPTH_SENSOR:
    WIDTH: 640
    HEIGHT: 360
    HFOV: 70
    MIN_DEPTH: 0.1
    MAX_DEPTH: 10.0
    POSITION: [0, 0.88, 0]
#    NOISE_MODEL: "RedwoodDepthNoiseModel"

#  ACTION_SPACE_CONFIG: 'pyrobotnoisy'
#  NOISE_MODEL:
#    ROBOT: "LoCoBot"
#    CONTROLLER: 'Proportional'
#    NOISE_MULTIPLIER: 0.5

TASK:
  TYPE: Nav-v0
  SUCCESS_DISTANCE: 0.36 # 2 x Agent Radius
  SENSORS: ['POINTGOAL_SENSOR']
  POINTGOAL_SENSOR:
    GOAL_FORMAT: POLAR
    DIMENSIONALITY: 2
  GOAL_SENSOR_UUID: pointgoal
  MEASUREMENTS: ['DISTANCE_TO_GOAL', "SUCCESS", 'SPL']
  SUCCESS:
    SUCCESS_DISTANCE: 0.36 # 2 x Agent Radius

But I've got the following error:

  File "habitat_baselines/run.py", line 74, in <module>
    main()
  File "habitat_baselines/run.py", line 41, in main
    run_exp(**vars(args))
  File "habitat_baselines/run.py", line 68, in run_exp
    trainer.train()
  File "/home/pryhoda/HabitatProject/habitat-api/habitat_baselines/rl/ddppo/algo/ddppo_trainer.py", line 355, in train
    ) = self._update_agent(ppo_cfg, rollouts)
  File "/home/pryhoda/HabitatProject/habitat-api/habitat_baselines/rl/ppo/ppo_trainer.py", line 256, in _update_agent
    value_loss, action_loss, dist_entropy = self.agent.update(rollouts)
  File "/home/pryhoda/HabitatProject/habitat-api/habitat_baselines/rl/ppo/ppo.py", line 74, in update
    for sample in data_generator:
  File "/home/pryhoda/HabitatProject/habitat-api/habitat_baselines/common/rollout_storage.py", line 157, in recurrent_generator
    ind = perm[start_ind + offset]
IndexError: index 7 is out of bounds for dimension 0 with size 7

Could you please help to fix it ? Probably there should some other value for ACTION_SPACE_CONFIG key.

from habitat-challenge.

erikwijmans commented on September 12, 2024

You will certainly want to keep the noisy actions as that doesn't slow down simulation speed by much (if it all), i.e. these


#  ACTION_SPACE_CONFIG: 'pyrobotnoisy'
#  NOISE_MODEL:
#    ROBOT: "LoCoBot"
#    CONTROLLER: 'Proportional'
#    NOISE_MULTIPLIER: 0.5

The second error is likely due to the number of processes vs. the number of PPO mini batches. The number of processes should really be an integer multiple of the number of PPO mini batches.

from habitat-challenge.

commented on September 12, 2024

@erikwijmans thanks!
Is there any ways to improve the speed of training for DD-PPO ?

Training procedure with 4 processes on GeForce RTX 2080ti is very slow

from habitat-challenge.

Skylion007 commented on September 12, 2024

Yes, there are ways to further speed up DD-PPO. You enable the SIMULATOR.HABITAT_SIM_V0.GPU2GPU flag to allow for GPU2GPU transfers (however, note that habitat-sim will need to be built --with-cuda and it may reduce the amount of available VRAM on the card since each simulator process will need to allocate memory to store the CUDA runtime.) Also with the above error, the action space config should be set to v1 for the agent if you want to the agent to be able use the look up and down actions. ACTION_SPACE_CONFIG: 'v1'

…

On Tue, May 5, 2020 at 9:43 AM Oleksandr Pryhoda ***@***.***> wrote: @erikwijmans <https://github.com/erikwijmans> thanks! Is there any ways to improve the speed of training for DD-PPO ? Training procedure with 4 processes on GeForce RTX 2080ti is very slow — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAPVMX5OWXBHKQUQ63Z6PVLRQAJZLANCNFSM4LTVHVNQ> .

from habitat-challenge.

erikwijmans commented on September 12, 2024

Looks like that code wasn't setup to read those parameters and needs to be modified to use resnet18 instead of resnet50

from habitat-challenge.

commented on September 12, 2024

@erikwijmans except config file, where else should I modify the code ?

from habitat-challenge.

mathfac commented on September 12, 2024

@AdventureO #47 should fix it for local Docker evaluation.

from habitat-challenge.

PPO model training with habitat 2020 challenge config about habitat-challenge HOT 20 CLOSED

Comments (20)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent