dhruvramani / transformers-rl Goto Github PK

View Code? Open in Web Editor NEW

152.0 4.0 21.0 28 KB

An easy PyTorch implementation of "Stabilizing Transformers for Reinforcement Learning"

License: MIT License

Python 100.00%

stabilizing-transformers-for-rl transformers-in-rl transformer rl attention-mechanism

transformers-rl's People

Stargazers

Watchers

transformers-rl's Issues

A question about the function update_memory

A question about the function update_memory in layers.py line269
May I ask, is there any difference between writing this way by cat and appending the hidden_states directly to the new_memory?
Maybe it’s a detail somewhere I didn’t notice,Thanks.

with torch.no_grad():
    new_memory = []
    end_idx = mem_len + seq_len
    beg_idx = max(0, end_idx - mem_len)
    for m, h in zip(previous_memory, hidden_states):
        cat = torch.cat([m, h], dim=0)
        new_memory.append(cat[beg_idx:end_idx].detach())

with torch.no_grad():
    new_memory = []
    for h in hidden_states:
        new_memory.append(h.detach())

How to combine it with PPO algorithm?

If my observation is an image of shape (4, 84, 84), and action dim is 3, so how to modify the code below?

if __name__ == '__main__':
    states = torch.randn(1,1, 4) # seq_size, batch_size, dim - better if dim % 2 == 0
    print("=> Testing Policy")
    policy = TransformerGaussianPolicy(state_dim=states.shape[-1], act_dim=4)
    for i in range(10):
        act = policy(states)
        action = act[0].sample()
        print(torch.isnan(action).any(), action.shape)

u and v may be NaN from the beginning

At first, thanks for the great implementation.

I found that this initialization is somewhat critical:

Transformers-RL/layers.py

Line 256 in fc3d8af

self.u, self.v = (

It may occur that self.u and self.v will be initialized containing nan. Eventually, this leads to everything becoming nan.

Bugs report about memory mechanism

I found 2 bugs in transformer-xl code layers.py.

The init_mem function uses an incorrect shape.

Transformers-RL/layers.py

Lines 261 to 268 in 337d84a

 
 def init_memory(self, device=torch.device("cpu")): 

 return [ 

 # torch.empty(0, dtype=torch.float).to(device) 

 torch.zeros(20, 5, 8, dtype=torch.float).to(device) 

 for _ in range(self.n_layers + 1) 

 ]

def init_memory(self, device=torch.device("cpu")): 
     return [ 
         torch.empty(0, dtype=torch.float).to(device) 
         for _ in range(self.n_layers + 1) 
     ]

The calculation of the beginning index in update_mem is incorrect.

Transformers-RL/layers.py

Lines 280 to 288 in 337d84a

 with torch.no_grad(): 

 new_memory = [] 

 end_idx = mem_len + seq_len 

 beg_idx = max(0, end_idx - mem_len) 

 for m, h in zip(previous_memory, hidden_states): 

 cat = torch.cat([m, h], dim=0) 

 new_memory.append(cat[beg_idx:end_idx].detach()) 

 return new_memory

            new_memory = []
            end_idx = mem_len + seq_len
            # self.mem_len is the length of memory retention length. It is different with mem_len.
            beg_idx = max(0, end_idx - self.mem_len)

After fixing these bugs above, the memory mechanism still caused incorrect values. I compared the output of the transformer with and without the memory mechanism, and they are totally different.
I tried another stable-transformer code from this repo. If anyone wants to fix this further, he can refer to this code.

dhruvramani / transformers-rl Goto Github PK

transformers-rl's People

Stargazers

Watchers

Forkers

transformers-rl's Issues

A question about the function update_memory

How to combine it with PPO algorithm?

u and v may be NaN from the beginning

Bugs report about memory mechanism

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent


	def init_memory(self, device=torch.device("cpu")):
	return [
	# torch.empty(0, dtype=torch.float).to(device)
	torch.zeros(20, 5, 8, dtype=torch.float).to(device)
	for _ in range(self.n_layers + 1)
	]

	with torch.no_grad():
	new_memory = []
	end_idx = mem_len + seq_len
	beg_idx = max(0, end_idx - mem_len)
	for m, h in zip(previous_memory, hidden_states):
	cat = torch.cat([m, h], dim=0)
	new_memory.append(cat[beg_idx:end_idx].detach())
	return new_memory