lucidrains / q-transformer Goto Github PK

Implementation of Q-Transformer, Scalable Offline Reinforcement Learning via Autoregressive Q-Functions, out of Google Deepmind

License: MIT License

Python 100.00%

artificial-intelligence attention-mechanisms deep-learning offline-learning q-learning robotics transformers

q-transformer's Introduction

q-transformer's People

Contributors

Stargazers

Watchers

Forkers

francqz31 hust1booze brianpetro peytontolbert dbarbedillo greydoubt wrmsr walrushat artemkolmykov james4ever0 ramkumarkoppu jotatd cashbeario lxqpku rongkunxue

q-transformer's Issues

memmap can only handle max 2GB on certain systems

When I run usage example code with the latest release of q-transformer pip package, I get following error:

python example1.py 
using memory efficient attention
/home/ram/anaconda3/envs/q-transformer/lib/python3.9/site-packages/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  return self.fget.__get__(instance, owner)()
Traceback (most recent call last):
  File "/home/ram/github/q-transformer/example1.py", line 47, in <module>
    agent = Agent(
  File "<@beartype(q_transformer.agent.Agent.__init__) at 0x7f21fa1563a0>", line 145, in __init__
  File "/home/ram/github/q-transformer/q_transformer/agent.py", line 208, in __init__
    self.states      = open_memmap(str(states_path), dtype = 'float32', mode = 'w+', shape = (*prec_shape, *state_shape))
  File "/home/ram/anaconda3/envs/q-transformer/lib/python3.9/site-packages/numpy/lib/format.py", line 945, in open_memmap
    marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order,
  File "/home/ram/anaconda3/envs/q-transformer/lib/python3.9/site-packages/numpy/core/memmap.py", line 254, in __new__
    fid.seek(bytes - 1, 0)
OSError: [Errno 22] Invalid argument

pip show q-transformer
Name: q-transformer
Version: 0.1.8
Summary: Q-Transformer
Home-page: https://github.com/lucidrains/q-transformer
Author: Phil Wang
Author-email: [email protected]
License: MIT
Location: /home/ram/anaconda3/envs/q-transformer/lib/python3.9/site-packages
Requires: accelerate, beartype, classifier-free-guidance-pytorch, einops, ema-pytorch, numpy, torch, torchtyping
Required-by:

This is the final all code, will it be updated again?

We are very interested in this work of yours and look forward to further research, and hope you can open source more code from this work! Will it be updated again?

how to get dataset?

hello, how to get the dataset for training?

A simple question about the code

Hi, @lucidrains, I'm a beginner trying to use Q-transformer and encountered a question while reading the code. In the QHeadMultipleActions class, I noticed that Q-transformer encodes the bin into an embedding using self.action_bin_embeddings. However, when obtaining the q value, it multiplies the attention output with self.action_bin_embeddings once again. Is there a specific reason for using this approach to derive the q value instead of employing a new MLP layer multiplied by the attention output? I've shared the relevant code below. Thank you!

def maybe_append_actions(self, sos_tokens, actions: Optional[Tensor] = None):
        if not exists(actions):
            return sos_tokens

        batch, num_actions = actions.shape
        action_embeddings = self.action_bin_embeddings[:num_actions]

        action_embeddings = repeat(action_embeddings, 'n a d -> b n a d', b = batch)
        past_action_bins = repeat(actions, 'b n -> b n 1 d', d = action_embeddings.shape[-1])

        bin_embeddings = action_embeddings.gather(-2, past_action_bins)
        bin_embeddings = rearrange(bin_embeddings, 'b n 1 d -> b n d')

        tokens, _ = pack((sos_tokens, bin_embeddings), 'b * d')
        tokens = tokens[:, :self.num_actions] # last action bin not needed for the proposed q-learning
        return tokens

def get_q_values(self, embed):
        num_actions = embed.shape[-2]
        action_bin_embeddings = self.action_bin_embeddings[:num_actions]

        if self.dueling:
            advantages = einsum('b n d, n a d -> b n a', embed, action_bin_embeddings)

            values = einsum('b n d, n d -> b n', embed, self.to_values[:num_actions])
            values = rearrange(values, 'b n -> b n 1')

            q_values = values + (advantages - reduce(advantages, '... a -> ... 1', 'mean'))
        else:
            q_values = einsum('b n d, n a d -> b n a', embed, action_bin_embeddings)

        return q_values.sigmoid()

Running the latest main branch with given usage example

Running the latest main branch with given usage example, results in:

episode 0
99%|██████████████████████████████████████████████████████████████████████████▎| 99/100 [01:13<00:00, 1.35it/s]
episode 1
0%| | 0/100 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/ram/github/q-transformer/example2.py", line 54, in
agent()
File "/home/ram/anaconda3/envs/q-transformer/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ram/anaconda3/envs/q-transformer/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ram/anaconda3/envs/q-transformer/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/ram/github/q-transformer/q_transformer/agent.py", line 255, in forward
self.text_embeds[episode, step] = text_embed
File "/home/ram/anaconda3/envs/q-transformer/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1695, in getattr
raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'")
AttributeError: 'Agent' object has no attribute 'text_embeds'

question about Q-head

Hi,
Thank you for the code, really nice work.

I am new to the transformer architecture, and I am taking this code as a guidance to implement a simple q-transformer that works with a single task (i.e. not language conditioned) using states as observations and not images.

So, I think the "QHeadMultipleActions" class is only needed in my case. However, Do I still need the cross attention layer? There is no language or images in my case.

Thank you

The rest part of the code?

Hi, is this the official implementation of the paper 'Q-Transformer: Scalable Offline Reinforcement
Learning via Autoregressive Q-Functions' ? Could you please upload the rest part of the code? I really appreciate the idea in the paper, I hope to reproduce it recently.

integrate into stable-baselines3

Hi @lucidrains,
Thank you for your wonderful job.
May I ask you how to integrate into stable-baselines3, please?

Best regards

lucidrains / q-transformer Goto Github PK

q-transformer's Introduction

Q-transformer

Install

Usage

Appreciation

Todo

Citations

q-transformer's People

Contributors

Stargazers

Watchers

Forkers

q-transformer's Issues

Recommend Projects

Recommend Topics

Recommend Org