I believe the agent in the notebook doesn't have any deion. Is it possible to in

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Simple agent with TextWorld doesn't have any explanation about textworld HOT 4 CLOSED

microsoft commented on May 18, 2024

Simple agent with TextWorld doesn't have any explanation

from textworld.

Comments (4)

MarcCote commented on May 18, 2024 1

Hi @Acejoy, here is a blog post mentioning how LSTM-DQN can be adapted to text-based games.

Code can be found here https://github.com/microsoft/tdqn

Regarding having access to the possible actions, you can request the admissible_commands when building the EnvInfos object (see documentation). Also, you can check Building a simple agent.ipynb, this is used in RandomAgent.

from textworld.

MarcCote commented on May 18, 2024

Hi @mohanr, thank you for your feedback. We are going to better describe what's going on in the agent example code. Here's a high-level description in case you need it now.

The agent receives as input the concatenation of the description of the room (output of the look command), its inventory contents (output of the inventory command) and the game's narrative (previous command feedback). A GRU encoder (encoder_gru) is used to parse the input text and extract features at the current game step. Those features are then provided as input to another GRU (state_gru) that serves as a state history and spans the whole episode.

To select which commands to send to the game, the agent gets the list of all admissible commands (provided by TextWorld) and scores each of them conditioned on the current hidden state of the state_gru. To do that, another GRU encoder (cmd_encoder_gru) is used to encode each text command separately. Then, a simple linear layer (att_cmd) is used to output a score given the concatenation of an encoded command and the current hidden state of the state_gru.

Note, that word embedding (embedding) is learned from scratch but one could use a pre-trained one like word2vec, GloVe, ELMo or BERT.

The agent is trained using A2C (batch size of 1). The critic shares the same bottom layers as the agent and uses a simple linear layer (critic) to output a single scalar value given the current hidden state of the state_gru.

The model also uses entropy regularization to promote exploration.

from textworld.

Acejoy commented on May 18, 2024

Hey, I am new to RL and got interested in its applications in NLP.
I have few queries:

I was wondering how would one implement LSTM-DQN (https://arxiv.org/pdf/1506.08941.pdf) using textworld. Here we need to know all possible actions beforehand, and the final layer outputs max_actions q-values for each action. I tried to find all possible actions from the gamefile , but wasn't successful. In OpenAi's GYM , env.action_space gives the action list. Is there any way to get it in textworld?
ALso, in the above implementation, how are actions stored. I mean in normal DQN, we store actions in transitions. But in above implementations, where are actions stored in self.tansitions:

self.transitions.append([None, indexes, outputs, values]) # Reward will be set on the next call

from textworld.

Acejoy commented on May 18, 2024

Thank you for the reply.

from textworld.

Simple agent with TextWorld doesn't have any explanation about textworld HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent