joshvarty / dotastuff Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 515 KB

Jupyter Notebook 100.00%

dotastuff's People

Contributors

Watchers

dotastuff's Issues

Improve performance of column shuffle

Right now we use a crazyshuffle() method that uses numpy and Python for-loops. Can we make this faster using jnp? It's considerably slowing down our training pipeline.

Filter out invalid games

Once we are ready to build a dataset, we should make sure to filter out invalid game modes. See: https://wiki.teamfortress.com/wiki/WebAPI/GetMatchDetails

We don't want to run our model against Turbo games.

Investigate using min_players when downloading match history

We should see if we can use this to only pull games with 10 human players.

See: https://wiki.teamfortress.com/wiki/WebAPI/GetMatchHistory

When building dataset, do not use str()

Python's str() function ruins the JSON and formats the string in a Python-friendly way (eg. single quotes). I think we're supposed to use json.dump() or json.dumps()?

Look at match seq num 5125140861 with the Dire team name I'm bored

Create one embedding table for heroes and one for accounts

Currently I'm accidentally creating 10 embedding tables for heroes and 10 embedding tables for accounts.

This is incorrect and we probably waste a lot of time (10x!) training each separate table instead of learning jointly. We should figure out how to fix this.

Investigate missing account numbers

The default anonymous account number is 4294967295 (0xFFFFFFFF). For some reason, some players have no account number at all. Currently we're just assigning them the default and moving on but we should try to better understand why we're seeing this.

Example:
https://www.dotabuff.com/matches/6123672329

Error: 'account_id'
{"players": [{"account_id": 4294967295, "player_slot": 0, "hero_id": 121, "leaver_status": 0}, {"account_id": 4294967295, "player_slot": 1, "hero_id": 14, "leaver_status": 0}, {"account_id": 4294967295, "player_slot": 2, "hero_id": 114, "leaver_status": 0}, {"account_id": 4294967295, "player_slot": 3, "hero_id": 8, "leaver_status": 0}, {"account_id": 4294967295, "player_slot": 128, "hero_id": 50, "leaver_status": 0}, {"account_id": 301406649, "player_slot": 129, "hero_id": 101, "leaver_status": 0}, {"account_id": 200676327, "player_slot": 130, "hero_id": 11, "leaver_status": 0}, {"account_id": 4294967295, "player_slot": 131, "hero_id": 59, "leaver_status": 0}, {"account_id": 4294967295, "player_slot": 132, "hero_id": 120, "leaver_status": 0}, {"player_slot": 4, "hero_id": 35, "hero_damage": 0, "tower_damage": 0, "hero_healing": 0, "gold": 0, "gold_spent": 0, "scaled_hero_damage": 0, "scaled_tower_damage": 0, "scaled_hero_healing": 0}], "radiant_win": true, "duration": 1946, "pre_game_duration": 90, "start_time": 1628309333, "match_id": 6123672329, "match_seq_num": 5125154990, "cluster": 117, "first_blood_time": 2, "lobby_type": 0, "human_players": 10, "leagueid": 0, "game_mode": 22, "flags": 1, "engine": 1}

Convert radiant_win (target) to float during pre-processing.

It's annoying to have to do this in each notebook.

# TODO(joshvarty): Move this to the data pre-processing stage.
df['radiant_win'] = df['radiant_win'].astype('float32')
df

Consider shuffling hero columns during training.

We could shuffle the heros on each team (or even shuffle the teams and invert radiant_win). This could be a useful form of regularization. It's unclear how much this will be needed if we just get more data.

Investigate `float64` in account embedding columns

For some reason, one of the tables is incorrectly float64.

df.dtypes

hero0              int64
hero1              int64
hero2              int64
hero3              int64
hero4              int64
hero5              int64
hero6              int64
hero7              int64
hero8              int64
hero9              int64
account0           int64
account1           int64
account2           int64
account3           int64
account4           int64
account5           int64
account6           int64
account7           int64
account8           int64
account9           int64
start_time         int64
radiant_win      float64
account_emb_0      int64
account_emb_1      int64
account_emb_2      int64
account_emb_3      int64
account_emb_4      int64
account_emb_5      int64
account_emb_6      int64
account_emb_7    float64
account_emb_8      int64
account_emb_9      int64
dtype: object

joshvarty / dotastuff Goto Github PK

dotastuff's People

Contributors

Watchers

dotastuff's Issues

Improve performance of column shuffle

Filter out invalid games

Investigate using min_players when downloading match history

When building dataset, do not use str()

Create one embedding table for heroes and one for accounts

Investigate missing account numbers

Convert radiant_win (target) to float during pre-processing.

Consider shuffling hero columns during training.

Investigate `float64` in account embedding columns

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent