dotastuff's People
dotastuff's Issues
Improve performance of column shuffle
Right now we use a crazyshuffle()
method that uses numpy and Python for-loops. Can we make this faster using jnp
? It's considerably slowing down our training pipeline.
Filter out invalid games
Once we are ready to build a dataset, we should make sure to filter out invalid game modes. See: https://wiki.teamfortress.com/wiki/WebAPI/GetMatchDetails
We don't want to run our model against Turbo games.
Investigate using min_players when downloading match history
We should see if we can use this to only pull games with 10 human players.
See: https://wiki.teamfortress.com/wiki/WebAPI/GetMatchHistory
When building dataset, do not use str()
Python's str()
function ruins the JSON and formats the string in a Python-friendly way (eg. single quotes). I think we're supposed to use json.dump()
or json.dumps()
?
Look at match seq num 5125140861
with the Dire team name I'm bored
Create one embedding table for heroes and one for accounts
Currently I'm accidentally creating 10 embedding tables for heroes and 10 embedding tables for accounts.
This is incorrect and we probably waste a lot of time (10x!) training each separate table instead of learning jointly. We should figure out how to fix this.
Investigate missing account numbers
The default anonymous account number is 4294967295
(0xFFFFFFFF
). For some reason, some players have no account number at all. Currently we're just assigning them the default and moving on but we should try to better understand why we're seeing this.
Example:
https://www.dotabuff.com/matches/6123672329
Error: 'account_id'
{"players": [{"account_id": 4294967295, "player_slot": 0, "hero_id": 121, "leaver_status": 0}, {"account_id": 4294967295, "player_slot": 1, "hero_id": 14, "leaver_status": 0}, {"account_id": 4294967295, "player_slot": 2, "hero_id": 114, "leaver_status": 0}, {"account_id": 4294967295, "player_slot": 3, "hero_id": 8, "leaver_status": 0}, {"account_id": 4294967295, "player_slot": 128, "hero_id": 50, "leaver_status": 0}, {"account_id": 301406649, "player_slot": 129, "hero_id": 101, "leaver_status": 0}, {"account_id": 200676327, "player_slot": 130, "hero_id": 11, "leaver_status": 0}, {"account_id": 4294967295, "player_slot": 131, "hero_id": 59, "leaver_status": 0}, {"account_id": 4294967295, "player_slot": 132, "hero_id": 120, "leaver_status": 0}, {"player_slot": 4, "hero_id": 35, "hero_damage": 0, "tower_damage": 0, "hero_healing": 0, "gold": 0, "gold_spent": 0, "scaled_hero_damage": 0, "scaled_tower_damage": 0, "scaled_hero_healing": 0}], "radiant_win": true, "duration": 1946, "pre_game_duration": 90, "start_time": 1628309333, "match_id": 6123672329, "match_seq_num": 5125154990, "cluster": 117, "first_blood_time": 2, "lobby_type": 0, "human_players": 10, "leagueid": 0, "game_mode": 22, "flags": 1, "engine": 1}
Convert radiant_win (target) to float during pre-processing.
It's annoying to have to do this in each notebook.
# TODO(joshvarty): Move this to the data pre-processing stage.
df['radiant_win'] = df['radiant_win'].astype('float32')
df
Consider shuffling hero columns during training.
We could shuffle the heros on each team (or even shuffle the teams and invert radiant_win
). This could be a useful form of regularization. It's unclear how much this will be needed if we just get more data.
Investigate `float64` in account embedding columns
For some reason, one of the tables is incorrectly float64
.
df.dtypes
hero0 int64
hero1 int64
hero2 int64
hero3 int64
hero4 int64
hero5 int64
hero6 int64
hero7 int64
hero8 int64
hero9 int64
account0 int64
account1 int64
account2 int64
account3 int64
account4 int64
account5 int64
account6 int64
account7 int64
account8 int64
account9 int64
start_time int64
radiant_win float64
account_emb_0 int64
account_emb_1 int64
account_emb_2 int64
account_emb_3 int64
account_emb_4 int64
account_emb_5 int64
account_emb_6 int64
account_emb_7 float64
account_emb_8 int64
account_emb_9 int64
dtype: object
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.