Giter Club home page Giter Club logo

dotastuff's People

Contributors

joshvarty avatar

Watchers

 avatar  avatar

dotastuff's Issues

Improve performance of column shuffle

Right now we use a crazyshuffle() method that uses numpy and Python for-loops. Can we make this faster using jnp? It's considerably slowing down our training pipeline.

When building dataset, do not use str()

Python's str() function ruins the JSON and formats the string in a Python-friendly way (eg. single quotes). I think we're supposed to use json.dump() or json.dumps()?

Look at match seq num 5125140861 with the Dire team name I'm bored

Create one embedding table for heroes and one for accounts

Currently I'm accidentally creating 10 embedding tables for heroes and 10 embedding tables for accounts.

This is incorrect and we probably waste a lot of time (10x!) training each separate table instead of learning jointly. We should figure out how to fix this.

Investigate missing account numbers

The default anonymous account number is 4294967295 (0xFFFFFFFF). For some reason, some players have no account number at all. Currently we're just assigning them the default and moving on but we should try to better understand why we're seeing this.

Example:
https://www.dotabuff.com/matches/6123672329

Error: 'account_id'
{"players": [{"account_id": 4294967295, "player_slot": 0, "hero_id": 121, "leaver_status": 0}, {"account_id": 4294967295, "player_slot": 1, "hero_id": 14, "leaver_status": 0}, {"account_id": 4294967295, "player_slot": 2, "hero_id": 114, "leaver_status": 0}, {"account_id": 4294967295, "player_slot": 3, "hero_id": 8, "leaver_status": 0}, {"account_id": 4294967295, "player_slot": 128, "hero_id": 50, "leaver_status": 0}, {"account_id": 301406649, "player_slot": 129, "hero_id": 101, "leaver_status": 0}, {"account_id": 200676327, "player_slot": 130, "hero_id": 11, "leaver_status": 0}, {"account_id": 4294967295, "player_slot": 131, "hero_id": 59, "leaver_status": 0}, {"account_id": 4294967295, "player_slot": 132, "hero_id": 120, "leaver_status": 0}, {"player_slot": 4, "hero_id": 35, "hero_damage": 0, "tower_damage": 0, "hero_healing": 0, "gold": 0, "gold_spent": 0, "scaled_hero_damage": 0, "scaled_tower_damage": 0, "scaled_hero_healing": 0}], "radiant_win": true, "duration": 1946, "pre_game_duration": 90, "start_time": 1628309333, "match_id": 6123672329, "match_seq_num": 5125154990, "cluster": 117, "first_blood_time": 2, "lobby_type": 0, "human_players": 10, "leagueid": 0, "game_mode": 22, "flags": 1, "engine": 1}

Consider shuffling hero columns during training.

We could shuffle the heros on each team (or even shuffle the teams and invert radiant_win). This could be a useful form of regularization. It's unclear how much this will be needed if we just get more data.

Investigate `float64` in account embedding columns

For some reason, one of the tables is incorrectly float64.

df.dtypes

hero0              int64
hero1              int64
hero2              int64
hero3              int64
hero4              int64
hero5              int64
hero6              int64
hero7              int64
hero8              int64
hero9              int64
account0           int64
account1           int64
account2           int64
account3           int64
account4           int64
account5           int64
account6           int64
account7           int64
account8           int64
account9           int64
start_time         int64
radiant_win      float64
account_emb_0      int64
account_emb_1      int64
account_emb_2      int64
account_emb_3      int64
account_emb_4      int64
account_emb_5      int64
account_emb_6      int64
account_emb_7    float64
account_emb_8      int64
account_emb_9      int64
dtype: object

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.