opendebates / openskill.py Goto Github PK

View Code? Open in Web Editor NEW

242.0 6.0 12.0 12.97 MB

Multiplayer Rating System. No Friction.

Home Page: https://openskill.me

License: MIT License

Python 42.73% TeX 1.60% Jupyter Notebook 55.68%

python rating-system ranking ranking-system rating openskill openskill-py elo pypy

openskill.py's Introduction

Multiplayer Rating System. No Friction.

A faster and open license asymmetric multi-team, multiplayer rating system comparable to TrueSkill.

Description

In the multifaceted world of online gaming, an accurate multiplayer rating system plays a crucial role. A multiplayer rating system measures and compares players' skill levels in competitive games to ensure balanced match-making, boosting overall gaming experiences. Currently, TrueSkill by Microsoft Research is a notable rating system, but gaming communities are yearning for faster, more adaptable alternatives.

Here are some, but not all, of the reasons you should drop TrueSkill and bury Elo once and for all:

Multiplayer.
Multifaction.
Asymmetric faction size.
Predict Win, Draw and Rank Outcomes.
150% faster than TrueSkill.
100% Pure Python.
100% Test Coverage.
CPython and PyPy Support.
5 Separate Models.
Fine-grained control of mathematical constants.
Open License
Up to 7% more accurate than TrueSkill.

Installation

pip install openskill

Usage

The official documentation is hosted here. Please refer to it for details on how to use this library.

Limited Example

>>> from openskill.models import PlackettLuce
>>> model = PlackettLuce()
>>> model.rating()
PlackettLuceRating(mu=25.0, sigma=8.333333333333334)
>>> r = model.rating
>>> [[a, b], [x, y]] = [[r(), r()], [r(), r()]]
>>> [[a, b], [x, y]] = model.rate([[a, b], [x, y]])
>>> a
PlackettLuceRating(mu=26.964294621803063, sigma=8.177962604389991)
>>> x
PlackettLuceRating(mu=23.035705378196937, sigma=8.177962604389991)
>>> (a == b) and (x == y)
True

Support

If you're struggling with any of the concepts, please search the discussions section to see if your question has already been answered. If you can't find an answer, please open a new discussion and we'll try to help you out. You can also get help from the official Discord Server. If you have a feature request, or want to report a bug please create a new issue if one already doesn't exist.

References

This project is originally based off the openskill.js package. All of the Weng-Lin models are based off the work in this wonderful paper or are the derivatives of algorithms found in it.

Julia Ibstedt, Elsa Rådahl, Erik Turesson, and Magdalena vande Voorde. Application and further development of trueskill™ ranking in sports. 2019.
Ruby C. Weng and Chih-Jen Lin. A bayesian approximation method for online ranking. Journal of Machine Learning Research, 12(9):267–300, 2011. URL: http://jmlr.org/papers/v12/weng11a.html.

Implementations in other Languages

openskill.py's People

Contributors

Stargazers

Watchers

Forkers

damoyskeene calcolson weiplanet philihp asheeshiit erotemic seidlr theodoreguillet benoitlondon jonathanjonsson-dev bussiere jack-mcivor

openskill.py's Issues

Possibility for parameter for how ratings and win chances adjust for uneven teams

I'm using openskill for a game where sometimes we have teams of, for example, 6 vs 7. When making teams we put the better players on the team with the lesser amount of players. Openskill estimates are way off results when dealing with uneven teams. It seems that it values extra players much more than the specific game I'm using for it does.

Does anyone have any insight on how to tune a parameter that makes team disparity less important?

Thanks!

Guidance on matchmaking

Is your feature request related to a problem? Please describe.
This is a request to add a section to the documentation on how matches should be arranged.

Do the models make any assumptions on how matches should be arranged? For example should matches avoid playing the same players or teams back to back, or should matches avoid players arranging their own opponents? Should matches always try to balance teams based on the latest ratings?

In my use case I plan on using the rating algorithm for in person matches, where the player pool to make matches at any given time would be less than 20, with random teams.

It would be great if there was documentation on guidance on how to arrange matches to make rating convergence faster.

Documentation Theme and SEO

Switch theme to Shibuya.
New Logo
Fix issue causing apidocs to not generate some modules.
~~Officially Support LaTeX and PDF Builds~~

Clearer statement of need in documentation

Raising as part of JOSS review openjournals/joss-reviews#5901

The documentation / top-level README need a clearer statement of need / what problems the software is designed to solve and who the intended audience is.

The current summary at the begining of the README

A faster and open license asymmetric multi-team, multiplayer rating system comparable to TrueSkill.

assumes knowledge of what a multiplayer rating system is and what TrueSkill is. There is also no specific mention of online gaming communities, which reading between the lines, seems to be the primary target audience.

The summary in the documentation index page

This advanced rating system is faster, accurate, and flexible with an open license. It supports multiple teams, factions, and predicts outcomes. Elevate your gaming experience with OpenSkill - the superior alternative to TrueSkill.

similarly needs a bit more context and explanation.

Create Citation Files

About CITATION Files

Improve Documentation

Make the statistical theory accessible for absolute beginners.
LaTeX where possible.
Docstrings Everywhere.

"What do you mean, 'everywhere'?"

Are `predict_win` and `predict_draw` functions accidentally using Thurstone-Mosteller specific calculations?

If I understand it correctly, those two functions seem to perform calculations using equations numbered (65) in the paper. However, those equations seems to be specific to Thurstone-Mosteller model and as far as I can tell, the proper way to calculate probabilities for Bradley-Terry model would be to use equations (48) and (51) (also seen as p_iq in equation (49)). Is this intended? Or am I misunderstanding either the paper or the code of these functions?

Documenting how to access data for benchmarking

Raising as part of JOSS review openjournals/joss-reviews/issues/5901

As the data files are stored on Git LFS and the free LFS quota for this account seems to be regularly exceeded (see openjournals/joss-reviews#5901 (comment)) it would be useful to document an alternative approach for accessing the data, ideally one which uses an open data repository which doesn't require subscribing to an account to download. While the datasets have been made available on Kaggle (openjournals/joss-reviews#5901 (comment)) this is not currently documented in this repository and a Kaggle account is required to download. An open research data repository / archive like Zenodo would seem to be a better fit with JOSS requirement that the software should be stored in a repository that can be cloned without registration. While I don't think this strictly extends to data associated with the software, from a FAIR data and reproducibility perspective a service like Zenodo is much better than Kaggle.

A potentially even nicer approach would be to use a tool like pooch to automate getting the data from a remote repository as part of running the benchmarks.

Unable to install on Google Colab

Describe the bug
I can't install openskill on Google Colab via pip.
What should I do?

Screenshots

Platform Information

Google Colab
Python Version: 3.7.13

Thurston -> Thurstone

Looks like my poor spelling propagated here. It should be Thurstone Mosteller.

Add all contributors

Going to add contributors to the CONTRIBUTORS.md file. If there is a problem with credit, mention it here.

predict function

hello,

congrats for your work! I was wondering if there is a predict function for the rankings?

greetings

Full Strict Typing

Integrate MyPy
Annotate All Code with Types
Set up MyPy CI Type Checker

Issue: mu and sigma can't be set to zero

Describe the bug
Player rating parameters mu and sigma cant be set to 0.0, they are overwritten by default values 25 and 8.333.
The issue is file openskill/rate.py on rows 28, 29:

self.mu = mu if mu else default_mu(**options)
self.sigma = sigma if sigma else default_sigma(**options)

Conditions mu if mu and sigma if sigma return False when mu or sigma is set to 0.
Also, one cosmetic thing, you have the wrong typing in openskill/constants.py for functions z and mu. When the default z or mu value is used you are returning 3 or 25 (int) instead of float.

To Reproduce

from openskill import Rating

Rating(mu=0.0, sigma=5)
Rating(mu=25, sigma=0.0)
Rating(mu=0.0, sigma=0.0)

Expected behavior
It should be possible to set them to a value of 0.0.

Screenshots

Possible solution

if isinstance(mu, float) or isinstance(mu, int):
    self.mu = mu
else:
    self.mu = default_mu(**options)

Platform Information

OS: [macOS]
Python Version: [3.8.13]
openskill.py Version: [2.5.0]

Additional context

predict_win and predict_rank do not work on 2x2 and more games

Describe the bug
predict_win and predict_rank do not work properly on 3x3x3 games

To Reproduce
Step 1:

from openskill.models import PlackettLuce

model = PlackettLuce()

p1 = model.rating(mu=34, sigma=0.25)
p2 = model.rating(mu=34, sigma=0.25)
p3 = model.rating(mu=34, sigma=0.25)

p4 = model.rating(mu=32, sigma=0.5)
p5 = model.rating(mu=32, sigma=0.5)
p6 = model.rating(mu=32, sigma=0.5)

p7 = model.rating(mu=30, sigma=1)
p8 = model.rating(mu=30, sigma=1)
p9 = model.rating(mu=30, sigma=1)

team1, team2, team3 = [p1, p2, p3], [p4, p5, p6], [p7, p8, p9]

r = model.predict_win([team1, team2, team3])
print(r)

Results in:
[0.439077174955099, 0.3330210112526078, 0.2279018137922932]

Step 2, change p9 mu to 40:

from openskill.models import PlackettLuce

model = PlackettLuce()

p1 = model.rating(mu=34, sigma=0.25)
p2 = model.rating(mu=34, sigma=0.25)
p3 = model.rating(mu=34, sigma=0.25)

p4 = model.rating(mu=32, sigma=0.5)
p5 = model.rating(mu=32, sigma=0.5)
p6 = model.rating(mu=32, sigma=0.5)

p7 = model.rating(mu=30, sigma=1)
p8 = model.rating(mu=30, sigma=1)
p9 = model.rating(mu=40, sigma=1)

team1, team2, team3 = [p1, p2, p3], [p4, p5, p6], [p7, p8, p9]

print([team1, team2, team3])
r = model.predict_win([team1, team2, team3])
print(r)

Results are the same:
[0.439077174955099, 0.3330210112526078, 0.2279018137922932]

Expected behavior
After p9 mu increase team3 are expected to have a bigger chance of victory

Platform Information

openskill.py Version: 5.1.0

Additional context
https://github.com/OpenDebates/openskill.py/blob/f76df19c3e388f31050c988a0059367bd1dadc76/openskill/models/weng_lin/bradley_terry_full.py#L765

I have no idea what is going on here, and why it selects rating only of the first player, but it just does not work as intended

mu=0 results in mu=25

Describe the bug
mu=0 results in mu=25. Same goes for sigma and potentially other optional parameters that I did not investigate.

To Reproduce
To reproduce simply do:

model = PlackettLuce()
player = model.rating(mu=0,sigma=1)
print(player.mu) # prints 25.0 but expected is 0.0

Expected behavior
When mu is not None then take whatever the user provides.

Screenshots

Additional context
This can lead to unexpected behaviour AND wrong predictions. The issue happens in the wrong initialization of Rating objects.

# Replace this:
return self.PlackettLuceRating(mu or self.mu, sigma or self.sigma, name)
# With something more like this:
if self.mu is None:
    return self.PlackettLuceRating(mu,...)
else:
    return self.PlackettLuceRating(self.mu, ...)
# and the same for other parameters

Determining Convergence Criteria for Bradley-Terry Model

Is your feature request related to a problem? Please describe.
Currently, it's not explicitly clear how to determine convergence in the Bradley-Terry model implemented in openskill.py. Users might struggle to ascertain whether the model has converged, leading to uncertainty in the validity of the results.

Describe the solution you'd like
I propose adding documentation or guidance on determining convergence criteria for the Bradley-Terry model in openskill.py. This could include recommended thresholds or methods for assessing convergence, such as examining parameter estimates or likelihood changes over iterations.

Describe alternatives you've considered
One alternative is leaving the determination of convergence criteria to individual users, which could lead to inconsistency and confusion. Another option is relying solely on default convergence settings, but this might not be suitable for all use cases and datasets.

Additional context
Convergence is a crucial aspect of model fitting, particularly in iterative algorithms like those used in the Bradley-Terry model. Providing clear guidelines on determining convergence will enhance the usability and reliability of openskill.py for researchers and practitioners utilizing the Bradley-Terry model for skill estimation.

Community guidelines for reporting issues and support queries

Raising as part of JOSS review openjournals/joss-reviews/issues/5901

Ideally you should some clear and easily findable guidelines for how to report issues and seek support with the software.

This section in the user manual page in the documentation

https://github.com/OpenDebates/openskill.py/blob/bc96949febda5a285af34a901d7128c95501e9a4/docs/source/manual.rst?plain=1#L10-L13

already partially fits the bill, but

It took a little while for me to find - I would put it somewhere more prominent, for example in a dedicated top-level documentation page or in README
The reference to 'discussions section' is not very clear - I assume this means GitHub Discussions but from documentation website it wouldn't necessarily be clear how to get there so adding a link would be useful.
A brief description and pointer to the GitHub issue tracker as (presumably) the correct place to report issues with the code, and perhaps some explanation of the different issue templates / categories.

Rate function: "score" and "rank" interchangeable ?

Apologies for being a noob - it seems that the score margin doesn't have any affect on how the ratings are updated and it's effectively the same as the rank option just that the higher score is better. If that is true, is there a way to consider the score margin for games where it is important?

Improve win predictions for 1v1 teams

First of all, congrats and thanks for the great repo!

In a scenario that Player A has 2x the rating of Player B, the predicted win probability is 60% vs 40%. This seems strange.

players = [ [Rating(50)], [Rating(25)] ]

predict_win(teams=players)

[ 1 ]: [0.6002914159316424, 0.39970858406835763]

If I use this function implementation, I get 97% vs 3% which sounds more reasonable to me.

Maybe the predict_win function has some flaw?

Support for partial play/weighting/player performance

Would it be possible to add some sort of system to weight player performance like how the official trueskill module does it? I'm trying to create a system that weights players overall performance compared to their teams to get a more accurate skill rating.

Model Agnostic API

The Rating objects currently can be mixed and used between models. This may or may not make sense depending on the models under consideration. It is definitely erring on the side of caution to disallow this. It allows us to have different values (instead of mu and sigma for different models (perhaps Glicko? Standard Elo?).

Proposed API:

Note: Added a spoiler to not cause bias from my recommendation.

New API (Click to Reveal)

Similar to TrueSkill API, we can have a new class called `OpenSkill` which is initialized with the default `PlackettLuce` model.

Example code:

from openskill.models import BradleyTerryFull


# Initialize Rating System
system = BradleyTerryFull(tau=0.3,  constrain_sigma=True)

# Team 1
a1 = system.Rating()
a2 = system.Rating(mu=32.444, sigma=5.123)

# Team 2
b1 = system.Rating(43.381, 2.421)
b2 = system.Rating(mu=25.188, sigma=6.211)

# Rate with BradleyTerryFull
[[x1, x2], [y1, y2]] = system.rate([[a1, a2], [b1, b2]])  # No need to pass tau and so on again.

All functions that can be, will be converted to methods under the. All constants in the methods can be manually overridden as normal. A variable called custom will be set to True if models are mixed or constants are changed within a system after ratings have taken place.

Rating objects will contain a public attribute (Rating.model that references the model with which they were created in mind. So, if the user tries to use it in any function that takes those objects, it will produce an error.

If there are no active objections from users or any other implementation developers by the time I get to this issue in the Project Release Board (which should be a while still), then it will be shipped in the next major release.

If someone has another API idea, you are also free to suggest it in this issue.

Mentions: @philihp

Relevant Issues: philihp/openskill.js#231

Tournament Interface

Is your feature request related to a problem? Please describe.
Creating models of tournaments is hard since you have to parse the data using another library (depending on the format) and then pass everything into rate and predict manually. It's a lot of effort to predict the entire outcome of say, "2022 FIFA World Cup" easily.

Describe the solution you'd like
it would be nice if there was a tournament class of some kind that allowed us to pass in rounds which themselves contained matches. Then using an exhaustive approach predict winners and move them along each bracket/round. Especially now that #74 has landed it would be easier to predict whole matches and in turn tournaments.

The classes should be customizable to allow our own logic. For instance, allow using the munkres algorithm and other such methods.

Describe alternatives you've considered
I don't know any other libraries that do this already.

Software Paper Review: Suggestions for Clarity and Completeness

Please consider the following in drafting the software manuscript:

Provide more contextual information about the problem you are solving in the paper and software, targeting software engineers and researchers who may not have specialized knowledge in the domain.
There is a typo in the summary: know --> known.
For claims about the software being "Faster" and "accurate", please offer supporting evidence like benchmarks, examples, or descriptions of the steps taken to achieve these qualities.
Some of the limitations of the software are mentioned in the FAQ section of the documentation. It would be beneficial to dedicate a discussion section in the paper that covers what the package can and cannot do, as well as future development plans.
The paper should give appropriate credit to the OpenSkills.js package.
Please incorporate at least one example in the paper that demonstrates how to use the package, enabling readers to start using it quickly.

This issue is related to this submission: openjournals/joss-reviews#5901

let score difference be reflected in rating

When you enter scores into rate(), the difference between the scores have no effect on the rating - meaning: rate([team1,team2],score(1,0)) == rate([team1,team2],score(100,0)) is true.
They have exactly the same rating effect on team1 and team2.

I don't know if it is mathematical possible and how it would look like. But it would be great if the difference could be somehow factored into the calculation, as it is (if your game has a score) quite an important datapoint for skill evaluation.

Automatic Test Generation and Parameterization

Is your feature request related to a problem? Please describe.
When a model is rewritten or improved, due to changes internally, the expected API outputs will change significantly. Re-entering the correct values into the test suite to verify determinism is wasted effort on the developer's part long term.

Describe the solution you'd like
Use Hypothesis to generate tests and pytest parameterization.

Tasks:

Decouple benchmarks into its own top-level package for loading different kinds of data.
Create a command line utility to run benchmarks and regenerate tests.
Import benchmarks package to load data for testing purposes.

Add 3.10 for PyPy

Migrate to Tox
Use Tox in GitHub Actions

Fully Vectorized

This is obviously a very difficult problem that relies on a few parts being successful.

No:	Dependency Changes	Strict Typing	Implementation	OS	Performance Gains	Implementation Difficulty
1	None	Possible	CPython, PyPy	Windows, Ubuntu, MacOS	Insignificant	Easy
2	Numpy	Partial	CPython	Windows, Ubuntu, MacOS	Significant	Difficult
3	Scipy	Not Possible	Cpython	Windows, Ubuntu, MacOS	Significant	Normal
4	Conditional Numpy	Partial	CPyhton, PyPy	Windows, Ubuntu, MacOS	Significant	Very Difficult

Option 4 is ideal for best compatibility and performance but is a huge undertaking at the end of which strict typing may still end up being not possible.

Regardless of which option is being pursued, these tasks need to be completed first:

opendebates / openskill.py Goto Github PK

openskill.py's Introduction

Description

Installation

Usage

Limited Example

Support

References

Implementations in other Languages

openskill.py's People

Contributors

Stargazers

Watchers

Forkers

openskill.py's Issues

Proposed API:

Recommend Projects

Recommend Topics

Recommend Org