Giter Club home page Giter Club logo

statistics's People

Contributors

campos20 avatar daniel-anker-hermansen avatar georgemunyoro avatar randomno avatar sauroux avatar spencerchubb avatar timreyn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

statistics's Issues

Create "evolution of records" graph

Preliminary notes

Requirement

One of the things still needed in the transition to Statistics 2.0 is an updated version of the "Evolution of Records" graph.

Basically, we need to recreate that graph in more modern frameworks (Python/ReactJS) inside this repo.

Description of task

You will need to complete the following steps:

  • A Python script to extract, generate and organise the source data for the graph
  • Create a frontend page to host the graph in ReactJS
  • Integrate (ie, link to) the page from the main statistic.worldcubeassociation.org website.
  • Use research and judgement to decide how best to render the graph on the frontend - I don't know enough to advise on how to do this

General Comments

One person doesn't need to do all of this - if you only feel confident to do the Python or React sides of this, feel free to just do those.

Even if its one person doing everything, it is strongly recommended to submit a pull request for each piece of this, so that code review is easier. Please feel free to ask for help if you aren't sure if you're going in the right direction.

Migrate existing statistics

  • Best "medal collection"
  • Sum of 3x3/4x4/5x5 ranks
  • Sum of all single ranks
  • Sum of all average ranks
  • Appearances in 3x3x3 Cube top 100 results
  • Most Sub-X solves in 3x3x3 Cube
  • 3x3x3 Blindfolded longest success streak
  • 3x3x3 Blindfolded recent success rate
  • World records in most events
  • Best Podiums in 3x3x3 Cube event
  • Oldest standing world records
  • Most Persons
  • Most Competitions
  • Most Countries
  • Most solves in one competition, year, or lifetime

Success streaks should order by RoundType.rank

The query for "longest success streak" doesn't sort by RoundType.rank (https://github.com/thewca/statistics/blob/6f98049999b6edf1977ad6d9523053c662328cff/misc/python/statistics/longest_success_streak.py). This means that, for competitions with multiple rounds, it's nondeterministic what order they are returned in.

I noticed this for my clock success streak. Last week it was 116 (correct), early this week it dropped to 111, and when AmherstOpen2022 results were added it went back up to 116 (should be 121 now). My last DNF was in round 1 of a 2-round competition, so the finals of that competition appear to be switching back and forth between being included and not.

Thank you!

Update website copy/text

I risk sounding like a bit of an asshole on this one - sorry if I do!

Some of the text on the website appears to be placeholder text that could be expanded upon (eg, more info could be added to the "About" page), or has phrasing that sounds a bit weird to a native English speaker ("We try always to add new statistics." could be "We're always looking to add new statistics").

I expect the focus here has been on putting together the website and its features, not on making sure everything reads perfectly. But I would suggest that the text is reviewed before the site is made live.

This is something I would be happy to work on, but would only be available to do so next week.

Better html configuration

Currently, the way to support html formatting is to send it directly to the statistics. We could add capabilities to support html configuration. Currently, query is including html snipets and it makes them quite hard to reuse/study.

Features that would be nice to support

  • Links (with the href and text)
  • Different colors (eg for sum of ranks)

Sum of Ranks not loading on Safari

I've had two reports of Sum of Ranks not loading on Safari - one on iOS, one on Mac.

Don't have much information besides that refreshing doesn't fix it. If you'd like me to ask more detailed questions, let me know what you'd like to find out.

Error message using custom function for displaying results

This came in via the WST email - though it would be best to put it here so you can work on it when you get a chance :)

Been loving the new WCA Statistics page, but it seems permissions have been denied for the custom function for displaying results. Here is the Error message:

< StatementCallback; bad SQL grammar [select count(*) from ( SELECT wca_statistics_time_format(347, '333', 'single') ) alias]; nested exception is java.sql.SQLSyntaxErrorException: execute command denied to user 'read_only'@'%' for routine 'wca_development.wca_statistics_time_format' >

It's a great function that I'd love to utilise, so this would be great to have access to! Amazing work on the rest of this site.

Better error handling when not logged in

When you go to a database query link when not being logged in it says "NOT FOUND" as the error message. It should either tell you "get access to this feature by logging in" or just redirect you to the login page instead.

Statistics oldest standing records may not be accurate

Submitted via website contact form:

Statistics oldest standing records may not be accurate. Noticed in the national records section the oldest standing records was a 3x3 average from 2007. That very same average he got a single time that is currently an NR. I suppose it was a continental record single at the time, but think that could still be worth noting if unintentional. Likewise, I assume there are current continental records that aren't listed because they were world records at first.

Add statistics from various sources

Implement "PB Streaks" Statistic

From the "New statistic suggestion" email thread:

I’d like to propose the addition of PB streaks to the WCA Statistics page. There’s a wide interest in this statistic in the community and It’s easy to implement as the code for generating the rankings is already on Jonatan’s GitHub page. (Furthermore, but unnecessarily, Antonie is only two comps away from tying Evan's WR)
People might also find it interesting to filter out the ones that finished, leaving only the ongoing streaks as a ranking.
It might be too much to also separate them in continental and national rankings, but that’s up to you.

Add redirect parameters on login in statistics service

Redirect parameters should be present on every login. The absence of this makes the entire process a bit more cumbersome and potentially confusing for people who are not used to the platform.

For example, take this process:

Click on a WCA statistics database query URL
Login required
User logs in
Back to home screen. How do I get to the first screen again?

It is a bit counterintuitive. The same happens for any instance of login.

Originally posted by @GuidoDipietro in thewca/worldcubeassociation.org#6777 (comment)

Fix FMC Podium rankings

image
The 7th place podiums and 9th place podiums are equivalent, and should be rounded as such - easiest fix is probably rounding all x.66 to x.67, and the same for (x-1).99 and x.01 to x.00; and x.34 to x.33.

Bugs in statistics page

The following are the list of bugs that WST received through email a few days ago:

Database export has not been replaced in >=19 days

The database export is supposed to be automatically replaced every 7 days:

one_week=$(date -d 'now - 7 days' +%s) # export older than 1 week is suggested to be replaced
date_of_export=$(date -r "$export_file_sql" +%s)
if (( date_of_export <= one_week )); then # Here we check how old the export is.

But it has not updated since 12/28/2022:

image

I don't see anything in the code that suggests an issue handling a change in year (I could be wrong). Is there a way to check if the cron is running?

Sum of Ranks presentation changes

Two changes suggested by a parent in the "Issue on Statistic Page" email thread.

  1. For better readability, the total sum of ranks could be in bold
  2. Add text explaining the red/green highlights on different numbers

Screenshot to illustrate the points is below:
image

Stat includes cancelled competitions

Description:
This statistic probably includes cancelled competitions, because the only competition in Czechia, that was in 2020, was Kostelec Open 2020, which was cancelled due to the pandemic.

Expected behaviour:
It shouldn't count cancelled competitions, since those didn't actually happened.

Fixing this should be very easy, since there are columns cancelled_at and cancelled_by in the Competitions table, that should probably be null for competitions that weren't cancelled.

TODO list before releasing

I'm having less time to dedicate to this project so we are releasing it.

  • Assigning a subdomain (impeditive)
  • Restrict database query to logged users and point login from staging to prod (Impeditive)
  • Hide backend endpoints to create statistics in prod so people won't mess with us (Impeditive)
  • Assign a load balancer and use that address instead of server's address so we can be more consistent (we are using elastic IP, which can help, but it's not good enough IMO)
  • Start using a dedicated database instead of one that lives in the server
  • Use stats from various sources
  • Stop using servers and start using ECS
  • Change home page and about page
  • Add a cron process to calculate stats
  • HTTPs

Include export date

As of today, we are including the date when the statistics were generated, which is not very useful. We could include the export date, which is more meaningful.

  • Migration with a new table that include export_date and execution_date. Statistics table should also include export_date
  • Save export date in the control table
  • Statistics should look at the last export date in the control table and the date should go as a parameter to frontend
  • The front end needs to display this date in every page

Details

This file is the one that runs in the cron

https://github.com/thewca/statistics/blob/main/scripts/cron.sh

This line downloads the export dump

https://github.com/thewca/statistics/blob/main/scripts/generate_all_statistics.sh#L8

Around here, we should include a line to save the export date

https://github.com/thewca/statistics/blob/main/scripts/get_db_export.sh#L65

The export we use is the "not so well known developer database export" here

https://github.com/thewca/worldcubeassociation.org/wiki/Developer-database-export

As an export date, we can use the last date in the Results table

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.