Giter Club home page Giter Club logo

xivstats's Introduction

XIVStats

What is XIVStats?

XIVStats is actually two things. One part is an application which trawls the lodestone and gathers detail on all of the player characters in FFXIV, and the other is this repository, which takes that lodestone data, and produces graphs and charts of the data.

You can view a live demo of this web page here: ffxivcensus.com.

This project is inspired by (now defunct) xivsoul.

I want to make my own copy of the Lodestone database, which repository should I use?

The data used by this PHP is gathered by the XIVStats Java Gatherer, linked below:

Configuration

Usage

If you have a relatively complete lodestone database, you will find that page execution times are extremely high (>10 minutes). For this reason I recommend you compile the PHP to static HTML if you intend to use it somewhere. To do this, simple run:

php xiv_stats.php > xiv_stats.html

Notes

Complete scans of the Lodestone are completed on a monthly basis and are available for download from Link

xivstats's People

Contributors

crakila avatar matthewhillier avatar pricetx avatar reidweb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xivstats's Issues

Complete rewrite of gatherer needed (again)

Background

To give some background: in November/December of 2015, Pricetx noted to me the 'hassle' of having to run multiple instances of his ruby script to get a 'fast census'. I picked up on this, and I was intending to learn threading in Java anyway - so I took it as an opportunity to learn and implement something useful at the same time. Unfortunately at the time I was not greatly experienced in testing techniques, database connectivity or handling errors in the best manner. It has also become evident to me that Java may not have been the best choice too.

Flaws in current implementation

  • badly written, hard to maintain database code
  • no proper handling of database write errors
  • handling of non-200 or 404 status codes is non-optimal - it just repeats until it gets a hit
  • the 'parser' is highly coupled to the 'interpreter' of the data
  • testing is flaky and awfully done

Targets for rewrite

  • scalable, maintainable database code
  • more maintainable code in general - in particular mapping mounts/minions --> achievements - perhaps separate this out into a JSON/XML file declaring these?
  • properly handle database errors
  • handle non-200, non-404 status codes in a better manner
  • decouple 'scraper' from 'interpreter'
  • better testing - incorporating mocking technologies to allow for removal of reliance on data that might change.
  • incorporate features requested in #8 beyond current feature set.
  • re-architect the database, previous DB implementation was built upon the idea of 'minimal time for pricetx to integrate', the current database architecture is extremely out of line with DB design principles as I tried to implement in line with what the ruby implementation did in SQLite.

Technology choice for rewrite

At present I am currently considering the following:

  • JavaScript (Node.js) - this is the language I'm most comfortable in - and I've already PoCed this previously
  • C# (.NET Core) - i'm relatively comfortable and do have a decent level of experience in C#, and I'm interested in developing out my skills in this area. Indications of performance of .NET Core and C# are very promising.
  • Java rewrite - complete rewrite of the existing implementation from the ground up using the above targets.
  • Python - no experience what so ever in Python, but interested to evaluate.
  • Go - no experience what so ever in Go, but interested to evaluate.

Technology requirements

  • allows for 'modular' code
  • performance must be better than current Java implementation
  • well maintained, singular HTTP library that supports GET and HEAD HTTP operations
  • MySQL database support
  • MongoDB database support - need to further evaluate MongoDB performance for our use case, and consider alternatives in NoSQL space
  • runs on Windows, MacOS, Linux, FreeBSD - the runtime must be included in the package repos on Debian & CentOS and likewise for FreeBSD. On MacOS inclusion in brew or an alternative such as nvm would be preferable.
  • well documented and maintained testing and mocking libraries to allow for improvement of testing
  • must run in AWS Lambda environment - all of technologies outlined already meet this requirement.

Support requirements

  • Jenkins should support any technology stack chosen - this is unlikely to be an issue
  • JetBrains/Microsoft IDE for technology stack should be available
  • Free alternative IDE for technology stack must be available

Architecture

Ideally I/we would separate what is currently one component the 'gatherer' into two separate modules:

  • the 'scraper' or 'gatherer' - that fetches the page and reads the data from it
  • the 'interpreter' that interprets the data from the former, and makes 'assumptions' such as translating 'having X mount means they've done Y thing'.

To put it simply the 'gatherer' should not have an 'opinion' on what any data means - it should simply return the 'raw facts' - such as classes, gear, race, age, name, FC, GC etc. It's output would be via an API, which could be interacted with locally or remotely (REST). It would not write anything to any form of persistent storage.

The 'interpreter' would take this data and extrapolate it to make the assumptions such as those mentioned previously - this would be the piece that generates the database.

Timeframe

Between the start of June and mid-July or mid-August I will have time to pursue this.

Suggestion: Get player's currently active job, current HP, average item level, main weapon level and main weapon name (would help us determine player activity status).

It could be a good idea to grab the player's currently active job and their average item level (and, why not, the main weapon's item level too) from the player lodestone profile along with the rest of the currently scraped data. That would allow us to get a "snapshot" of what jobs are players playing at the time of the census, and maybe infer their main job from this additional data (e.g. by comparing levels or previous censuses). And it could also help us with distinguishing SCHs from SMNs.

Could be reflected in the DB with a column containing the class/job trigram of the currently active class in the lodestone, and another containing the average ilvl.

Dynamis data centre missing

Your census data is missing the 4 US servers that exist on the Dynamis Logical Datacenter:
-Halicarnassus
-Maduin
-Marilith
-Seraph

Download link broken

Due to the census starting in January and running into February, the SQL Dump download link for the current census is invalid.

Manual fix only needed, we don't anticipate running across several months again

Optimise PHP Database Queries

Currently the page generation requires a significant amount of time to compile all of the data for the census.

There are optimisations to be made to the underlying database, however the bulk of the gains will be from rewriting the queries fetching the statistics data from the raw information.

This ticket is to track optimisation of the queries to reduce page build time.

Pull Request - #21

Add Eureka/Elemental Levels to Census

Copied from @fahy's original suggestion on the Gatherer:

So looking at LuckyBancho's stats that they recently released here: https://www.reddit.com/r/ffxiv/comments/apfx16/luckybancho_unofficial_census_10_february_2019/

It seems that they are tracking for Elemental Levels (it's under the Class/Job tab on a character's page)
As of Patch 4.55 (Released today 12th Feb 2019), there are 4 "Eureka" instances, with there being Level caps for each zone. Now it is possible to level past 20 in Anemos but I figured that the figures could be displayed in the next (March report) once players have got to max level in Hydatos.

Eureka Anemos - 1 - 20
Eureka Pagos - 21 - 35
Eureka Pyros - 36 - 50
Eureka Hydatos - 51 - 60
My idea would be that the result would be:

"Total Players who have played Eureka": (eg. Everyone that has a Elemental Level)
"Players who are levelling in Eureka: Hydatos" (eg. Anyone that is between 50 and 59)

Fix Qitari beast tribe reward

The qitari beast tribe reward is showing as 0. Look into why and apply fix.

Should just require a re-run of the PHP, so a census re-run is not required

PvP data on ffxivcensus shows 0 instances of WIN 200 'FEAST' OR 'CRYSTALLINE CONFLICT' MATCHES

I don't know the exact methodology you all use to parse out this data, but there's very little chance that not a single person has completed this achievement. Comparing against a similar site, ffxivcollect.com, it shows at least 15000 characters own the associated mount, the Gloria-class Airship:

image

When looking at ffxivcensus.com, the linked item on the header is to the Wind-Up Hildibrand, which is the quest reward for completing the ARR questline:

image

Mounts and Minions columns missing from ffxivcensus.tblplayers?

I might have missed something, but the mounts and minions columns seem to have completely vanished from the tblplayers table. Is this expected? I was using these fields to filter bots.
Note: I did not import anything into my local DBMS since July's census, so I can only say it was working in June and not anymore in October.

Report progress with php (into a file, or into the output HTML)

So as the current script can take quite long, and even with the new pull request... it would be great to see a progress report. Maybe percentage, or just simple steps.

(A simple $variable_sql that we increment each query and a simple $variable_logic that we implement after prints and whatnot. And of course, we write out the current progress into a file, or into the HTML file somehow.)

Errors when running XIVStats against the latest Java gatherer

The following errors occur when running XIVStats against a database generated by the latest Java Gatherer from the unattended_gathering branch;

PHP Notice: Undefined index: female in xiv_stats.php on line 1016
PHP Notice: Undefined index: male in xiv_stats.php on line 1025

The current results of a run can be seen here: http://dev.ffxivcensus.com

The run was a limited size one, so ignore the overall numbers being low. Specially, there appears to be an issue with the chart output.

Add merchandise

We are currently missing:

  • Encyclopedia Eorzea
  • The Art of Ishgard: Stone and Steel (HW Artbook 1)
  • The Art of Ishgard: The Scars of War (HW Artbook 2)
  • Potentially other items you're parsing older versions of.

Also:

  • Rename 'Artbook' to be 'Another Dawn' Artbook

No updates for Stormblood?

Hi there.

Just wondering why there hasn't been an update for Stormblood. It has been a couple of months since release.

Also, the page for the existing data seems to be missing (I'm just getting a white page when I go to the site https://ffxivcensus.com/ )

An update would be much appreciated! :)

📣 Statement on the disparity between the January 2022 census and censuses from Feb 2022 onwards

This issue is intended to be a short form, digestible summary of the issue identified, and the impact it will have on the numbers that make up our census output, if you wish to review technical details please check out XIVStats/XIVStats-Gatherer-Java#63.

Issue

In January 2022, a manual review of the census output on the part of @Pricetx, @Crakila and I identified a significant disparity in the numbers from what they were prior to September 2019.

Census Num. Characters
August 2019 16,376,929
September 2019 5,044,224
January 2022 7,048,698

As you can see there is a significant drop between August 2019 and September 2019.

Cause

Following investigations as detailed in XIVStats/XIVStats-Gatherer-Java#63, we are confident we have identified the cause of the issue as below.

An issue occurred where characters that satisfied one of the 3 criteria outlined below were marked as DELETED:

Short explanation

For each character that we fetch, four pages are loaded: their main page, their class page, their minions page and their mounts page.

The introduction of these 3 separate pages for the base character info, mounts and minions is something that was introduced in August 2019 by the Lodestone and we implemented changes to support this thereafter via XIVStats/XIVStats-Gatherer-Java#55.

Due to the way in which we had coded our character parsing, if any of these requests failed to load due to a 404 page not found it would mark the entire character as 'DELETED'. This is not the intended behaviour

Fix

Going forward we have corrected our code to account for the case described above, characters with no mounts or minions should not cause a character to be marked as deleted.

Automated tests have been added to verify that this scenario does not re-occur.

Impact

I guess this is why most of you are probably reading this, so I'll cut to the chase.

For censuses starting September 2019 (2019-09) and ending in January 2022 (2022-01) inclusive, the below corrections should be applied:

All characters graphs and metrics

The values presented for the below values are incorrect:

  1. Global 'all characters' count
  2. Regional 'all characters' count
  3. 'All characters' class distributions
  4. 'All characters' race distribution
  5. 'All characters' realm distribution
  6. 'All characters' grand company distribution

Unfortunately we have no means to correct this at this time, we can however consider correctional action once we know which of the ids marked as deleted are not in fact deleted, if there is demand for it.

Active character graphs and metrics, and other stats

All other values are coupled with the presence of a minion or a mount, and so due to the nature of this issue we still believe these values to be for the most part correct.

The active characters metric and all related graphs are dependent on the presence of a minion and mount, and as such can be confirmed to be correct, as no player who satisfies the criteria outlined above in the 'Cause' would meet the criteria for being active.

Some notes

  1. Per the above, there will not be any characters that would get flagged as 'active' that failed to parse due to this issue.
  2. While the number of characters that failed to parse due to this issue is significant, these are largely inactive characters. Think about how many characters you know that have no minions and no mounts? These are largely likely to be characters below level 20.

Estimate on corrected census

We estimate that the corrected census should be available in late January or early February 2022. We're running against a completely fresh database so results will be a bit slower than normal.

Update 2022-02-16: The updated census is now live at FFXIVCensus.com 🎉

An apology

@Pricetx , @Crakila , @matthewhillier and I are sorry we let this issue slip through for so long. Many of us (Jonathan and I especially) have been largely inactive in game for the time since this issue came about, and have only just properly returned to the game.

We should have been actively reviewing the censuses as they automatically came out, and comparing them against past results but we weren't. This is something we're planning to do going forward.

I personally have a lot of plans to inject new life into this project, and make it an even better experience for all of you. So looking forward to doing so over the coming months and years.

A thank you

@Pricetx , @Crakila , @matthewhillier and I are immensely proud of what we've managed to build here over the last 7+ years. Every time we check the subreddit (shoutout to the awesome mods over there ❤️), various discords, the community forums or other sites we're delighted to see many a reference to our work and genuine discussions from this amazing community that is the FFXIV player base.

Thanks to the many individuals who've taken it upon themselves to provided feedback on our projects, and especially to those who've provided changes for us.

I'd like to extend a special thank you to:

  • @matthewhillier for the work you did to rebuild the census gatherer into the incredibly efficient state it is in today, especially when I didn't have the time to do so myself due to other commitments.
  • @Crakila for the work put in to maintain the live site, and help @Pricetx and I with corrective action on the back of this issue and others.
  • @Pricetx for continuing to host the census, and responding so diligently to my access requests/issues while I debugged this issue.

Thank you all, you're all awesome❤️

This issue will be locked, please direct all discussion to the page linked below

Add new realms to world graphs

Louisoix and Omega have been added as worlds, need to be added to population graphs.

CC: @fahy - can you confirm that these are the only changes?

Missing realms from population graphs

As highlighted by a user on Reddit, there was a realm missing from the output.

I've checked through them all

American Realms (Aether & Primal Datacentres)

  • None missing

Japanese Realms (Mana, Gaia and Elemental Datacentres)

  • Atomos - typo in script - it is accidentally spelt 'Atomis'

EU Realms

  • None missing

Seems our graphing UI framework removes elements with 0 value. The SQL query would have returned 0 with this typo in place.

Feature Suggestion: milestone Minions/Mounts?

Hello, all and thank you for your hard work with this. And I apologize if this is the wrong place to do this but I found nowhere else more appropriate.

Like you have the Dress-up Raubahn as the definition of active players, my suggestion would be to include a section with evolution of claimed mounts and minions along the story. Only milestones like the airship minion from the early ARR quest, the "Maggie" magitek mount for end of ARR, whaterever it is from HW [it must have something, not there yet] and Dress-up Raubahn for SB.
Something like the BEAST TRIBES (REDEEMED MINION) but for story related only minions and mounts

thank you again, and sorry if this is an inappropriate place
rd

Add better means of determining activity

A better way of determining active players is needed, as at present by only including players with the latest story minion/mount we are excluding the following:

  • Pure crafters
  • Pure gatherers
  • Players who do not play story
  • Players who do play story but have not reached 60 yet and are still 'actively playing the game'.

Potential solutions:

  • Manually compare each row
  • use MySQL MD5 function, and compare hashes
  • Compare only certain values, e.g. levels, minions and mounts

Only 'additions' should ever take place, players won't lose their minions, mounts or levels.

The window for determining activity would need to be over a 90 day period.

This metric would still not have a 100% accuracy rate unfortunately.

Evaluate issues with census mis-reporting data in 2017-03-16 test-run

Initially discussed in #1

For the test run of v1.3.0 of the gatherer program that was completed 2017-03-15, and the output of the XIVStats script run through to 2017-03-16 the statistics are extremely diminished on the figures for the beginning of February.

Taking some samples from the page:

Metric February Test Run
World Players 10.1 million 8.1 million
Eternal Bond Guest 319k 259k
Eternal Bond Married 121k 99k
ARR Soundtrack 76k 57k

From this small sample of metrics and comparing the two sites side by side, you can see that the results are obviously affected by some sort of issue.

Suggested Incident Response

  • @Pricetx - revert page for 2017-03 to the data from 2017-03-04, and restore the index page symlink.

  • @Pricetx - Investigate cause of issue with data being misreported, ascertain as to whether it was a network or system issue

  • @Pricetx / @ReidWeb - load a dump of the database from 2017-03-04 in once restored, and do a compare of the 'missing' data between the 2017-03-04 and 2017-03-16. Find missing IDs and manually check out and parse those pages - to ensure they still exist - it might be that they actually got deleted from the lodestone/characters deleted - but that's unlikely in such a large volume?

  • @Pricetx / @ReidWeb - verify that the highest player ID for 2017-03-16 was higher than that for 2017-03-04.

Possible Causes

Listed in my evaluation of most to least likely.

  1. Lodestone was rate limiting us - due to increased connections from 'image date parsing' - we were hitting this when we didn't hit it before
  2. Server firewall/network experienced issues that caused requests to not reach the lodestone.

Changes to Veteran Rewards in 4.1

(I know I have already mentioned this to @Pricetx and @ReidWeb but I felt I should create an issue here)

In 4.1 (Releasing October 10th), Square-Enix is going to be changing the Veteran Reward system.
Details here: http://na.finalfantasyxiv.com/lodestone/topics/detail/ca42d41cbd6e7849ba92978235f0d5dc926e72db

All (but a couple of items) are being removed from the Veteran Rewards and moved to Jonathas in Old Gridania for pickup using "achievement certificates"

The changes are as follows:

Advent Attire (Cloud Strife)
WAS: Rank 10 - 720 days
NOW: Rank 1 - 60 days

Tantalus Attire (Zidane)
WAS: Rank 13 - 1080 days
NOW: Rank 2 - 150 days

Wild Rose Attire (Kain ?)
WAS: Rank 11 - 840 Days
NOW: Rank 3 - 240 days.

Leonhart Attire (Squall Leonhart)
WAS: Rank 14 - 1440 days
NOW: Rank 4 - 330 days.

So it looks like that some adjustments may be needed going forward

Include player clan data

For most races this isn't hugely important, but especially in the case of Highlanders they're practically a race unto themselves, and it would still be interesting to see for the rest.
I think a good way to integrate this would be for each bar in the Race & Gender charts to be split into two stacked bars for the separate clans so the total is still visually obvious.

Give job distribution based on achievements

Hey !

Just discovered your site with some friends we think it's pretty cool :D

We were just wondering why there were only datas about class distribution and no data at all about job distribution? Our guess is that you parse the data from the Lodestone and count all the players having a class at 60.

I have no idea how you guys are parsing and if there are limitations of some sort, but have you thought of counting job distribution based on the level 50 and 60 job quests achievements?

Keep up the good work! Cheers \o

Missing charts

The charts rendered after "race and gender distribution" are missing and the chart itself is empty.

Looks like this is caused to to invalid data being passed to highcharts. The relevant code looks like this right now:

              series: [{
                  name: 'Female',
                  data: [
                      3672,3358,10229,8277,22122,,990,                  ],
              }, {
                  name: 'Male',
                  data: [
                      1736,6035,14764,11359,6404,,4363,                  ],
              }]

There's a double comma in there (I assume that's due to the value there being 0) which produces an exception deep in highcharts. Apparently Chromium interprets this as a null value and highcharts ends up trying to access a property on that which causes an exception to be thrown since you can't access properties on a null value.
The exception then halts JavaScript execution which is why the other graphs are affected as well.

From a quick glance, it looks like the code should be handling this problem correctly. The only suggestion I can make would be replacing the the isset($value) check in getValue with !empty($value).

Filter large amount of players by Grand Company.

Let me begin first by saying that it is always hard to find the active player count when you try to deduce it from other information. In my opinion, the current line for deciding if a player is active/not active is too high. There are tons and tons of players who are really active working through the MSQ and keep playing after they choose their Grand Company.

You can see this if you look at "GRAND COMPANY DISTRIBUTION - ALL PLAYERS". By far, most of the accounts do not have a Grand Company and are either bots, or people quitting the game after a few days. Every player that plays for more than a week a bit actively would have gotten a Grand Company.

I propose to remove all the "No Grand Company" players/data from the player statistics to get a better overview of the playerbase by disregarding bots and people that have barely tried the game out.

This would cause all the ALL PLAYERS/ACTIVE PLAYERS graphs to become more meaningful and more correct.

Pairs of months contain identical data since 2022-07

I discovered this while downloading the dumps for offline analysis, but the issue seems to apply to the website pages too. The following pairs of adjacent months seem to contain identical data:

  • 2022-07 and 2022-08
  • 2022-09 and 2022-10
  • 2022-12 and 2023-01
  • 2023-02 and 2023-03
  • 2023-04 and 2023-05

It's understandable if there are limitations and the project is only able to produce a fresh data every 2 months.
But in that case I'd suggest that instead of duplicating data for the intervening months, they should be skipped instead.
Missing months e.g. 2022-11 are not unprecedented and they are easier to detect and ignore.

Implement means to evaluate player language

Referencing conversation here

@ReidWeb Yeah, noticed that when downloading the ZIP file — it was 50 MB lighter. Thanks for the heads up.
That shouldn't prevent me from working on adjusting the slicers and the fields, but I'll wait until we're sure of the data before publishing it to PowerBI.
Regarding my language query proof of concept, I haven't really had the time to work on it.
That said, it looks like Lucky Bancho is getting his data a different way, and I'm not really sure how he's doing it exactly. He does have the language data, though.
According to his site http://luckybancho.ldblog.jp/wsurvey.htm
He's getting his data from the lodestone character search page (so he could be doing searches over and over). He's only grabbing characters that are more than level 36, and looks whether there was a change in HP or minions/mounts since his last census, and he's excluding bots too by counting people with no mounts as bots.
Still, I'm left wondering at how he manages to make filters that avoid the 1000 returned result limitation...

We could implement a means to get the language of a character.

Suggested solution involves scraping search pages:

  • these operate on query strings - so it is possible

  • from an extensive look, there is no way to evaluate this from the player profile page

  • if we are aiming to be conservative with CPU resources, we will likely need to implement some sort of 'fewest queries' algorithm to determine which queries to run - potentially free company based? or name based? or grand company per realm?

CC: @Bwin4L

Change active check to look for Yol from 4.0 story instead of minion from 3.3

Patch 3.3 was now over a year ago so it doesn't make much sense to keep using it for the active check.

Until the day when a better system of determining active players is around, this should be updated to look for the "Yol" mount, which is obtained during the 4.0 story. This is not captured as an individual data point at the moment, but is part of the list of mounts that gets captured.

Fix copy for eternal bonds

The eternal bond stats relate to items only issued for certain tiers of eternal bond:

Quoting a community member:
"guest at an eternal bond is only given to the gold and platinum eternal bonds, need to update the copy on that later".

Can maybe just put 'a gold/silver' eternal bond in there? Thoughts?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.