Giter Club home page Giter Club logo

Comments (11)

jldbc avatar jldbc commented on July 3, 2024

Good question. I didn't realize that wasn't included there. It's also missing in batting_stats_bref(season). The tables these scrape from don't include bWAR, but I'm open to moving these functions over to a better table.

They're currently using Baseball Reference's Daily Gamelog Finder, where batting_stats_bref(season) supplies a season-length date range to batting_stats_range(start_dt,end_dt). Know of a better table that includes WAR + all the other standard stats for each player?

from pybaseball.

trojanguard25 avatar trojanguard25 commented on July 3, 2024

I'm not aware of any way to query bWAR over a date range. Baseball-reference hosts files that include player bWAR (among other stats that go into their WAR calculations) for batters and pitchers. Every player has an entry broken up by year-team-stint.

http://www.baseball-reference.com/data/war_daily_bat.txt
http://www.baseball-reference.com/data/war_daily_pitch.txt

These files are updated daily during the season, as well as during the offseason whenever they make stat adjustments.

I don't think there are analogous files for traditional counting stats, so you will probably need separate interfaces.

from pybaseball.

jfreynolds avatar jfreynolds commented on July 3, 2024

My initial thought was get a player's baseball reference ID and just scrape the table from their actual player page. You could sum up something like WAR for a given range, but that feels like more of a band-aid fix. Wouldn't work for aggregating other values over a range.

On top of that, it would be pretty slow all in all.

from pybaseball.

jldbc avatar jldbc commented on July 3, 2024

I can't see fetching WAR one player at a time scaling well beyond a small number of players.

The data @trojanguard25 mentioned look promising. If there's no single source with WAR and traditional stats side by side, a separate scrape for pulling this data might be the best route forward. From there a user can join the tables together on player id if necessary.

Thoughts/objections?

from pybaseball.

jfreynolds avatar jfreynolds commented on July 3, 2024

The daily batting/pitching files seem to be the best option available.

Should all the data form those files be provided in a table to a user by default? Seems like there is a lot in there that isn't regularly sought after. Maybe by default they are provided more common statistics (WAR, salary, WAA, ERA+, etc.) from that file and if a boolean is specified to be true, provide all the data available?

from pybaseball.

jldbc avatar jldbc commented on July 3, 2024

Most of these could be left out by default since the main point of this is to get WAR. Returning all 49 columns by default might be overkill.

Bare minimum would be WAR, its essential components (WAA and WAR_rep for batters, WAA, WAR_rep, and WAA_adj for pitchers) , and everything needed to identify the player and connect the with another table. I think this would mean WAR, WAR_off, WAR_def, WAR_rep, WAA, mlb_ID, player_ID, team_ID, year_ID, stint_ID for both, plus WAA_adj for pitching unless I'm missing anything.

On top of these it might get a bit arbitrary to decide what to leave in by default. Is anything else important to keep in or should the rest be optional with something along the lines of a boolean return_all parameter? Maybe G for both and GS for pitchers since these are common things people might filter on?

from pybaseball.

jfreynolds avatar jfreynolds commented on July 3, 2024

Definitely agree that WAR values should be the default. The only other things that jump out to me that could be frequently requested is ERA+, salary, or even BIP. Other than that, nothing really strikes me.

So, it seems like the best idea would be provide WAR and its components by default, maybe allow specify an argument for some more commonly used columns within such as ERA+, salary, RA, xRA, RAA, BIP, etc. and finally a return_all parameter like you said to return all of the rows.

I just think occasionally people may want a select few values outside of WAR and forcing them read all of the columns seems like unnecessary overhead. Should we just keep it simple though? WAR and its components or if some boolean argument is true, then return all columns?

from pybaseball.

jldbc avatar jldbc commented on July 3, 2024

Yeah we can keep some of the more commonly used ones in. For non-WAR, non-identification columns of interest I'm seeing:

Batting: salary, G, PA, runs_above_avg, runs_above_avg_off, runs_above_avg_def
Pitching: G, GS, RA, xRA, BIP, BIP_perc, salary, ERA_plus

Which all in all would have these as the defaults:

Batting: ['name_common', 'mlb_ID', 'player_ID', 'year_ID', 'team_ID', 'stint_ID', 'lg_ID', 'pitcher', 'G', 'PA', 'salary', 'runs_above_avg', 'runs_above_avg_off', 'runs_above_avg_def', 'WAR_rep', 'WAA', 'WAR']
Pitching: ['name_common' ,'mlb_ID', 'player_ID', 'year_ID', 'team_ID', 'stint_ID', 'lg_ID', 'G', 'GS', 'RA', 'xRA', 'BIP', 'BIP_perc', 'salary', 'ERA_plus', 'WAR_rep', 'WAA', 'WAA_adj', 'WAR']

With everything else being retrievable with a return_all type of parameter. Anything important I missed? This leaves ~ 20 columns each which seems reasonable.

The function itself would basically be the top response to this Stack Overflow post with the above column filtering.

from pybaseball.

trojanguard25 avatar trojanguard25 commented on July 3, 2024

I think there should be some default 'groupby' that is done to combine player rows for the same year. I committed a potential option in my fork: https://github.com/trojanguard25/pybaseball/commits/cache
This function returns all the columns for a single season. By default, it groups the rows so each player has a single entry for the year submitted. I also added an option to split each player by team. I think those are the two most common use-cases. This does cause a problem since some of the columns (like ERA+) cannot be summed or averaged; rather, they need to be weighted by playing time. Not exactly sure the best way to handle that correctly.

from pybaseball.

jldbc avatar jldbc commented on July 3, 2024

Let's leave the groupby in the hands of the user for now since doing it for them without using proper weights might cause people to unknowingly use bad data (i.e. using a summed/averaged ERA+ without realizing it's not weighted).

I pushed the version I've been using to a new branch in 7b10b82. I'll merge later today if there aren't any objections.

It's probably worth opening a new issue for working on properly-weighted aggregations since it definitely would be useful to have.

from pybaseball.

jldbc avatar jldbc commented on July 3, 2024

Merged branch bwar to master. Commit 7b10b82 adds a bwar_bat() and bwar_pitch() function, each with the optional argument return_all to retrieve all fields.

from pybaseball.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.