panzarino / mlbgame Goto Github PK
View Code? Open in Web Editor NEWA Python API to retrieve and read MLB GameDay data
Home Page: http://panz.io/mlbgame/
License: MIT License
A Python API to retrieve and read MLB GameDay data
Home Page: http://panz.io/mlbgame/
License: MIT License
Hi zachpanz88,
This is a very nice API. However, I notice that in events.game_events you have an error which causes the bottom of the inning to save the values for the top of the inning.
# loop through the bottom half
bot = x.findall('bottom')[0]
for y in bot.findall('atbat'):
# top info
botinfo = []
# loop through the top half
bot = x.findall('top')[0]
for y in bot.findall('atbat'):
Whats causing this? Just started happening.
code:
import mlbgame
stats = mlbgame.player_stats('2014_06_07_miamlb_chnmlb_1')
for player in stats.home_batting:
print(player.h)
hbp_home = player_stats.home_batting.hbp
AttributeError: 'list' object has no attribute 'hbp'
^ Getting this error on the player stats object
for game in games:
stats = mlbgame.team_stats(game.game_id)
player_stats = mlbgame.player_stats(game.game_id)
hbp_home = player_stats.home_batting.hbp
Also is there a way to grab or calculate advanced sabermetrics like wRC, FIP, wOPA, etc for teams(not players) on a game level basis. I am working on some ML models and that would be really helpful. Would do any API work myself just want to know if it can be done.
I hope this is the proper place to ask a question. If not please forgive me. I have been playing around with mlbgame and am looking for a way to generate the current standings by league and division. Is this at all possible?
Example:
American League East
1 Yankees 21-10
2 Orioles 21-11
3 Red Sox 17-16
4 Rays 17-19
5 Blue Jays 13-21
......
I'm currently working on an MLB scoreboard on an LED board in preparation for the upcoming season and am planning on using this very interesting API.
The above image is similar to what I'm going for here.
It'd be wonderful to get some data for occupied bases, balls/strikes count, and current number of outs. I can file separate issues for the balls/strikes and outs if needed.
Would you accept a pull request for this feature? If so, any ideas where to start? I'm not very familiar with the XML files this API is pulling from.
Splitting out from #54, it would be wonderful to get the balls/strikes/outs for an inning's current state.
@trevor-viljoen found a possible solution that we'll have to live test once spring training starts up.
<status status="Final" ind="F" reason="" inning="9" top_inning="N" b="0" s="0" o="3" inning_state="" note="" is_perfect_game="N" is_no_hitter="N"/>
Sample linescore:
<linescore away_team_runs="0" home_team_runs="0" away_team_hits="0" home_team_hits="0" away_team_errors="0" home_team_errors="0" note="(rain) with 0 out in the top of the 1st and a 0-0 count on George Springer.">
<inning_line_score away="" inning="1"/>
</linescore>
Traceback:
Traceback (most recent call last):
File "<stdin>", line 5, in <module>
File "/usr/local/lib/python2.7/site-packages/mlbgame/__init__.py", line 190, in box_score
data = mlbgame.game.box_score(game_id)
File "/usr/local/lib/python2.7/site-packages/mlbgame/game.py", line 240, in box_score
home = x.attrib['home']
File "src/lxml/etree.pyx", line 2469, in lxml.etree._Attrib.__getitem__ (src/lxml/etree.c:72092)
KeyError: 'home'
I'm not sure how we want to handle this one. Perhaps add the note
attribute and wrap some try/except
blocks around the home
and note
variable evaluations?
Occasionally MLB teams play non-MLB teams in exhibition. Two of these games took place on April 4, 2016. combine_games handles this okay, simply ignoring the non-MLB match-ups. See the April 4 scoreboard:
Nationals (4) at Braves (3)
Giants (12) at Brewers (3)
Mariners (2) at Rangers (3)
White Sox (4) at Athletics (3)
Rockies (10) at D-backs (5)
Cubs (9) at Angels (0)
Blue Jays (5) at Rays (3)
Red Sox (0) at Indians (0)
Phillies (2) at Reds (6)
Twins (2) at Orioles (3)
Dodgers (15) at Padres (0)
Astros (0) at Yankees (0)
However, player_stats does not handle these well. Below is the error message I receive when I try to run mlbgame.player_stats(game.game_id) on one of those games, for instance the Miami Marlins vs. the Diablos Rojos:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/mlbgame/data.py", line 47, in get_box_score
data = urlopen("http://gd2.mlb.com/components/game/mlb/year_%s/month_%s/day_%s/gid_%s/boxscore.xml" % (year, month, day, game_id))
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 162, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 471, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 581, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 509, in error
return self._call_chain(_args)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 443, in _call_chain
result = func(_args)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 589, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not FoundDuring handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "", line 1, in
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/mlbgame/init.py", line 238, in player_stats
data = mlbgame.stats.player_stats(game_id)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/mlbgame/stats.py", line 11, in player_stats
data = mlbgame.data.get_box_score(game_id)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/mlbgame/data.py", line 49, in get_box_score
raise ValueError("Could not find a game with that id.")
ValueError: Could not find a game with that id.
/mlbgame/init.py l. 173 # do not even try to get data if date is in the future
Why? It works for me if I change the code. But you probably have a reason.
Hi. I just came across this fantastic piece of work today. Thanks so much. I'm fairly new to Python, but I can't figure out what I'm doing wrong. This piece of code:
stats = mlbgame.player_stats(game.game_id)
for player in stats.home_batting:
... stuff here on player
works fine
But this code
stats = mlbgame.player_stats(game.game_id)
for player in stats.away_batting:
... stuff here on player
Never executes because away_batting is always empty. I presume I'm doing something wrong, but if I'm not....
Any chance that future updates will parse the venue value from linescore.xml?
Of course, 99% of the time, the venue is just the home team's park, but having venue data would help sort out games like last Sunday's Marlins-Braves contest at Fort Bragg, and the occasional overseas/Puerto Rico contests.
In my case, I use (your excellent) mlbgame in a workflow that includes park adjustments for player stats and home field adjustments for win expectancies. Being able to capture the venue would help ensure that I'm not correcting for park or home field factors that don't apply.
I am wondering how to retrieve the current roster for each team? This information is not related to any stat or game, and I also can't find such information from mlbgame.teams().
I believe I can estimate the current roster by looking at the player_stats() of the most recent played game. But again, that will be an estimate.
Thank you for creating this wonderful API.
I am trying to create a pandas dataframe of injuries from 2000-2017, with corresponding player profiles. I have a list of injury objects for each game, for each year, but I hit a snag when I was looking for 2005 game data. There was an error for one of the URLs that raised an exception error and crashed the entire 2005 grab.
The challenge I'm having is trying to understand the injury() object. Are there any examples out there or is there documentation that shows some injury data? For all the games in a year, I appended injury() to an injured_list. I can access the injury information by subsetting the injured list, but I can't seem to iterate that list. Any advice or examples would be really helpful.
I am currently looping through games in a season to gather data. I then use that game id to generate some more data.
Note the following game IDs
id is 2016_04_03_slnmlb_pitmlb_1
id is 2016_04_03_nynmlb_kcamlb_1
id is 2016_04_03_tormlb_tbamlb_1
id is 2016_04_03_chnmlb_anamlb_1
id is 2016_04_04_phimlb_cinmlb_1
id is 2016_04_04_lanmlb_sdnmlb_1
id is 2016_04_04_colmlb_arimlb_1
id is 2016_04_04_chnmlb_anamlb_1
id is 2016_04_04_sfnmlb_milmlb_1
id is 2016_04_04_tormlb_tbamlb_1
id is 2016_04_04_houmlb_nyamlb_1
When I run
stats = mlbgame.team_stats(game_id)
everything is fine up until the last id
2016_04_04_houmlb_nyamlb_1
I get the following error
Could not find a game with that id.
Any idea how that can happen? The id is generated from the game object itself. Seems a bit odd to me.
Edit: After doing more research I believe what is happening here is that these are game IDs for game that actually didn't happen/post poned. They get a game id but when I try to search for them in in team_stats() it fails.
I am not sure if this was intended or not.
Edit 2:
I ran my program with a try to find which game_ids for the whole season were not able to be found by
mlbgame.team_stats()
They were
CAN'T FIND a game with id of 2016_04_04_bosmlb_clemlb_1
CAN'T FIND a game with id of 2016_04_09_miamlb_wasmlb_1
CAN'T FIND a game with id of 2016_04_10_nyamlb_detmlb_1
CAN'T FIND a game with id of 2016_04_17_balmlb_texmlb_1
CAN'T FIND a game with id of 2016_04_27_milmlb_chnmlb_1
CAN'T FIND a game with id of 2016_04_28_pitmlb_colmlb_1
CAN'T FIND a game with id of 2016_04_30_atlmlb_chnmlb_1
CAN'T FIND a game with id of 2016_05_16_bosmlb_kcamlb_1
CAN'T FIND a game with id of 2016_05_26_chamlb_kcamlb_1
CAN'T FIND a game with id of 2016_09_25_atlmlb_miamlb_1
I assume there would be more than ten rainouts/ cancelled games
In __inning_info(inning, part) we use half.findall('atbat') this leaves us unable to see the 'actions' - things like substitutions, steals, etc.. That'd be helpful for what I'm trying to do.
I suppose we could just add another loop similar to the 'atbat' loop to __inning_info()? Would that work?
Hello,
Thank you very much for this initiative.
Is it possible to include player's ids?
That way you can use the data returned with other data sources.
For example something like this would be useful:
games = mlbgame.games(2015, 4)
for day in games :
for g in day:
print(g.w_pitcher_id)
Thank you in advance for your help.
I think this happens when the winning pitcher is not in the stats. Repro:
games = mlbgame.games(2015, 4)
for day in games :
for g in day:
print(g.game_id)
I'm not a python programmer, but I think line 45 of game.py has W_pitcher instead of w_pitcher
The docs list 's_hr' as an attribute for both batter and pitcher objects, but it doesn't seem to exist for pitchers.
obp, ops and slg aren't always returned for player_stats()
and team_stats()
even though idmap['obp']['always']
, idmap['ops']['always']
and idmap['slg']['always']
are set to True
.
Example: mlbgame.stats.team_stats('2015_04_01_bosmlb_minmlb_1')
returns
`{'away_batting': {'ab': '32',
'avg': '.285',
'bb': '6',
'd': '1',
'da': '8',
'h': '7',
'hr': '0',
'lob': '16',
'po': '27',
'r': '4',
'rbi': '3',
'so': '6',
't': '0',
'team_flag': 'away'},
'away_pitching': {'bb': '5',
'bf': '45',
'er': '4',
'era': '4.40',
'h': '13',
'hr': '1',
'out': '27',
'r': '4',
'so': '8',
'team_flag': 'away'},
'home_batting': {'ab': '40',
'avg': '.258',
'bb': '5',
'd': '1',
'da': '7',
'h': '13',
'hr': '1',
'lob': '26',
'po': '27',
'r': '4',
'rbi': '4',
'so': '8',
't': '0',
'team_flag': 'home'},
'home_pitching': {'bb': '6',
'bf': '39',
'er': '4',
'era': '3.87',
'h': '7',
'hr': '0',
'out': '27',
'r': '4',
'so': '6',
'team_flag': 'home'}}``
Also, any chance there is basic weather information for the games?
lxml needs to be installed. It is not installed by default for Windows at least.
So I have a simple loop looking for stats like so:
def test_stats(games):
for game in games:
stats = mlbgame.team_stats(game.game_id)
hr_home = stats.home_batting.hr
hr_away = stats.away_batting.hr
box_score = mlbgame.box_score(game.game_id)
error_home = box_score.print_scoreboard()
pitch_home = stats.home_pitching.so
pi_home = stats.home_pitching.s_h
print hr_away,hr_home,pitch_home,pi_home
when I run this, it is giving me: AttributeError: 'TeamStats' object has no attribute 's_h'
Is this another break?
How can i directly access the database instead of using the python module?
The file's getting giant, if these two APIs are to be further enhanced it would be easier to maintain if they were split out.
I noticed that all spring training games so far in 2018 are returning scores of 0 for all games. Not sure if there is something wrong with the module or with the MLB service? To repro try:
from __future__ import print_function
import mlbgame
month = mlbgame.games(2018, 3, home='Mets')
games = mlbgame.combine_games(month)
for game in games:
print(game)
Output
Marlins (0) at Mets (0)
Nationals (0) at Mets (0)
Nationals (0) at Mets (0)
Tigers (0) at Mets (0)
Astros (0) at Mets (0)
Yankees (0) at Mets (0)
Astros (0) at Mets (0)
Astros (0) at Mets (0)
Marlins (0) at Mets (0)
Orioles (0) at Mets (0)
Nationals (0) at Mets (0)
Cardinals (0) at Mets (0)
Marlins (0) at Mets (0)
Cardinals (0) at Mets (0)
Cardinals (0) at Mets (0)
for game in games:
stats = mlbgame.team_stats(game.game_id)
overview = mlbgame.overview(game.game_id)
player_stats = mlbgame.player_stats(game.game_id)
winning_team_SP_era = overview.home_probable_pitcher_s_era
Hi, is there a way to get innings pitched using this library? It looks like the xml data that you get doesn't even contain this data?
Running the same code as before:
def team_games(year,month):
month = mlbgame.games(year, month)
games = mlbgame.combine_games(month)
for game in games:
print game.game_id
return games
games = team_games(2016, 5)
and its breaking on random game ids when iterating through.
This one in particular breaks:
stats = mlbgame.team_stats('2016_05_16_bosmlb_kcamlb_1')
era = stats.home_pitching.era
print(era)
Status is also FINAL
"error: could not create '/System/Library/Frameworks/Python.framework/Versions/2.7/docs': Permission denied
and after that message, it prints:
Command "/usr/bin/python -u -c "import setuptools, tokenize;file='/private/var/folders/7t/2vyvr6056v553r15xpx0gjl00000gn/T/pip-build-Y2rvAh/mlbgame/setup.py';exec(compile(getattr(tokenize, 'open', open)(file).read().replace('\r\n', '\n'), file, 'exec'))" install --record /var/folders/7t/2vyvr6056v553r15xpx0gjl00000gn/T/pip-5ibLS1-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/7t/2vyvr6056v553r15xpx0gjl00000gn/T/pip-build-Y2rvAh/mlbgame/
I've tried all sorts of work arounds and can't figure out why that error won't stop
Hey guys,
I'm running into a problem getting game information for the month of July. Whenever I run:
# when n is >= 2012
mlbgame.games(n,7)
I get this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.5/site-packages/mlbgame/__init__.py", line 205, in games
game = day(i, y, x, home=home, away=away)
File "/usr/lib/python3.5/site-packages/mlbgame/__init__.py", line 175, in day
data = mlbgame.game.scoreboard(year, month, day, home=home, away=away)
File "/usr/lib/python3.5/site-packages/mlbgame/game.py", line 24, in scoreboard
home_name = teams[0].attrib['name']
IndexError: list index out of range
>>> mlbgame.games(2012,7)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.5/site-packages/mlbgame/__init__.py", line 205, in games
game = day(i, y, x, home=home, away=away)
File "/usr/lib/python3.5/site-packages/mlbgame/__init__.py", line 175, in day
data = mlbgame.game.scoreboard(year, month, day, home=home, away=away)
File "/usr/lib/python3.5/site-packages/mlbgame/game.py", line 24, in scoreboard
home_name = teams[0].attrib['name']
IndexError: list index out of range
Any idea what this could be?
Thanks for the help
Before I start, a quick thanks for making this available.
I installed 2.4.2 today on my Windows 7 machine with a straightforward pip install and thought things were fine, but I was unable to duplicate the results shown on the web. I used this code:
game = mlbgame.day(2015, 11, 1, home='Mets')[0]
stats = mlbgame.player_stats(game.game_id)
for player in stats.home_batting:
print(player)
and got these results (names and positions, but no actual stats). Did I miss a step somewhere?
Curtis Granderson (RF)
David Wright (3B)
Daniel Murphy (2B)
Yoenis Cespedes (CF)
Juan Lagares (CF)
Lucas Duda (1B)
Travis d'Arnaud (C)
Michael Conforto (LF)
Wilmer Flores (SS)
Matt Harvey (P)
Jeurys Familia (P)
Kelly Johnson (PH)
Jonathon Niese (P)
Addison Reed (P)
Bartolo Colon (P)
Any thoughts?
-Bill
Love the API, but found a small problem. When filtering mlbgame.day() by home or away team, using "Diamondbacks" or "Athletics" will result in an empty set. This is due to the MLB's XML having these listed as "D-backs" and "A's", respectively. Just wanted to let you know in case you wanted to add something to account for it, or let people know in the docs. I'm sure this causes issues in other parts of the API as well, but I've only tested with mlbgame.day().
Wasn't sure how to mark this as a question, it's not an issue.
How would I code in getting a players stats for the 2015 season for example?
Hi zachpanz88,
Is there a way to get basic information on the players through mlbgame?
For example, I have the player's id number, can I do something like mlbgame.player_info(player_id) to get a dict(Name="Cal Ripken", Throws="R", Bats="R", Team="Orioles").
Aside from this, the module leaves nothing to be desired :)
Thanks,
Andrew
I am using mlbgame for python and I am getting some data that I do not now how to manipulate. Can you direct me to the right track please.
Here is my view on Python, Django.
import mlbgame
def baseball(request):
games1 = mlbgame.day(2017, 8, 10)
angels_game = mlbgame.games(2017, home='Angels', away='Angels')
return render(request, 'home/baseball.html', {'games1': games1, 'angels_game': angels_game})
Here is my template
{{ anaheim }}
{% endfor %}I am getting many like this : mlbgame.game.GameScoreboard object at 0x7f04f2f0ac50.
Is there a way to do something like mlbgame.game.GameScoreboard object at 0x7f04f2f0ac50.GETGAMEDAY()?
I know I am printing the object here, but I do not know the attributes of that object.
Thanks,
Noel.
I'm trying to run the sample script from the readme and keep getting the same error.
Traceback (most recent call last):
File "mlbgame.py", line 2, in
import mlbgame
File "/home/pi/mlbgame.py", line 4, in
game = mlbgame.day(2015, 11, 1, home='Mets')[0]
AttributeError: 'module' object has no attribute 'day'
I am using the newest version of mlbgame. I have tried this on Windows 7, Linux Mint, and a Raspberry Pi. Python 2.7 and Python 3. I get the same error on every machine. Is there somethin I'm dowing wrong?
To start let me say I am new to Python so the issue is probably something I missed but when getting scores for games before the 2018 year everything works as expected like so
month = mlbgame.games(2015, 6, home='Mets')
games = mlbgame.combine_games(month)
for game in games:
print(game)
Giants (5) at Mets (0)
Giants (8) at Mets (5)
...
but when I change the date to anything this year like 2018, 3, 1 I get the game(s) but the game_status is pre_game and all the scores and stats are blank or zero.
month = mlbgame.games(2018, 2, home='Mets')
games = mlbgame.combine_games(month)
for game in games:
print(game)
Braves (0) at Mets (0)
Cardinals (0) at Mets (0)
Marlins (0) at Mets (0)
Astros (0) at Mets (0)
I'm not sure if its something on my end or if MLB just hasn't updated those games but if anybody could help me understand why this is happening I would be grateful. Thanks for the help!
The accented character appears to be in the name of batter Elias Diaz.
Could this be handled properly?
Traceback (most recent call last):
<snip>
File "/home/yyy/.virtualenvs/xxx/local/lib/python2.7/site-packages/mlbgame/__init__.py", line 234, in player_stats
obj = mlbgame.stats.BatterStats(x)
File "/home/yyy/.virtualenvs/xxx/local/lib/python2.7/site-packages/mlbgame/object.py", line 25, in __init__
setattr(self, x, str(data[x]))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xed' in position 1: ordinal not in range(128)
This project is awesome. Thank you so much!
Is there a way to find the probable starting pitcher for today's games? They are displayed on mlb gameday.
Wanted to reach out to see if attendance data was available through the API
So far this is excellent but I don't see an easy way to do a few things
Is this something I am missing?
Hi There,
This is less an issue and more a question.. is there anywhere in this library where you can retrieve if a batter is right handed or left handed?
This library is so awesome. Thanks so much!
The update script will stop at whatever the end day of the month is, regardless of whether it is the last month or not. For example, if the end date is set to 11/2/2016, the script will only save the first two days of each month in 2016, when it really should save all of the days in those months (other than Nov).
The examples of your program on GitHub have an XMLSyntax Error
What is the reason for not handling games with tag 'ig_game' (i'm assuming this means "in-game"?)? I've hacked a solution for it but the data is not exactly the same as a 'go_game' ("game-over"?). I think this needs to be implemented in order for the library to update in real-time.
In game.scoreboard
the names and win/loss records of the probable starting pitchers are added to the game data.
# games that were not played
elif game_tag == "sg_game":
try:
p_pitcher_data = game.findall('p_pitcher')
p_pitcher_home_data = p_pitcher_data[0]
p_pitcher_home = p_pitcher_home_data.find('pitcher').attrib['name']
p_pitcher_home_wins = int(p_pitcher_home_data.attrib['wins'])
p_pitcher_home_losses = int(p_pitcher_home_data.attrib['losses'])
How could I add the player ID for the starting pitchers? I see them in the linescore.xml
, but just adding
p_pitcher_home_id = p_pitcher_home_data.find('pitcher').attrib['id']
doesn't work. Which data file is read here exactly?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.