Giter Club home page Giter Club logo

sportsref's Introduction

sportsref

easily pull stats from sports-reference web sites

sportsref is designed to be used in an interactive python environment, such as IPython or JupyterNotebook

The api tries to mirror the web experience:

  • each subject area (i.e. player, season, league) is represented by an class.
  • each class has methods representing the pages available.
    • for example Ozzie Albies player page has the following menu of pages albies pages
    • if the menu is a dropdown, such as the Splits menu on the player page the method takes an additional parameter or two
  • the methods return a Page object which know about all the tables on that page
  • use the Page.get_df("table_name") to get a pandas.DataFrame of the table you want.

The examples.ipynb JupyterNotebook has a few examples demonstrating a workflow.

Install

clone the repo then

pip install .

sportsref's People

Contributors

double-dose-larry avatar

Stargazers

 avatar

Watchers

 avatar

sportsref's Issues

add a delay to calls touching bref

to conform with Terms & Conditions of the website.

Specifically:

Except as specifically provided in this paragraph, you agree not to use or launch any automated system, including without limitation, robots, spiders, offline readers, or like devices, that accesses the Site in a manner which sends more request messages to the Site server in any given period of time than a typical human would normally produce in the same period by using a conventional on-line Web browser to read, view, and submit materials.

I suppose waiting at least a half-second is good enough. The numberize_df stuff usually takes more than that already. just want to make sure.

move all of the url building logic into the convert_url funcion

There's no reason for all these url strings to be built all over the place.

The logic is quite simple:
in goes:

  • web page url
  • name of the div that contains the table you want
  • possibly an dictionary that will be translated to a url query
    out comes:
  • a fully quoted embed url.

add leauge wide stats

maybe have a league object.
league can be MLB, or AL or NL

Then we can have league_season object that holds things like

  • Standings
  • For batting/pitching/fielding:
    • standard tables
    • value tables
    • there's a whole bunch of other stuff in here
  • misc
    • attendance

that's it for now, much more in there

Proper Tests

write proper tests, maybe with pytest.

get rid of the jupyter notebooks

usage guide

write docs, or update README to show folks how to use this.

Maybe supply a couple of jupyter notebooks. I like those.

generalize the parsing and url construction to work with all sports reference websites

Basic logic is this:

First we start with the sport module, for example:

from py_sportsref.football import Player

I guess that means we'll have to rename the library to more general name like py_sportsref

then we'll build up the parts of the url in a dictionary:

my_dict = {
    'css': 1,
    'site': 'pfr',
    'url' : '/players/F/FarvBr00.htm',
    'div': 'div_passing'
}

we still care about the web url because we'll need to parse the valid divs on it. we can construct it from known base locations, we can use the urllib.parse library to work with these and html.parser to quickly stream through the html and pick out the ids of divs with a 'stats_table' class

we pass this url to a enumerate_table_divs() function. that returns all the table_stat div ids. these ids will be the valid table types

Cloudfront CSV

Didn't see a way of contacting you so Id thought I do it through here, I found your code why google searching some of the domains I found, and found your util.py
Was particularly looking at
csv_url = f"https://{cdn}.cloudfront.net/short/inc/{player_or_team}s_search_list.csv"

What do you mean by {player_or_team} would I replace this with its unique id?
Ive found some links such as
https://d6rt22vwfyr3i.cloudfront.net/short/inc/players_search_list.csv
and
https://d6rt22vwfyr3i.cloudfront.net/short/inc/clubs_search_list.csv

But if I wanted info on a player doing
https://d6rt22vwfyr3i.cloudfront.net/short/inc/19538871_search_list.csv
Does not load

I was wondering what the significance of the line meant thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.