joenano / rpscrape Goto Github PK

View Code? Open in Web Editor NEW

133.0 133.0 54.0 372 KB

Scrape horse racing results data and racecards.

Python 100.00%

horse-racing python scraper

rpscrape's People

Contributors

Stargazers

Watchers

rpscrape's Issues

List Index Out of Range - Singapore

[rpscrape]> sin 2014-2019 flat Scraping flat results from sin in 2014-2019... No flat race data for Kranji in 2016. No flat race data for Kranji in 2017. No flat race data for Kranji in 2018. No flat race data for Kranji in 2019. Traceback (most recent call last): File "./rpscrape.py", line 722, in <module> main() File "./rpscrape.py", line 718, in main parse_args([arg.strip() for arg in args.split()]) File "./rpscrape.py", line 686, in parse_args scrape_races(races, course_name(scrape_target), args[1], code) File "./rpscrape.py", line 508, in scrape_races sires, dams, damsires = pedigree_info(pedigrees) File "./rpscrape.py", line 209, in pedigree_info dam = ped_info[1].text.strip() IndexError: list index out of range

Last recorded data:

2014-04-27	Kranji	10:35	Lion City Cup (Singapore Group 1)			6	1200	Good	4	1	1.75	El Padrino	94/10	10.4	5	9-1	127	tb	1:9.05	Oriel Chavez	Hai Wang Tan			101	11961.72	Mr Nancho (ARG)	Crownie (AUS)	Luskin Star
`

Not urgent, as I was hoping for more up to date data for Singapore. Only 2014 isn't a lot of help, but I thought I'd report it anyway.

Failed to find number of runners

This seems to be happening a lot on a number of different courses when trying to scrape data between dates. Hasn't happened before and doesn't happen if I just scrape a days worth of data.

For Example
[rpscrape]> 93 2010-2011 flat
https://www.racingpost.com/results/93/Windsor/2010-08-28/512396
Failed to find number of runners.

Error when scraping gb 2019 jumps

Have used this script for weeks, is a super tool. Tried to update my 2019/20 data and i got this error

File "..../rpscrape/rpscrape-master/scripts/rpscrape.py", line 236, in pedigree_info
dam_nat = p.find('span').text

AttributeError: 'NoneType' object has no attribute 'text'

Code Update Issue

Hi,

I'm not sure if there is an issue or if this is something local, but when I try to update when prompted I'm getting the following error:

Update available. Do you want to update? Y/N y
From https://github.com/4A47/rpscrape

branch master -> FETCH_HEAD
error: Your local changes to the following files would be overwritten by merge:
README.md
scripts/utils/update.py
Please commit your changes or stash them before you merge.
Aborting
Updating 64b1d91..3dde2bc

I haven't made any changes to the README.md file or the update file?

Hopefully it's nothing I'm doing :-)

AttributeError: 'NoneType' object has no attribute 'text'

not sure what might be happening here, tried with python 2.7 and python 3.8 -

[rpscrape]> 11 2009 jumps
Scraping jumps results from Cheltenham in 2009...
Traceback (most recent call last):
File "./rpscrape.py", line 722, in
main()
File "./rpscrape.py", line 718, in main
parse_args([arg.strip() for arg in args.split()])
File "./rpscrape.py", line 686, in parse_args
scrape_races(races, course_name(scrape_target), args[1], code)
File "./rpscrape.py", line 508, in scrape_races
sires, dams, damsires = pedigree_info(pedigrees)
File "./rpscrape.py", line 210, in pedigree_info
dam_nat = p.find("span").text
AttributeError: 'NoneType' object has no attribute 'text'

Horse Finishing Times

This is not an issue at all just added comment.

I noted the interesting idea you included about calculating
each horses finishing time based on the available winner time
and a calculation that transforms lengths beaten into
seconds etc.

This could be the basis of something useful.
The trick for the punter may be when to ponder using it and when to ignore it.

Picture a long distance chase for example.
The last dregs of finishers won't be putting maximum effort into
finishing as fast as they can. Any time recorded
might be a dubious measure of the ability they may demonstrate in future.

Less dubious in the same race may be the times for the first three home.

Style of race may have impact as well.
A 5f flat sprint for example would have a lot less of
the "I will just plod along at the end with minimal effort" style of impact.

Interesting that you have bother to include such stuff. :)
Future research into the data may reveal when it is a decent metric to use and when it is not.

AND

I can envision two possible routes for this scraper stuff.

#1 - rpsraper.py you continue to add new stuff too such as the above.

#2 - rpscraper.py is more so focussed on pure scrape. There is only so much data on the page and once it grabs it all without fail it is deemed perfect and set in stone.

Datatransform.py is then a 2nd script that takes the raw scrape output and creates extra fields.
Decimal odds, weights in lbs, time calcs anything else custom calculation wise.

No right or wrong answers I guess.

ERROR: distance_to_furlongs()

Good day 4A47,
I came across a small issue and thought that I should tell you.
This is what it gave me.
python rpscrape.py -d 2020/12/01-2020/12/31
ERROR: distance_to_furlongs()
Race: https://xxx/results/513/wolverhampton-aw/2020-12-01/770728

Regards
Patrick

Can't Encode Character Error

Hi 4A47

May I begin by saying that I am mighty impressed by the work you have done on rpscraper so far.
It has the makings of a very useful tool.

It inspired me to join github and also to install python.
No python expert here after 1 day using it but I did manage to create two little functions
aimed at converting string based data to numbers.
ie fractional odds to decimal odds and weight in 8-12 string style to 124 lbs etc.
String forms are easy on the eye but number formats are easier to work with if attempting to analyse data. My functions work stand alone but I have to progress to figuring out how to incorporate them into the bigger script.

I think I have found a small bug however.
Well might be code or it might be something weird on my system.

Output I got is as follows.

[rpscrape]> ire 2013-2018 jumps

Scraping jumps results from ire in 2013-2018...
No jumps race data for Curragh in 2014.
No jumps race data for Curragh in 2015.
No jumps race data for Curragh in 2016.
No jumps race data for Curragh in 2017.
No jumps race data for Curragh in 2018.
No jumps race data for Dundalk in 2013.
No jumps race data for Dundalk in 2014.
No jumps race data for Dundalk in 2015.
No jumps race data for Dundalk in 2016.
No jumps race data for Dundalk in 2017.
No jumps race data for Dundalk in 2018.
No jumps race data for Dundalk-AW in 2013.
No jumps race data for Dundalk-AW in 2014.
No jumps race data for Dundalk-AW in 2015.
No jumps race data for Dundalk-AW in 2016.
No jumps race data for Dundalk-AW in 2017.
No jumps race data for Dundalk-AW in 2018.
No jumps race data for Laytown in 2013.
No jumps race data for Laytown in 2014.
No jumps race data for Laytown in 2015.
No jumps race data for Laytown in 2016.
No jumps race data for Laytown in 2017.
No jumps race data for Laytown in 2018.
No jumps race data for Limerick-Junction in 2013.
No jumps race data for Limerick-Junction in 2014.
No jumps race data for Limerick-Junction in 2015.
No jumps race data for Limerick-Junction in 2016.
No jumps race data for Limerick-Junction in 2017.
No jumps race data for Limerick-Junction in 2018.
No jumps race data for Mallow in 2013.
No jumps race data for Mallow in 2014.
No jumps race data for Mallow in 2015.
No jumps race data for Mallow in 2016.
No jumps race data for Mallow in 2017.
No jumps race data for Mallow in 2018.
No jumps race data for Phoenix-Park in 2013.
No jumps race data for Phoenix-Park in 2014.
No jumps race data for Phoenix-Park in 2015.
No jumps race data for Phoenix-Park in 2016.
No jumps race data for Phoenix-Park in 2017.
No jumps race data for Phoenix-Park in 2018.
No jumps race data for Tralee in 2013.
No jumps race data for Tralee in 2014.
No jumps race data for Tralee in 2015.
No jumps race data for Tralee in 2016.
No jumps race data for Tralee in 2017.
No jumps race data for Tralee in 2018.
No jumps race data for Wexford in 2013.
No jumps race data for Wexford-RH in 2015.
No jumps race data for Wexford-RH in 2016.
No jumps race data for Wexford-RH in 2017.
No jumps race data for Wexford-RH in 2018.
Traceback (most recent call last):
File "rpscrape.py", line 448, in
main()
File "rpscrape.py", line 444, in main
parse_args([arg.strip() for arg in args.split()])
File "rpscrape.py", line 425, in parse_args
scrape_races(races, get_course_name(scrape_target), args[1], code)
File "rpscrape.py", line 362, in scrape_races
csv.write((f'{date},{course_name},{r_time},{race},{race_class},{band},{dist},{going},{p},{dr},{bt},{n},{sp},'
File "C:\Users\me\AppData\Local\Programs\Python\Python37-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\x80' in position 58: character maps to

Nb I ran this twice and twice it produced the same result.
Each csv file stopped at exactly the same spot.
4845 rows each time with the last entry horse being
14-Jun-13 Clonmel 06:45 Louis Fitzgerald Hotel Handicap Hurdle Age Of Glory

My suspicion is that whatever race comes after that one has some sort of unusual weirdness to it that the parser has yet to be designed to accommodate. That is my stab 1 guess but it could be wide of the mark.

If so you running ire 2013-2018 jumps should produce the exact same output as me.

If it sails through fine for you perhaps it is something local to my machine.

Appreciate if you could have a quick look when you have the time.

Cheers
Mick

Horse Speed Rating

Is it possible to scrape the horse's speed rating?

Running file including arguments all in 1 line

Whenever i try to run the file and include arguments eg -d 2019/12/18 gb
it returns the options menu as if i have entered an invalid command

eg in cmd instead of running "python C:\Users\james\rpscrape\scripts\rpscrape.py"
and then once it has come up "[rpscrape]>" entering " -d 2019/12/18 gb"
(which works fine, it creates the file)
i would like to be able to send this all in 1 line eg"python C:\Users\james\rpscrape\scripts\rpscrape.py -d 2019/12/18 gb"
but this doesn't run and just returns the options menu

Is this possible? I would like to automate the process of creating files rather than having to type in manually

Thanks in advance
The package is great
James

parsing errors

Out of curiosity why are you formatting some of the fields during the loading phase - I am seeing all of the same issues listed in the chain of issues (string to float etc). I have tried loads of combinations and all fail with various errors (mostly linked to the conversion of distance).

Would this have been easier to just leave as a string and then reformat the data after the load has completed?

as example

C:\Users\rorye\Documents\GitHub\rpscrape\scripts>rpscrape.py
[rpscrape]> 31 2013-2019 flat
Scraping flat results from Lingfield in 2013-2019...
Traceback (most recent call last):
File "C:\Users\rorye\Documents\GitHub\rpscrape\scripts\rpscrape.py", line 587, in
main()
File "C:\Users\rorye\Documents\GitHub\rpscrape\scripts\rpscrape.py", line 583, in main
parse_args([arg.strip() for arg in args.split()])
File "C:\Users\rorye\Documents\GitHub\rpscrape\scripts\rpscrape.py", line 564, in parse_args
scrape_races(races, course_name(scrape_target), args[1], code)
File "C:\Users\rorye\Documents\GitHub\rpscrape\scripts\rpscrape.py", line 425, in scrape_races
metres = round(float(dist) * 200)
ValueError: could not convert string to float:

C:\Users\rorye\Documents\GitHub\rpscrape\scripts>

setup problem

Hello everybody! I am a newbie of python (but I had experience with others programming languages like php). I installed python for windows and selected to add the path during the installation. I installed almost all module required, but I have a problem with lxml (I got every time an error) and when I tried to launch rpscrape from the cmd, I got this message:

Python\Python310\rpscrape\scripts\rpscrape.py", line 8, in
from lxml import html
ImportError: cannot import name 'html' from 'lxml' (unknown location)

how can I fix it? is it ok to launch rpscrape.py from the command line? please help me :)

Thanks a lot

No longer capturing value for 'trainer' in result records

Since I pulled the latest version, I don't seem to be getting a value returned for 'trainer'.

I ran
[rpscrape]> 11 2013 jumps
Scraping jumps results from Cheltenham in 2013...
Finished scraping. cheltenham-2013_jumps.csv saved in rpscrape/data

Here is a sample of the first few lines, I'm not getting a trainer. I tried a couple of different years (all Cheltenham) and had the same problem.

"date","course","time","race_name","class","band","dist(f)","dist(m)","going","pos","draw","btn","horse_name","sp","dec","age","weight","lbs","gear","fin_time","jockey","trainer","or","ts","rpr","prize(£)","sire","dam","damsire","comment"
2014-03-14,Cheltenham,3:20,Betfred Cheltenham Gold Cup Chase,Grade 1,5yo+,26.5,5300,Good,1,,0,Lord Windermere,20/1,21.00,8,11-10,164,,6:43.88,Davy Russell,,152,149,168,327325.82,Oscar (IRE),Satellite Dancer (IRE),Satco,In rear and detached 10th - driven 4 out - still plenty to do next - headway under pressure approaching 2 out - chased leaders last - led and hung badly right final 110yds - held on all out (trainer said - regarding apparent improvement in form - that the stable had been out of form and the gelding was unsuited by the slow pace at Leopardstown.)(op 25/1)
2014-03-14,Cheltenham,3:20,Betfred Cheltenham Gold Cup Chase,Grade 1,5yo+,26.5,5300,Good,2,,0.1,On His Own,16/1,17.00,10,11-10,164,,6:43.90,D J Casey,,161,148,168,122826.21,Presenting (GB),Shuil Na Mhuire (IRE),Roselier,Led 4th - narrowly headed 8th - stayed challenging - led 11th - joined 4 out - narrowly headed - ridden 3 out - went right and bumped 2 out - rallied last - strong challenge and carried right final 110yds - just failed(tchd 20/1)

Ovr_Btn Column error

Hi,

Great package, thanks very much for sharing. I've noticed a little bug in the recently updated script, the Ovr_Btn column is now returning the same value as the Btn column, rather than the total distance the horse was beaten, which it was previously returning. This is also leading to the time for beaten horses to be incorrect.

Duplicate Course Names

Hi 4A47,

I have run into a small snag. It seems like there are some duplicate course names from other countries so it might be best to allocate the course id or some other feature to make them unique?

for example:
' "saf": [
"988 - Arlington",
"994 - Ascot",
"515 - Bloemfontein",'
and
' "gb": [
"32 - Aintree",
"2 - Ascot",
"3 - Ayr",'

The csv files produce:

"2020-02-15,ascot,1:15,Thames Materials...."

Regards
Patrick

Today's racecards

Is it possible to get today's racecards rather than historical results?
Or would it be easy to adjust the code to allow this?

Thanks

No RPR in daily results

Hey, is the pesbot project dead?

Sorry for asking here, couldn't find the repo anymore

ValueError: could not convert string to float

Query: fr 2017-2019 flat

Error:

No flat race data for Zonza in 2019.
Traceback (most recent call last):
  File "./rpscrape.py", line 716, in <module>
    main()
  File "./rpscrape.py", line 712, in main
    parse_args([arg.strip() for arg in args.split()])
  File "./rpscrape.py", line 680, in parse_args
    scrape_races(races, course_name(scrape_target), args[1], code)
  File "./rpscrape.py", line 588, in scrape_races
    win_time = float(winning_time[0].strip("s"))
ValueError: could not convert string to float: '121%'

CSV data file in rpscrape/data, named fr-2017-2019_flat.csv is completely empty apart from headers.

No Data

hi, thanks for sharing your code. I get no data prompts when trying to scrape GB data for 2018 or 2019
[rpscrape]> gb 2018 flat Scraping flat results from gb in 2018... No flat race data for Aintree in 2018. No flat race data for Bangor-on-Dee in 2018. No flat race data for Cartmel in 2018. No flat race data for Chelmsford in 2018.

rail movements - lower() is missing during the assignment

From Windows 10:

rpscrape\scripts>python racecards.py today

Traceback (most recent call last):
File "racecards.py", line 416, in
main()
File "racecards.py", line 406, in main
races = parse_races(session, race_docs)
File "racecards.py", line 311, in parse_races
going_info = get_going_info(session)
File "racecards.py", line 63, in get_going_info
going, rail_movements = parse_going(course['going'])
File "racecards.py", line 260, in parse_going
rail_movements = [x.strip() for x in info.split('rail movements:')[1].strip().strip(')').split(',')]
IndexError: list index out of range

The line before in file "racecards.py", line 259, in parse_going is:

if 'rail movements' in info.lower():

And therefore, I think line 260 should be changed to:

rail_movements = [x.strip() for x in info.lower().split('rail movements:')[1].strip().strip(')').split(',')]

2021 not working

Hello
Sure this is simple, just data for 2021 is not scraping

[rpscrape]> hk 2021 flat

INVALID YEAR: must be in range 1988-2020 for flat and 1987-2020 for jumps.

Just a range change in code I guess.

Thanks

Mark

2019 Jump Season Data

Hi Guys,

Firstly your tool is fantastic, extremely useful and the level of data it pulls is phenomenal!

It may be something I'm doing but the jump season seems to be restricted to 2018 data (which is May-18 to Apr-19), is there an issue with the script pulling more recent data?

Also I noted on the flat data the RPR rating always seems to be higher by one (as a glance from the prior days racing for to cross check the final RPR from the results).

Apologies if I'm doing something wrong on the jump element, any guidance would be greatly appreciated

Great work :-)

Cant get data

Hi,
ich can not get any data. I can not find a URL in the code, where did the script know where to go?

No matter what course or date i try to scrape, i get no data at all.
Also i tried the example data, the script runs throw all GB courses but no data.

Thanks for your help.

No module named 'requests'

Installing, running with Python 3.7.1...

$ ./rpscrape.py
Traceback (most recent call last):
File "./rpscrape.py", line 8, in
import requests
ModuleNotFoundError: No module named 'requests'

RuntimeError: Event loop is closed

Hi
I am getting the following error while running on a Windows machine. Could you please look into this?

RuntimeError: Event loop is closed
RuntimeWarning: coroutine 'TCPConnector._resolve_host' was never awaited

jake

Crash during scraping

Thanks for the great library, just wanted to report a crash I came across:

Traceback (most recent call last):
  File "rpscrape.py", line 586, in <module>
    main()
  File "rpscrape.py", line 582, in main
    parse_args([arg.strip() for arg in args.split()])
  File "rpscrape.py", line 563, in parse_args
    scrape_races(races, course_name(scrape_target), args[1], code)
  File "rpscrape.py", line 488, in scrape_races
    times = calculate_times(win_time, btn, going, code, course_name)
  File "rpscrape.py", line 359, in calculate_times
    time = (win_time + (float(dist) / lps_scale))
UnboundLocalError: local variable 'lps_scale' referenced before assignment

Which I saw after running the script with these arguments: gb 1999-2019 flat

Works well but limited data

Where is the script currently pooling its data from. I have used for Australian tracks and it works well but it pulls 1 out of 8 races for many events.

TypeError: getresponse() got an unexpected keyword argument 'buffering'

Attempted execution with:

[rpscrape]> gb 2020 flat

Script was left running overnight. It seemed to freeze and never complete. With a keyboard interruption, the following is revealed:

^CTraceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 377, in _make_request
    httplib_response = conn.getresponse(buffering=True)
TypeError: getresponse() got an unexpected keyword argument 'buffering'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "rpscrape.py", line 1110, in <module>
    main()
  File "rpscrape.py", line 1106, in main
    parse_args([arg.strip() for arg in args.split()])
  File "rpscrape.py", line 1016, in parse_args
    scrape_races(races, course_name(scrape_target), args[1], code)
  File "rpscrape.py", line 607, in scrape_races
    r = requests.get(race, headers={'User-Agent': 'Mozilla/5.0'})
  File "/usr/lib/python3/dist-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 380, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.7/http/client.py", line 1336, in getresponse
    response.begin()
  File "/usr/lib/python3.7/http/client.py", line 306, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.7/http/client.py", line 267, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.7/socket.py", line 589, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.7/ssl.py", line 1052, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.7/ssl.py", line 911, in read
    return self._sslobj.read(len, buffer)
KeyboardInterrupt

Last line in generated CSV is:

2020-07-04 | Chelmsford-AW | 8:10 | chelmsfordcityracecourse.com Novice Stakes (Div I)

Versions:

Python 3.7.3 (default, Jul 25 2020, 13:03:44) 
[GCC 8.3.0] on linux

Fails on 'Void' races

Hi,
Thanks for this, just what I was looking for.
Only issue I've spotted is the file creation fails when a race is 'Void', with a message - Failed to find number of runners.
see '-d 2019/11/06-2019/11/06 gb' for example

I'd also be very interested if you have any suggestions on how one might automate this to produce a daily file, for GB and IRE for example.

Future race entry data

One neat feature for this script would be scraping future entries or race cards from RP.

racecard.py - Chelmsford (ARO)

Today's racecards for Chelmsford start their fixtures with a Arab race at 16:25.

I think the country abbreviation, ARO, is breaking racecard.py

Issue since update

Great work by the way, is a God send.
Having an issue since updating today, have installed all modules stated.

Traceback (most recent call last):
File "rpscrape.py", line 1166, in
main()
File "rpscrape.py", line 1162, in main
parse_args([arg.strip() for arg in args.split()])
File "rpscrape.py", line 1009, in parse_args
races = get_race_urls_date(dates, region_code)
File "rpscrape.py", line 544, in get_race_urls_date
docs = asyncio.run(get_documents(days))
AttributeError: module 'asyncio' has no attribute 'run'

Racecards - IndexError: list index out of range

Hi,

I think there is an issue with the rail movements / going element of the racecards module, when trying to run the module today (07/09/2021) I get the following error:

Traceback (most recent call last):
File "./racecards.py", line 412, in
main()
File "./racecards.py", line 402, in main
races = parse_races(session, race_docs, date)
File "./racecards.py", line 264, in parse_races
going_info = get_going_info(session, date)
File "./racecards.py", line 62, in get_going_info
going, rail_movements = parse_going(course['going'])
File "./racecards.py", line 255, in parse_going
rail_movements = [x.strip() for x in going_info.split('Rail movements:')[1].strip().strip(')').split(',')]
IndexError: list index out of range

The strange thing is when I try tried to run if for tomorrow (08/09/2021) it seems to work fine?

Any assistance would be greatly appreciated.

Best wishes

Using futures requests instead of raw to speed data loading

Would there be any objections to switching to pulling data with the requests_futures library and then aggregating as it comes in? Testing locally, along with writing the CSV in one chunk at the end rather than iteratively, that seems to make things significantly faster (and can always check and retry individual requests as you load them).

Problem trying to execute codes

Hi,

I am a newbie trying to learn Python and Github, probably doing something very stupid here. Can you kindly point out where i went wrong, please?

I run the following codes in GIT BASH and got following error message "bash: [rpscrape]: command not found"

$ git clone https://github.com/4A47/rpscrape.git
$ cd rpscrape/scripts
$ python rpscrape.py
[rpscrape]> -d 2019/12/18 gb

Many thanks in advances
Jason

error on racecards.py tomorrow call

Hello @4A47,

When running a call to racecards.py tomorrow in order to pull saturday 18th cards, I'm receiving the below error:

Traceback (most recent call last):
File "C:\horses\scripts\racecards.py", line 413, in
main()
File "C:\horses\scripts\racecards.py", line 403, in main
races = parse_races(session, race_docs, date)
File "C:\horses\scripts\racecards.py", line 318, in parse_races
runners = get_runners(profile_urls, race['race_id'])
File "C:\horses\scripts\racecards.py", line 135, in get_runners
json_str = doc[1].xpath('//body/script')[0].text.split('window.PRELOADED_STATE =')[1].split('})()')[0].strip().strip(';')

IndexError: list index out of range

ValueError: could not convert string to float: '&'

Hello,

When running [rpscrape]> 1138 2018-2020 flat, I receive an error seemingly because a & symbol. Happy for you to advise a solution or to implement something yourself.

Invalid Region Code

Hi, When I try to get data for a course it doesn't work for me....

[rpscrape]> -d 2021/10/23 -c 36 -y 2021 -t flat
Invalid Region code. 

Examples:
		2020/01/19 gb
		2021/07/11 ire

Also, I am only getting a few races on the csv file when I type this command...

[rpscrape]> -d 2021/10/23 gb
Finished scraping.
2021_10_23.csv saved in rpscrape/data/dates/gb/all

Is there an issue, or is this user error?

Many thanks.

ln 927 in rpscrape.py throws error.

Line tries to execute git remote show origin but creates an error. it should pass these words as seperate strings in a list to fix the issue.

Executing rpscrape in google colab

Hi 4A47.

I am executing the rpscrape in google colab and I don't know how to access the csv file.

(Btw thanks for the great work)

Can you add a requirements file?

Can you add a requirements.txt file?

messaging incorrect when scraping for 2019

Scraping jumps results from Ffos-Las in 2010-2019...
No jumps race data for Ffos-Las in 2019.

Finished scraping. ffos-las-2010-2019_jumps.csv saved in rpscrape/data
[rpscrape]>

Just pointing out that there is jumps racing data for 2019 in Ffos-Las, contrary to the warning, the last row on my file is -

2019-01-14,Ffos-Las,4:15,Specsavers Llanelli Standard Open National Hunt Flat Race,Class 5,4-6yo,16,3200,Soft,11,,103.5,Moonlight Camp,12/1,13.00,5,11-0,154,,4:33.27,Robert Dunne,Neil Mulholland,,,9,,Kamsin (GER),Moonlight Symphony (GER),Pentire,Led 1f - remained close up until ridden and weakened 4f out - tailed off(op 11/1)

This happens for all courses when I search for 2019.

Thanks

Racecard.py fetch failing

Hello there,

When calling the racecard.py either for today or tomorrow, I am seeing the below error. This started occurring on 19th May.

Traceback (most recent call last): File "C:\scripts\racecards.py", line 426, in <module> main() File "C:\scripts\racecards.py", line 416, in main races = parse_races(session, race_docs) File "C:\scripts\racecards.py", line 321, in parse_races going_info = get_going_info(session) File "C:\scripts\racecards.py", line 62, in get_going_info going, rail_movements = parse_going(course['going']) File "C:\scripts\racecards.py", line 255, in parse_going info = going_info.split(going)[1] ValueError: empty separator

Extremely slow

I am trying to pull historical data, it has been running for over an hour and has not yet completed for Ireland. In task manager, it shows no network activity but it is writing at 0.1mbps to disk. Any ideas?

pull result data for specific date?

is it possible to pull the result data from a specific date across multiple racecourses?

Working really well, one question

Just wanted to check in and say that I found this yesterday and in about an hour I had every Cheltenham race since 2010 in a sqlite database and I'm having great fun messing around with the data.
Just one thing I wanted to ask about, it looks like you are using some algorithm to approximate the times of all horses (except the winner) as bizarrely this information still isn't made available in racing. Could you give me a brief rundown of how the algorithm works?

Thanks again
Steve

Distance to Furlong Error - Conversion to Float

Hi,

I spotted an error today when processing todays racecards, could not convert string to float.

Tomorrow's racecards work fine so must be something in the distance text it doesn't like today for a specific race? :-)

Traceback (most recent call last):
File "./racecards.py", line 395, in
main()
File "./racecards.py", line 385, in main
races = parse_races(session, race_docs, date)
File "./racecards.py", line 242, in parse_races
race['distance_f'] = distance_to_furlongs(race['distance_round'])
File "./racecards.py", line 29, in distance_to_furlongs
return float(dist)
ValueError: could not convert string to float: ''

Getting an error when scraping results for Exeter

Ran into this issue earlier.

[rpscrape]> 14 2010-2019 jumps
Scraping jumps results from Exeter in 2010-2019...
No jumps race data for Exeter in 2019.
Traceback (most recent call last):
File "./rpscrape.py", line 472, in
main()
File "./rpscrape.py", line 468, in main
parse_args([arg.strip() for arg in args.split()])
File "./rpscrape.py", line 449, in parse_args
scrape_races(races, course_name(scrape_target), args[1], code)
File "./rpscrape.py", line 382, in scrape_races
dec = fraction_to_decimal([sp.strip('F').strip('J').strip('C').strip() for sp in sps])
File "./rpscrape.py", line 157, in fraction_to_decimal
decimal.append('{0:.2f}'.format(float(fraction.split('/')[0]) / float(fraction.split('/')[1]) + 1.00))
ValueError: could not convert string to float:

Thanks

joenano / rpscrape Goto Github PK

rpscrape's People

Contributors

Stargazers

Watchers

Forkers

rpscrape's Issues

I run the following codes in GIT BASH and got following error message "bash: [rpscrape]: command not found"

$ git clone https://github.com/4A47/rpscrape.git $ cd rpscrape/scripts $ python rpscrape.py [rpscrape]> -d 2019/12/18 gb

Recommend Projects

Recommend Topics

Recommend Org

$ git clone https://github.com/4A47/rpscrape.git
$ cd rpscrape/scripts
$ python rpscrape.py
[rpscrape]> -d 2019/12/18 gb