new-village / nkparser Goto Github PK
View Code? Open in Web Editor NEWnkparser is a simple scraping library for netkeiba.com
License: Apache License 2.0
nkparser is a simple scraping library for netkeiba.com
License: Apache License 2.0
result: 2020C8100404
Traceback (most recent call last):
File "/workspaces/nkcrawler/run.py", line 54, in <module>
result = nkparser.load("result", race['race_id'])
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/load.py", line 29, in load
return loader.exec()
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/load.py", line 87, in exec
self.table = parse_text("result", self.text, self.entity_id)
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/parse.py", line 18, in parse_text
return parser.exec()
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/parse.py", line 84, in exec
kv3 = [{col["col_name"]: self._apply_format(col, rec) for col in self.columns} for rec in kv2]
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/parse.py", line 84, in <listcomp>
kv3 = [{col["col_name"]: self._apply_format(col, rec) for col in self.columns} for rec in kv2]
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/parse.py", line 84, in <dictcomp>
kv3 = [{col["col_name"]: self._apply_format(col, rec) for col in self.columns} for rec in kv2]
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/parse.py", line 40, in _apply_format
val = formatter(col["reg"], val, col["var_type"])
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/help.py", line 34, in formatter
value = float(val) if val is not None else None
ValueError: could not convert string to float: '2:39'
数値型(int/float)で指定されたフィールドが、Netkeibaのサイト上ブランクの場合、パース結果にゼロが登録されてしまう。
データ分析上はNoneケースにしたい。
race_id = 202201020801
で下記のエラーが発生する。
Traceback (most recent call last):
File "/workspaces/nkcrawler/run.py", line 51, in <module>
nkdata = nkparser.load(table_name, race_id)
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/load.py", line 29, in load
return loader.exec()
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/load.py", line 69, in exec
self.info = parse_text("race", self.text, self.entity_id)
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/parse.py", line 18, in parse_text
return parser.exec()
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/parse.py", line 88, in exec
kv5 = [{col["col_name"]: self._apply_post_func(col, rec) for col in self.columns} for rec in kv4]
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/parse.py", line 88, in <listcomp>
kv5 = [{col["col_name"]: self._apply_post_func(col, rec) for col in self.columns} for rec in kv4]
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/parse.py", line 88, in <dictcomp>
kv5 = [{col["col_name"]: self._apply_post_func(col, rec) for col in self.columns} for rec in kv4]
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/parse.py", line 63, in _apply_post_func
val = self._call_functions(col, rec, "post_func")
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/parse.py", line 70, in _call_functions
return globals()[func_name](eval(args))
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/help.py", line 139, in fmt_date
return dt.strptime(dt_str, "%Y%m%d").strftime('%Y-%m-%d')
File "/opt/python/3.10.4/lib/python3.10/_strptime.py", line 568, in _strptime_datetime
tt, fraction, gmtoff_fraction = _strptime(data_string, format)
File "/opt/python/3.10.4/lib/python3.10/_strptime.py", line 349, in _strptime
raise ValueError("time data %r does not match format %r" %
ValueError: time data '' does not match format '%Y%m%d'
race_id = 202250030808
で下記のエラーが発生している。
Traceback (most recent call last):
File "/workspaces/nkcrawler/run.py", line 51, in <module>
result = nkparser.load("result", race['race_id'])
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/load.py", line 29, in load
return loader.exec()
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/load.py", line 86, in exec
self.info = parse_text("race_db", self.text, self.entity_id)
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/parse.py", line 18, in parse_text
return parser.exec()
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/parse.py", line 88, in exec
kv5 = [{col["col_name"]: self._apply_post_func(col, rec) for col in self.columns} for rec in kv4]
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/parse.py", line 88, in <listcomp>
kv5 = [{col["col_name"]: self._apply_post_func(col, rec) for col in self.columns} for rec in kv4]
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/parse.py", line 88, in <dictcomp>
kv5 = [{col["col_name"]: self._apply_post_func(col, rec) for col in self.columns} for rec in kv4]
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/parse.py", line 63, in _apply_post_func
val = self._call_functions(col, rec, "post_func")
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/parse.py", line 70, in _call_functions
return globals()[func_name](eval(args))
File "/workspaces/nkcrawler/.venv/lib/python3.10/site-packages/nkparser/help.py", line 146, in fmt_date
return dt.strptime(dt_str, "%Y%m%d").strftime('%Y-%m-%d')
File "/opt/python/3.10.4/lib/python3.10/_strptime.py", line 568, in _strptime_datetime
tt, fraction, gmtoff_fraction = _strptime(data_string, format)
File "/opt/python/3.10.4/lib/python3.10/_strptime.py", line 349, in _strptime
raise ValueError("time data %r does not match format %r" %
ValueError: time data '20225003' does not match format '%Y%m%d'
レースID 202209020804
の馬名、騎手などで文字化けが発生する。また同レースIDのレース名でも発生している。
[{'id': '20220902080415', 'race_id': '202209020804', 'rank': 1, 'bracket': 7, 'horse_number': 15, 'horse_id': '2019101507', 'horse_name': 'ЅЧЅЃЅЪЁМЅЙЅП', 'gender': '', 'age': 3, 'burden': 56.0, 'jockey_id': '01140', 'jackey_name': 'ВЃЛГЯТРИ', 'rap_time': 133.1, 'diff_time': 0, 'passage_rank': '13-13-2-2', 'last_3f': 34.7, 'weight': 462, 'weight_diff': 0, 'trainer_id': '01183', 'trainer_name': 'ФдЬюТйЧЗ', 'prize': 520.0}, {'id': '20220902080405', 'race_id': '202209020804', 'rank': 2, 'bracket': 3, 'horse_number': 5, 'horse_id': '2019105568', 'horse_name': 'ЅВЁМЅЦЅэЁМЅК', 'gender': '', 'age': 3, 'burden': 54.0, 'jockey_id': '01163', 'jackey_name': 'КфАцЮмРБ', 'rap_time': 133.5, 'diff_time': 0.4, 'passage_rank': '5-7-8-5', 'last_3f': 34.7, 'weight': 438, 'weight_diff': -2, 'trainer_id': '01161', 'trainer_name': 'Щ№БбУв', 'prize': 210.0}, {'id': '20220902080416', 'race_id': '202209020804', 'rank': 3, 'bracket': 8, 'horse_number': 16, 'horse_id': '2019105793', 'horse_name': 'ЅЂЅЄЅ\xadЅуЅѓЅЩЅІЅЄЅУ', 'gender': '', 'age': 3, 'burden': 56.0, 'jockey_id': '01019', 'jackey_name': 'НЉЛГППАь', 'rap_time': 133.7, 'diff_time': 0.6, 'passage_rank': '3-4-5-3', 'last_3f': 35.2, 'weight': 454, 'weight_diff': 0, 'trainer_id': '01071', 'trainer_name': 'УгЙОТйМї', 'prize': 130.0} ... ]
事象:
オッズが未確定(status=middle
)状態でパーサを実行すると、There is no odds data: 202201020704
というメッセージを出して処理が終了してしまう。
対応:
エラーを出さずに、取得できたオッズデータのみで処理を進められるようにしてほしい。
nkcrawler
を使って 2022年8-9月分を収集した結果、18頭を超えるレースは下記の通り。
race_id | head_count |
---|---|
202206040611 | 45 |
202206040710 | 22 |
202207050710 | 20 |
202207050712 | 20 |
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.