Giter Club home page Giter Club logo

e-dictionary-crawler's Introduction

ami_dict_crawler

crawler of http://e-dictionary.apc.gov.tw/ami/Search.htm

jsonlines output: https://thewayiam.github.io/ami_dict_crawler/data/data.jsonlines

安裝

virtualenv --python=python3 venv; . venv/bin/activate; pip install --upgrade pip # 設置環境檔
pip install scrapy

爬阿美語

scrapy runspider crawler.py -a lang=ami -o data/ami.jsonlines

其他族語

請將{代號}換成需要的族語,預設為阿美語

scrapy runspider crawler.py -t jsonlines -a lang={代號} -o data/{代號}.jsonlines --logfile {代號}.log

代號別

阿美語 ami
泰雅語 tay
排灣語 pwn
布農語 bnn
卑南語 pyu
魯凱語 dru
鄒語 tsu
賽夏語 xsy
雅美語 tao
邵語 ssf
噶瑪蘭語 ckv
太魯閣語 trv
撒奇萊雅語 ais
賽德克語 sdq
拉阿魯哇語 sxr
卡那卡那富語 xnb

指令

# 阿美語 ami
scrapy runspider crawler.py -a lang=ami -o data/ami.jsonlines --logfile ami.log

# 泰雅語 tay
scrapy runspider crawler.py -a lang=tay -o data/tay.jsonlines --logfile tay.log

# 排灣語 pwn
scrapy runspider crawler.py -a lang=pwn -o data/pwn.jsonlines --logfile pwn.log

# 布農語 bnn
scrapy runspider crawler.py -a lang=bnn -o data/bnn.jsonlines --logfile bnn.log

# 卑南語 pyu
scrapy runspider crawler.py -a lang=pyu -o data/pyu.jsonlines --logfile pyu.log

# 魯凱語 dru
scrapy runspider crawler.py -a lang=dru -o data/dru.jsonlines --logfile dru.log

# 鄒語 tsu
scrapy runspider crawler.py -a lang=tsu -o data/tsu.jsonlines --logfile tsu.log

# 賽夏語 xsy
scrapy runspider crawler.py -a lang=xsy -o data/xsy.jsonlines --logfile xsy.log

# 雅美語 tao
scrapy runspider crawler.py -a lang=tao -o data/tao.jsonlines --logfile tao.log

# 邵語 ssf
scrapy runspider crawler.py -a lang=ssf -o data/ssf.jsonlines --logfile ssf.log

# 噶瑪蘭語 ckv
scrapy runspider crawler.py -a lang=ckv -o data/ckv.jsonlines --logfile ckv.log

# 太魯閣語 trv
scrapy runspider crawler.py -a lang=trv -o data/trv.jsonlines --logfile trv.log

# 撒奇萊雅語 ais
scrapy runspider crawler.py -a lang=ais -o data/ais.jsonlines --logfile ais.log

# 賽德克語 sdq
scrapy runspider crawler.py -a lang=sdq -o data/sdq.jsonlines --logfile sdq.log

# 拉阿魯哇語 sxr
scrapy runspider crawler.py -a lang=sxr -o data/sxr.jsonlines --logfile sxr.log

# 卡那卡那富語 xnb
scrapy runspider crawler.py -a lang=xnb -o data/xnb.jsonlines --logfile xnb.log

e-dictionary-crawler's People

Contributors

sih4sing5hong5 avatar thewayiam avatar

Watchers

 avatar  avatar

e-dictionary-crawler's Issues

無一定有詞類

2021-02-09 16:54:22 [scrapy.core.scraper] ERROR: Spider error processing <GET https://e-dictionary.apc.gov.tw/trv/terms/m/115.htm> (referer: https://e-dictionary.apc.gov.tw/trv/terms.htm)
Traceback (most recent call last):
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/utils/defer.py", line 120, in iter_errback
    yield next(it)
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/utils/python.py", line 353, in __next__
    return next(self.data)
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/utils/python.py", line 353, in __next__
    return next(self.data)
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
    for x in result:
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/spidermiddlewares/referer.py", line 340, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "/home/fafofafoy/git/E-Dictionary-Crawler/crawler.py", line 22, in parse
    yield from self.掠詞條(response)
  File "/home/fafofafoy/git/E-Dictionary-Crawler/crawler.py", line 51, in 掠詞條
    if sului.startswith('詞類:'):
AttributeError: 'NoneType' object has no attribute 'startswith'

無一定有音檔

2021-02-09 16:54:31 [scrapy.core.scraper] ERROR: Spider error processing <GET https://e-dictionary.apc.gov.tw/trv/terms/m/48.htm> (referer: https://e-dictionary.apc.gov.tw/trv/terms.htm)
Traceback (most recent call last):
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/utils/defer.py", line 120, in iter_errback
    yield next(it)
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/utils/python.py", line 353, in __next__
    return next(self.data)
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/utils/python.py", line 353, in __next__
    return next(self.data)
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/spidermiddlewares/offsite.py", line 29, in process_spider_output
    for x in result:
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/spidermiddlewares/referer.py", line 340, in <genexpr>
    return (_set_referer(r) for r in result or ())
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/spidermiddlewares/urllength.py", line 37, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/spidermiddlewares/depth.py", line 58, in <genexpr>
    return (r for r in result or () if _filter(r))
  File "/home/fafofafoy/git/E-Dictionary-Crawler/venv/lib/python3.7/site-packages/scrapy/core/spidermw.py", line 62, in _evaluate_iterable
    for r in iterable:
  File "/home/fafofafoy/git/E-Dictionary-Crawler/crawler.py", line 22, in parse
    yield from self.掠詞條(response)
  File "/home/fafofafoy/git/E-Dictionary-Crawler/crawler.py", line 42, in 掠詞條
    'div.main_entry_word > span.volume audio').attrib['src']
KeyError: 'src'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.