Giter Club home page Giter Club logo

amazon-scrapy's People

Contributors

dg1245 avatar dynamohuang avatar rangerdong avatar zch513430014 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

amazon-scrapy's Issues

Help in running the code

Hi @dynamohuang I am new to python scrapy and mysql anyway. I have finished creating the working environment for running your code (i.e. MySQL backend and run your code under amazon/db to create datasets accordingly)

However, how could i start scrapping the best seller's ASIN ? What I do now is running the main.py but results 0 item fetched. Would you please help me through the working process?

amazon反爬机制问题

目前我采用了动态设置user-agent 加delay 会弹出验证码;
想请教一下大家有没有更好的方法

search words

hello there, there is missing information about how exactly is the search key words work, there are a lot of tables but no data to put in the tables, because of this reason the code doesn't run.

I'll appreciate if you could help or give advice on how to run it, because it's expecting to get a keyword and asin in the database but no database exists in the repo.

User timeout caused connection failure.

Don't know if it is because of the Amazon has detected our bot and block the IP?

But
https://www.amazon.com/best-sellers-video-games/zgbs/videogames/?aja
x=1&pg=3
Indeed doesn't existed, there is no page 3 there .

https://www.amazon.com/Best-Sellers-Sports-Outdoors/zgbs/sporting-go%20ods/?ajax=1&pg=2
is correct, I can open it with chrome browser.

How can I set up the proxy?
Because of this error, it will lose all the data even already got some data from previous pages?

twisted.internet.error.TimeoutError: User timeout caused connection failure.
2018-11-19 23:40:32 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.amazon.com/best-sellers-video-games/zgbs/videogames/?aja
x=1&pg=3>
Traceback (most recent call last):
File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_r
equest
defer.returnValue((yield download_func(request=request,spider=spider)))
scrapy.core.downloader.handlers.http11.TunnelError: Could not open CONNECT tunnel with proxy 46.38.52.36:8081 [{'status': 400, 'reason': b'B
ad Request'}]
2018-11-19 23:40:36 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.amazon.com/Best-Sellers-Sports-Outdoors/zgbs/sporting-go
ods/?ajax=1&pg=2>
Traceback (most recent call last):
File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/twisted/python/failure.py", line 491, in throwExceptionIntoG
enerator
return g.throw(self.type, self.value, self.tb)
File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_r
equest
defer.returnValue((yield download_func(request=request,spider=spider)))
File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/scrapy/core/downloader/handlers/http11.py", line 320, in _cb
_timeout
raise TimeoutError("Getting %s took longer than %s seconds." % (url, timeout))
twisted.internet.error.TimeoutError: User timeout caused connection failure: Getting https://www.amazon.com/Best-Sellers-Sports-Outdoors/zgb
s/sporting-goods/?ajax=1&pg=2 took longer than 30.0 seconds..
2018-11-19 23:41:51 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.amazon.com/best-sellers-software/zgbs/software/?ajax=1&p
g=2>
Traceback (most recent call last):
File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_r
equest
defer.returnValue((yield download_func(request=request,spider=spider)))
twisted.internet.error.TimeoutError: User timeout caused connection failure.
(1030, 'Got error 168 from storage engine')
total spent: 0:52:23.652052
done

keyword和review 爬虫的使用?

我使用的是scrapy 1.5,有部分错误,主要是utf8编码及其它问题,现在已能正常运行,但仍有几个问题请教:
1、product无保存item的代码及sql,我计划自行编写,不知后期会更新还是由使用者根据需要完善?
2、asin、cate、detail 3个爬虫可独立运行,keyword和review等如何执行?独立运行提示参数不足?

ModuleNotFoundError when trying to use amazon asin spider

Hi, I would like to use this repo to get some info on amazon products. I'm not very familiar with scrapy (yet), and here's what I did :
-Git clone your project
-install requirement
-cd amazon-scrapy/amazon
-scrapy crawl asin
I get the following error :

Traceback (most recent call last):
  File "/home/user/miniconda3/envs/scrap/bin/scrapy", line 10, in <module>
    sys.exit(execute())
  File "/home/user/miniconda3/envs/scrap/lib/python3.7/site-packages/scrapy/cmdline.py", line 109, in execute
    settings = get_project_settings()
  File "/home/user/miniconda3/envs/scrap/lib/python3.7/site-packages/scrapy/utils/project.py", line 68, in get_project_settings
    settings.setmodule(settings_module_path, priority='project')
  File "/home/user/miniconda3/envs/scrap/lib/python3.7/site-packages/scrapy/settings/__init__.py", line 292, in setmodule
    module = import_module(module)
  File "/home/user/miniconda3/envs/scrap/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'amazon.settings'

Any idea how to fix that?

关于特定关键词抓取对应ASIN的排名

您好。
我是电商小白,最近在弄爬亚马逊网页的关键词对应ASIN的排名。
在第一步我就卡住了,找不到亚马逊network里面的cookie,只有一个user-agent和一些accept数据。
请问您是通过cookie爬,还是通过其他一些途径?望您有空的时候,可以指点小弟一二。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.