dynamohuang / amazon-scrapy Goto Github PK

View Code? Open in Web Editor NEW

284.0 284.0 122.0 94 KB

Scrapy the detail and lowest price of amazon best seller product by python spider

Python 89.81% TSQL 10.19%

amazon-scrapy's People

Contributors

Stargazers

Watchers

Forkers

liangdabiao 18521080951 kcyu1993 jimmysum yuefeng99 weizhengzhou j4son prczsf antoniotrento huochequan theflase lancelotarthur lywei saskye adamwebrog geosson patrickgao2013 hunkguo harryzhou1103 sexcode zjuhjz yousuowei123 saaspeter ishoxing haoshuncheng rayixk liudianpeng tissr iotwlw zhouyantao beimingmaster leokingai cwoinc wanghnu zhifu10001 licycle vuchau yangzhaoyunfei deviluper webstardotme stanxii johnma alpharootbeta devinhu swordarter jorigorn undigit mbrctv kidesigner yingdongo xuhuachen yuchiawang congdoan19 erikhayton niejn rollost hhy5277 tester-smj oliviawanderlust oneboxok billdo gxjljx kenlee0305 jevy146 djkim0317 heathenry jy92 askk xpzouying luofeihu hongyu9000 datafields-team zhejigege yuesir gokhu18 bpowers4 darkethereum makaraduman twskipper allen0517 pengjinfu pengyixin360 bazickoff jackzhume pyscrape leedaga benjanmin haoshanwei mvlintho sillysg5110 wyt12345 ezhilarasan14 tjtgzxs lyon-chen anzersy xixuebin pakchoiko wkf0660 cly88 lishixin7

amazon-scrapy's Issues

Help in running the code

Hi @dynamohuang I am new to python scrapy and mysql anyway. I have finished creating the working environment for running your code (i.e. MySQL backend and run your code under amazon/db to create datasets accordingly)

However, how could i start scrapping the best seller's ASIN ? What I do now is running the main.py but results 0 item fetched. Would you please help me through the working process?

amazon反爬机制问题

目前我采用了动态设置user-agent 加delay 会弹出验证码;
想请教一下大家有没有更好的方法

似乎不能更改配送地址？

亚马逊location address是**的话，那么将会有一大半的商品不会出现再商品搜索页

亚马逊的配送地址如何设置，是哪些参数设置的

search words

hello there, there is missing information about how exactly is the search key words work, there are a lot of tables but no data to put in the tables, because of this reason the code doesn't run.

I'll appreciate if you could help or give advice on how to run it, because it's expecting to get a keyword and asin in the database but no database exists in the repo.

User timeout caused connection failure.

Don't know if it is because of the Amazon has detected our bot and block the IP?

But
https://www.amazon.com/best-sellers-video-games/zgbs/videogames/?aja
x=1&pg=3
Indeed doesn't existed, there is no page 3 there .

https://www.amazon.com/Best-Sellers-Sports-Outdoors/zgbs/sporting-go%20ods/?ajax=1&pg=2
is correct, I can open it with chrome browser.

How can I set up the proxy?
Because of this error, it will lose all the data even already got some data from previous pages?

twisted.internet.error.TimeoutError: User timeout caused connection failure.
2018-11-19 23:40:32 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.amazon.com/best-sellers-video-games/zgbs/videogames/?aja
x=1&pg=3>
Traceback (most recent call last):
File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_r
equest
defer.returnValue((yield download_func(request=request,spider=spider)))
scrapy.core.downloader.handlers.http11.TunnelError: Could not open CONNECT tunnel with proxy 46.38.52.36:8081 [{'status': 400, 'reason': b'B
ad Request'}]
2018-11-19 23:40:36 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.amazon.com/Best-Sellers-Sports-Outdoors/zgbs/sporting-go
ods/?ajax=1&pg=2>
Traceback (most recent call last):
File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/twisted/python/failure.py", line 491, in throwExceptionIntoG
enerator
return g.throw(self.type, self.value, self.tb)
File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_r
equest
defer.returnValue((yield download_func(request=request,spider=spider)))
File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/scrapy/core/downloader/handlers/http11.py", line 320, in _cb
_timeout
raise TimeoutError("Getting %s took longer than %s seconds." % (url, timeout))
twisted.internet.error.TimeoutError: User timeout caused connection failure: Getting https://www.amazon.com/Best-Sellers-Sports-Outdoors/zgb
s/sporting-goods/?ajax=1&pg=2 took longer than 30.0 seconds..
2018-11-19 23:41:51 [scrapy.core.scraper] ERROR: Error downloading <GET https://www.amazon.com/best-sellers-software/zgbs/software/?ajax=1&p
g=2>
Traceback (most recent call last):
File "/home/john/anaconda2/envs/amazon-scrapy/lib/python3.6/site-packages/scrapy/core/downloader/middleware.py", line 43, in process_r
equest
defer.returnValue((yield download_func(request=request,spider=spider)))
twisted.internet.error.TimeoutError: User timeout caused connection failure.
(1030, 'Got error 168 from storage engine')
total spent: 0:52:23.652052
done

这代理没用吧？？

https://github.com/dynamohuang/amazon-scrapy/blob/master/amazon/amazon/middlewares/ProxyMiddleware.py

访问https的网站用http+ip_port 怎么会生效

keyword和review 爬虫的使用？

我使用的是scrapy 1.5,有部分错误，主要是utf8编码及其它问题，现在已能正常运行，但仍有几个问题请教：
1、product无保存item的代码及sql，我计划自行编写，不知后期会更新还是由使用者根据需要完善？
2、asin、cate、detail 3个爬虫可独立运行，keyword和review等如何执行？独立运行提示参数不足？

ModuleNotFoundError when trying to use amazon asin spider

Hi, I would like to use this repo to get some info on amazon products. I'm not very familiar with scrapy (yet), and here's what I did :
-Git clone your project
-install requirement
-cd amazon-scrapy/amazon
-scrapy crawl asin
I get the following error :

Traceback (most recent call last):
  File "/home/user/miniconda3/envs/scrap/bin/scrapy", line 10, in <module>
    sys.exit(execute())
  File "/home/user/miniconda3/envs/scrap/lib/python3.7/site-packages/scrapy/cmdline.py", line 109, in execute
    settings = get_project_settings()
  File "/home/user/miniconda3/envs/scrap/lib/python3.7/site-packages/scrapy/utils/project.py", line 68, in get_project_settings
    settings.setmodule(settings_module_path, priority='project')
  File "/home/user/miniconda3/envs/scrap/lib/python3.7/site-packages/scrapy/settings/__init__.py", line 292, in setmodule
    module = import_module(module)
  File "/home/user/miniconda3/envs/scrap/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'amazon.settings'

Any idea how to fix that?

关于特定关键词抓取对应ASIN的排名

您好。
我是电商小白，最近在弄爬亚马逊网页的关键词对应ASIN的排名。
在第一步我就卡住了，找不到亚马逊network里面的cookie，只有一个user-agent和一些accept数据。
请问您是通过cookie爬，还是通过其他一些途径？望您有空的时候，可以指点小弟一二。

When the amazon review spider available?

Hi, thanks for open source the project.

When the amazon review spider available?
I would like to get some data about amazon product review.

Thanks.
John

dynamohuang / amazon-scrapy Goto Github PK

amazon-scrapy's People

Contributors

Stargazers

Watchers

Forkers

amazon-scrapy's Issues

Help in running the code

amazon反爬机制问题

似乎不能更改配送地址？

亚马逊的配送地址如何设置，是哪些参数设置的

search words

User timeout caused connection failure.

这代理没用吧？？

keyword和review 爬虫的使用？

ModuleNotFoundError when trying to use amazon asin spider

关于特定关键词抓取对应ASIN的排名

When the amazon review spider available?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent