wanzixin / sinaweibo-locationsignin-spider Goto Github PK
View Code? Open in Web Editor NEW以城市为单位爬取新浪微博移动端poi与poi下的微博信息
以城市为单位爬取新浪微博移动端poi与poi下的微博信息
Traceback (most recent call last):
File "C:/Users/XXJ/PycharmProjects/pythonProject1/poicrawler/crawler.py", line 254, in
main()
File "C:/Users/XXJ/PycharmProjects/pythonProject1/poicrawler/crawler.py", line 234, in main
spider.get_poi(ippool)
File "C:/Users/XXJ/PycharmProjects/pythonProject1/poicrawler/crawler.py", line 64, in get_poi
pois_id.append(poi_id.group())
AttributeError: 'NoneType' object has no attribute 'group'
我运行了代码,提示我上面的错误。请问大佬知道原因吗?错误指向以下部分:
res = requests.get(cityURL+'&page='+str(page),proxies = proxy_ip,headers = headers)
if res.status_code == 200:
info = json.loads(res.text)
if info['ok'] == 1:
card_group = info['data']['cards'][0]['card_group']
print(card_group)
print(len(card_group))
for i in range(0,len(card_group)):
poi_id = re.search(r'100101B2094[A-Z0-9]{15}',card_group[i]['scheme'])
pois_id.append(poi_id.group())
pois_name.append(card_group[i]['title_sub'])
else:
print('这座城市poi已经爬取完毕了。')
目前看到代码中有关POI的获取是每个城市10页,是否有办法进行扩展呢?谢谢!
你好,脚本中使用的西刺代理已经失效了,而且在更换代理之后爬虫脚本依然无法正确工作,希望作者有空时能更新一下。
可否更换一下目前可用的代理?非常感谢!
您好,请问在爬取代理的时候出现如下错误应该怎么解决呢?
----------------爬取代理使用的ip为: {'http': '223.241.119.42:47972'} --------------------
Traceback (most recent call last):
File "D:\Program Files\python36\lib\urllib\request.py", line 1318, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "D:\Program Files\python36\lib\http\client.py", line 1239, in request
self._send_request(method, url, body, headers, encode_chunked)
File "D:\Program Files\python36\lib\http\client.py", line 1285, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "D:\Program Files\python36\lib\http\client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "D:\Program Files\python36\lib\http\client.py", line 1026, in _send_output
self.send(msg)
File "D:\Program Files\python36\lib\http\client.py", line 964, in send
self.connect()
File "D:\Program Files\python36\lib\http\client.py", line 1392, in connect
super().connect()
File "D:\Program Files\python36\lib\http\client.py", line 936, in connect
(self.host,self.port), self.timeout, self.source_address)
File "D:\Program Files\python36\lib\socket.py", line 724, in create_connection
raise err
File "D:\Program Files\python36\lib\socket.py", line 713, in create_connection
sock.connect(sa)
socket.timeout: timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Program Files\python36\lib\site-packages\fake_useragent\utils.py", line 67, in get
context=context,
File "D:\Program Files\python36\lib\urllib\request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "D:\Program Files\python36\lib\urllib\request.py", line 526, in open
response = self._open(req, data)
File "D:\Program Files\python36\lib\urllib\request.py", line 544, in _open
'_open', req)
File "D:\Program Files\python36\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "D:\Program Files\python36\lib\urllib\request.py", line 1361, in https_open
context=self._context, check_hostname=self._check_hostname)
File "D:\Program Files\python36\lib\urllib\request.py", line 1320, in do_open
raise URLError(err)
urllib.error.URLError:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "E:/Pycharm/weiboqiandao/crawler.py", line 255, in
main()
File "E:/Pycharm/weiboqiandao/crawler.py", line 229, in main
ippool = build_ippool()
File "E:\Pycharm\weiboqiandao\buildip.py", line 82, in build_ippool
results = p.get_proxy(page)
File "E:\Pycharm\weiboqiandao\buildip.py", line 37, in get_proxy
res = requests.get(url, proxies=proxy_ip, headers={'User-Agent': UserAgent(use_cache_server=False).random})
File "D:\Program Files\python36\lib\site-packages\fake_useragent\fake.py", line 69, in init
self.load()
File "D:\Program Files\python36\lib\site-packages\fake_useragent\fake.py", line 78, in load
verify_ssl=self.verify_ssl,
File "D:\Program Files\python36\lib\site-packages\fake_useragent\utils.py", line 250, in load_cached
update(path, use_cache_server=use_cache_server, verify_ssl=verify_ssl)
File "D:\Program Files\python36\lib\site-packages\fake_useragent\utils.py", line 245, in update
write(path, load(use_cache_server=use_cache_server, verify_ssl=verify_ssl))
File "D:\Program Files\python36\lib\site-packages\fake_useragent\utils.py", line 178, in load
raise exc
File "D:\Program Files\python36\lib\site-packages\fake_useragent\utils.py", line 154, in load
for item in get_browsers(verify_ssl=verify_ssl):
File "D:\Program Files\python36\lib\site-packages\fake_useragent\utils.py", line 97, in get_browsers
html = get(settings.BROWSERS_STATS_PAGE, verify_ssl=verify_ssl)
File "D:\Program Files\python36\lib\site-packages\fake_useragent\utils.py", line 84, in get
raise FakeUserAgentError('Maximum amount of retries reached')
fake_useragent.errors.FakeUserAgentError: Maximum amount of retries reached
请问大佬这个代码可以维护更新一下吗?是否还可以利用这个代码爬取?我试了一下,无法爬取。先谢谢了
尊敬的开发者,您好!
我最近正在follow你的项目以期分析地点数据,但在运行中有以下错误:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.