chenjiandongx / 51job-spider Goto Github PK
View Code? Open in Web Editor NEW🔎 前程无忧 Python 招聘岗位信息爬取和分析
License: MIT License
🔎 前程无忧 Python 招聘岗位信息爬取和分析
License: MIT License
请问你的可视化制图工具是什么,图表制作的很好看
反爬怎么处理的呀
href, post = b.find("a")["href"], b.find("a")["title"]
这行代码会报错了
Will you please give me a format of this file? currently it is not "UTF-8".
该url模板第一页的第一条数据的网页结构和其他都不一样,导致解析错误。解析的时候try过滤或者单独加一个逻辑来处理。
x + (y - x) * 0.4 应该改为 (x + y) * 0.5 * 0.5,前一个0.5是取区间中值,后一个0.5是除去虚假招聘浮沫(乐观估计)
counter[seg]` = counter.get(seg, 1) + 1
默认seg这个地方是不是应该是0,因为你后面还有一个加1,你想如果没有找到seq的话,你把默认值搞成1,后面你又加一个1,那这个第一次出现的dict value不就是2了?
counter[seg]` = counter.get(seg, 0) + 1
是不是应该是上面的这样
分词请问是怎么做的?
2019-09-23 01:04:32,404 - 爬取第 748 条岗位详情
爬完之后 就一直停滞在这了,只出现post_require_new.txt不太懂哪里出了出了差错
中间有一段儿是:
Traceback (most recent call last):
File "src\gevent\greenlet.py", line 766, in gevent._greenlet.Greenlet.run
File "C:/Users/Administrator/Job/job_spider.py", line 106, in post_require
html = resp.content.decode("gbk")
UnicodeDecodeError: 'gbk' codec can't decode byte 0xae in position 1060: illegal multibyte sequence
2019-09-22T17:03:11Z <Greenlet at 0x178d629c158: <bound method JobSpider.post_require of <main.JobSpider object at 0x00000178D6288C88>>> failed with UnicodeDecodeError
您能帮忙解惑吗?
我改了改你的代码,想多爬点数据,结果大致爬了100页左右以后就被封了。之后就再也获取不到数据了,而且....开代理,改IP都,改浏览器标识都不行,51job反爬虫这么牛逼啊
Traceback (most recent call last):
File "job_spider.py", line 338, in
spider.run()
File "job_spider.py", line 329, in run
self.job_spider()
File "job_spider.py", line 90, in job_spider
bs = BeautifulSoup(html, "lxml").find("div", class_="dw_table").find_all("div", class_="el")
AttributeError: 'NoneType' object has no attribute 'find_all'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.