Giter Club home page Giter Club logo

boss_zhipin_spider's Introduction

Boss_zhipin_spider

🔎 Boss 直聘 Python 招聘岗位信息爬取和分析🔎

爬取了BOSS直聘上 Python 关键字全国范围内的招聘岗位

部分城市无数据不列入统计,故地区范围为**101个城市,总共3112条数据,结构如下:

字段浅显易懂,其中需要说明的是pid为每个待招岗位的唯一id,在访问页面详情时会用到

注意爬取不要太快,否则403警告😏

项目结构

boss招聘.ipynb -> 生成分析图表

mongo_connect.py -> 数据清洗,存入mongodb

pipelines.py -> 数据过滤的管道

spider -> 爬虫

wordcloud -> 生成词云

settings.py -> scrapy配置文件

middlewares.py -> scrapy中间组件

运行方式

pip install -r requirements.txt
scrapy crawl zhipin -o jobs_python.json

使用 Jupyter Notebook 配合 echarts进行绘图(绘图部分由我可爱的girl编写,真的很棒),部分示例图如下所示🔍

如果能帮上你的话,献上一个小小的 Star 👍吧

后续慢慢补充

  • 智能识别302跳转的验证码,并进行输入
  • 多线程爬虫

boss_zhipin_spider's People

Contributors

leomalik avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.