Giter Club home page Giter Club logo

hydra's Introduction

Hydra(九头蛇)


GitHub Workflow Status Codacy grade Codacy coverage
简单但绝不简陋的 Python3 爬虫项目。
参考「建立完美的 Python 项目」 创建

Hydra 力求用最简单的代码实现聚合 HG 多平台的数据。

从本项目中你可以看到:熟悉的 Python 基础语法如何编写爬虫操作数据库常用第三库分析网页解析接口编写单元测试mock 请求异常监控和管理保证代码质量的自动化GitHub Action 等实战应用。

此项目是汇集「HelloGitHub」在每个平台的账号和内容数据,方便我们的作者们看到自己作品的数据(投稿吗?)。支持平台:博客园头条知乎掘金即刻 等。

你要加入我们吗?

一、运行

基于 Python 3.9.1 实现,理论上支持 3.7.5+

首先,下载项目:git clone or 点击下载 zip 包

然后,在项目根目录创建配置文件,.local_env.yaml

最后,把玩起来吧!

  1. 安装 poetry:pip install poetry

  2. 安装依赖:在项目根目录执行 poetry install --no-root

  3. 运行单个爬虫:poetry run python main.py wechat|cnblogs|toutiao|csdn|zhihu|juejin|jike

运行遇到问题和更多说明点这里,贡献代码看这里

二、展示

比如:查看某一日发布的原创文章数据

SELECT
	summary ,
	clicks_count ,
	platform ,
	publish_date
FROM
	hydra_content
WHERE
	content_type = "article"
AND publish_date = "2021-03-01"
AND(
	is_original = 1
	OR is_original IS NULL
);
+-----------------------------------------+----------------+------------+----------------+
| summary                                 |   clicks_count | platform   | publish_date   |
|-----------------------------------------+----------------+------------+----------------|
| 更新啦!第 59 期《HelloGitHub》开源月刊 |             77 | csdn       | 2021-03-01     |
| 更新啦!第 59 期《HelloGitHub》月刊     |           5133 | wechat     | 2021-03-01     |
| 更新啦!第 59 期《HelloGitHub》开源月刊 |           1022 | cnblogs    | 2021-03-01     |
| 更新啦!第 59 期《HelloGitHub》开源月刊 |           1053 | toutiao    | 2021-03-01     |
| 更新啦!第 59 期《HelloGitHub》开源月刊 |           1879 | zhihu      | 2021-03-01     |
| 更新啦!第 59 期《HelloGitHub》开源月刊 |            931 | juejin     | 2021-03-01     |
+-----------------------------------------+----------------+------------+----------------+
6 rows in set
Time: 0.050s

三、声明

知识共享许可协议
本作品采用 署名-非商业性使用-禁止演绎 4.0 国际 进行许可。

hydra's People

Contributors

521xueweihan avatar zhengxiaotian avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.