Giter Club home page Giter Club logo

bthello-app's Introduction

BTHello Python3 DHT磁力爬虫

简介

这是一个 magnet磁力连接爬虫,通过伪装成一个 DHT 节点,接收其他节点发过来的请求信息,提取相关的 magnet磁力链接。 然后实现BitTorrent BEP-9协议来获取种子文件信息,把文件信息存入redis。

地址

bthello 爬虫程序

bthello-app 入库程序&web搜索

技术栈

使用redis存储种子基本信息 infohash当key 避免了重复

使用elasticsearch为用户提供搜索功能

入库程序说明

入库程序会定时从redis获取数据同步到es infohash作为es id 避免重复 如果id相同 es version+1 version值越大说明种子热度越高

BTHello安装

说明

  • 爬虫程序和入库&web搜索程序是两个工程 可以分开部署 根据自己的需求
  • 比如有a b c 3台服务器分布部署 爬虫 入库 web搜索

运行环境

  • Python3.x
  • redis4.x
  • elasticsearch6.x 需要安装ik分词器

爬虫程序部署

git clone https://github.com/xieh1995/bthello.git
cd bthello

#修改redis配置
vi config.py

# redis 地址
REDIS_HOST = "你的redis ip"
# redis 端口
REDIS_PORT = 你的redis ip

#安装依赖包
pip3 install -r requirements.txt
#运行
python3 run.py

#后台运行
nohub python3 run.py &
#日志查看
tail -f nohub.out

运行成功 等待几分钟出现如下输出:

1

就说明已经在爬取了 同时可以看redis[0] 有无数据

image-20190120155238627

入库程序 & web搜索部署

git clone https://github.com/xieh1995/bthello-app.git
cd bthello-app

#修改redis es配置
vi config.py

# redis 地址
REDIS_HOST = "你的redis ip"
# redis 端口
REDIS_PORT = 你的redis ip

#elastics 索引名称
ELASTICS_INDEX_NAME = 'bt_metadata'
#elastics 索引类型
ELASTICS_INDEX_TYPE = 'doc'
# elastics 地址
ELASTICS_HOST = "你的es ip"
# elastics 端口
ELASTICS_PORT = 你的es 端口

#安装依赖包
pip3 install -r requirements.txt

#运行参数说明
-w		#启动web搜索
-m		#启动入库程序
-a		#同时启动web搜索 入库程序
-port	#web搜索端口 默认8000

#根据自己需求启动
python3 main.py -m
python3 main.py -w -port=80
python3 main.py -a -port=80

入库程序运行成功日志

image-20190120154911906

web搜索运行成功页面

image-20190120154911906

web搜索可以访问 ip:port

BTHello常见问题

好久有数据?

1 - 20分钟内 服务器必须可外网访问 爬虫数据在redis[0]

es好久才有数据?

正常情况只要启动了入库程序 2秒执行一次任务 马上就会用 前提是redis[0] 有数据 入库数据在redis[1] 也存了一份

有任何问题可以通过Issues提问

TODO

  • 完成web页面相关 完成
  • 优化多线程 完成
  • 优化入库程序避免重复入库 完成

bthello-app's People

Contributors

rehe0x avatar

Stargazers

 avatar  avatar  avatar Andyfoo avatar  avatar  avatar azitak avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar 薄荷七喜 avatar 布衣男儿 avatar xtyzhen avatar  avatar Rayz Clowder avatar  avatar  avatar  avatar

Watchers

James Cloos avatar  avatar

bthello-app's Issues

bthello-app运行报错

root@ubuntu-dev:~/bthello-app# python3 main.py -a -port=80
PUT http://192.168.1.211:9200/bt_metadata [status:400 request:0.013s]
Traceback (most recent call last):
File "main.py", line 55, in
command_line_runner()
File "main.py", line 46, in command_line_runner
me.init_index()
File "/root/bthello-app/metadata_storage.py", line 14, in init_index
ElasticsClients.create_index(_index_mappings)
File "/root/bthello-app/common/database.py", line 82, in create_index
res = self.es.indices.create(index=self.index_name, body=_index_mappings)
File "/usr/local/lib/python3.6/dist-packages/elasticsearch/client/utils.py", line 73, in _wrapped
return func(*args, params=params, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/elasticsearch/client/indices.py", line 107, in create
params=params, body=body)
File "/usr/local/lib/python3.6/dist-packages/elasticsearch/transport.py", line 312, in perform_request
status, headers, data = connection.perform_request(method, url, params, body, ignore=ignore, timeout=timeout)
File "/usr/local/lib/python3.6/dist-packages/elasticsearch/connection/http_urllib3.py", line 129, in perform_request
self._raise_error(response.status, raw_data)
File "/usr/local/lib/python3.6/dist-packages/elasticsearch/connection/base.py", line 125, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)
elasticsearch.exceptions.RequestError: TransportError(400, 'mapper_parsing_exception', 'analyzer [ik_max_word] not found for field [file_list]')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.