search_engine's Introduction

Search_Engine

项目名称

针对boost库的小型搜索引擎

项目目标

服务器能接收到客户端的请求，并根据客户的请求将搜索内容返回给用户。

技术特点

1.数据处理模块。对boost库的html文件进行处理，首先剔除目录、图片等非html文件，其实对html文件进行去标签处理，解析出标题，url，正文等。

2.索引模块。根据处理后的文件构建正排索引及倒排索引。正排索引是根据文本id找到文本内容，倒排索引是根据搜素关键词找到对应文本id。

3.搜索模块。根据用户搜索的查询词，对索引进行查找，最终找出搜索的相关内容。

1).分词。对查询词进行分词处理，这里我们调用结巴分词。https://github.com/yanyiwu/cppjieba

2).触发。针对每个分词结果查找倒排索引，找到这些词在哪个文件中出现过，返回对应的文本id。

3).排序。按照词的出现频率进行降序排序，文本相关度高的先出现。

4).构造返回结果。根据触发得到的id列表查找正排索引得到搜索结果。

提示：没有学习前端，并没有注意返回界面。

效果图

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.

Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

TensorFlow

An Open Source Machine Learning Framework for Everyone

Django

The Web framework for perfectionists with deadlines.

Laravel

A PHP framework for web artisans

D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

web

Some thing interesting about web. New door for the world.

server

A server is a program made to process requests and deliver data to clients.

Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

Visualization

Some thing interesting about visualization, use data art

Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.

Microsoft

Open source projects and samples from Microsoft.

Google

Google ❤️ Open Source for everyone.

Alibaba

Alibaba Open Source for everyone

D3

Data-Driven Documents codes.

Tencent

China tencent open source team.

lvxinup / search_engine Goto Github PK