Giter Club home page Giter Club logo

mysearchengine's Introduction

1 General Description: This is an personal search engine developed with C++. And used some feature of boost, like asio, tokenizer, serialization. This search engin support some simple but important features like pages crawling, posting index establishing and key word querying. And I also specially developed a simple and reliable thread pool which is class template supporting different types of jobs. More features will developed in the future, as I learn more on search engine and related natural language processing.

2 Files Description:

PART I. Source files of the thread pool, crawler and Indexer. --ThreadPool.h --ThreadPool.cpp --Crawler.h --Crawler.cpp --Indexer.h --Indexer.cpp

PART II Test example files of the thread pool, crawler and Indexer. --ThreadPoolTest.cpp --CrawlerTest.cpp --IndexerTest.cpp

PART III Executable binary files. If you have not installed the needed boost library, it may not run as expected. --CrawlerTest --IndexerTest

PART IV Output files of the crawler and indexer during run. Including a depository of pages crawled from www.boost.org, serialized posting index(.idx) and doc set(.dst). --Sat_Aug_24_15_48_12_2013_crawled --Sat_Aug_31_21_02_09_2013_indexed.idx --Sat_Aug_31_21_02_09_2013_indexed.dst

3 Build Description: Because this program used some feature of boost, like asio, tokenizer, serialization. So if you want to build the source code you may need install a boost library first. The boost version used in the developing is 1.54.0. Here just give the simple g++ command line when build the three examples for reference.

//ThreadPoolTest.cpp g++ ThreadPool.h ThreadPoolTest.cpp -o ThreadPoolTest -lpthread

//Note that threadPool.h is not a ordinary header file. It's a combination source file of class template.

//CrawlerTest.cpp g++ Crawler.cpp CrawlerTest.cpp -o CrawlerTest -lpthread -lboost_system -lboost_regex -lboost_exception -lboost_thread

//IndexerTest.cpp g++ Indexer.cpp IndexerTest.cpp -o IndexerTest -lboost_serialization

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.