arachnid's Introduction

Arachnid

Welcome to Arachnid, an experimental general-purpose web crawler, written in Rust.

Arachnid functions by driving a real web browser (Firefox). It then uses the standard WebDriver protocol to gather information from pages. Not much is done yet (e.g., it doesn't follow links, have a queue for pages, or store the data anywhere...)

To run, first install Rust, Firefox, and geckodriver. Then:

geckodriver &
cargo run

The current output is the visible text of recurse.com (where I am working on the project!)

The project includes files for debugging in VSCode. I am a beginner in Rust, so feedback/contributions very welcome!

Next steps include building a queue for pages, respecting robots.txt, grabbing links and metadata, and perhaps other useful info from the page that will help build a search index.

Recommend Projects

tmickel / arachnid Goto Github PK

arachnid's Introduction

Arachnid

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent