Giter Club home page Giter Club logo

scrapio's Introduction

Codacy Badge GoDoc Go Report Card

Scrapio

Scrapio - is a lightweight and user-friendy web crawling and scraping library. The main goal of creating the project was to make scraping big amounts of similar data from web easy and user-friendly. It might be useful for wide range of applications, like data mining, data processing and archiving. After some time, I am going to make it a standalone service, which will work as an API.

Installation

Features

At the moment works as a library which can be used to crawl and scrap data from web. What it can do:

  • Crawl all pages on host, return all the links.
  • Scrap text, image urls and links from Crawl Result pages.
  • It leaves the choice of data output(csv,json, etc) up to you.
  • It's free and quite powerful.
  • Written in go, concurrent, depending on Network Speed can crawl and scrap up to 2k pages/minute.

Installation

go get github.com/koshqua/scrapio 

Usage

Crawler is easy to use. You just need to specify a starting URL and it will crawl all the URL on the host.

    //init a new crawler, give it a start url, it's not necessary should be basic URL
    cr := &crawler.Crawler{StartURL: "https://gulfnews.com/"}
    //Start crawling func. 
    //After some time im going to implement more configs for this func, like max results, etc.
    cr.Crawl()
    //Do something with result, it's up to you

Scraper uses data structure given by crawler. Before initiating a scraper, you need to create a few selectors, to assign them to scraper. Selectors are the simple css-like selectors.

    //create some Selectors, which you want to scrap.
    h2 := scraper.NewSelector("h2", true, true, true)
    img := scraper.NewSelector("img", true, true, true)
    p := scraper.NewSelector("p:first-of-type", true, true, true)
    //Initiate a new scrapper with given selectors
    //Scraper depends on the crawler from previous code snippet.
    //It gets pages and creates new structure with selectors and scrap results.
    sc := scraper.InitScraper(*cr, []scraper.Selector{h2, img, p})
    //And just start scraping
	err := sc.Scrap()
	if err != nil {
		log.Fatalln(err)
	}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.