Giter Club home page Giter Club logo

scrapysharp's Introduction

Getting started

ScrapySharp has a Web Client able to simulate a real Web browser (handle referrer, cookies …)

Html parsing has to be as natural as possible. So I like to use CSS Selectors and Linq.

This framework wraps HtmlAgilityPack.

Basic examples of CssSelect usages

using System.Linq;
using HtmlAgilityPack;
using ScrapySharp.Extensions;

class Example
{
    public void Main()
    {
        var divs = html.CssSelect("div");  //all div elements
        var nodes = html.CssSelect("div.content"); //all div elements with css class ‘content’
        var nodes = html.CssSelect("div.widget.monthlist"); //all div elements with the both css class
        var nodes = html.CssSelect("#postPaging"); //all HTML elements with the id postPaging
        var nodes = html.CssSelect("div#postPaging.testClass"); // all HTML elements with the id postPaging and css class testClass

        var nodes = html.CssSelect("div.content > p.para"); //p elements who are direct children of div elements with css class ‘content’

        var nodes = html.CssSelect("input[type=text].login"); // textbox with css class login
    }
}

Scrapysharp can also simulate a web browser

ScrapingBrowser browser = new ScrapingBrowser();

//set UseDefaultCookiesParser as false if a website returns invalid cookies format
//browser.UseDefaultCookiesParser = false;

WebPage homePage = browser.NavigateToPage(new Uri("http://www.bing.com/"));

PageWebForm form = homePage.FindFormById("sb_form");
form["q"] = "scrapysharp";
form.Method = HttpVerb.Get;
WebPage resultsPage = form.Submit();

HtmlNode[] resultsLinks = resultsPage.Html.CssSelect("div.sb_tlst h3 a").ToArray();

WebPage blogPage = resultsPage.FindLinks(By.Text("romcyber blog | Just another WordPress site")).Single().Click();

Install Scrapysharp in your project

It's easy to use Scrapysharp in your project.

A Nuget package exists on nuget.org and on myget

News

Scrapysharp V3 is a reborn.

Old version under GPL license is still on bitbucket

Version 3 is a conversion to .net standard 2.0 and a relicensing.

scrapysharp's People

Contributors

rflechner avatar

Watchers

James Cloos avatar Vlad Vizitu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.