Giter Club home page Giter Club logo

crystagiri's Introduction

Crystagiri

An HTML parser library for Crystal like the amazing Nokogiri Ruby gem.

I won't pretend that Crystagiri does much as Nokogiri. All help is welcome! :)

Installation

Add this to your application's shard.yml:

dependencies:
  crystagiri:
    github: madeindjs/crystagiri

and then run

$ shards install

Usage

require "crystagiri"

Then you can simply instantiate a Crystagiri::HTML object from an HTML String like this

doc = Crystagiri::HTML.new "<h1>Crystagiri is awesome!!</h1>"

... or directly load it from a Web URL or a pathname:

doc = Crystagiri::HTML.from_file "README.md"
doc = Crystagiri::HTML.from_url "http://example.com/"

Also you can specify follow: true flag if you want to follow redirect URL

Then you can search all XML::Nodes from the Crystagiri::HTML instance. The tags found will be Crystagiri::Tag objects with the .node property:

  • CSS query
puts doc.css("li > strong.title") { |tag| puts tag.node}
# => <strong class="title"> .. </strong>
# => <strong class="title"> .. </strong>

Known limitations: Currently, you can't use CSS queries with complex search specifiers like :nth-child

  • HTML tag
doc.where_tag("h2") { |tag| puts tag.content }
# => Development
# => Contributing
  • HTML id
puts doc.at_id("main-content").tagname
# => div
  • HTML class attribute
doc.where_class("summary") { |tag| puts tag.node }
# => <div class="summary"> .. </div>
# => <div class="summary"> .. </div>
# => <div class="summary"> .. </div>

Benchmark

I know you love benchmarks between Ruby & Crystal, so here's one:

require "nokogiri"
t1 = Time.now
doc = Nokogiri::HTML File.read("spec/fixture/HTML.html")
1..100000.times do
  doc.at_css("h1")
  doc.css(".step-title"){ |tag| tag }
end
puts "executed in #{Time.now - t1} milliseconds"

executed in 00:00:11.10 seconds with Ruby 2.6.0 with RVM on old Mac

require "crystagiri"
t = Time.now
doc = Crystagiri::HTML.from_file "./spec/fixture/HTML.html"
1..100000.times do
  doc.at_css("h1")
  doc.css(".step-title") { |tag| tag }
end
puts "executed in #{Time.now - t} milliseconds"

executed in 00:00:03.09 seconds on Crystal 0.27.2 on LLVM 6.0.1 with release flag

Crystagiri is more than two time faster than Nokogiri!!

Development

Clone this repository and navigate to it:

$ git clone https://github.com/madeindjs/crystagiri.git
$ cd crystagiri

You can generate all documentation with

$ crystal doc

And run spec tests to ensure everything works correctly

$ crystal spec

Contributing

Do you like this project? here you can find some issues to get started.

Contributing is simple:

  1. Fork it ( https://github.com/madeindjs/crystagiri/fork )
  2. Create your feature branch git checkout -b my-new-feature
  3. Commit your changes git commit -am "Add some feature"
  4. Push to the branch git push origin my-new-feature
  5. Create a new Pull Request

Contributors

See the list on Github

crystagiri's People

Contributors

daviscodesbugs avatar giuseongit avatar madeindjs avatar opes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crystagiri's Issues

Retag latest release with v0.3.4

The last version is tagged 0.3.4. This causes shards to fail the fetch of this project as a described in #9.
As described in here shards won't accept anymore tags that do not start with the v, so as long as the tag isn't fixed the only way to use crystagiri as a depencency is to target either v0.3.2 or a branch such as master.

Implement css selector parsing

Crystagiri currently translates css selectors into xpath query to retrieve elements. Since commit e70753b the parser has now an internal set of pointers to quickly access some particular nodes, so a parser to handle css selectors would be great for rewriting HTML#css and HTML#at_css to speed up tag retrival.

Xpath example?

Hello thanks for your hard work. Is there any find by xpath example? I cant find.

Also how can I find subtags under a tag? For example there is six td tag under tr tag; I want to select them and read its content.

Thanks!

Make blocks return

Probably there are quick ways of doing this. But wouldn't it make more sense if the blocks returned so we can apply map(), filter() and such things afterwards?

Can't require it with crystal v0.27.0

I am getting the following:

$ shards install -v
git ls-remote --get-url origin
Fetching https://github.com/madeindjs/crystagiri.git
git fetch --all --quiet
git tag --list --column=never
versions: 0.3.2, 0.3.1, 0.3.0, 0.2.1, 0.2.0, 0.1.0, 0.3.4
git ls-tree -r --full-tree --name-only v0.3.4 -- shard.yml
Failed git ls-tree -r --full-tree --name-only v0.3.4 -- shard.yml (). Maybe a commit, branch or file doesn't exist?

The contents of my shards.yml file:

name: lc-parseventos
version: 0.1.0

authors:
  - Rene Bon Ciric <[email protected]>

description: |
  Un parser a YAML; para eventos de la página de LinuxCabal

targets:
  lc-parseventos:
    main: src/lc-parseventos.cr

crystal: 0.27.0

license: GPLv3

dependencies:
  crystagiri:
    github: madeindjs/crystagiri

Shard name does not match dependency name

Here is my terminal output

➜ shards install
Resolving dependencies
Fetching https://github.com/madeindjs/crystagiri.git
Error shard name (cristagiri) doesn't match dependency name (crystagiri)

My shard.yml

...
dependencies:
  crystagiri:
    github: madeindjs/crystagiri
...

Please fix this.

Parsing with different IPs?

Great library!

I am new to Crystal and I am wondering how one would use your library with multiple IPs? With curl I can simply add a --interface parameter, is something like this possible with Crystagiri?

Thanks

Parsing non HTML, XML documents

Just wondering because there's no examples and I can't find anything in the source, how is this for parsing and exploring other XML documents such as RSS/Atom feeds? Should a new Document format be created specifically for navigating XML?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.