Giter Club home page Giter Club logo

html_editor's Introduction

html_editor's People

Contributors

arduano avatar lomirus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

html_editor's Issues

On XML and Node spans

I'm looking to replace the current HTML parser that's being used in a project of mine and I came across this one. It looks like it would be a nice replacement, and by my estimates, would bring the final binary size down by about 16%.

There's two things stopping me from trying it out:

  1. The ability to parse simple XML (from an EPUB) which would require the the ability to parse the XML doc type and possible namespace attributes.
  2. The tracking of Node spans, as in the offset the where Node starts and stops at in.

Would these be something you'd consider adding or accepting a pull request for?

Multiple classes on the same html element are parsed as a single class

An html snippet like

<div class="a b">
</div>

gets parsed into an Element like

Element { name: "div", attrs: [("class", "a b")], children: [] }

There is only one classs attribute which contains both classes seperated by whitespace.
This leads to queries for a only one of the two classes to fail.

This test fails.

    #[test]
    fn html_editor_multiple_class_parsing() {
        let test_snippet = r#"<div class="a b"></div>"#;
        let result = parser::parse(test_snippet).unwrap();
        print!("{:?}", result);      
        // This selector fails
        let selector = Selector::from(".a");
       // This selector works
       // let selector = Selector::from(".a b");

        result.query(&selector).unwrap();
        assert!(true);
    }

My expected behaviour would be a seperate class attribute for each class inside the html.
In this case:

Element { name: "div", attrs: [("class", "a"),("class", "b")], children: [] }

query with numerous selectors at once

Hello, I would like to query_all all the headers (regardless of size) present, as the order matters. If I individually search for each header (h1, h2, h3 etc.) I will get them ordered based on their size, h1 first, h2 second and so on. I've looked through the docs and cannot seem to find a way to do this, so if one is indeed not present I suggest the following syntax.

fn query_all(&self, selector: Vec<&Selector>) -> Vec<Element>
so usage would be something like:
dom.query_all(vec![&Selector::from("h1"), &Selector::from("h2"), ...]);

try_parse stuck in infinite loop

The following HTML will cause try_parse to get stuck in an infinite loop

https://gist.github.com/lucasavila00/ae14f1b3284879add91f712663bdb4c7

parse does fail, with:

running 1 test
thread 'test_inf_loop' panicked at packages/rust-html2/src/lib.rs:494:37:
failed to parse: "<notranslate> is not closed"
stack backtrace:
   0: rust_begin_unwind
             at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:595:5
   1: core::panicking::panic_fmt
             at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/panicking.rs:67:14
   2: core::result::unwrap_failed
             at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/result.rs:1652:5
   3: core::result::Result<T,E>::expect
             at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/result.rs:1034:23
   4: rust_webpack_template::transform_html_unbound
             at ./src/lib.rs:494:19
   5: rust_webpack_template::test_inf_loop
             at ./src/lib.rs:859:35
   6: rust_webpack_template::test_inf_loop::{{closure}}
             at ./src/lib.rs:852:20
   7: core::ops::function::FnOnce::call_once
             at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/ops/function.rs:250:5
   8: core::ops::function::FnOnce::call_once
             at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
test test_inf_loop ... FAILED

Add pase html with some err

It woild be great if add html with some err.
Like this

 <!doctype html>
        <html lang="en">
            <head>
                <meta charset="utf-8">
                <title>Html parser</title>
            </head>
            <body>
                <h1 id="a" class="b c">Hello world</h1>
                <!-- comments & dangling elements are ignored -->
                <a class="trait" href="next/next2/">Queryable 

            </body>
        </html>

Because browser handle it.
I`m trying write crawler and use you parser, but got err on parse.. =-(

Fails to parse if closing tag is seperated by new-line

I came across this issue while working with version 0.3.0.

On a website, there was an <a> that was closed by <\a\n>. The parser failed with the error message "<a> does not match the </a\n>".

To reproduce, attempt to parse the following segment

<a> example </a
>

Firefox can parse the website regardless of that formatting error.
Quick workaround could be to just delete all newlines in the string?

Query by #id containing periods do not works

We can't match when querying by id contains periods:

let html = r#"<div id="foo.bar">baz</div>"#;
let nodes = parse(&html)?;
let selector = Selector::from("#foo.bar");
let element = nodes.query(&selector);

println!("{:?}", element);
// None

Encode and decode inside Node::Text?

Hi!

In the Text variant of a Node, the text is stored as-is from the source code of the HTML file. This means that a source such as a &gt; b would be represented as Node::Text("a &gt; b"), rather than Node::Text("a > b"). While this does make sense for performance reasons, I feel like this might be unintuitive for users. The Node data-type is for manipulating HTML after it has been parsed into an abstract syntax tree, but here the Text variant store the text unprocessed from the file, rather than storing what the text represents feels.

Additionally, this means that one could easily construct a Node::Text instance by mistake which contains HTML fragments which when serialized, either give invalid HTML or something which would parse to a different tree structure (for example doing Node::Text("a > b"), or Node::Text("a <img> b"))

From what I can see, a solution to this problem would simply be to add a dependency such as html-escape and making a call to decode_html_entities in the parse function, as well as a call to encode_html_entities in the Htmlifiable::html implementation.

(All of this also applies to attribute values as well)

wrong parse of <div\n

Hi,

I like to report a bug in html_editor.
Running this program:

use html_editor::parse;
fn main() {
    let s = r#"
<div
> </div>
"#;

   let dom = parse(&s);
   println!("wrong: {:?}",dom);

    let s = r#"
<div> </div>
"#;
   let dom = parse(&s);
   println!("good: {:?}",dom);
}

I get:

wrong: Err("<div\n> does not match the </div>")
good: Ok([Text("\n"), Element { name: "div", attrs: [], children: [Text(" ")] }, Text("\n")])

So, it seems that html_editor does not handle a newline as a space or tab where it should.
The example is with <div>, but the bug affects also other tags like <a>.

Regards, and thanks for the package,

Willem

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.