Giter Club home page Giter Club logo

html2md-rs's Introduction

Hi, I'm Yumi Izumi. A self-taught programmer, studying cybersecurity at Penn State, advocating to make privacy an actual human right.

I'm always looking for interesting ideas to solve issues with technology :)

My website: yumiizumi.com

Some programming stuff I'm confident with: TypeScript (and JS), Astro, React, Rust, Swift, Ionic, HTML/CSS, SQL, Git, and Java.

Find my contact details here: yumi.to/card

html2md-rs's People

Contributors

github-actions[bot] avatar izyuumi avatar yutatokoi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

yutatokoi

html2md-rs's Issues

Relax `html2md-rs`?

I like your parser, it has very few dependencies and a solid design.
I am currently testing it for tp-note. The use case: in case the clipboard contains HTML, a filter is needed to convert the input into CommonMark compliant Markdown. The input may contain complete HTML documents or just snippets, thus my previous feature request.

Going through your Rustdoc, I discovered panics if .... . Fortunately you offer safe variants of panicking functions. To my knowledge, it is common practice that libraries must never panic. What is your use case?

Feature request:
Besides, my first tests indicate that your (safe) parser might be too strict for my use case. I probably would need a parser, which processes also imperfect/incorrect input. Even in cases where the input is not perfectly valid HTML5.

Once you publish 0.7 I test again how bad the copied HTML in the clipboard can be.

Incorrect Error: Malformed attribute

input

create with

curl https://askubuntu.com/questions/189640/how-to-find-architecture-of-my-pc-and-ubuntu -o test.txt

The file:

test.txt

Incorrect Error

Malformed attribute: id=“search” role=“search” action=/search class=“s-topbar–searchbar js-searchbar “ autocomplete=“off” - Missing quotation mark at around index 13951

Malformed attribute

Malformed attribute: property="og:type" content= "website" - Missing attribute name at around index 938`
test.txt

feat: support for br tag

There is currently no support for <br />, which causes the UnknownNodeType error to be returned from the parser.

Incorrect parsing: "Missing quotation mark at around index 0"

Valid HTML

<meta http-equiv="content-type" content="text/html; charset=utf-8"><span><a href="https://search.nixos.org/packages?channel=unstable&amp;from=0&amp;size=50&amp;sort=relevance&amp;type=packages&amp;query=tpnote">tpnote</a></span><div>Markup enhanced granular note-taking</div><ul><li>Name: <code class="package-name">tpnote</code></li><li>Version: <strong>1.23.10</strong></li><li><a href="https://blog.getreu.net/projects/tp-note/" target="_blank">🌐 Homepage</a></li><li><a href="https://github.com/NixOS/nixpkgs/blob/nixos-unstable/pkgs/by-name/tp/tpnote/package.nix#L53" target="_blank">📦 Source</a></li><li>License: <a href="https://spdx.org/licenses/MIT.html" target="_blank">MIT License</a></li></ul>

Incorrect parsing

html2md-rs: Malformed attribute: http-equiv="content-type" content="text/html; charset=utf-8" - Missing quotation mark at around index 0

Do not print `<header>`, `</header>`, `<span...>` and `<span>` tags

You already ignore attributes in <span...> tags which is nice. Could you please also refrain from printing out the tags <span...> and </span> itself? It does not add any semantics to the resulting Markdown, but it adds a lot of garbage noise when converting automatically generated HTML.

Secondly, please also ignore the <header> and </header> tags in the output.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.