Giter Club home page Giter Club logo

Comments (4)

KillyMXI avatar KillyMXI commented on August 17, 2024

This sounds like some specific use case which might not be useful for everyone and for which I still have no clear understanding. People might have different ideas how to split a document in blocks.

Currently there is baseElements set of options.
You can specify baseElements.selectors array for all things you consider blocks (in your definition) and want to process.
All results will be merged into single output string though. I might think about a way to output results for each base element separately, but I need a real-world example where it might be useful.

There are multiple hacky solutions that might be done currently to separate outputs for base elements. But running a separate parser to get your block first would be more clean and robust solution for now.

from node-html-to-text.

cosbgn avatar cosbgn commented on August 17, 2024

Yes, an example of a use case would be to extract a table from a website, as I don't know the site beforehand, it would be useful for me to know which part of the text is the actual table, but I do agree that this might actually be better to do using a html parser and then pass the output to node-html-to-text. Thanks for confirming it!

from node-html-to-text.

KillyMXI avatar KillyMXI commented on August 17, 2024

If you can define a CSS selector for a table - you can extract it with already available options.

If you need to extract multiple tables separately from one page - that might be tricky now.

If you need to extract a table that is used for data, not layout, but you can't provide a selector and want to use some heuristic - such heuristic can be implemented with a custom formatter, but it is not great for extraction purposes.

from node-html-to-text.

cosbgn avatar cosbgn commented on August 17, 2024

Yes, to be honest this library is so great at doing what it does that I wouldn't want it to actually solve this problem. It's perfect as it is, you give html as input and it gives text as output.

From the HTML you should anyways be able to know if it's a table or now, so no need to pollute this great library with a function very few people will need.

from node-html-to-text.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.