Giter Club home page Giter Club logo

Comments (3)

schierlm avatar schierlm commented on May 23, 2024 1

The malformed HTML is caused by incorrectly parsing embedded HTML like

<!--<img src="../Images/map_01_01.jpg" alt="The Near East at the Time of Genesis"/>-->

Where the end of the HTML is found at the first > instead of the second one. The extra > is then inserted as part of the text (not as Raw HTML), and LogosHTML exporter will replace by &gt; then.

from biblemulticonverter.

ickc avatar ickc commented on May 23, 2024

I also found that for unsupported HTML entity, it changes it to literal &. But then since the output file is HTML (at least in the case of LogosHTML), they should be left untouched. A temporary fix is to undo this literal & replacement: sed -i 's/&amp/\&/g'

from biblemulticonverter.

schierlm avatar schierlm commented on May 23, 2024

I guess you are seeing now what problems you face if you have embedded HTML in modules (like MyBibleZone ones). Either you allow for Raw HTML (and then you may get malformed HTML in the output) or you don't (and then when there is HTML that cannot be parsed, you lose information in case the destination format also allows for Raw HTML).

The current decision I took for MyBibleZone modules is: Inside of footnotes and introduction texts, raw HTML is allowed, while inside of verses all raw HTML gets stripped/replaced.

But I agree that the handling of entities can be improved (and unsupported entities should probably become Raw HTML even if they are in verses).

I will also have a look if I can sanitize the Raw HTML better so that no unbalanced tags can sneak through. And probably convert more raw HTML to formatting tags (e.g. <strong> to <b>) to reduce the need for Raw HTML.

For the record, the StrippedDiffable export format has an option to strip Raw HTML. That way, you will be guaranteed to not get any malformed HTML tags in your export, while losing some formatting in your footnotes/introductions.

from biblemulticonverter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.