Comments (3)
The malformed HTML is caused by incorrectly parsing embedded HTML like
<!--<img src="../Images/map_01_01.jpg" alt="The Near East at the Time of Genesis"/>-->
Where the end of the HTML is found at the first >
instead of the second one. The extra >
is then inserted as part of the text (not as Raw HTML), and LogosHTML exporter will replace by >
then.
from biblemulticonverter.
I also found that for unsupported HTML entity, it changes it to literal &
. But then since the output file is HTML (at least in the case of LogosHTML), they should be left untouched. A temporary fix is to undo this literal &
replacement: sed -i 's/&/\&/g'
from biblemulticonverter.
I guess you are seeing now what problems you face if you have embedded HTML in modules (like MyBibleZone ones). Either you allow for Raw HTML (and then you may get malformed HTML in the output) or you don't (and then when there is HTML that cannot be parsed, you lose information in case the destination format also allows for Raw HTML).
The current decision I took for MyBibleZone modules is: Inside of footnotes and introduction texts, raw HTML is allowed, while inside of verses all raw HTML gets stripped/replaced.
But I agree that the handling of entities can be improved (and unsupported entities should probably become Raw HTML even if they are in verses).
I will also have a look if I can sanitize the Raw HTML better so that no unbalanced tags can sneak through. And probably convert more raw HTML to formatting tags (e.g. <strong>
to <b>
) to reduce the need for Raw HTML.
For the record, the StrippedDiffable
export format has an option to strip Raw HTML. That way, you will be guaranteed to not get any malformed HTML tags in your export, while losing some formatting in your footnotes/introductions.
from biblemulticonverter.
Related Issues (20)
- USFM `\fig` properties written out as plain text HOT 5
- single line OSIS file output HOT 5
- Getting exception errors HOT 4
- Bible name not found HOT 4
- Issue using "-inline" to convert Bible with more than one verse per line HOT 2
- Trouble converting USFM with Strong's numbers to LogosHTML HOT 2
- Help with command line HOT 2
- USFM word level attributes not recognized when there are nested USFM tags HOT 2
- Markdown output format HOT 4
- Help with versification HOT 4
- theWord import: <WT*> tags not implemented HOT 4
- Importing interlinear information from MySword bibles HOT 4
- Export from Accordance 13.3.2 failing on macOS HOT 3
- Bibleworks to theword HOT 1
- MyBible (SQLite3) to ZefaniaXML error HOT 2
- EquipdEpub: mimetype should be stored uncompressed HOT 3
- CCEL's ThML xml? HOT 1
- Import Beblia xml bibles HOT 1
- Support OliveTree csv format HOT 2
- Support Olivetree format HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from biblemulticonverter.