Giter Club home page Giter Club logo

docx2html's Introduction

docx2html

docx2html is a javascript converter from docx to html on nodejs and browser. here's a demo.

installation

npm install docx2html

example

const docx2html=require("docx2html")
docx2html(input.files[0])
/** you can do further with utilities in converted html
	.then(html=>{
		//html.toString()
		//html.asZip/download/save
	})
*/

api

  • docx2html(docx, options), return a promise object, options support

    • container: a HTMLElement to append converted html, default value is document.body
    • asImageURL(data): to convert image data to url, only required for nodejs
  • the promise object resolved with an object with following functions

    • content: the converted dom
    • toString(/options:{template(style,body,props), extendScript:}/)
    • asZip(options)
    • download(options)
    • save(options)
    • release(): to release image resources

License

MIT, and I also provide commercial support for tickets and enhancement to pay my rent.

Feature

It is based on docx4js 1.x to parse docx, and utilize docx4js api to traverse docx models and convert docx models to html elements.

Ideally, each docx model should have a specific converter to create accordingly html elements, so the design is simply to map from type of docx model to html element constructor.

While, the difficulty is that some docx models are difficult to be expressed in html. It's luckly that we have CSS3 that make some rich styles possible in html, such as numbering, all(12) kinds of table styles.

Word shape utilizes SVG to draw lines, rects, and etc, but so far it only supports limited shapes, while the left job is time.

P of html, according to HTML specification, is restricted not to include any block container, such as div, so there's no p tag, but all div with paragraph styles, and then do some arrangement when dom is ready with a small javascript code.

It keeps header and footer for every section, but there's no conditional consideration, such as odd and even header/footer.

Word Field is kept, while so far only link is supported.

environment

  • nodejs
  • browser
    • IE9+
    • firefox
    • chrome

model

  • section
  • header
  • footer
  • paragraph
  • link
  • numbering
    • many
  • shape
    • rect
    • circle
    • round rect
  • table
  • textbox
  • inline content
  • heading
    • h1 ~ h6
  • Field
    • hyperlink
  • img

style

  • document default
  • named style
  • section style
    • page layout
    • columns
    • column style
  • paragraph style
  • inline style
  • style inheriance
  • table style
    • all(12) word built in styles
    • styles on first/last/even/odd row/column
    • styles on 4 cornor cells
  • numbering style
  • bullet style
  • shape
    • rotate
    • text direction
    • positioning
      • vertical
        • page/margin - top/bottom/absolute
      • horizontal
        • page
          • left/right/center/inside/outside/absolute
        • margin/leftMargin/RightMargin/inMargin/outMargin/column
          • left/right/center/absolute

ToDo

  • more shapes
  • word art
  • chart

docx2html's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

docx2html's Issues

basic usage example

Hi,

Could you provide any basic example of how this could be used?
Or point me to the code line, where I can see that.

Regards,

load file in browser by passing url

is it possible to pass a url to the constructor in the browser the same it's done in node? or, alternatively, is it possible to pass the raw text of the file to the constructor, if, for example, it's loaded asynchronously?

How to edit the source to trap unhandled errors?

I have a very complex document which raises many errors which are not managed by docx2html, which just stops without any warning, for example:

..\docx4js\src\openxml\docx\document.js:9 Uncaught (in promise) TypeError: Cannot read property 'documentElement' of null
at document.parse (..\docx4js\src\openxml\document.js:28)
at document.parse (..\docx4js\src\openxml\docx\document.js:9)
at ..\src\index.js:6

I am a javascript developer, but not a Node.js developer, and usual methods that I use to debug a javascript code do not work for docx2html: opening console, looking at error messages, clicking on the referred .js files, examining them, putting linebreaks and using local overrides to make changes and test.
I would like to add some error messages any time docx2html crashes, to avoid crashing and to warn me next times. How can I do it?

In above error I see it is referred "..\docx4js\src\openxml\docx\document.js", which actually maps to "file:///C:/MYPATH/docx2html/dist/..\docx4js\src\openxml\docx\document.js"... but there is no "docx4js" folder in my installation of doc2xhtml, so where does the browser look for docx4js, and why aren't its source shown in "Sources" tab of Chrome?

Link anchor tags and content after '#' is removed.

Links/anchors in a document which contain a # have it and everything after the # stripped off. Is there a way to bypass this functionality to keep the original link?

Original:
<a href="https://example.com/file.mp3#t=00:02:00">
<a href="https://example.com/file.pdf#page=2">

After parse:
<a href="https://example.com/file.mp3">
<a href="https://example.com/file.pdf">

Error: Cannot find module 'docx2html'

When I tried to import this library in my application, it is throwing the below error.
"Uncaught Error: Cannot find module 'docx2html'"

Please check and let me know, whether any other dependency is required or is it an issue in the base package itself?

If there is an issue in the base package, please help in the resolution.
Thanks in advance.

Header & Footer

The images from header and footer are not being displayed, or extracted.
Can you check please, or show me where i can edit it?

Error with some documents

I get an error with some documents:

Uncaught (in promise) TypeError: this.constructor.Level is not a constructor list.js:12

Is it possible to fix this error?
Thank!

how to set img options?

hi

when docx conert to html, image tag indicates notsupport.
How can I set img option in nodejs?

I declared obj.asImageURL and gave it as an options, but it didn't work.

TypeError: this.field is null

This lib was working fine, until a docx throwed an error that breaks JS :

TypeError: this.field is null - docx2html.min.js:14:19890

I think that a "!" is missing in the following condition :
if (this.field) this.field = new _field2.default(instruct, this.wDoc, this, type);

To make it works again I had to change it to :

if (!this.field) this.field = new _field2.default(instruct, this.wDoc, this, type);

Nice lib ๐Ÿ‘

Cannot find module

it seems a very nice work, but after i install with
'npm install --save docx2html'
it throw the error

module.js:327
throw err;
^

Error: Cannot find module 'docx2html'
at Function.Module._resolveFilename (module.js:325:15)
at Function.Module._load (module.js:276:25)
at Module.require (module.js:353:17)
at require (internal/module.js:12:17)

Cannot Download the html file generated

Hi, I tried running your demo, it worked fine and displayed the output, but when I tried to download using the method mentioned I could not do that, it is giving me error "Uncaught (in promise) TypeError: props is undefined", It would be really helpful if you could provide some sample code!! Thank You! Here's the code for your reference:
function test(input) { require("docx2html")(input.files[0], { container: document.querySelector("#container") }) .then(html => { html.download(); }) }

Screenshot 2022-09-25 003322

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.