aljohri / docx2pdf Goto Github PK

License: MIT License

Python 98.33% Makefile 1.67%

docx2pdf's Introduction

docx2pdf

Convert docx to pdf on Windows or macOS directly using Microsoft Word (must be installed).

On Windows, this is implemented via win32com while on macOS this is implemented via JXA (Javascript for Automation, aka AppleScript in JS).

Install

On macOS:

brew install aljohri/-/docx2pdf

Via pipx:

pipx install docx2pdf

Via pip:

pip install docx2pdf

CLI

usage: docx2pdf [-h] [--keep-active] [--version] input [output]

Example Usage:

Convert single docx file in-place from myfile.docx to myfile.pdf:
    docx2pdf myfile.docx

Batch convert docx folder in-place. Output PDFs will go in the same folder:
    docx2pdf myfolder/

Convert single docx file with explicit output filepath:
    docx2pdf input.docx output.pdf

Convert single docx file and output to a different explicit folder:
    docx2pdf input.docx output_dir/

Batch convert docx folder. Output PDFs will go to a different explicit folder:
    docx2pdf input_dir/ output_dir/

positional arguments:
  input          input file or folder. batch converts entire folder or convert
                 single file
  output         output file or folder

optional arguments:
  -h, --help     show this help message and exit
  --keep-active  prevent closing word after conversion
  --version      display version and exit

Library

from docx2pdf import convert

convert("input.docx")
convert("input.docx", "output.pdf")
convert("my_docx_folder/")

See CLI docs above (or in docx2pdf --help) for all the different invocations. It is the same for the CLI and python library.

Jupyter Notebook

If you are using this in the context of jupyter notebook, you will need ipywidgets for the tqdm progress bar to render properly.

pip install ipywidgets
jupyter nbextension enable --py widgetsnbextension
``

docx2pdf's People

Contributors

Stargazers

Watchers

Forkers

sparrowme deepanprabhu rmmajor rdxr10 superbirds2008 hitzg adamshostack enjlee ryan-williams austingilkison asishm iamhemanthhs karel-brinda silasvasconcelos remalloc chip-davis he1ox alexdraconian hoogenm austin-ultratesting crackercat russschultz zvtrung jaimeambrus muhsinkompas lijuncq javegavermehren mascot27 nagasudhirpulla tyceherrman pplonski davidbstein changhyun-ssvt bortlicenseplates tnakaicode killymcdead fb33 ahmadabdulnasir jftsang sss-software antoni-devlin reteprennelk zufj poojaniket mariomena duykhang53 divigl1 ii0 michaeltan9999 laveshmittal januschoi jackb2113 stragglemuffin farhanceo mohammedsaqib1 windsor-advisory-group devslimbr cloved cosmojg rorymcllrory007 beginner-cryptonyx pallonel taylormadeapps arpitjain799 shiina96 tinyx3k probjjohns faisolarifin libreofficedocker averbitsky mrh32 jooney-ai marsnebulasoup jstich310 windstrikeone vineetp6 rbqren000 rasata alexander-wilms robschmok g1b3r1sh ish1506 xvanturing starryskyreversal soi-20 philipp1297 adityakumaralok sahamdelfi porridgepi wrenchchatrepo rwindwh

docx2pdf's Issues

Attribute Error : Open.SaveAs when converting a word doc to pdf

I have installed docx2pdf as:

pip install docx2pdf

I am trying to convert a word document to PDF with the below code:
from docx2pdf import convert
convert("secrecy.docx", "My_Files\\output.pdf")

I am facing the below error :

I am unable to figure out the underlying issue i.e, why this error is being occurred.

Is this the problem with my python version? I am using Python 3.8.3. Could you please let me know the solution for this as early as possible?

convert fails silently if Microsoft Word is not available

Environment: MacOS machine where Microsoft Word is not available.

Steps to reproduce: Run on the command line: docx2pdf file.docx or use convert('file.docx') in a script.

Desired behaviour: An exception to indicate that the conversion has been unsuccessful.

Actual behaviour: Silent failure; the only way to check is to see whether file.pdf has been created (and what if that file already exists?).

Related to, but not quite the same as, #55.

look into `ExportAsFixedFormat` function

doc.ExportAsFixedFormat(OutputFileName=pdf_file,
    ExportFormat=17, #17 = PDF output, 18=XPS output
    OpenAfterExport=False,
    OptimizeFor=0,  #0=Print (higher res), 1=Screen (lower res)
    CreateBookmarks=1, #0=No bookmarks, 1=Heading bookmarks only, 2=bookmarks match word bookmarks
    DocStructureTags=True
    );

As an alternative to the SaveAs function, you could also use ExportAsFixedFormat which gives you access to the PDF options dialog you would normally see in Word. With this you can specify bookmarks and other document properties.

The full list of function arguments is: 'OutputFileName', 'ExportFormat', 'OpenAfterExport', 'OptimizeFor', 'Range', 'From', 'To', 'Item', 'IncludeDocProps', 'KeepIRM', 'CreateBookmarks', 'DocStructureTags', 'BitmapMissingFonts', 'UseISO19005_1', 'FixedFormatExtClassPtr'

Source: https://stackoverflow.com/a/50516507/1667241

Exception occurred 'This file is in use by another application or user'

Hi I was stuck at this error from 2 days. I 'm running a Python Script from a c# shell. I am using https://pypi.org/project/docopt/ to fill a word form and then convert it to pdf. I 'm using office 365 and win 10 64 bit.

Traceback (most recent call last):
  File "runpy.py", line 192, in _run_module_as_main
  File "runpy.py", line 85, in _run_code
  File "C:\Users\Dell\Desktop\masprojapp\python\python-3.8.0-embed-amd64\Scripts\docx2pdf.exe\__main__.py", line 7, in <module>
  File "c:\users\dell\desktop\masprojapp\python\python-3.8.0-embed-amd64\lib\site-packages\docx2pdf\__init__.py", line 170, in cli
    convert(args.input, args.output, args.keep_active)
  File "c:\users\dell\desktop\masprojapp\python\python-3.8.0-embed-amd64\lib\site-packages\docx2pdf\__init__.py", line 106, in convert
    return windows(paths, keep_active)
  File "c:\users\dell\desktop\masprojapp\python\python-3.8.0-embed-amd64\lib\site-packages\docx2pdf\__init__.py", line 32, in windows
    doc = word.Documents.Open(str(docx_filepath))
  File "<COMObject <unknown>>", line 5, in Open
pywintypes.com_error: (-2147352567, 'Exception occurred.', (0, 'Microsoft Word', 'This file is in use by another application or user.\r (C:\\Users\\Dell\\...\\Temp\\tmpcj1b9swc.docx)', 'wdmain11.chm', 24937, -2146822831), None)
  0%|                                                                                                                 | 0/1 [00:00<?, ?it/s]

Code:

import os
import sys
import pandas as pd
import configparser
from docxtpl import DocxTemplate
import jinja2
from num2words import num2words
import json
from docx2pdf import convert
from sys import platform
import tempfile

BASE_DIR = os.path.abspath(os.path.dirname(__file__))

# --------------load configurations----------------
def get_configurations(config):
    config.optionxform = str
    config.read(os.path.join(BASE_DIR, "vars.cfg"))
    return config
# --------------load configurations----------------


config = get_configurations(configparser.RawConfigParser())
FILES_FOLDER = os.path.join(BASE_DIR, config["FILES"]["FILES_FOLDER"])


def convert_to_pdf(doc):
    
    from comtypes import client

    word = client.CreateObject('Word.Application')
    new_name = os.path.join(FILES_FOLDER, config["DOWNLOAD"]["DOWNLOAD_FILE_AS"])
    wdFormatPDF = 17
    doc = word.Documents.Open(doc)
    doc.SaveAs(new_name, FileFormat=wdFormatPDF)
    doc.Close()
    word.Quit()

def get_jinja_env():
    def to_words(value):
        value = value.strip()
        if value:
            return num2words(int(value), lang="en_IN").upper()
        return value

    jinja_env = jinja2.Environment()
    jinja_env.filters["to_words"] = to_words
    return jinja_env


def save_docx_object(opt, row):

    row = row.to_dict()

    file_path = r'C:\Users\Dell\Desktop\masprojapp\python\DEMATTENDER FORM for LOF.docx'#

    doc = DocxTemplate(file_path)
    
    context = {
        # value is tag name which is placed in word file and key is always column name of db
        value: row[key]
        for key, value in config._sections["TAGS"].items()
    }
    
    jinja_env = get_jinja_env()
    
    doc.render(context, jinja_env)
    temp = tempfile.NamedTemporaryFile(suffix=".docx")
    doc.save(temp)
    return temp
    
    

def search_row_in_database(opt, value):
    df = pd.read_csv(
        os.path.join(FILES_FOLDER, config["FILES"]["EXCEL_FILE_NAME"]),
        dtype=str,
        keep_default_na=False,
    )
    return df[df[opt] == value]


def get_pdf(opt, row):
    temp = save_docx_object(opt, row)
    import subprocess

    if platform == "linux" or platform == "linux2":
        subprocess.run(
            [
                "abiword",
                "--to=pdf",
                os.path.join(FILES_FOLDER, config["DOWNLOAD"]["DOWNLOAD_FILE_AS"]),
            ]
        )
    elif platform == "win32":
        subprocess.run(
            [
                os.path.join(BASE_DIR, "python-3.8.0-embed-amd64", "Scripts", "docx2pdf"),
                os.path.join(FILES_FOLDER, temp.name),
                os.path.join(FILES_FOLDER, config["DOWNLOAD"]["DOWNLOAD_FILE_AS"]),
            ],
            shell=True )


def main(opt, value):

    row = search_row_in_database(opt, value)

    # check if a single row with that ID exists
    if len(row) == 1:
        row = row.squeeze()
        get_pdf(opt, row)
        print(
            json.dumps(
                {
                    "status": "Success",
                    "path": os.path.join(
                        FILES_FOLDER, config["DOWNLOAD"]["DOWNLOAD_FILE_AS"]
                    ),
                }
            )
        )
    # no rows with that ID found
    elif len(row) == 0:
        print(json.dumps({"status": "ErrorDNE"}))
    # in case of multiple rows with that ID exists
    else:
        print(json.dumps({"status": "ErrorMRE"}))


if __name__ == "__main__":
    main(sys.argv[1], sys.argv[2])

output.pdf should be mentioned instead of output.docx in cli usage example

docx2pdf/README.md

Line 43 in 4b77f6c

docx2pdf input.docx output.docx

In this line, the command should be docx2pdf input.docx output.pdf, it was mentioned output.docx.

I think it is a mistake

Using with Cygwin?

Currently while using the tool from Cygwin I get the error:

Traceback (most recent call last): File "/usr/bin/docx2pdf", line 8, in <module> sys.exit(cli()) File "/usr/lib/python3.7/site-packages/docx2pdf/__init__.py", line 170, in cli convert(args.input, args.output, args.keep_active) File "/usr/lib/python3.7/site-packages/docx2pdf/__init__.py", line 109, in convert "docx2pdf is not implemented for linux as it requires Microsoft Word to be installed" NotImplementedError: docx2pdf is not implemented for linux as it requires Microsoft Word to be installed

Word is installed on Windows. Is there any way to make the tool see that version instead of looking for a Linux one?

new Window/Document instead of activeDocument

Thanks for this very useful script!
Just noticed that running the script while we are working on a document on Microsoft Word will close our document window.
Could you please open the printing document inside a new window/document so we will dont lose our work?
Thank you again

Converting a docx I edited with Python3.8 to pdf gives me an error

First of all, I think this module is awesome as there are nearly no other easy options to convert a docx to a pdf using Python. So if this problem could be solved, I'd be very happy.

I created a script that opens a docx, edit some text variables according to a Tkinter Entry input and then saves it as a new docx. I then want to convert it to a pdf so that I don't need to do that manually but that's the part which I can't get to succeed. I'll share with you my code and all the output it gives me:

My code:

from docx import Document
from docx.shared import Pt
from tkinter import *
from tkinter import messagebox
from tkinter import font as tkfont
from docx2pdf import convert
import docx2pdf
import os

print(docx2pdf.__version__)


root = Tk()
root.config(background='#009688')
root.wm_withdraw()
root.update()
root.title('Contractmaker')

naamhuurder = StringVar(root)
geboortedatum = StringVar(root)
adreshuurder = StringVar(root)
pchuurder = StringVar(root)
woonplaatshuurder = StringVar(root)
adresapp = StringVar(root)
typekamer = StringVar(root)
einddatumcontract = StringVar(root)
begindatumcontract = StringVar(root)
beginmaand1 = StringVar(root)
eindmaand1 = StringVar(root)

# Make all entries empty
def clear():
    entryNames = [naamhuurderr, geboortedatumm, adreshuurderr, pchuurderr, woonplaatshuurderr,
                  einddatumcontractt, begindatumcontractt,
                  beginmaand11, eindmaand11]

    for i in entryNames:
        i.delete(0, END)


# The function that changes the variables and saves it as a pdf in the correct folder
def contractupdater():
    global huur
    global appwp
    global pcapp

    # open the document and set the fontsize and style
    doc = Document('./Contract.docx')
    styles = doc.styles['Normal']
    font = styles.font
    font.size = Pt(9)
    font.name = 'Arial'

    # Remove the spaces from the adress line
    oldadres = adresapp.get()
    nospaceadres = oldadres.replace(' ', '').lower()

    # Handle the dropdown menus
    if adresapp.get() == 'Slotlaan 73' or adresapp.get() == 'Slotlaan 77':
        pcapp = '2902AK'
        appwp = 'Capelle aan den IJssel'
    elif adresapp.get() == 'Albert Cuypstraat 22':
        pcapp = '2902GC'
        appwp = 'Capelle aan den IJssel'

    if typekamer.get() == 'Grote kamer':
        huur = '510'
    elif typekamer.get() == 'Kleine kamer':
        huur = '435'
    elif typekamer.get() == 'Grote kamer gedeeld':
        huur = '800'

    # Check whether the date has been filled in correctly
    try:
        einddatum = einddatumcontract.get()
        laatstecijferpluseen = str(int(einddatum[-1]) + 1)
        verlengentot = str(einddatum[:-1])
        verlengentot += laatstecijferpluseen
    except:
        verlengentot = 'error'

    # Replace the variables with the input
    Dictionary = {"naam.vv": naamhuurder.get(), "gb.vv": geboortedatum.get(), 'adres.vv': adreshuurder.get(),
                  'postcode.vv': pchuurder.get(),
                  'woonplaats.vv': woonplaatshuurder.get(), 'appdres.vv': adresapp.get(), 'apppc.vv': pcapp,
                  'appwp.vv': appwp, 'typekamer.vv': typekamer.get(), 'enddate.vv': einddatumcontract.get(),
                  'begindate.vv': begindatumcontract.get(), 'verlengdatum.vv': verlengentot, 'huur.vv': huur,
                  'begineerstemaand.vv': beginmaand1.get(), 'eindeerstemaand.vv': eindmaand1.get()}

    # Check whether all Entry received input
    entriesWithInput = []
    for key, value in Dictionary.items():
        if len(value) > 0:
            entriesWithInput.append(value)

    # Replace the variables of in the docx to the input if it fullfills the if statements
    if len(entriesWithInput) == len(Dictionary):
        if verlengentot != 'error':
            for i in Dictionary:
                for p in doc.paragraphs:
                    if p.text.find(i) >= 0:
                        p.text = p.text.replace(i, Dictionary[i])

            # Save changed document at the correct place
            doc.save('/Users/Jem/Documents/Huurovereenkomsten/Specifiek/{}/contract{}.docx'.format(nospaceadres,
                                                                                             naamhuurder.get()))

            path = '/Users/Jem/Documents/Huurovereenkomsten/Specifiek/{}/contract{}.docx'.format(nospaceadres, naamhuurder.get())
            os.chmod(path, 0o777)

            convert('/Users/Jem/Documents/Huurovereenkomsten/Specifiek/{}/contract{}.docx'.format(nospaceadres,
                                                                                                       naamhuurder.get()))
            # Show user that the file has succesfully been made
            messagebox.showinfo('Succesvol opgeslagen', 'Het contract is gemaakt en opgeslagen in de folder.')

        else:
            messagebox.showerror('Vul alleen cijfers in bij de datum', 'Vul een datum als volgt in: dd-mm-jjjj')

    else:
        messagebox.showerror('Er ging iets mis',
                             'Controleer of alle data correct is ingevuld en probeer opnieuw.')



# GUI stuff that takes care of the scrollbar
def on_configure(event):
    canvas.configure(scrollregion=canvas.bbox('all'))

def on_mousewheel(event):
    canvas.yview_scroll(int(event.delta), 'units')

# Create some fonts
bold_font = tkfont.Font(weight='bold')

# Create the actual GUI
canvas = Canvas(root, width=450, height=550)
canvas.config(background='#009688')
canvas.pack(side=RIGHT)

scrollbar = Scrollbar(root, command=canvas.yview)
# scrollbar.pack(side=RIGHT, fill='y')

canvas.configure(yscrollcommand=scrollbar.set)
canvas.bind('<Configure>', on_configure)
canvas.bind_all('<MouseWheel>', on_mousewheel)

frame = Frame(canvas)
frame.config(background='#009688')
canvas.create_window((0,0), window=frame)

labelNaamhuurder = Label(frame, text='Naam huurder', bg='#009688', font=bold_font).grid(row=0, column=0, sticky=W, padx=(30, 0), pady=(15, 0))
naamhuurderr = Entry(frame, textvariable=naamhuurder, relief=FLAT, highlightcolor='#9DCCFD')
naamhuurderr.focus_set()
naamhuurderr.grid(row=0, column=2, pady=(15, 0))

labelGeboortedatum = Label(frame, text='Geboortedatum', bg='#009688', font=bold_font).grid(row=1, column=0, pady=(15, 0), sticky=W, padx=(30, 0))
geboortedatumm = Entry(frame, textvariable=geboortedatum, relief=FLAT, highlightcolor='#9DCCFD')
geboortedatumm.grid(row=1, column=2, pady=(15, 0))

labelAdreshuurder = Label(frame, text='Adres huurder', bg='#009688', font=bold_font).grid(row=2, column=0, pady=(15, 0), sticky=W, padx=(30, 0))
adreshuurderr = Entry(frame, textvariable=adreshuurder, relief=FLAT, highlightcolor='#9DCCFD')
adreshuurderr.grid(row=2, column=2, pady=(15, 0))

labelPchuurder = Label(frame, text='Postcode huurder', bg='#009688', font=bold_font).grid(row=3, column=0, pady=(15, 0), sticky=W, padx=(30, 0))
pchuurderr= Entry(frame, textvariable=pchuurder, relief=FLAT, highlightcolor='#9DCCFD')
pchuurderr.grid(row=3, column=2, pady=(15, 0))

labelWoonplaatshuurder = Label(frame, text='Woonplaats huurder', bg='#009688', font=bold_font).grid(row=4, column=0, pady=(15, 0), sticky=W, padx=(30, 0))
woonplaatshuurderr = Entry(frame, textvariable=woonplaatshuurder, relief=FLAT, highlightcolor='#9DCCFD')
woonplaatshuurderr.grid(row=4, column=2, pady=(15, 0))

labelAdresapp = Label(frame, text='Adres appartement', bg='#009688', font=bold_font).grid(row=5, column=0, pady=(15, 0), sticky=W, padx=(30, 0))
appartementen = {'Slotlaan 73', 'Slotlaan 77', 'Albert Cuypstraat 22'}
adresapp.set('Slotlaan 73') # Default option
dropdownMenuhuur = OptionMenu(frame, adresapp, *appartementen)
dropdownMenuhuur.config(width=18)
dropdownMenuhuur.grid(row=5, column=2, pady=(15, 0))

labelTypekamer = Label(frame, text='Type kamer', bg='#009688', font=bold_font).grid(row=6, column=0, pady=(15, 0), sticky=W, padx=(30, 0))
typeKamers = {'Grote kamer', 'Kleine kamer', 'Grote kamer gedeeld'}
typekamer.set('Grote kamer') # Default option
dropdownMenutypekamer = OptionMenu(frame, typekamer, *typeKamers)
dropdownMenutypekamer.config(width=18)
dropdownMenutypekamer.grid(row=6, column=2, pady=(15, 0))

labelEinddatumcontract = Label(frame, text='Eind datum contract', bg='#009688', font=bold_font).grid(row=7, column=0, pady=(15, 0), sticky=W, padx=(30, 0))
einddatumcontractt = Entry(frame, textvariable=einddatumcontract, relief=FLAT, highlightcolor='#9DCCFD')
einddatumcontractt.grid(row=7, column=2, pady=(15, 0))

labelBegindatumcontract = Label(frame, text='Begin datum contract', bg='#009688', font=bold_font).grid(row=8, column=0, pady=(15, 0), sticky=W, padx=(30, 0))
begindatumcontractt = Entry(frame, textvariable=begindatumcontract, relief=FLAT, highlightcolor='#9DCCFD')
begindatumcontractt.grid(row=8, column=2, pady=(15, 0))

labelBeginmaand1 = Label(frame, text='Begin van de eerste maand', bg='#009688', font=bold_font).grid(row=9, column=0, pady=(15, 0), sticky=W, padx=(30, 0))
beginmaand11 = Entry(frame, textvariable=beginmaand1, relief=FLAT, highlightcolor='#9DCCFD')
beginmaand11.grid(row=9, column=2, pady=(15, 0))

labelEindmaand1 = Label(frame, text='Eind van de eerste maand', bg='#009688', font=bold_font).grid(row=10, column=0, pady=(15, 0), sticky=W, padx=(30, 0))
eindmaand11 = Entry(frame, textvariable=eindmaand1, relief=FLAT, highlightcolor='#9DCCFD')
eindmaand11.grid(row=10, column=2, pady=(15, 0))

empty = Button(frame, text='Opnieuw', command=clear, font=bold_font)
empty.config(width=10, fg='#009688', borderwidth=0, relief=RAISED)
empty.configure(highlightbackground='#009688')
empty.grid(row=11, column=0, pady=(25, 0), padx=(80, 0))

converter = Button(frame, text='OK', command=contractupdater, font=bold_font)
converter.config(width=10, fg='#009688', borderwidth=2, relief=RAISED)
converter.configure(highlightbackground='#009688')
converter.grid(row=11, column=2, pady=(25, 0), padx=(0, 80))

root.after(1, root.deiconify)
root.mainloop()

Before I added the os.chmod(path, 0o777), it would ask me for permission. After I added that, it wouldn't ask me for permission anymore but it still didn't work. In this SO question, I've also asked my question. After running the above code, it gives me
0%| | 0/1 [00:02<?, ?it/s]
It then opens the file in MS Word but doesn't do anything in Word and then it gives me:
{'input': '/Users/Jem/Documents/Huurovereenkomsten/Specifiek/slotlaan73/contractabc.docx', 'output': '/Users/Jem/Documents/Huurovereenkomsten/Specifiek/slotlaan73/contractabc.pdf', 'result': 'error', 'error': 'Error: Er heeft zich een fout voorgedaan.'}
'Er heeft zich een fout voorgedaan.' is Dutch for: an error has occurred.

I'm using python 3.8 and docx2pdf 0.1.7

Really hoping you can find a solution for this because, as said, I think this module can be very useful.

I have fixed an error.

I have just fixed an error for python 3.7. Can I make a push?

convert a whole folder to pdf drops error

hi, im just using this two lines in the whole script

from docx2pdf import convert

convert("encontrados/")

and the library converts a random ammount of .docx to .pdf and drops this error

com_error: (-2147418111, 'La llamada fue rechazada por el destinatario.', None, None)

i've tried using different versions of the library and it still doing the same. Do you think its a library issue?
Thanks

DocxToPdf not supported in Python 3.9.4

I was using this library for convert my docx files to pdf, but the requirements of virtual machine where is running the scirpt now are differents, i need to use the version 3.9.4 of python and the library dont support this. its near a new update for support it?
thanks

Conversion error on Windows Server while scheduling the python script on Task Scheduler

Python script couldn't open the word file while running from the scheduler task. It return null value but the same script is working when the script is executed from Command Prompt.

Calling the method convert(source, destination)

Error come from the method
def windows(paths, keep_active):
doc = word.Documents.Open(str(docx_filepath))

python 3.9.5 is not supported

I am trying to convert a docx file using python 3.9.5 and I'm getting this error: 0%| | 0/1 [00:00<?, ?it/s]{'input': '/Users/username/Filename.docx', 'output': '/Users/username/output.pdf', 'result': 'error', 'error': 'Error: An error occurred.'}
0%|

Any chance you could help? Happy to also go back to a version of python that is supports for now, until we get a different templating engine running

The bar does not move

Hi,
unfortunately the program does not work

macOS Big Sur Version 11.2.1

useraccount@hostname pdf % pip3 install --user docx2pdf
WARNING: The directory '/Users/useraccount/Library/Caches/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
WARNING: The directory '/Users/useraccountLibrary/Caches/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Requirement already satisfied: docx2pdf in /Users/useraccount/Library/Python/3.8/lib/python/site-packages (0.1.7)
Requirement already satisfied: tqdm<5.0.0,>=4.41.0 in /Users/useraccount/Library/Python/3.8/lib/python/site-packages (from docx2pdf) (4.57.0)
Requirement already satisfied: appscript<2.0.0,>=1.1.0; sys_platform == "darwin" in /Users/useraccount/Library/Python/3.8/lib/python/site-packages/aeosa (from docx2pdf) (1.1.2)
WARNING: You are using pip version 19.2.3, however version 21.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.

useraccount@hostname ~ % cd test
useraccount@hostname test %

useraccounthostname test % docx2pdf 1.docx 
  0%|                                                                                                                                                                                                                                               | 0/1 [00:00<?, ?it/s]
useraccount@hostname test %

Error: FloatProgress not found. Please update jupyter and ipywidgets

Hi Al,

I saw your package and wanted to give a trial. But I met some issue that needs your help.

I used pip to install docx2pdf and finished the installation. However, the traceback says 'ImportError: cannot import name 'convert' from 'docx2pdf' (unknown location)'.

So I tried jupyther notebook in which import has no issue. But I faced the following traceback and didn't know what the problem is. I have the .py file in the same folder as my .docx files.
Can you help?

from docx2pdf import convert
convert("Speech for Carey celebration in SH copy.docx","Speech for Carey celebration in SH copy.pdf")

NameError Traceback (most recent call last)
~/opt/miniconda3/lib/python3.7/site-packages/tqdm/notebook.py in status_printer(_, total, desc, ncols)
95 if total:
---> 96 pbar = IProgress(min=0, max=total)
97 else: # No total? Show info style bar with no progress tqdm status

NameError: name 'IProgress' is not defined

During handling of the above exception, another exception occurred:

ImportError Traceback (most recent call last)
in
----> 1 convert("Speech for Carey celebration in SH copy.docx","Speech for Carey celebration in SH copy.pdf")

~/opt/miniconda3/lib/python3.7/site-packages/docx2pdf/init.py in convert(input_path, output_path, keep_active)
102 paths = resolve_paths(input_path, output_path)
103 if sys.platform == "darwin":
--> 104 return macos(paths, keep_active)
105 elif sys.platform == "win32":
106 return windows(paths, keep_active)

~/opt/miniconda3/lib/python3.7/site-packages/docx2pdf/init.py in macos(paths, keep_active)
60
61 total = len(list(Path(paths["input"]).glob("*.docx"))) if paths["batch"] else 1
---> 62 pbar = tqdm(total=total)
63 for line in run(cmd):
64 try:

~/opt/miniconda3/lib/python3.7/site-packages/tqdm/notebook.py in init(self, *args, **kwargs)
206 total = self.total * unit_scale if self.total else self.total
207 self.container = self.status_printer(
--> 208 self.fp, total, self.desc, self.ncols)
209 self.sp = self.display
210

~/opt/miniconda3/lib/python3.7/site-packages/tqdm/notebook.py in status_printer(_, total, desc, ncols)
102 # #187 #451 #558
103 raise ImportError(
--> 104 "FloatProgress not found. Please update jupyter and ipywidgets."
105 " See https://ipywidgets.readthedocs.io/en/stable"
106 "/user_install.html")

ImportError: FloatProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html

docx2pdf doesn't preserve any links in the final pdf on Mac

I'm using docx2pdf to do a bulk conversion from docx to pdf and it looks like the final pdf, while it is preserving the link font styling, the hyperlink itself is gone. This is only observed on Mac as when I run the same script on a Windows machine, everything is working as expected.

If it is at all useful, I'm using docx2pdf v.0.1.7, Python v.3.8.2, word v.16.51, and Mac v.11.5.1

docx2pdf doesn't preserve hyperlinks in table-of-contents

Word File with Comments Freezes Runtime

Whenever you try convert a docx with comments it freezes the script. Upon investigation it seems like there is some sort of pop-up about "Unsaved Comments" that is not handled, and the function call appears to run forever.

Missing license?

Hi @AlJohri , thanks for this awesome package! That's really useful.

The package is listed with MIT on pypi, but I don't see it here. Would you mind adding a license to the repo?

https://help.github.com/en/github/creating-cloning-and-archiving-repositories/licensing-a-repository

Error with Windowhandler in Task Scheduler

I have a script that converts my files to PDFs. As a framework I use pipenv, which is automatically generated via a batch file. Manually everything works. If I call the batch file via the Windows Scheduler, an error of the WindowHandler of System.Windows.Forms occurs. The error occurs already after the first document.

Recursively convert all files in a directory tree

Great tool! Is there a way to convert all files in all subfolders of the given input folder?

Explore Office JavaScript API

https://docs.microsoft.com/en-us/office/dev/add-ins/reference/overview/word-add-ins-reference-overview

TypeError: Can't convert 'PosixPath' object to str implicitly

Issue submitted via email by Gary Benson:

command: docx2pdf test.docx
  0%|                                                                                                     | 0/1 [00:00<?, ?it/s]Traceback (most recent call last):
  File "docx2pdf", line 8, in <module>
    sys.exit(cli())
  File "docx2pdf/__init__.py", line 170, in cli
    convert(args.input, args.output, args.keep_active)
  File "docx2pdf/__init__.py", line 104, in convert
    return macos(paths, keep_active)
  File "docx2pdf/__init__.py", line 63, in macos
    for line in run(cmd):
  File "docx2pdf/__init__.py", line 54, in run
    process = subprocess.Popen(cmd, stderr=subprocess.PIPE)
  File "python3.5/subprocess.py", line 947, in __init__
    restore_signals, start_new_session)
  File "python3.5/subprocess.py", line 1490, in _execute_child
    restore_signals, start_new_session, preexec_fn)
TypeError: Can't convert 'PosixPath' object to str implicitly

Support .doc and .docx files

As my try it support ".docx" file, but it cannot support ".doc" file to ".pdf". Can you provide it later? Please..

generate password protected pdf file

this is a request to add a new feature that would allow to generate password protected pdf file.

DocxToPdf closing opened Word documents.

I have here one automation application on windows desktop using docx, works well on conversion, except the case this close all word windows on GUI when conversion start.
Someone find one way to solve this?
Thanks.

linux support via libreoffice?

consider adding support for linux via libreoffice https://michalzalecki.com/converting-docx-to-pdf-using-python/

PDF Orientation Error on Mac

Hello,

Your package is quite amazing and very helpful.

[Edit] It appears the initial error I included was due to a mistake on my part. I have retained the remaining issue.

Using the package my Mac running OS Mojave, I have encountered the following error when trying to convert a .docx to .pdf.

My input is a .docx document in Landscape Orientation
The output does appear as PDF, but in Portrait Orientation.

I am not sure how to fix this error. Hope someone can help improve this great package.

Attribute.Error : Word.Application.Documents

Hello there,
I am pretty new to Python and I was working on a simple script to convert all word file in a folder. The script kind of works, meaning that sometimes I get the error I mentionend in the title, I do not know if it is random or there's something wrong with my code (which I attach).
Stampa_PDF.txt
Besides, every time I use it and it works it show 1 progress string empty and 1 progress string with the actual percentage of the process.
Could someone help me?
Thanks

AssertionError: Line 89 in init.py

Hi,

I have been using your library to do multiple file conversions, but I run into an AssetionError when I tried to convert a .doc file.

By reading your code, it's pretty clear that only .docx files are acceptable input arguments. Is it possible to add .doc support as well?

Edit: Saw that there is another issue thread made a year ago. Is .doc support still planned?

Thank you

docx2pf on macOS error

Hi,

I can't convert docx to pdf on macOS. Everytime I'm receiving error: 'result': 'error', 'error': 'Error: Message not understood.'}
An exception has occurred, use %tb to see the full traceback. anaconda3/lib/python3.8/site-packages/docx2pdf/init.py", line 72, in macos
sys.exit(1)
Library is working perfectly on windows 11 anaconda IDE. Could you help me with solving the issue?

convert function sometimes fails in a multithreaded context

Configuration:

docx2pdf: 0.1.7
python: 3.7.7
os: windows 10 version 2004 (build 19041.508)

The convert function will sometimes yield (-2147221008, 'CoInitialize has not been called.', None, None). I'm currently working to figure out the exact steps to reproduce, but I might not be able to, since the overall application is multithreaded (via concurrent.futures.ThreadPoolExecutor, where one of the tasks calls convert) and seems to happen at random.

As far as I know, using win32com in Python requires calling pythoncom.CoInitialize() - see this post for an example.

Installation issue

Hi!

I have some problems installing the package. It seems to try to install docx2pdf and appscript at the same, while appscript is already installed.

Terminal log:

pip install docx2pdf
Collecting docx2pdf
Using cached docx2pdf-0.1.7-py3-none-any.whl (6.6 kB)
Collecting appscript<2.0.0,>=1.1.0; sys_platform == "darwin"
Using cached appscript-1.1.1-cp37-cp37m-macosx_10_6_x86_64.whl (83 kB)
Requirement already satisfied: importlib_metadata<2.0.0,>=1.3.0; python_version < "3.8" in /opt/anaconda3/lib/python3.7/site-packages (from docx2pdf) (1.5.0)
Requirement already satisfied: tqdm<5.0.0,>=4.41.0 in /opt/anaconda3/lib/python3.7/site-packages (from docx2pdf) (4.42.1)
Requirement already satisfied: zipp>=0.5 in /opt/anaconda3/lib/python3.7/site-packages (from importlib_metadata<2.0.0,>=1.3.0; python_version < "3.8"->docx2pdf) (2.2.0)
Installing collected packages: appscript, docx2pdf
Attempting uninstall: appscript
Found existing installation: appscript 1.0.1
ERROR: Cannot uninstall 'appscript'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.

docx2pdf not found when building using pyinstaller

I'm trying to build my project using pyinstaller and am running into a "library not found" error for docx2pdf. I'm not sure what is causing this problem. Have you used pyinstaller with docx2pdf? And if so, have you had the same problem?

  File "c:\python38\lib\site-packages\PyInstaller\loader\pyimod03_importers.py", line 623, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages\docx2pdf\__init__.py", line 13, in <module>
  File "importlib\metadata.py", line 472, in version
  File "importlib\metadata.py", line 445, in distribution
  File "importlib\metadata.py", line 169, in from_name
importlib.metadata.PackageNotFoundError: docx2pdf

exit() call in library code not working

From jupyter the code breaks at line 73

exit(1)

with the following traceback:

NameError Traceback (most recent call last)
in
----> 1 convert('./BAAM_raw/docx/', './BAAM_raw/docx/pdf/')

/usr/local/lib/python3.7/site-packages/docx2pdf/init.py in convert(input_path, output_path, keep_active)
102 paths = resolve_paths(input_path, output_path)
103 if sys.platform == "darwin":
--> 104 return macos(paths, keep_active)
105 elif sys.platform == "win32":
106 return windows(paths, keep_active)

/usr/local/lib/python3.7/site-packages/docx2pdf/init.py in macos(paths, keep_active)
70 elif msg["result"] == "error":
71 print(msg)
---> 72 exit(1)
73
74

NameError: name 'exit' is not defined

I tried editing the import with:

from sys import exit

but this does not change anything.

convert fails silently if the docx file is malformed or nonexistent

Attempt to run convert() on a corrupted docx file convert('/tmp/empty_file.docx') or a docx file that doesn't exist: convert('/tmp/this_file_doesnt_exist.docx'). You just get nothing back: no indication that the conversion has failed. It would be more sensible to throw an exception in this case. As a workaround I have to test whether the expected file exists, but this is not a good test if the target file might already exist and hasn't been updated.

Support for PowerPoint?

I know this is probably out of scope for docx2pdf but still wanted to ask whether it would be a good idea to extend docx2pdf (or reuse some of its parts) to create a PPTX do DOCX converter?

Looking at the UI of PDF export of both Word and PowerPoint on macOS, it looks very simliar, so I assume lots of what you have done could be reused.

Docx2PDf will not install with Python 3.10

I have just installed Python 3.10 and went to install doc2pdf. I maintain old releases of Python (until the new one has all libraries I use), so I work via the pip3.10 command to make sure it is installing in the proper version. The issue seems to be with the version of pywin32. I've tried to pip3.10 install pywin32==227, and it has failed. The current version of pywin32 is version 302 (obviously for Python 3x). Please see the error log I'm seeing for the docx2pdf install below.

Thanks for the help!

C:\WINDOWS\system32>pip3.10 install docx2pdf
Collecting docx2pdf
Using cached docx2pdf-0.1.7-py3-none-any.whl (6.6 kB)
Using cached docx2pdf-0.1.6-py3-none-any.whl (6.6 kB)
Using cached docx2pdf-0.1.5-py3-none-any.whl (5.7 kB)
Using cached docx2pdf-0.1.4-py3-none-any.whl (5.6 kB)
Using cached docx2pdf-0.1.3-py3-none-any.whl (5.6 kB)
Using cached docx2pdf-0.1.1-py3-none-any.whl (5.5 kB)
Using cached docx2pdf-0.1.0-py3-none-any.whl (4.7 kB)
ERROR: Cannot install docx2pdf==0.1.0, docx2pdf==0.1.1, docx2pdf==0.1.3, docx2pdf==0.1.4, docx2pdf==0.1.5, docx2pdf==0.1.6 and docx2pdf==0.1.7 because these package versions have conflicting dependencies.

The conflict is caused by:
docx2pdf 0.1.7 depends on pywin32<228 and >=227; sys_platform == "win32"
docx2pdf 0.1.6 depends on pywin32<228 and >=227; sys_platform == "win32"
docx2pdf 0.1.5 depends on pywin32<228 and >=227; sys_platform == "win32"
docx2pdf 0.1.4 depends on pywin32<228 and >=227; sys_platform == "win32"
docx2pdf 0.1.3 depends on pywin32<228 and >=227; sys_platform == "win32"
docx2pdf 0.1.1 depends on pywin32<228 and >=227; sys_platform == "win32"
docx2pdf 0.1.0 depends on pywin32<228 and >=227; sys_platform == "win32"

To fix this you could try to:

loosen the range of package versions you've specified
remove package versions to allow pip attempt to solve the dependency conflict

Handles spaces and special characters in filenames

I get error messages once there are special characters such as (, ü or at times even a space in the folder name. When I choose a simple folder name (e.g. just one word), it works fine. So to make the tool more flexible, more characters should be allowed.
Best wishes!

Windows support for WPS Office

Didnt have MS Word installed, and I ran it with WPS Office in Windows.
In one installation, everything ran fine.

However in another installation, it ran and converted the files into pdf, but threw out an error.

https://i.stack.imgur.com/I7ngM.png

I was thinking if there is any way to fix this?

Tries to close other open word files when converting doc to pdf

I have various open word files in my computer (Windows 10). When launching the following function:

from docx2pdf import convert

def convert_doc_to_pdf(path, pdf_name):

   inputFile = os.path.join(path, 'XXX.docx')
   outputFile = os.path.join(path, pdf_name + '_XXX.pdf')

   print(f"generating pdf {outputFile}")

   if os.path.isfile(outputFile):
       os.remove(outputFile)

   convert(inputFile, outputFile)

The code works well: it opens inputFile, converts it and closes it. I can see all of this because I can see microsoft word openning inputFile and then closing it.

However, when closing InputFile the close command is also trying to close my other open word files, I find this is a bug and should be fixed, as it could result in unwanted behaviour.

AttributeError

Until recently, it worked fine. Now, when using docx2pdf, I get this error.

C:\Git\JonathanJacob\Poi>docx2pdf test.docx test.pdf
0%| | 0/1 [00:00<?, ?it/s]Traceback (most recent call last):
File "c:\users\jonathan jacob\appdata\local\programs\python\python38-32\lib\runpy.py", line 194, in run_module_as_main
return run_code(code, main_globals, None,
File "c:\users\jonathan jacob\appdata\local\programs\python\python38-32\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\Jonathan Jacob\AppData\Local\Programs\Python\Python38-32\Scripts\docx2pdf.exe_main.py", line 7, in
File "c:\users\jonathan jacob\appdata\local\programs\python\python38-32\lib\site-packages\docx2pdf_init.py", line 170, in cli
convert(args.input, args.output, args.keep_active)
File "c:\users\jonathan jacob\appdata\local\programs\python\python38-32\lib\site-packages\docx2pdf_init.py", line 106, in convert
return windows(paths, keep_active)
File "c:\users\jonathan jacob\appdata\local\programs\python\python38-32\lib\site-packages\docx2pdf_init_.py", line 33, in windows
doc.SaveAs(str(pdf_filepath), FileFormat=wdFormatPDF)
File "c:\users\jonathan jacob\appdata\local\programs\python\python38-32\lib\site-packages\win32com\client\dynamic.py", line 527, in getattr
raise AttributeError("%s.%s" % (self.username, attr))
AttributeError: Open.SaveAs
0%| | 0/1 [00:00<?, ?it/s]

Any roadmap or library enhancement for Linux machines ?

Can we use this library in Linux environment?

Batch runs can try and convert word temp files

I have noticed sometimes converting from a folder attempts to convert the word temp files. I changed line 23 of init.py from
for docx_filepath in tqdm(sorted(Path(paths["input"]).glob("*.docx"))):
to
for docx_filepath in tqdm(sorted(Path(paths["input"]).glob("[!~]*.docx"))):
and that seems to fixed the problem.

Use docx2pdf as an automator service

I myself am not good at coding, but is there a way to make this work as an automator service? I've come across your tool because I'm desperately looking for a simple automator service that converts .docx files to .PDF files. All the solutions I've found don't seem to work anymore. Some are not compatible with Catalina, others require CUPS (not available under Catalina) while others rely on Word-Automator library actions not available anymore.
So could I make your solution work in Automator? Manually copying folder paths into Terminal is very cumbersome...
Best wishes!

Issues with running on 64 bit machine?

I'm trying to batch convert some .docx files and am running the program on Jupyter notebook (5.7.4) I do have a 64-bit laptop and the errors reference 32 bit so I'm afraid this program might not work on it. Has anyone on a 64-bit machine had luck running it?
Code:

!pip install ipywidgets
!pip install docx2pdf
from docx2pdf import convert
convert(r"C:\Users\Juliette\Documents\Other\Other")

Which returns the following error:

No pdf created after running Python script (Mac OS)

Basically, when tried converting the file from cmd line or using my python script, there was no pdf output. I'm quite new to programming so I wonder if you will be able to find out what is the problem.
I was wondering if it's cause I did not grant permission when this popped up :bash would like to control system events

not converting a docx file into pdf without creating a docx file

Hi, I am working on a project where I need to read a docx file, edit the file and then convert it into a pdf. for reading and editing the docx file I am using python-docx package, but when using the edited document to convert to pdf I am facing issues.
import docx from docx2pdf import convert document = docx.Document('data/PO.docx') """ Editing the document """ convert(document, 'output.pdf')

I am facing following error.
TypeError: expected str, bytes or os.PathLike object, not Document

Specifying output PDF file name doesn't work

Running the code:

from docx2pdf import convert

docx_file = 'input.docx'
pdf_file = "myfile.pdf"

convert(docx_file, pdf_file)

The code runs, but 'myfile.pdf' isn't created.

However, running the code:

from docx2pdf import convert

docx_file = 'input.docx'

convert(docx_file)

The code runs and creates an input.pdf file.

Issue running docx2pdf in CLI and python

Hey,
I recently had to reset my laptop, since then i am having issues running docx2pdf from both CLI and python.

Package docx2pdf is already installed.

CLI - 'docx2pdf' is not recognized as an internal or external command, operable program or batch file even after adding the exe to path variable
When directly gave path to the exe for running, got below error
Traceback (most recent call last):

File "C:\tools\miniconda3\lib\runpy.py", line 197, in _run_module_as_main
   return _run_code(code, main_globals, None,
 File "C:\tools\miniconda3\lib\runpy.py", line 87, in _run_code
   exec(code, run_globals)
 File "C:\tools\miniconda3\Scripts\docx2pdf.exe\__main__.py", line 7, in <module>
 File "C:\tools\miniconda3\lib\site-packages\docx2pdf\__init__.py", line 170, in cli
   convert(args.input, args.output, args.keep_active)
 File "C:\tools\miniconda3\lib\site-packages\docx2pdf\__init__.py", line 106, in convert
   return windows(paths, keep_active)
 File "C:\tools\miniconda3\lib\site-packages\docx2pdf\__init__.py", line 17, in windows
   import win32com.client
 File "C:\tools\miniconda3\lib\site-packages\win32com\__init__.py", line 5, in <module>
   import win32api, sys, os
ImportError: DLL load failed while importing win32api: The specified module could not be found.

Python - when tried to use python to generate PDF's
File ~\.spyder-py3\temp.py:8 in <module>, from docx2df import convert, ModuleNotFoundError: No module named 'docx2df',

Can anyone help me how to get around this?

Other information which may or may not be helpful.

Not a great coder.
Earlier it was a standalone python installation, now python is part of miniconda, but package exists at C:\tools\miniconda3\Lib\site-packages\docx2pdf