convertapi / convertapi-python Goto Github PK

View Code? Open in Web Editor NEW

69.0 69.0 22.0 1.04 MB

A Python library for the ConvertAPI

Home Page: https://www.convertapi.com

License: Other

Python 100.00%

convertapi-python's People

Contributors

Stargazers

Watchers

convertapi-python's Issues

Increase default TimeOut to 1000

ConvertAPI/convertapi-dotnet#23

Incorrect Links in the readme and documentation

I am trying to understand how to specify the page size of the output pdf when I am trying to convert a source excel to PDF.
I came looking for the docs, and found a section called "Additional conversion parameters" in the official documentation as well as the README.
The first paragraph of the section promises that "All conversion parameters and explanations can be found here."

If we go to that page, we expect to find a list of all conversion parameters and explanations. Instead, we reach the home page of convertAPI where these details are not available.

How to find that list? Once found, I will send a PR with the link modified.

Emoji is converted into bullets after converting into PDF

I have tried converting a .docx file to PDF using convertAPI, The .docx file contains some emojis that were converted into bullets. How can I solve this issue? this API is working fine only issue is with emojis.

What if my server is behind a proxy?

How could I use this library if my server is behind a proxy in an internal network?

Files array syntax error in Python snippet

@laurynas-baltsoft Could you confirm that the files array has syntax error for Python https://www.convertapi.com/pdf-to-merge ?

Current snippet

convertapi.convert('merge', {
    'Files[0]': '/path/to/complex.pdf',
    'Files[1]': '/path/to/vector.pdf',
    'Files[2]': '/path/to/my_file.pdf'
}, from_format = 'pdf').save_files('/path/to/dir')

Should be

convertapi.convert('merge', {
    Files: [
      '/path/to/complex.pdf',
      '/path/to/vector.pdf',
      '/path/to/my_file.pdf'
    ]
}, from_format = 'pdf').save_files('/path/to/dir')

Which one is correct?

Conversions takes significantly more time compared to converting in browser

Tested PDF to Squeeze with 90MB Pdf file:
https://www.convertapi.com/a/api/pdf-to-squeeze#snippet=python
It took 4.5-5s to convert + file upload and download couple seconds. So <10s overall.

With python lib it took >60s:

An opportunity to store the result into stringIO/cStringIO

Would be possible to update method ResultFile.save() or create a new method which will be able to store downloaded content of the converted file into a stringIO/cStringIO object?

Token authentication method

Implement token authentication method.
https://www.convertapi.com/doc/auth#token

Unable to parse pdf to csv using convertapi

The code is below. Each of the pages have different orientation and hence it does not convert properly. However I am able to parse the same file successfully to csv file using the converter from Adobe and others. Please let me know if there is any setting which will help to fix the issue. thanks.

import convertapi
convertapi.api_secret = 'xx'
convertapi.convert('csv', {
'File': 'D:\Delme\List-of-Criminally-Charged-Providers.pdf'
}, from_format = 'pdf').save_files('D:\Delme')

The file is
List-of-Criminally-Charged-Providers.pdf

JSONDecodeError when getting/polling a 202/Conversion in progress response

I'm using the asynchronous conversion mode and I'm using the convertpi.client.get method to poll the conversion result.

I'm polling the results only after receiving the webhook confirmation, understanding that the conversion process would have been successful in this case.

For some reason, when polling results from the API after the webhook confirmation, I'm getting a response different than 200 OK. These are the possible status codes:

200 Conversion is successful. Response is a conversion result.
202 Conversion in progress.
404 JobId is invalid or response is expired.
503 No concurrent poll requests are allowed..
5XX Conversion error. Response is an error message.

I mostly get 200 OK responses when polling after the confirmation, but there are cases where I get a 202 Accepted with blank content. I suppose I'm hitting the API too fast and there is some internal propagation still ongoing, but that is not the point anyway, I could simply retry it and it will work.

The problem is that, since the response content is a blank string, and since the get method uses the handle_response method to convert the response to JSON or raise an exception, I'm hitting the unexpected exception json.decoder.JSONDecodeError.

The other error responses would raise_for_status and fail legitimely. The only problem is with the 202 response, that is a successful status (so it is not raised), but can't be returned as json. I wrote a small wrapper to solve this at my side:

def poll_retrieve(job_id):
    convertapi.api_secret = settings.CONVERT_API_SECRET
    response = requests.get(
        convertapi.client.url(f"job/{job_id}"),
        headers=convertapi.client.headers(),
        timeout=convertapi.timeout or None,
    )
    response.raise_for_status()
    if response.status_code == 202:
        raise ConversionInProgressError
    return response.json()

I could write a pull request, but I want to discuss with you the best approach first: is a new exception (like this ConversionInProgressError) a good idea? Or do you think returning a blank list would work better?

Multiprocessing module not allowing children processes

I am seeing this error when running convertapi module inside celery task (even with -P threads). In utils.py the multiprocessing module is not happy when ran inside celery and it fails with:

AssertionError: daemonic processes are not allowed to have children

Is there a workaround to this?

Introduce alternative converter parameter in conversion method

The same as in Ruby lib ConvertAPI/convertapi-ruby#10

File conversion

Incompatibility with tkinter?

I'm using the module convertapi to merge pdf files in a tkinter application in Python3.8. When I have some tkinter window in my code, if convertapi.convert('merge', {'Files': input_files}) is called, multiple instances of the tkinter window open. My script:

from tkinter import *
import convertapi

input_files = ["file1.pdf", "file2.pdf", "file3.pdf"]
output_file = "mergedFile.pdf"

def mergePDFs(input_files, output_file):
    convertapi.api_secret = 'my-api-secret'
    result = convertapi.convert('merge', {'Files': input_files})
    result.file.save(output_file)


root = Tk()
Button(root, text="Merge", command=lambda: mergePDFs(input_files, output_file)).pack()
root.mainloop()

A picture of the phenomenon
It's a very weird behavior since even when I call the function in the console with the tkinter window closed beforehand, multiple windows still open up. I'm guessing there is some kind of incompatibility between the two modules but I can't be sure. If it can help, there are 10 more instances of the tkinter window that open up when the funstion is called.

After a bit of digging, it seems that it is due to the fact that multiple processes can't share the same root window in tkinter. And indeed in the code, there is a multiprocessing operation when multiple files are converted. In convertapi/task.py:

def __normalize_params(self):
        params = {}

        for k, v in self.params.items():
            if k == 'File':
                params[k] = file_param.build(v)
            elif k == 'Files':
                results = utils.map_in_parallel(file_param.build, v, convertapi.max_parallel_uploads)

                for idx, val in enumerate(results):
                    key = '%s[%i]' % (k, idx)
                    params[key] = val
            else:
                params[k] = v

        params.update(self.default_params)

        return params

And convertapi/utils.py:

import multiprocessing

def map_in_parallel(f, values, pool_size):
    pool = multiprocessing.Pool(pool_size)
    results = pool.map_async(f, values)
    pool.close()
    pool.join()

    return results.get()

Convert PDF to DOCX: List with bulletpoints has different spaces

Hello,

thanks for your API and your great work.

I have been trying to use the ConvertAPI to convert a PDF to DOCX. It works quite well but I have some problems with lists. There is different space between the bullet and the entry like in this image:

Can I use some additional parameters or do you have any suggestions to solve this problem?

Error while converting files

Getting requests.exceptions.HTTPError: 500 Server Error for pptx to png conversion. Files are successfully converted using web interface https://www.convertapi.com/pptx-to-png

To reproduce: https://repl.it/@ConvertAPI/error

"Unable to access the file." or "Unable to download the file" errors in Django

I keep getting " Unable to access the file. Code: 5008." or "Unable to download the file" errors and I'm not sure why. I've browsed through the website but there's nothing there.

I'm using the API in Django and here's my code:

convertapi.api_secret = 'my-secret'
result = convertapi.convert('pdfa', {'File': currFile[0].file.url})

in which currFile[0].file.url giver the URL of the file i.e. /file/tmp/name-of-file.png.

HTTPSConnectionPool error

I am getting the below error when calling convert api to convert pdf file to a text file. Running the command on a compute engine in google cloud.

ps: I am getting similar error on my local desktop as well. Could you please look into and update.

raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/anaconda/lib/python3.7/site-packages/urllib3/packages/six.py", line 686, in reraise
raise value
File "/usr/local/anaconda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/local/anaconda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 386, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/usr/local/anaconda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 306, in _raise_timeout
raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='v2.convertapi.com', port=443): Read timed out. (read timeout=60)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "t1.py", line 51, in
}, from_format = 'pdf').save_files(DIRECTORY)
File "/usr/local/anaconda/lib/python3.7/site-packages/convertapi/api.py", line 7, in convert
return task.run()
File "/usr/local/anaconda/lib/python3.7/site-packages/convertapi/task.py", line 29, in run
response = convertapi.client.post(path, params, timeout = timeout)
File "/usr/local/anaconda/lib/python3.7/site-packages/convertapi/client.py", line 15, in post
r = requests.post(self.url(path), data = payload, headers = self.headers(), timeout = timeout)
File "/usr/local/anaconda/lib/python3.7/site-packages/requests/api.py", line 119, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/local/anaconda/lib/python3.7/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/anaconda/lib/python3.7/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/anaconda/lib/python3.7/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/usr/local/anaconda/lib/python3.7/site-packages/requests/adapters.py", line 529, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='v2.convertapi.com', port=443): Read timed out. (read timeout=60)

Converting files from data, and not from path

Hi, is there a way to send the data of a file, instead of the a path to the file? I'm asking because I'm generating files with code and then want to convert them to pdfs, without saving them on the computer.
Thanks!

I am facing 500:Internal Server Error for URL in version 1.4.0

Whenever, i am trying to make a call from localhost or from production server to

convertapi.convert('pdf', {
		'File': 'static/output/'+userId+'/excel.xlsx'
	}, from_format = 'xlsx').save_files('static/output/'+userId)

It's throwing the same error and the fun fact is the statistics page shows API was hit and deducts my time for failed attempts also which is totally wrong from a business perspective.

Do not pass any TimeOut property to ConvertAPI and set HTTP Client timeout to 1800 seconds

Do not pass any default TimeOut property to ConvertAPI and set HTTP Client timeout to 1800 seconds. The idea is to use the default converter timeout of ConvertAPI if no timeout is set and to prevent HTTP Client request deadlock and set the default HTTPClient timeout to 1800 seconds.

If, however, timeout is set we handle it as before, pass a timeout to ConvertAPI and set HTTPClient Timeout to: ConvertAPITimeOut+conversion_timeout_delta

conversion_timeout_delta = 10

convert parameters override

When I try to setup the StoreFile parameter in the conversion method

res = convertapi.convert('pdf', params={'File': <UploadIO object>, 'StoreFile': False})

the parameters normalization down below automatically overrides the StoreFile parameter from False to True, actually blocking the in-memory conversion

def __normalize_params(self):
    params = {}

    for k, v in self.params.items():
        if k == 'File':
            params[k] = file_param.build(v)
        elif k == 'Files':
            results = utils.map_in_parallel(file_param.build, v, convertapi.max_parallel_uploads)

            for idx, val in enumerate(results):
                key = '%s[%i]' % (k, idx)
                params[key] = val
        else:
            params[k] = v

    params.update(self.default_params)  <----

    return params

Alternative converter parameter is case-sensitive

The customer complains that the "converter" parameter is case-sensitive and doesn't work as we described in our auto-generated code snippet:

This Python code doesn't switch Converter type - it stays default.

convertapi.api_secret = 'your-api-secret'
convertapi.convert('pdf', {
'File': '/path/to/my_file.csv',
'Converter': 'Printer'
}, from_format = 'csv').save_files('/path/to/dir')

But if I change the "Converter" key "converter" key - it starts working. In GitHub the function uses param["converter"]
https://github.com/ConvertAPI/convertapi-python/blob/master/convertapi/task.py

Suddenly started facing SSL Error

I was trying to integrate convertapi for converting my ppt to pdf,
Till some point of time it was working fine. Then suddenly i started receiving below SSL error message,

HTTPSConnectionPool(host='v2.convertapi.com', port=443): Max retries exceeded with url: /convert/ppt/to/pdf?Secret=uikSzf1fd7jBfddn (Caused by SSLError(SSLError(1, u'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:726)'),))

i was using having set code written in python 2.7
`import os
import requests
import base64
import json

input_filename = "C:/Users/xyz/Desktop/Test_PPT2.ppt"
url = "https://v2.convertapi.com/convert/ppt/to/pdf?Secret=my_secrete_key"

try:
response = requests.post(url, files={'file':open(input_filename, "rb")})
print "response = ", response.status_code
except Exception as e:
print "Broken here", e`

Set conversion location option

Allow developers to set endpoint base URL, and add a description to readme how to do that.

This is example how is done in C#

https://github.com/ConvertAPI/convertapi-dotnet#set-conversion-location-optional

Deactivate SSL Certrificate verification?

Hi, applying the following code I get a SSLCertVerificationError:

import convertapi
convertapi.api_secret = 'your-api-secret'
result = convertapi.convert('pdf', { 'File': '/path/to/my_file.docx' })
result.file.save('/path/to/save/file.pdf')

Is there a possibility to deactivate the verification? E.g. as when using the requests package directly with verify=False.