Giter Club home page Giter Club logo

convertapi-python's People

Contributors

jonasjasas avatar kostas-jonauskas avatar laurynas-convertapi avatar mayaizart avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

convertapi-python's Issues

Incorrect Links in the readme and documentation

I am trying to understand how to specify the page size of the output pdf when I am trying to convert a source excel to PDF.
I came looking for the docs, and found a section called "Additional conversion parameters" in the official documentation as well as the README.
The first paragraph of the section promises that "All conversion parameters and explanations can be found here."

image

If we go to that page, we expect to find a list of all conversion parameters and explanations. Instead, we reach the home page of convertAPI where these details are not available.

How to find that list? Once found, I will send a PR with the link modified.

Files array syntax error in Python snippet

@laurynas-baltsoft Could you confirm that the files array has syntax error for Python https://www.convertapi.com/pdf-to-merge ?

Current snippet

convertapi.convert('merge', {
    'Files[0]': '/path/to/complex.pdf',
    'Files[1]': '/path/to/vector.pdf',
    'Files[2]': '/path/to/my_file.pdf'
}, from_format = 'pdf').save_files('/path/to/dir')

Should be

convertapi.convert('merge', {
    Files: [
      '/path/to/complex.pdf',
      '/path/to/vector.pdf',
      '/path/to/my_file.pdf'
    ]
}, from_format = 'pdf').save_files('/path/to/dir')

Which one is correct?

Unable to parse pdf to csv using convertapi

The code is below. Each of the pages have different orientation and hence it does not convert properly. However I am able to parse the same file successfully to csv file using the converter from Adobe and others. Please let me know if there is any setting which will help to fix the issue. thanks.

import convertapi
convertapi.api_secret = 'xx'
convertapi.convert('csv', {
'File': 'D:\Delme\List-of-Criminally-Charged-Providers.pdf'
}, from_format = 'pdf').save_files('D:\Delme')

The file is
List-of-Criminally-Charged-Providers.pdf

JSONDecodeError when getting/polling a 202/Conversion in progress response

I'm using the asynchronous conversion mode and I'm using the convertpi.client.get method to poll the conversion result.

I'm polling the results only after receiving the webhook confirmation, understanding that the conversion process would have been successful in this case.

For some reason, when polling results from the API after the webhook confirmation, I'm getting a response different than 200 OK. These are the possible status codes:

  • 200 Conversion is successful. Response is a conversion result.
  • 202 Conversion in progress.
  • 404 JobId is invalid or response is expired.
  • 503 No concurrent poll requests are allowed..
  • 5XX Conversion error. Response is an error message.

I mostly get 200 OK responses when polling after the confirmation, but there are cases where I get a 202 Accepted with blank content. I suppose I'm hitting the API too fast and there is some internal propagation still ongoing, but that is not the point anyway, I could simply retry it and it will work.

The problem is that, since the response content is a blank string, and since the get method uses the handle_response method to convert the response to JSON or raise an exception, I'm hitting the unexpected exception json.decoder.JSONDecodeError.

The other error responses would raise_for_status and fail legitimely. The only problem is with the 202 response, that is a successful status (so it is not raised), but can't be returned as json. I wrote a small wrapper to solve this at my side:

def poll_retrieve(job_id):
    convertapi.api_secret = settings.CONVERT_API_SECRET
    response = requests.get(
        convertapi.client.url(f"job/{job_id}"),
        headers=convertapi.client.headers(),
        timeout=convertapi.timeout or None,
    )
    response.raise_for_status()
    if response.status_code == 202:
        raise ConversionInProgressError
    return response.json()

I could write a pull request, but I want to discuss with you the best approach first: is a new exception (like this ConversionInProgressError) a good idea? Or do you think returning a blank list would work better?

Multiprocessing module not allowing children processes

I am seeing this error when running convertapi module inside celery task (even with -P threads). In utils.py the multiprocessing module is not happy when ran inside celery and it fails with:

AssertionError: daemonic processes are not allowed to have children

Is there a workaround to this?

Incompatibility with tkinter?

I'm using the module convertapi to merge pdf files in a tkinter application in Python3.8. When I have some tkinter window in my code, if convertapi.convert('merge', {'Files': input_files}) is called, multiple instances of the tkinter window open. My script:

from tkinter import *
import convertapi

input_files = ["file1.pdf", "file2.pdf", "file3.pdf"]
output_file = "mergedFile.pdf"

def mergePDFs(input_files, output_file):
    convertapi.api_secret = 'my-api-secret'
    result = convertapi.convert('merge', {'Files': input_files})
    result.file.save(output_file)


root = Tk()
Button(root, text="Merge", command=lambda: mergePDFs(input_files, output_file)).pack()
root.mainloop()

A picture of the phenomenon
It's a very weird behavior since even when I call the function in the console with the tkinter window closed beforehand, multiple windows still open up. I'm guessing there is some kind of incompatibility between the two modules but I can't be sure. If it can help, there are 10 more instances of the tkinter window that open up when the funstion is called.

After a bit of digging, it seems that it is due to the fact that multiple processes can't share the same root window in tkinter. And indeed in the code, there is a multiprocessing operation when multiple files are converted. In convertapi/task.py:

def __normalize_params(self):
        params = {}

        for k, v in self.params.items():
            if k == 'File':
                params[k] = file_param.build(v)
            elif k == 'Files':
                results = utils.map_in_parallel(file_param.build, v, convertapi.max_parallel_uploads)

                for idx, val in enumerate(results):
                    key = '%s[%i]' % (k, idx)
                    params[key] = val
            else:
                params[k] = v

        params.update(self.default_params)

        return params

And convertapi/utils.py:

import multiprocessing

def map_in_parallel(f, values, pool_size):
    pool = multiprocessing.Pool(pool_size)
    results = pool.map_async(f, values)
    pool.close()
    pool.join()

    return results.get()

Convert PDF to DOCX: List with bulletpoints has different spaces

Hello,

thanks for your API and your great work.

I have been trying to use the ConvertAPI to convert a PDF to DOCX. It works quite well but I have some problems with lists. There is different space between the bullet and the entry like in this image:

grafik

Can I use some additional parameters or do you have any suggestions to solve this problem?

"Unable to access the file." or "Unable to download the file" errors in Django

I keep getting " Unable to access the file. Code: 5008." or "Unable to download the file" errors and I'm not sure why. I've browsed through the website but there's nothing there.

I'm using the API in Django and here's my code:

convertapi.api_secret = 'my-secret'
result = convertapi.convert('pdfa', {'File': currFile[0].file.url})

in which currFile[0].file.url giver the URL of the file i.e. /file/tmp/name-of-file.png.

HTTPSConnectionPool error

I am getting the below error when calling convert api to convert pdf file to a text file. Running the command on a compute engine in google cloud.

ps: I am getting similar error on my local desktop as well. Could you please look into and update.

raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/anaconda/lib/python3.7/site-packages/urllib3/packages/six.py", line 686, in reraise
raise value
File "/usr/local/anaconda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/local/anaconda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 386, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/usr/local/anaconda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 306, in _raise_timeout
raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='v2.convertapi.com', port=443): Read timed out. (read timeout=60)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "t1.py", line 51, in
}, from_format = 'pdf').save_files(DIRECTORY)
File "/usr/local/anaconda/lib/python3.7/site-packages/convertapi/api.py", line 7, in convert
return task.run()
File "/usr/local/anaconda/lib/python3.7/site-packages/convertapi/task.py", line 29, in run
response = convertapi.client.post(path, params, timeout = timeout)
File "/usr/local/anaconda/lib/python3.7/site-packages/convertapi/client.py", line 15, in post
r = requests.post(self.url(path), data = payload, headers = self.headers(), timeout = timeout)
File "/usr/local/anaconda/lib/python3.7/site-packages/requests/api.py", line 119, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/local/anaconda/lib/python3.7/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/anaconda/lib/python3.7/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/anaconda/lib/python3.7/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/usr/local/anaconda/lib/python3.7/site-packages/requests/adapters.py", line 529, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='v2.convertapi.com', port=443): Read timed out. (read timeout=60)

Converting files from data, and not from path

Hi, is there a way to send the data of a file, instead of the a path to the file? I'm asking because I'm generating files with code and then want to convert them to pdfs, without saving them on the computer.
Thanks!

I am facing 500:Internal Server Error for URL in version 1.4.0

image
Whenever, i am trying to make a call from localhost or from production server to

convertapi.convert('pdf', {
		'File': 'static/output/'+userId+'/excel.xlsx'
	}, from_format = 'xlsx').save_files('static/output/'+userId)

It's throwing the same error and the fun fact is the statistics page shows API was hit and deducts my time for failed attempts also which is totally wrong from a business perspective.
image

Do not pass any TimeOut property to ConvertAPI and set HTTP Client timeout to 1800 seconds

Do not pass any default TimeOut property to ConvertAPI and set HTTP Client timeout to 1800 seconds. The idea is to use the default converter timeout of ConvertAPI if no timeout is set and to prevent HTTP Client request deadlock and set the default HTTPClient timeout to 1800 seconds.

If, however, timeout is set we handle it as before, pass a timeout to ConvertAPI and set HTTPClient Timeout to: ConvertAPITimeOut+conversion_timeout_delta

conversion_timeout_delta = 10

convert parameters override

When I try to setup the StoreFile parameter in the conversion method

res = convertapi.convert('pdf', params={'File': <UploadIO object>, 'StoreFile': False})

the parameters normalization down below automatically overrides the StoreFile parameter from False to True, actually blocking the in-memory conversion

def __normalize_params(self):
    params = {}

    for k, v in self.params.items():
        if k == 'File':
            params[k] = file_param.build(v)
        elif k == 'Files':
            results = utils.map_in_parallel(file_param.build, v, convertapi.max_parallel_uploads)

            for idx, val in enumerate(results):
                key = '%s[%i]' % (k, idx)
                params[key] = val
        else:
            params[k] = v

    params.update(self.default_params)  <----

    return params

Alternative converter parameter is case-sensitive

The customer complains that the "converter" parameter is case-sensitive and doesn't work as we described in our auto-generated code snippet:

image

This Python code doesn't switch Converter type - it stays default.

convertapi.api_secret = 'your-api-secret'
convertapi.convert('pdf', {
'File': '/path/to/my_file.csv',
'Converter': 'Printer'
}, from_format = 'csv').save_files('/path/to/dir')

But if I change the "Converter" key "converter" key - it starts working. In GitHub the function uses param["converter"]
https://github.com/ConvertAPI/convertapi-python/blob/master/convertapi/task.py

Suddenly started facing SSL Error

I was trying to integrate convertapi for converting my ppt to pdf,
Till some point of time it was working fine. Then suddenly i started receiving below SSL error message,

HTTPSConnectionPool(host='v2.convertapi.com', port=443): Max retries exceeded with url: /convert/ppt/to/pdf?Secret=uikSzf1fd7jBfddn (Caused by SSLError(SSLError(1, u'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:726)'),))

i was using having set code written in python 2.7
`import os
import requests
import base64
import json

input_filename = "C:/Users/xyz/Desktop/Test_PPT2.ppt"
url = "https://v2.convertapi.com/convert/ppt/to/pdf?Secret=my_secrete_key"

try:
response = requests.post(url, files={'file':open(input_filename, "rb")})
print "response = ", response.status_code
except Exception as e:
print "Broken here", e`

Deactivate SSL Certrificate verification?

Hi, applying the following code I get a SSLCertVerificationError:

import convertapi
convertapi.api_secret = 'your-api-secret'
result = convertapi.convert('pdf', { 'File': '/path/to/my_file.docx' })
result.file.save('/path/to/save/file.pdf')

Is there a possibility to deactivate the verification? E.g. as when using the requests package directly with verify=False.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.