convertapi / convertapi-python Goto Github PK
View Code? Open in Web Editor NEWA Python library for the ConvertAPI
Home Page: https://www.convertapi.com
License: Other
A Python library for the ConvertAPI
Home Page: https://www.convertapi.com
License: Other
I am trying to understand how to specify the page size of the output pdf when I am trying to convert a source excel to PDF.
I came looking for the docs, and found a section called "Additional conversion parameters" in the official documentation as well as the README.
The first paragraph of the section promises that "All conversion parameters and explanations can be found here."
If we go to that page, we expect to find a list of all conversion parameters and explanations. Instead, we reach the home page of convertAPI where these details are not available.
How to find that list? Once found, I will send a PR with the link modified.
I have tried converting a .docx file to PDF using convertAPI, The .docx file contains some emojis that were converted into bullets. How can I solve this issue? this API is working fine only issue is with emojis.
How could I use this library if my server is behind a proxy in an internal network?
@laurynas-baltsoft Could you confirm that the files array has syntax error for Python https://www.convertapi.com/pdf-to-merge ?
Current snippet
convertapi.convert('merge', {
'Files[0]': '/path/to/complex.pdf',
'Files[1]': '/path/to/vector.pdf',
'Files[2]': '/path/to/my_file.pdf'
}, from_format = 'pdf').save_files('/path/to/dir')
Should be
convertapi.convert('merge', {
Files: [
'/path/to/complex.pdf',
'/path/to/vector.pdf',
'/path/to/my_file.pdf'
]
}, from_format = 'pdf').save_files('/path/to/dir')
Which one is correct?
Tested PDF to Squeeze with 90MB Pdf file:
https://www.convertapi.com/a/api/pdf-to-squeeze#snippet=python
It took 4.5-5s to convert + file upload and download couple seconds. So <10s overall.
Would be possible to update method ResultFile.save() or create a new method which will be able to store downloaded content of the converted file into a stringIO/cStringIO object?
Implement token authentication method.
https://www.convertapi.com/doc/auth#token
The code is below. Each of the pages have different orientation and hence it does not convert properly. However I am able to parse the same file successfully to csv file using the converter from Adobe and others. Please let me know if there is any setting which will help to fix the issue. thanks.
import convertapi
convertapi.api_secret = 'xx'
convertapi.convert('csv', {
'File': 'D:\Delme\List-of-Criminally-Charged-Providers.pdf'
}, from_format = 'pdf').save_files('D:\Delme')
The file is
List-of-Criminally-Charged-Providers.pdf
I'm using the asynchronous conversion mode and I'm using the convertpi.client.get method to poll the conversion result.
I'm polling the results only after receiving the webhook confirmation, understanding that the conversion process would have been successful in this case.
For some reason, when polling results from the API after the webhook confirmation, I'm getting a response different than 200 OK. These are the possible status codes:
I mostly get 200 OK responses when polling after the confirmation, but there are cases where I get a 202 Accepted with blank content. I suppose I'm hitting the API too fast and there is some internal propagation still ongoing, but that is not the point anyway, I could simply retry it and it will work.
The problem is that, since the response content is a blank string, and since the get method uses the handle_response method to convert the response to JSON or raise an exception, I'm hitting the unexpected exception json.decoder.JSONDecodeError.
The other error responses would raise_for_status and fail legitimely. The only problem is with the 202 response, that is a successful status (so it is not raised), but can't be returned as json. I wrote a small wrapper to solve this at my side:
def poll_retrieve(job_id):
convertapi.api_secret = settings.CONVERT_API_SECRET
response = requests.get(
convertapi.client.url(f"job/{job_id}"),
headers=convertapi.client.headers(),
timeout=convertapi.timeout or None,
)
response.raise_for_status()
if response.status_code == 202:
raise ConversionInProgressError
return response.json()
I could write a pull request, but I want to discuss with you the best approach first: is a new exception (like this ConversionInProgressError) a good idea? Or do you think returning a blank list would work better?
I am seeing this error when running convertapi module inside celery task (even with -P threads). In utils.py the multiprocessing
module is not happy when ran inside celery and it fails with:
AssertionError: daemonic processes are not allowed to have children
Is there a workaround to this?
The same as in Ruby lib ConvertAPI/convertapi-ruby#10
I'm using the module convertapi to merge pdf files in a tkinter application in Python3.8. When I have some tkinter window in my code, if convertapi.convert('merge', {'Files': input_files})
is called, multiple instances of the tkinter window open. My script:
from tkinter import *
import convertapi
input_files = ["file1.pdf", "file2.pdf", "file3.pdf"]
output_file = "mergedFile.pdf"
def mergePDFs(input_files, output_file):
convertapi.api_secret = 'my-api-secret'
result = convertapi.convert('merge', {'Files': input_files})
result.file.save(output_file)
root = Tk()
Button(root, text="Merge", command=lambda: mergePDFs(input_files, output_file)).pack()
root.mainloop()
A picture of the phenomenon
It's a very weird behavior since even when I call the function in the console with the tkinter window closed beforehand, multiple windows still open up. I'm guessing there is some kind of incompatibility between the two modules but I can't be sure. If it can help, there are 10 more instances of the tkinter window that open up when the funstion is called.
After a bit of digging, it seems that it is due to the fact that multiple processes can't share the same root window in tkinter. And indeed in the code, there is a multiprocessing operation when multiple files are converted. In convertapi/task.py:
def __normalize_params(self):
params = {}
for k, v in self.params.items():
if k == 'File':
params[k] = file_param.build(v)
elif k == 'Files':
results = utils.map_in_parallel(file_param.build, v, convertapi.max_parallel_uploads)
for idx, val in enumerate(results):
key = '%s[%i]' % (k, idx)
params[key] = val
else:
params[k] = v
params.update(self.default_params)
return params
And convertapi/utils.py:
import multiprocessing
def map_in_parallel(f, values, pool_size):
pool = multiprocessing.Pool(pool_size)
results = pool.map_async(f, values)
pool.close()
pool.join()
return results.get()
Hello,
thanks for your API and your great work.
I have been trying to use the ConvertAPI to convert a PDF to DOCX. It works quite well but I have some problems with lists. There is different space between the bullet and the entry like in this image:
Can I use some additional parameters or do you have any suggestions to solve this problem?
Getting requests.exceptions.HTTPError: 500 Server Error
for pptx to png conversion. Files are successfully converted using web interface https://www.convertapi.com/pptx-to-png
To reproduce: https://repl.it/@ConvertAPI/error
I keep getting " Unable to access the file. Code: 5008." or "Unable to download the file" errors and I'm not sure why. I've browsed through the website but there's nothing there.
I'm using the API in Django and here's my code:
convertapi.api_secret = 'my-secret'
result = convertapi.convert('pdfa', {'File': currFile[0].file.url})
in which currFile[0].file.url giver the URL of the file i.e. /file/tmp/name-of-file.png.
I am getting the below error when calling convert api to convert pdf file to a text file. Running the command on a compute engine in google cloud.
ps: I am getting similar error on my local desktop as well. Could you please look into and update.
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/anaconda/lib/python3.7/site-packages/urllib3/packages/six.py", line 686, in reraise
raise value
File "/usr/local/anaconda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/local/anaconda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 386, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/usr/local/anaconda/lib/python3.7/site-packages/urllib3/connectionpool.py", line 306, in _raise_timeout
raise ReadTimeoutError(self, url, "Read timed out. (read timeout=%s)" % timeout_value)
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='v2.convertapi.com', port=443): Read timed out. (read timeout=60)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "t1.py", line 51, in
}, from_format = 'pdf').save_files(DIRECTORY)
File "/usr/local/anaconda/lib/python3.7/site-packages/convertapi/api.py", line 7, in convert
return task.run()
File "/usr/local/anaconda/lib/python3.7/site-packages/convertapi/task.py", line 29, in run
response = convertapi.client.post(path, params, timeout = timeout)
File "/usr/local/anaconda/lib/python3.7/site-packages/convertapi/client.py", line 15, in post
r = requests.post(self.url(path), data = payload, headers = self.headers(), timeout = timeout)
File "/usr/local/anaconda/lib/python3.7/site-packages/requests/api.py", line 119, in post
return request('post', url, data=data, json=json, **kwargs)
File "/usr/local/anaconda/lib/python3.7/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/anaconda/lib/python3.7/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/anaconda/lib/python3.7/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/usr/local/anaconda/lib/python3.7/site-packages/requests/adapters.py", line 529, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='v2.convertapi.com', port=443): Read timed out. (read timeout=60)
Hi, is there a way to send the data of a file, instead of the a path to the file? I'm asking because I'm generating files with code and then want to convert them to pdfs, without saving them on the computer.
Thanks!
Whenever, i am trying to make a call from localhost or from production server to
convertapi.convert('pdf', {
'File': 'static/output/'+userId+'/excel.xlsx'
}, from_format = 'xlsx').save_files('static/output/'+userId)
It's throwing the same error and the fun fact is the statistics page shows API was hit and deducts my time for failed attempts also which is totally wrong from a business perspective.
Do not pass any default TimeOut property to ConvertAPI and set HTTP Client timeout to 1800 seconds. The idea is to use the default converter timeout of ConvertAPI if no timeout is set and to prevent HTTP Client request deadlock and set the default HTTPClient timeout to 1800 seconds.
If, however, timeout is set we handle it as before, pass a timeout to ConvertAPI and set HTTPClient Timeout to: ConvertAPITimeOut+conversion_timeout_delta
conversion_timeout_delta = 10
When I try to setup the StoreFile
parameter in the conversion
method
res = convertapi.convert('pdf', params={'File': <UploadIO object>, 'StoreFile': False})
the parameters normalization down below automatically overrides the StoreFile
parameter from False
to True
, actually blocking the in-memory conversion
def __normalize_params(self):
params = {}
for k, v in self.params.items():
if k == 'File':
params[k] = file_param.build(v)
elif k == 'Files':
results = utils.map_in_parallel(file_param.build, v, convertapi.max_parallel_uploads)
for idx, val in enumerate(results):
key = '%s[%i]' % (k, idx)
params[key] = val
else:
params[k] = v
params.update(self.default_params) <----
return params
The customer complains that the "converter" parameter is case-sensitive and doesn't work as we described in our auto-generated code snippet:
This Python code doesn't switch Converter type - it stays default.
convertapi.api_secret = 'your-api-secret'
convertapi.convert('pdf', {
'File': '/path/to/my_file.csv',
'Converter': 'Printer'
}, from_format = 'csv').save_files('/path/to/dir')
But if I change the "Converter" key "converter" key - it starts working. In GitHub the function uses param["converter"]
https://github.com/ConvertAPI/convertapi-python/blob/master/convertapi/task.py
I was trying to integrate convertapi for converting my ppt to pdf,
Till some point of time it was working fine. Then suddenly i started receiving below SSL error message,
HTTPSConnectionPool(host='v2.convertapi.com', port=443): Max retries exceeded with url: /convert/ppt/to/pdf?Secret=uikSzf1fd7jBfddn (Caused by SSLError(SSLError(1, u'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:726)'),))
i was using having set code written in python 2.7
`import os
import requests
import base64
import json
input_filename = "C:/Users/xyz/Desktop/Test_PPT2.ppt"
url = "https://v2.convertapi.com/convert/ppt/to/pdf?Secret=my_secrete_key"
try:
response = requests.post(url, files={'file':open(input_filename, "rb")})
print "response = ", response.status_code
except Exception as e:
print "Broken here", e`
Allow developers to set endpoint base URL, and add a description to readme how to do that.
This is example how is done in C#
https://github.com/ConvertAPI/convertapi-dotnet#set-conversion-location-optional
Hi, applying the following code I get a SSLCertVerificationError:
import convertapi
convertapi.api_secret = 'your-api-secret'
result = convertapi.convert('pdf', { 'File': '/path/to/my_file.docx' })
result.file.save('/path/to/save/file.pdf')
Is there a possibility to deactivate the verification? E.g. as when using the requests package directly with verify=False.
pip install --upgrade convertapi
I got below error. Please help me
ERROR: Could not find a version that satisfies the requirement convertapi (from versions: none)
ERROR: No matching distribution found for convertapi
Regards
Asish
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.