esbenp / pdf-bot Goto Github PK
View Code? Open in Web Editor NEW๐ค A Node queue API for generating PDFs using headless Chrome. Comes with a CLI, S3 storage and webhooks for notifying subscribers about generated PDFs
License: MIT License
๐ค A Node queue API for generating PDFs using headless Chrome. Comes with a CLI, S3 storage and webhooks for notifying subscribers about generated PDFs
License: MIT License
unable to take pdf of pages like https://material.angular.io/components/table/examples
Sorry i'm new here, can anyone tell me if this works on pressreader.com to save newspapers/magazines to pdf..thanks..
I think it would be reasonable to make it possible to generate the pdf's directly from html instead of from visiting a url.
Not sure how html-pdf-chrome
handles this, but Puppeteer has support out of the box, so it should be possible.
When i am trying to generate pdf using CLI, i am getting the following log.
$ DEBUG=pdf:* pdf-bot generate 359621c9-87cc-4070-b648-09aa51d117ba
pdf:cli Creating CLI using config file /Users/venkadesh/Downloads/pdf-bot-master/pdf-bot.config.js +0ms
pdf:generator Creating PDF for url https://esbenp.github.io with options {"completionTrigger":{"timeout":5000,"timeoutMessage":"CompletionTrigger timed out."}} +68ms
pdf:db Logging try for job ID 359621c9-87cc-4070-b648-09aa51d117ba +26s
html-pdf-chrome error: Error: connect ECONNREFUSED ::1:50296 (job ID: 359621c9-87cc-4070-b648-09aa51d117ba. Generation ID: 09864256-00a8-410f-9a7a-68a8e8ba2f05)
Anyone please help me to fix this.
Since cron runs at most once per minute, a PDF job sent to pdf-bot may take one minute to process even if pdf-bot is not busy at the time.
Is there a way to have a shift:all run automatically when an API push is triggered? Or is there a trick to getting pdf-bot to process more quickly?
I see that htmlpdfchrome has this option with:
const pdf = await htmlPdf.create(html, {
port,
printOptions: {
displayHeaderFooter: true,
headerTemplate: `
<div class="text center">
Page <span class="pageNumber"></span> of <span class="totalPages"></span>
</div>
`,
footerTemplate: '<div class="text center">Custom footer!</div>',
},
});
Can I do this on pdf-bot generation?
I think it would be useful to have a real time pdf generation feature where user hits the api and gets pdf in return instead of inserting it in the queue.
Could we have an example of a lambda function for this?
It would be interesting (and probably practical) to add FTP support using the excellent jsftp library for example. Judging from the src/storage/s3.js
implementation, it shouldn't be too hard.
Hi,
I've tried to get this working on a Dockerised node image and natively on a Windows 10 machine. The error is consistent.
TypeError: queue.addToQueue(...).then is not a function
at /usr/local/lib/node_modules/pdf-bot/src/api.js:30:10
at Layer.handle [as handle_request] (/usr/local/lib/node_modules/pdf-bot/node_modules/express/lib/router/layer.js:95:5)
at next (/usr/local/lib/node_modules/pdf-bot/node_modules/express/lib/router/route.js:137:13)
at Route.dispatch (/usr/local/lib/node_modules/pdf-bot/node_modules/express/lib/router/route.js:112:3)
at Layer.handle [as handle_request] (/usr/local/lib/node_modules/pdf-bot/node_modules/express/lib/router/layer.js:95:5)
at /usr/local/lib/node_modules/pdf-bot/node_modules/express/lib/router/index.js:281:22
at Function.process_params (/usr/local/lib/node_modules/pdf-bot/node_modules/express/lib/router/index.js:335:12)
at next (/usr/local/lib/node_modules/pdf-bot/node_modules/express/lib/router/index.js:275:10)
at jsonParser (/usr/local/lib/node_modules/pdf-bot/node_modules/body-parser/lib/types/json.js:119:7)
at Layer.handle [as handle_request] (/usr/local/lib/node_modules/pdf-bot/node_modules/express/lib/router/layer.js:95:5)
The config is:
var htmlPdf = require('html-pdf-chrome')
var lowDB = require('/usr/local/lib/node_modules/pdf-bot/src/db/lowdb')
module.exports = {
api: {
port: process.env.PDF_BOT_PORT,
token: process.env.PDF_BOT_TOKEN
},
generator: {
// Triggers that specify when the PDF should be generated
completionTrigger: new htmlPdf.CompletionTrigger.Timer(process.env.PDF_BOT_COMPLETION_TIMER),
// The port to listen for Chrome (default: 9222)
port: process.env.CHROME_DEBUG_PORT
},
storagePath: 'storage',
db:lowDB({
lowdDbOptions: {},
// path:'./storage'
})
}
Docker image - node version: v10.5.0
Windows node: v8.9.4
pdf-bot: 0.5.4
With and without db
options in the config but to with no luck, what am I doing wrong?
Thanks
Configuration is cleaner when the request can include the callback url - also supports case where different classes / types of documents originate from different places and thus need multiple callback urls...
For our use case, the form is a formal document representation of other functionality, as such, there is no need for the overhead of having a static form page created for each instance. Preferable would be to have a template url that can consume a json payload via POST, render the form with the mapped in data, and then pass this data via the same request to pdf bot - basically thinking of having a 3rd optional parameter called formdata, an arbitrary object blob, and having a value in this field changes the request to a post.
This is so compelling for us vs having to instantiate every form prior to making the pdf-bot request, that I may end up forking and implementing, if you are at all interested in a PR...
This is really an issue with me getting pdf-bot to work at all. Right now I can send POSTs and they get added to the job queue as completed, but no PDF is generated. The response JSON seems to confirm that nothing is being done:
{
"meta": {
"type": "invoice",
"id": 1
},
"url": "https://localhost:3001/test",
"id": "ae6f6151-cb6a-4d21-914b-a10a8154ff94",
"created_at": "Wed, 27 Sep 2017 16:55:30 GMT",
"completed_at": null,
"generations": [],
"pings": [],
"storage": {}
}
My /test
endpoint is never touched as well as the /hook
. I have google-chrome running via pm2 and am running pdf-bot with a basic local storage config file. I can't get any errors to display anywhere.
201 Created
should not be the response if nothing is actually created.Config if it helps:
var htmlPdf = require('html-pdf-chrome');
module.exports = {
api: {
port: 3000,
token: 'api-token'
},
generator: {
completionTrigger: new htmlPdf.CompletionTrigger.Timer(1000) // 1 sec timeout
},
storagePath: "/Users/matthewmolnar/junk/pdf-storage",
webhook: {
secret: '12345',
url: 'http://localhost:3001/hook'
}
}
Hi,
Thanks for the wonderful library and all the effort that you invested. I have a situation where I'd prefer to send raw html to the bot and get back pdf in return, so no urls just html in POST request, is that possible? Thanks.
Hi cool lib!
I was wondering how to go about doing something like shift all? I have a group of PDFs i send in quick succession and would like to get them generated and saved to S3 consecutively.
I'm not familiar with the commands inside the bin folder. but i'm guessing looping through the queue and calling shift each time?
I'm looking for feedback from @esbenp before I dig into a PR for this.
I'd like to adapt pdf-bot
to be a scaleable pdf rendering microservice which can have resources added/removed on demand to handle workload fluctuations.
Because of pdf-bot's PostgreSQL database wide queue locking, only one machine can render pdf's for the given API endpoint at a time.
Because PG is a shared database, it' possible to scale the work load horizontally across many machines in parallel. To accomplish this, we would need to change the queue locking mechanism to be on a per-job basis, and adapt the generation commands (shift:all
comes to mind) to support this.
There are a few concerns here:
processing_started_at
date column to the jobs tablegetAllUnfinished
to select jobs where they aren't completed and processing_started_at
is greater than a given a configurable amount (30 sec default maybe)isBusy
calls return false always (maybe?)shift
, shift:all
, etc) to handle the possibility of getting an empty array of jobs instead of relying on an isBusy
call (maybe?)setIsBusy
calls (maybe?)LowDb
as well (maybe?)worker
tableid | processing_started_at | completed_at |
---|---|---|
1 | 2018-01-08 17:31:17.825153 | 2018-01-08 17:31:48.925153 |
2 | 2018-01-08 17:31:17.825153 | null |
3 | 2018-01-08 17:31:47.925153 | null |
4 | 2018-01-08 17:31:48.925153 | null |
5 | null | null |
6 | null | null |
Given this sample data, jobs 2, 5, and 6 would be eligible for the next generation worker to start processing, while jobs 3 and 4 are assumed to be currently processing.
If this all sounds like too big of an overhaul, I'd be open to other suggestions. I'd also be willing to add the support to a new Redis database adapter instead as well.
I am creating function app for pushing iot hub messages to application, using nodejs. But some times messages are grouping together. Can U please look at this issue.
I installed pdf-bot using npm but after installing whenever I run command
pdf-bot install
I get following error:
/usr/lib/node_modules/pdf-bot/bin/pdf-bot.js:328
function openConfig(delayQueueCreation = false) {
^
SyntaxError: Unexpected token =
at exports.runInThisContext (vm.js:53:16)
at Module._compile (module.js:373:25)
at Object.Module._extensions..js (module.js:416:10)
at Module.load (module.js:343:32)
at Function.Module._load (module.js:300:12)
at Function.Module.runMain (module.js:441:10)
at startup (node.js:139:18)
at node.js:968:3
I don't know why it is throwing this error. Is it because of the node version?
Any chance of a docker build to use. It would be great
The Url for webhooks are lower-cased when resolving dns if prefixed with http.
var LowDB = require('./src/db/lowdb.js')
var htmlPdf = require('html-pdf-chrome')
module.exports = {
api: {
port: 3000,
token: 'api-token'
},
db: LowDB({
lowDbOptions: {},
path: 'pdf-storage/db/db'
}),
storagePath: "pdf-storage",
webhook: {
secret: '1234',
url: 'http://MY_URL:5000/api/internals/webhooks/pdf'
},
generator: {
// Triggers that specify when the PDF should be generated
completionTrigger: new htmlPdf.CompletionTrigger.Timer(1000), // waits for 1 sec
// The port to listen for Chrome (default: 9222)
port: 9222,
printOptions: {
printBackground: true
}
}
}
Will result with error
Ping failed: {"id":"8de73c3c-21ef-47e7-91b2-cbb64c1a4d80","method":"POST","payload":{"id":"75db4451-ad57-49af-9265-a547855c7ff1","url":"https://test","meta":{"scheduledreporttaskid":"5fdbf776-c35d-45ad-9e44-0be5688d3a9d"},"storage":{"local":"pdf-storage/pdf/38cfee13-4d43-418e-ac34-bc70880995d1.pdf"}},"response":{"name":"FetchError","message":"request to http://MY_URL:5000/api/internals/webhooks/pdf failed, reason: getaddrinfo ENOTFOUND my_url my_url:5000","type":"system","errno":"ENOTFOUND","code":"ENOTFOUND"},"url":"http://MY_URL:5000/api/internals/webhooks/pdf","sent_at":"Tue, 15 Oct 2019 07:38:03 GMT","error":true}
npm WARN deprecated [email protected]: This package is no longer supported. It's now a built-in Node module. If you've depended on crypto, you should switch to the one that's built-in.
As the S3 support I would like to see the https://www.backblaze.com/ support.
Installing pdf-bot via npm outputs the following warning:
npm WARN deprecated [email protected]: This package is no longer supported. It's now a built-in Node module. If you've depended on crypto, you should switch to the one that's built-in.
I have been trying a lot of variants to push a job and then generate PDF file, however haven't succeeded in. It seems like db.json is not used at all, it's being stayed empty. I don't know what is in Postgres regard, perhaps there is the same issue. Could you check the reason please?
I tried executing the pdf-bot shift with aws s3 URL but it's not working. It is stuck on blinking the cmd cursor and nothing happens. But when I tried to execute the pdf-bot shift command with another website URL, it is working. Is there any problem with my configs or setup?Please help me..Thanks..
UPDATE: Already solved this issue. Just added "ContentType": "text/html" property on my s3.upload options.
D:\Xampp\htdocs\pdf-bot>createdbjs pdfbot --user=pdfbot --password=pdfbot
Error: connect ECONNREFUSED 127.0.0.1:5432
var pgsql = require('pdf-bot/src/db/pgsql')
module.exports = {
api: {
token: 'api-token'
},
db: pgsql({
database: 'pdfbot',
username: 'pdfbot',
password: 'pdfbot',
port: 5432
}),
webhook: {
secret: '1234',
url: 'http://localhost:3000/webhooks/pdf'
}
}
Can you change pls "username" to "user" cause it's the the correct option there?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.