esbenp / pdf-bot Goto Github PK

🤖 A Node queue API for generating PDFs using headless Chrome. Comes with a CLI, S3 storage and webhooks for notifying subscribers about generated PDFs

License: MIT License

JavaScript 100.00%

html pdf headless headless-chrome pdf-generator nodejs node-js pdf-generation chromium headless-chromium

pdf-bot's Issues

Full page contents are scattering if page has full width

unable to take pdf of pages like https://material.angular.io/components/table/examples

"Question"

Sorry i'm new here, can anyone tell me if this works on pressreader.com to save newspapers/magazines to pdf..thanks..

Support for posting the html instead of adding a url

I think it would be reasonable to make it possible to generate the pdf's directly from html instead of from visiting a url.

Not sure how html-pdf-chrome handles this, but Puppeteer has support out of the box, so it should be possible.

Run faster than once per minute?

Since cron runs at most once per minute, a PDF job sent to pdf-bot may take one minute to process even if pdf-bot is not busy at the time.

Is there a way to have a shift:all run automatically when an API push is triggered? Or is there a trick to getting pdf-bot to process more quickly?

Add custom header and footer to generated pdf

I see that htmlpdfchrome has this option with:

const pdf = await htmlPdf.create(html, {
  port,
  printOptions: {
    displayHeaderFooter: true,
    headerTemplate: `
      <div class="text center">
        Page <span class="pageNumber"></span> of <span class="totalPages"></span>
      </div>
    `,
    footerTemplate: '<div class="text center">Custom footer!</div>',
  },
});

Can I do this on pdf-bot generation?

Realtime Support

I think it would be useful to have a real time pdf generation feature where user hits the api and gets pdf in return instead of inserting it in the queue.

Lambda function

Could we have an example of a lambda function for this?

Add FTP storage support

It would be interesting (and probably practical) to add FTP support using the excellent jsftp library for example. Judging from the src/storage/s3.js implementation, it shouldn't be too hard.

lowdb -> queue.addToQueue(...).then is not a function

Hi,

I've tried to get this working on a Dockerised node image and natively on a Windows 10 machine. The error is consistent.

TypeError: queue.addToQueue(...).then is not a function
    at /usr/local/lib/node_modules/pdf-bot/src/api.js:30:10
    at Layer.handle [as handle_request] (/usr/local/lib/node_modules/pdf-bot/node_modules/express/lib/router/layer.js:95:5)
    at next (/usr/local/lib/node_modules/pdf-bot/node_modules/express/lib/router/route.js:137:13)
    at Route.dispatch (/usr/local/lib/node_modules/pdf-bot/node_modules/express/lib/router/route.js:112:3)
    at Layer.handle [as handle_request] (/usr/local/lib/node_modules/pdf-bot/node_modules/express/lib/router/layer.js:95:5)
    at /usr/local/lib/node_modules/pdf-bot/node_modules/express/lib/router/index.js:281:22
    at Function.process_params (/usr/local/lib/node_modules/pdf-bot/node_modules/express/lib/router/index.js:335:12)
    at next (/usr/local/lib/node_modules/pdf-bot/node_modules/express/lib/router/index.js:275:10)
    at jsonParser (/usr/local/lib/node_modules/pdf-bot/node_modules/body-parser/lib/types/json.js:119:7)
    at Layer.handle [as handle_request] (/usr/local/lib/node_modules/pdf-bot/node_modules/express/lib/router/layer.js:95:5)

The config is:

var htmlPdf = require('html-pdf-chrome')
var lowDB = require('/usr/local/lib/node_modules/pdf-bot/src/db/lowdb')

module.exports = {
    api: {
        port: process.env.PDF_BOT_PORT,
        token: process.env.PDF_BOT_TOKEN
    },
    generator: {
        // Triggers that specify when the PDF should be generated
        completionTrigger: new htmlPdf.CompletionTrigger.Timer(process.env.PDF_BOT_COMPLETION_TIMER),
        // The port to listen for Chrome (default: 9222)
        port: process.env.CHROME_DEBUG_PORT
    },
    storagePath: 'storage',
    db:lowDB({
        lowdDbOptions: {},
//        path:'./storage'
    })
}

Docker image - node version: v10.5.0
Windows node: v8.9.4
pdf-bot: 0.5.4

With and without db options in the config but to with no luck, what am I doing wrong?

Thanks

Make webhook url an optional parameter in the request

Configuration is cleaner when the request can include the callback url - also supports case where different classes / types of documents originate from different places and thus need multiple callback urls...

Add a parameter to the request to support postback to the url

For our use case, the form is a formal document representation of other functionality, as such, there is no need for the overhead of having a static form page created for each instance. Preferable would be to have a template url that can consume a json payload via POST, render the form with the mapped in data, and then pass this data via the same request to pdf bot - basically thinking of having a 3rd optional parameter called formdata, an arbitrary object blob, and having a value in this field changes the request to a post.

This is so compelling for us vs having to instantiate every form prior to making the pdf-bot request, that I may end up forking and implementing, if you are at all interested in a PR...

Returns 201 even if nothing created

This is really an issue with me getting pdf-bot to work at all. Right now I can send POSTs and they get added to the job queue as completed, but no PDF is generated. The response JSON seems to confirm that nothing is being done:

{
    "meta": {
        "type": "invoice",
        "id": 1
    },
    "url": "https://localhost:3001/test",
    "id": "ae6f6151-cb6a-4d21-914b-a10a8154ff94",
    "created_at": "Wed, 27 Sep 2017 16:55:30 GMT",
    "completed_at": null,
    "generations": [],
    "pings": [],
    "storage": {}
}

My /test endpoint is never touched as well as the /hook. I have google-chrome running via pm2 and am running pdf-bot with a basic local storage config file. I can't get any errors to display anywhere.

At the very least, I think that 201 Created should not be the response if nothing is actually created.
Is there anything obvious I'm missing in getting pdf-bot working? Thanks.

Config if it helps:

var htmlPdf = require('html-pdf-chrome');
module.exports = {
  api: {
    port: 3000,
    token: 'api-token'
  },
  generator: {
    completionTrigger: new htmlPdf.CompletionTrigger.Timer(1000) // 1 sec timeout
  },
  storagePath: "/Users/matthewmolnar/junk/pdf-storage",
  webhook: {
    secret: '12345',
    url: 'http://localhost:3001/hook'
  }
}

Possibility to send html instead of url?

Hi,
Thanks for the wonderful library and all the effort that you invested. I have a situation where I'd prefer to send raw html to the bot and get back pdf in return, so no urls just html in POST request, is that possible? Thanks.

Shift all?

Hi cool lib!

I was wondering how to go about doing something like shift all? I have a group of PDFs i send in quick succession and would like to get them generated and saved to S3 consecutively.

I'm not familiar with the commands inside the bin folder. but i'm guessing looping through the queue and calling shift each time?

pdf-bot limited to one machine rendering pdf's

I'm looking for feedback from @esbenp before I dig into a PR for this.

Goal:

I'd like to adapt pdf-bot to be a scaleable pdf rendering microservice which can have resources added/removed on demand to handle workload fluctuations.

Problem:

Because of pdf-bot's PostgreSQL database wide queue locking, only one machine can render pdf's for the given API endpoint at a time.

Because PG is a shared database, it' possible to scale the work load horizontally across many machines in parallel. To accomplish this, we would need to change the queue locking mechanism to be on a per-job basis, and adapt the generation commands (shift:all comes to mind) to support this.

There are a few concerns here:

This would require a database migration of some sort to support
Process crashes, unhandled errors, etc could result in jobs never being processed if implemented poorly
?

Purposed implementation:

Add a processing_started_at date column to the jobs table
Adapt getAllUnfinished to select jobs where they aren't completed and processing_started_at is greater than a given a configurable amount (30 sec default maybe)
Make isBusy calls return false always (maybe?)
Adapt cli scripts (shift, shift:all, etc) to handle the possibility of getting an empty array of jobs instead of relying on an isBusy call (maybe?)
Remove setIsBusy calls (maybe?)
Add changes to LowDb as well (maybe?)
Remove worker table

id	processing_started_at	completed_at
1	2018-01-08 17:31:17.825153	2018-01-08 17:31:48.925153
2	2018-01-08 17:31:17.825153	null
3	2018-01-08 17:31:47.925153	null
4	2018-01-08 17:31:48.925153	null
5	null	null
6	null	null

Given this sample data, jobs 2, 5, and 6 would be eligible for the next generation worker to start processing, while jobs 3 and 4 are assumed to be currently processing.

If this all sounds like too big of an overhaul, I'd be open to other suggestions. I'd also be willing to add the support to a new Redis database adapter instead as well.

FunctionApp

I am creating function app for pushing iot hub messages to application, using nodejs. But some times messages are grouping together. Can U please look at this issue.

Not able to install pdf-bot

I installed pdf-bot using npm but after installing whenever I run command
pdf-bot install I get following error:

/usr/lib/node_modules/pdf-bot/bin/pdf-bot.js:328
function openConfig(delayQueueCreation = false) {
                                       ^

SyntaxError: Unexpected token =
    at exports.runInThisContext (vm.js:53:16)
    at Module._compile (module.js:373:25)
    at Object.Module._extensions..js (module.js:416:10)
    at Module.load (module.js:343:32)
    at Function.Module._load (module.js:300:12)
    at Function.Module.runMain (module.js:441:10)
    at startup (node.js:139:18)
    at node.js:968:3

I don't know why it is throwing this error. Is it because of the node version?

Docker build?

Any chance of a docker build to use. It would be great

Webhook Url is interpreted as lowercase if prefixed by http

The Url for webhooks are lower-cased when resolving dns if prefixed with http.

var LowDB = require('./src/db/lowdb.js')
var htmlPdf = require('html-pdf-chrome')

module.exports = {
  api: {
    port: 3000,
    token: 'api-token'
  },
  db: LowDB({
    lowDbOptions: {},
    path: 'pdf-storage/db/db'
  }),
  storagePath: "pdf-storage",
  webhook: {
    secret: '1234',
    url: 'http://MY_URL:5000/api/internals/webhooks/pdf'
  },
  generator: {
    // Triggers that specify when the PDF should be generated
    completionTrigger: new htmlPdf.CompletionTrigger.Timer(1000), // waits for 1 sec
    // The port to listen for Chrome (default: 9222)
    port: 9222,
    printOptions: {
        printBackground: true
    }
  }
}

Will result with error

Ping failed: {"id":"8de73c3c-21ef-47e7-91b2-cbb64c1a4d80","method":"POST","payload":{"id":"75db4451-ad57-49af-9265-a547855c7ff1","url":"https://test","meta":{"scheduledreporttaskid":"5fdbf776-c35d-45ad-9e44-0be5688d3a9d"},"storage":{"local":"pdf-storage/pdf/38cfee13-4d43-418e-ac34-bc70880995d1.pdf"}},"response":{"name":"FetchError","message":"request to http://MY_URL:5000/api/internals/webhooks/pdf failed, reason: getaddrinfo ENOTFOUND my_url my_url:5000","type":"system","errno":"ENOTFOUND","code":"ENOTFOUND"},"url":"http://MY_URL:5000/api/internals/webhooks/pdf","sent_at":"Tue, 15 Oct 2019 07:38:03 GMT","error":true}

Installing package(npm install azure-iothub --save)

npm WARN deprecated [email protected]: This package is no longer supported. It's now a built-in Node module. If you've depended on crypto, you should switch to the one that's built-in.

Add storage Support for BlackBlaze B2 Cloud Storage

As the S3 support I would like to see the https://www.backblaze.com/ support.

Disable crypto dependency

Installing pdf-bot via npm outputs the following warning:

npm WARN deprecated [email protected]: This package is no longer supported. It's now a built-in Node module. If you've depended on crypto, you should switch to the one that's built-in.

"Job not found" error using Lowdb as a storage of queue

I have been trying a lot of variants to push a job and then generate PDF file, however haven't succeeded in. It seems like db.json is not used at all, it's being stayed empty. I don't know what is in Postgres regard, perhaps there is the same issue. Could you check the reason please?

Shift command with AWS S3 URL not working

I tried executing the pdf-bot shift with aws s3 URL but it's not working. It is stuck on blinking the cmd cursor and nothing happens. But when I tried to execute the pdf-bot shift command with another website URL, it is working. Is there any problem with my configs or setup?Please help me..Thanks..

UPDATE: Already solved this issue. Just added "ContentType": "text/html" property on my s3.upload options.

when create database we got error to get database connection

D:\Xampp\htdocs\pdf-bot>createdbjs pdfbot --user=pdfbot --password=pdfbot
Error: connect ECONNREFUSED 127.0.0.1:5432

Typo in docs

var pgsql = require('pdf-bot/src/db/pgsql')

module.exports = {
  api: {
    token: 'api-token'
  },
  db: pgsql({
    database: 'pdfbot',
    username: 'pdfbot',
    password: 'pdfbot',
    port: 5432
  }),
  webhook: {
    secret: '1234',
    url: 'http://localhost:3000/webhooks/pdf'
  }
}

Can you change pls "username" to "user" cause it's the the correct option there?

esbenp / pdf-bot Goto Github PK

pdf-bot's Issues

Goal:

Problem:

Purposed implementation:

Recommend Projects

Recommend Topics

Recommend Org