Giter Club home page Giter Club logo

awesome-pdf's Introduction

Awesome PDF Awesome

A curated list of resources around PDF files

The File Format

Viewers

  • KOReader: a document viewer primarily aimed at e-ink readers
  • react-native-pdf: a react native PDF view component
  • PdfViewPager: Android widget to display PDF documents in your Activities or Fragments
  • vue-pdf: vue.js pdf viewer

Data Extraction

  • pdftotext: an application that converts Portable Document Format (PDF) files to plain text. Part of poppler-utils.
  • pdfminer.six: a Python library for extracting information from PDF documents
    • pdfplumber: Plumb a PDF for detailed information about each text character, rectangle, and line. Plus: Table extraction and visual debugging.
  • Tabula: an application for extracting tables
  • camelot: PDF Table Extraction
  • awesome-document-understanding: A curated list of resources for Document Understanding (DU) topic

Generators

Anything that can produce PDF files from scratch:

  • pdflatex (e.g. in TexLive): A LaTeX-to-PDF converter
  • reportlab: The ReportLab Toolkit. An Open Source Python library for generating PDFs and graphics.
  • prawn: a pure Ruby PDF generation library
  • react-pdf: Create PDF files using React
  • markdown-pdf: Markdown to PDF converter
  • mpdf: PHP library generating PDF files from UTF-8 encoded HTML

Manipulators

Anything that's used to edit an existing PDF file:

  • pdfarranger: a small python-gtk application, which helps the user to merge or split pdf documents and rotate, crop and rearrange their pages using a graphical interface
  • OCRmyPDF: adds an OCR text layer to scanned PDF files, allowing them to be searched

File Analysis / Security

  • Pdfalyzer: PDF analysis tool to visualize the internal data structure of a PDF in large and colorful diagrams as well as scanning the binary streams embedded in the PDF against a collection of malicious PDF specific YARA rules.
  • Malicious PDF Generator: generate a bunch of malicious pdf files with phone-home functionality

Multi-Purpose Libraries

  • pdftk: command-line tool for working with PDFs. It is commonly used for client-side scripting or server-side processing of PDFs.
  • PyPDF2 : a free and open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
  • pikepdf : a Python library for reading and writing PDF, powered by qpdf
  • PyMuPDF : Python bindings to MuPDF.
  • pypdfium2 : Python bindings to PDFium.
  • borb : reading, creating and manipulating PDF files in python
  • pdfcpu : batch processing and scripting via a rich command line
  • pdf-lib : Create and modify PDF documents in any JavaScript environment

awesome-pdf's People

Contributors

martinthoma avatar adehad avatar michelcrypt4d4mus avatar mara004 avatar

Stargazers

pengbo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.