Giter Club home page Giter Club logo

pdf-title-rename's Introduction

pdf-title-rename

A batch-renaming script for PDF files based on the Title and Author information in the metadata and XMP. The XMP metadata, if available, supersedes the standard PDF metadata. The output format is currently:

[FirstAuthorLastName [LastAuthorLastName ]- ][SanitizedTitleText].pdf

Only the article title is used if no author information is found. First and last author surnames are used if the creator field in the XMP is a list of more than one author.

To use the script, you can list one or more file paths to PDF files that should be processed. If you have downloaded a large number of files in a particular directory, you can use the shell expansion wildcard * to list those files; for example,

pdf-title-rename path/to/folder/*.pdf

will process all .pdf files in path/to/folder. If you want to see what it would do without changing anything, do a dry run with -n. There is also an interactive mode with -i that will let you open the files and manually enter titles and author strings if you have problematic PDFs without proper metadata.

usage: pdf-title-rename [-h] [-n] [-i] [-d DESTINATION] files [files ...]

PDF batch rename

positional arguments:
  files                 list of pdf files to rename

optional arguments:
  -h, --help            show this help message and exit
  -n                    dry-run listing of filename changes
  -i                    interactive mode
  -d DESTINATION, --dest DESTINATION
                        destination folder for renamed files

This script is intended as a first pass in an academic PDF workflow to get browsable filenames for a pile of articles that have been downloaded but not yet filed away.

Requirements

The PDF parsing uses PDFMiner. The XMP parsing uses this xmp module. Important: For XMP parsing to work, you will need to copy the code from that link to an xmp.py file and put it on your PYTHONPATH. (Note, I've now included xmp.py in the repo in case that link goes away; you still have to put it on your python path.)

pdf-title-rename's People

Contributors

jdmonaco avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.