jdmonaco / pdf-title-rename Goto Github PK

View Code? Open in Web Editor NEW

109.0 109.0 28.0 16 KB

A script to batch rename PDF files based on metadata/XMP title and author

Python 100.00%

pdf-title-rename's People

Contributors

Stargazers

Watchers

pdf-title-rename's Issues

Some files won't be processed

For some PDFs I receive the following error:

Traceback (most recent call last):
File "pdf-title-rename.py", line 242, in
sys.exit(RenamePDFsByTitle(args).main())
File "pdf-title-rename.py", line 63, in main
title, author = self._get_info(f)
File "pdf-title-rename.py", line 123, in _get_info
info = self._get_metadata(pdf)
File "pdf-title-rename.py", line 193, in _get_metadata
return doc.info[0]
IndexError: list index out of range

I can provide the PDFs if you are willing to look into it.

Best,

Patrick

UnicodeDecodeError: 'ascii' codec can't decode byte

Hi! Thanks for fix #1, works fine now! Alas, PDFs with Umlauts in them like ö or ä Output an error now (beforehand, the umlaut character was just bypassed):

Traceback (most recent call last):
  File "/usr/bin/pdf-title-rename.py", line 236, in <module>
    sys.exit(RenamePDFsByTitle(args).main())
  File "/usr/bin/pdf-title-rename.py", line 63, in main
    title, author = self._get_info(f)
  File "/usr/bin/pdf-title-rename.py", line 128, in _get_info
    title = ti.decode()
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 5: ordinal not in range(128)

I’ve changed the Metadata for the 314.pdf accordingly if you like to look into that (https://transfer.sh/RfnsE/314.pdf). Adding # -*- coding: utf-8 -*- at the beginning does not help…

TypeError in get_xmp_metadata() function

For some reason when running the script on some of my pdf files, I get the following error:

Traceback (most recent call last):
  File "./pdf-title-rename.py", line 247, in <module>
    sys.exit(RenamePDFsByTitle(args).main())
  File "./pdf-title-rename.py", line 63, in main
    title, author = self._get_info(f)
  File "./pdf-title-rename.py", line 144, in _get_info
    xmpt, xmpa = self._get_xmp_metadata()
  File "./pdf-title-rename.py", line 207, in _get_xmp_metadata
    t = md['dc']['title']['x-default']
TypeError: string indices must be integers

To fix the issue, I added this bit of code to the get_xmp_metadata() function:

except TypeError:
    t = md['dc']['title']

The end result would be:
Code begins on Line 198

    def _get_xmp_metadata(self):
        t = a = None
        metadata = resolve1(self.doc.catalog['Metadata']).get_data()
        try:
            md = xmp_to_dict(metadata)
        except:
            return t, a
        try:
            t = md['dc']['title']['x-default']
        except TypeError:
            t = md['dc']['title']
        except KeyError:
            pass

It probably has something to do with the 'x-default' element not existing for whatever reason but I was happy enough with the quick and easy fix.

Suggestion for alternative output style

Thank you for developing and sharing this tool.

I have a suggestion to make it more useful to academics.
I do not need the last authors first name.
Instead, I need the year of publication for citations in academic papers.

I now manually renamed my pdfs in the style of

firstAurhtorsLastNameYeartitleInCamelCase.pdf

with the the whitespaes, dashes, and hyphens removed.
I only really need the first 5 to 7 words of the title.

I find that the file stem in this format makes a useful
bibtex citekey. This greatly eases finding the right cite key from the
pulldown of citekeys that pops up in Overleaf or vim or Sublime Text 3 when
you start to type \cite{.

The matching of the bibtex cite key and the pdf file stem makes it easier to find the pdf on my harddrive.

AttributeError: 'PDFObjRef' object has no attribute 'strip'

Hi! Thanks for making this script, it works nice most of the time! Sometimes though I get a AttributeError: 'PDFObjRef' object has no attribute 'strip' Error (most of the time when PDFs have not Title Metadata) even though my PDF has Metadata (and no speacial Characters like: which also cause this error to happen). Anything that can be done here to prevent those errors? Thanks!

Could not find metadata in the file

Processing "s/@arXiv1904.09146 .pdf":
-- Could not find metadata in the file
Processing "s/@arXiv1904.09709 .pdf":
-- Could not find metadata in the file
Processing "s/@arXiv1904.11272 .pdf":
-- Could not find metadata in the file
Processing "s/@arXiv1905.03333 .pdf":
-- Could not find metadata in the file
Processing "s/@arXiv1905.11736 .pdf":
-- Could not find metadata in the file
Processing "s/@arXiv1906.00335 .pdf":
-- Could not find metadata in the file
Processing "s/@arXiv1906.06919 .pdf":
-- Could not find metadata in the file
Processed 29 files:

Renamed: 0
Missing metadata: 29
Errors: 0

Allow script to take folderpath as argument

It would be nice if one could just point to a folderpath which contains mutliple .pdf files which are supposed to be renamed. This is especially useful when you download a .zip file with lots of .pdf files (i.e. most journals allow you to download the specific paper you were looking for + a bunch of other papers which are related to this paper). With the current script I have to type in all the filenames which can be quite annoying when you got a folder which contains something like: foo123_456.pdf, bar_789.pdf, etc., etc.

Update README.md to make 'simply pass in a list or glob of PDF filenames' statement more clear

When I first wanted to use the script I struggled with understanding the sentence 'simply pass in a list or glob of PDF filenames' in the README.md file. I thought I had to pass in something like [foo.pdf, bar.pdf] or ['foo.pdf', 'bar.pdf']. I am not sure if I could/should have known that (maybe I am the only one who got that wrong, I never used a parser script before). If not, it would be nice to make this more clear in the README.md file.

jdmonaco / pdf-title-rename Goto Github PK

pdf-title-rename's People

Contributors

Stargazers

Watchers

Forkers

pdf-title-rename's Issues

Some files won't be processed

UnicodeDecodeError: 'ascii' codec can't decode byte

TypeError in get_xmp_metadata() function

Suggestion for alternative output style

AttributeError: 'PDFObjRef' object has no attribute 'strip'

Could not find metadata in the file

Allow script to take folderpath as argument

Update README.md to make 'simply pass in a list or glob of PDF filenames' statement more clear

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent