Giter Club home page Giter Club logo

pdf-title-rename's People

Contributors

jdmonaco avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

pdf-title-rename's Issues

Some files won't be processed

For some PDFs I receive the following error:

Traceback (most recent call last):
File "pdf-title-rename.py", line 242, in
sys.exit(RenamePDFsByTitle(args).main())
File "pdf-title-rename.py", line 63, in main
title, author = self._get_info(f)
File "pdf-title-rename.py", line 123, in _get_info
info = self._get_metadata(pdf)
File "pdf-title-rename.py", line 193, in _get_metadata
return doc.info[0]
IndexError: list index out of range

I can provide the PDFs if you are willing to look into it.

Best,

Patrick

UnicodeDecodeError: 'ascii' codec can't decode byte

Hi! Thanks for fix #1, works fine now! Alas, PDFs with Umlauts in them like ö or ä Output an error now (beforehand, the umlaut character was just bypassed):

Traceback (most recent call last):
  File "/usr/bin/pdf-title-rename.py", line 236, in <module>
    sys.exit(RenamePDFsByTitle(args).main())
  File "/usr/bin/pdf-title-rename.py", line 63, in main
    title, author = self._get_info(f)
  File "/usr/bin/pdf-title-rename.py", line 128, in _get_info
    title = ti.decode()
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 5: ordinal not in range(128)

I’ve changed the Metadata for the 314.pdf accordingly if you like to look into that (https://transfer.sh/RfnsE/314.pdf). Adding # -*- coding: utf-8 -*- at the beginning does not help…

TypeError in get_xmp_metadata() function

For some reason when running the script on some of my pdf files, I get the following error:

Traceback (most recent call last):
  File "./pdf-title-rename.py", line 247, in <module>
    sys.exit(RenamePDFsByTitle(args).main())
  File "./pdf-title-rename.py", line 63, in main
    title, author = self._get_info(f)
  File "./pdf-title-rename.py", line 144, in _get_info
    xmpt, xmpa = self._get_xmp_metadata()
  File "./pdf-title-rename.py", line 207, in _get_xmp_metadata
    t = md['dc']['title']['x-default']
TypeError: string indices must be integers

To fix the issue, I added this bit of code to the get_xmp_metadata() function:

except TypeError:
    t = md['dc']['title']

The end result would be:
Code begins on Line 198

    def _get_xmp_metadata(self):
        t = a = None
        metadata = resolve1(self.doc.catalog['Metadata']).get_data()
        try:
            md = xmp_to_dict(metadata)
        except:
            return t, a
        try:
            t = md['dc']['title']['x-default']
        except TypeError:
            t = md['dc']['title']
        except KeyError:
            pass

It probably has something to do with the 'x-default' element not existing for whatever reason but I was happy enough with the quick and easy fix.

Suggestion for alternative output style

Thank you for developing and sharing this tool.

I have a suggestion to make it more useful to academics.
I do not need the last authors first name.
Instead, I need the year of publication for citations in academic papers.

I now manually renamed my pdfs in the style of

firstAurhtorsLastNameYeartitleInCamelCase.pdf

with the the whitespaes, dashes, and hyphens removed.
I only really need the first 5 to 7 words of the title.

I find that the file stem in this format makes a useful
bibtex citekey. This greatly eases finding the right cite key from the
pulldown of citekeys that pops up in Overleaf or vim or Sublime Text 3 when
you start to type \cite{.

The matching of the bibtex cite key and the pdf file stem makes it easier to find the pdf on my harddrive.

AttributeError: 'PDFObjRef' object has no attribute 'strip'

Hi! Thanks for making this script, it works nice most of the time! Sometimes though I get a AttributeError: 'PDFObjRef' object has no attribute 'strip' Error (most of the time when PDFs have not Title Metadata) even though my PDF has Metadata (and no speacial Characters like: which also cause this error to happen). Anything that can be done here to prevent those errors? Thanks!

Could not find metadata in the file

Processing "s/@arXiv1904.09146 .pdf":
-- Could not find metadata in the file
Processing "s/@arXiv1904.09709 .pdf":
-- Could not find metadata in the file
Processing "s/@arXiv1904.11272 .pdf":
-- Could not find metadata in the file
Processing "s/@arXiv1905.03333 .pdf":
-- Could not find metadata in the file
Processing "s/@arXiv1905.11736 .pdf":
-- Could not find metadata in the file
Processing "s/@arXiv1906.00335 .pdf":
-- Could not find metadata in the file
Processing "s/@arXiv1906.06919 .pdf":
-- Could not find metadata in the file
Processed 29 files:

  • Renamed: 0
  • Missing metadata: 29
  • Errors: 0

Allow script to take folderpath as argument

It would be nice if one could just point to a folderpath which contains mutliple .pdf files which are supposed to be renamed. This is especially useful when you download a .zip file with lots of .pdf files (i.e. most journals allow you to download the specific paper you were looking for + a bunch of other papers which are related to this paper). With the current script I have to type in all the filenames which can be quite annoying when you got a folder which contains something like: foo123_456.pdf, bar_789.pdf, etc., etc.

Update README.md to make 'simply pass in a list or glob of PDF filenames' statement more clear

When I first wanted to use the script I struggled with understanding the sentence 'simply pass in a list or glob of PDF filenames' in the README.md file. I thought I had to pass in something like [foo.pdf, bar.pdf] or ['foo.pdf', 'bar.pdf']. I am not sure if I could/should have known that (maybe I am the only one who got that wrong, I never used a parser script before). If not, it would be nice to make this more clear in the README.md file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.