Giter Club home page Giter Club logo

pdfrename's Introduction

pdfrename - say goodbye to ed1d47.pdf!

A simple python script to rename research PDF files based on their content.

Leverages pdfminer to extract text and GPT to generate the filename.

Before

Before

After running pdfrename

After

FAQ

Why not use the Title from the PDF metadata? Because it's often missing: in my personal collection of research papers, only 44% of the PDFs have a Title metadata field.

Isn't this expensive? In my personal collection of research papers, renaming each PDF uses ~2.1K tokens on average. At gpt-3.5-turbo-0125 current cost of $0.0005 / 1K tokens, that means renaming each PDF costs ~$0.001 (1 tenth of a penny). I think it's worth it!

Why does it add -PR.pdf to the end of filenames? Since the cost is non-zero to rename files, pdfrename needs to keep track of files which have already been renamed to avoid renaming again. I wanted something simpler than having to store a db, using filesystem attributes, or storing additional metadata files. I settled on using this suffix (-PR for (P)DF (R)ename) as a marker for renamed files.

Usage

Set your OpenAI API key in pdfrename.py:

openai.api_key = "YOUROPENAIKEY"

Then run:

pip install -r requirements.txt
python pdfrename.py filetorename.pdf

To run recursively on a directory:

find whichdirectory -name "*.pdf" | parallel -j 10 python pdfrename.py 

Monitoring a folder on MacOS

In the folderaction.workflow directory there is a MacOS Automator workflow that can be used to monitor a folder for new pdfs and automatically rename them.

Change PATHTOCODE in folderaction.workflow/document.wflow to point to the location of the pdfrename.py script:

<key>ActionParameters</key>
				<dict>
					<key>COMMAND_STRING</key>
					<string>for f in "$@"
do
	if [[ -d "$f" ]]; then
		continue
	fi
	python PATHTOCODE/pdfrename.py "$f"
done</string>

To install, copy the folderaction.workflow to ~/Library/Workflows/Applications/Folder Actions and then right click on a folder in Finder and select Services -> Folder Actions Setup... and then select the folderaction.workflow from the list of available workflows.

License

Copyright (c) 2024 Salle, Alexandre [email protected]. All work in this package is distributed under the MIT License.

pdfrename's People

Contributors

alexandres avatar

Stargazers

 avatar  avatar  avatar Daniel Turner-Lloveras avatar  avatar Dino avatar  avatar  avatar  avatar  avatar Leon Yao avatar  avatar patrick avatar  avatar Yura Vashkiv avatar 阿修羅 avatar  avatar Denis Tumakov avatar JN avatar  avatar NiktoX2 avatar Konstantin avatar ismailov avatar  avatar I BIBI avatar hirak0 avatar  avatar Jeff Martson avatar Nikolay avatar Ivan Pupkin avatar  avatar Shatalov Vadim avatar  avatar Maksim Kuprienko avatar Daniyar Auyezkhan  avatar Vitaliy avatar  avatar Donni avatar Valentin Ivanov avatar Dmitry Sh avatar  avatar Ivan Kostrubin avatar Vladislav Sorokin avatar oybek avatar IemandDood avatar Gleb avatar Tim Kersey avatar  avatar  avatar

Watchers

 avatar Shatalov Vadim avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.