Giter Club home page Giter Club logo

citeswitcher's Introduction

Citeswitcher

For simple collaboration in academic writing. Cite papers in a markdown/plain text file as PMID, DOI, bib reference, or whole citation, and citeswitcher will find them, convert them all to the format of your choice, and make a .bib file for you.

TL;DR

python fixcitations.py test/test.md

You can cite papers in a plain text document like this [PMID: 24465370] or like this [https://doi.org/10.1371/journal.pone.0081229] or [DOI: 10.1371/journal.pone.0081229] or (if you share a .bib file) like this [@HallNetworkAnalysisReveals2014] or (if you really need to) like this [Hall, David P., Ian J. C. MacCormick, Alex T. Phythian-Adams, Nina M. Rzechorzek, David Hope-Jones, Sorrel Cosens, Stewart Jackson, et al. “Network Analysis Reveals Distinct Clinical Syndromes Underlying Acute Mountain Sickness.” PLoS ONE 9, no. 1 (January 22, 2014): e81229].

Citeswitcher will find these and format them properly as citations in the chosen output format.

Requirements

OS

  • works on all linux and mac systems tested so far
  • not tested on windows

Essential

For basic function of fixcitations.py:

  • python 3
  • pyparsing 2.4.7
  • requests-2.24.0
  • biopython
  • oyaml
  • bibtexparser is also required but it is included in this git repository (because I was concerned that it may not be well maintained)

To install these: pip install pyparsing pip install requests pip install biopython pip install oyaml

Optional additional requirements

Once you have the above installed, you're good to go. Additional dependencies for extra functions are listed at the end.

Installation

Download or clone the git repository into a folder on your hard disk. Then run the scripts using python from the command line. To make it easier to use, I use the unix alias command in my ~/.bash_profile to automatically map common commands, like this:

alias fix="/miniconda3/envs/thisenv/bin/python <full-path-to-git-repository>/citeswitcher/fixcitations.py"

Config

Edit the config.json file to include your own paths before running. Key elements:

  • email: required for using the Pubmed API - enter your own email address here.
  • filepaths: these need to be on you computer somewhere.

Use

fixcitations.py

Switches citation formats, finds missing citations online. Optionally, calls pandoc to create a fully referenced output file.

The default or specified .bib file will then be searched for the relevant citations. Citations will be replaced with a citation in either .md or .tex format.

Any citations not present in the master .bib file will be downloaded from pubmed and added to a .bib file

Two new files will be written:

  1. a new .md or .tex file with correct citations
  2. a local .bib file containing all cited reference details.

Optionally, additional files can be written:

  1. pandoc outputs in .docx, .pdf or .html formats

Input files

Required:

  • a .md or .tex document with references in square [] or curly {} or round brackets which will be searched for PMIDs, DOIs, bibtex refs, and whole citations

Optional:

  • bib file. A bibtex file to use. If not specified, a new cs.bib file will be created in the same directory as your input file. You can also specify a global bib file for keeping a shared bank of references.
  • yaml file. A yaml file called .yaml. If it already exists, it will be augmented to add csl and bib files if needed. YAML is taken according to the following hierarchy: in-file YAML, .yaml, other yaml files. All will be copied into the .yaml file.
  • csl file. You can specify a csl file using yaml.

Extra functions

  • replace.json file. A file called replace.json will be read as a dictionary of text strings to find:replace. If you just create this file and leave it in the same directory as the input file, it will be read. An alternative is to specify text to be replaced in a YAML dict format.

  • files to include.

The default (in config.json) or specified .bib file will be searched for the relevant citations. Citations will be replaced with a citaion in either .md or .tex or PMID format.

Any citations not found in the .bib files will be downloaded from pubmed and added to the supplementary.bib file

Output files

Two outputs are specified.

-o determines the CITATION FORMAT and is used for switching between different citation formats e.g. PMID==>DOI or PMID==>BIB

-p determines the PANDOC OUTPUT FORMAT and provides a quick workflow for generating fully formatted documents

If -p is specified, -o must be md. If both -p and -o are specified and -o is not md, this indicates that the user's expectation is different from what's going to happen so the script exits. Fix this by only specifying one of these options.

For pdf, docx and html outputs using pandoc

(Quarto does this for you, and is better, so I'd suggest using that instead.) You'll want to have pandoc properly installed with crossref so that you can automatically generate printable documents.

pandoc pandoc-crossref

This can be done with conda, or pip. pip install pandoc-crossref

Options

arg long arg action
-b --bibfile bibfile
-y --yaml specify a yaml file to use as a source for yaml; use "normal" or "fancy" to copy from sup-files/yaml
-o --outputstyle choices=md,markdown,tex,latex,pubmed,pmid,inline; output references format
-p --pandoc_outputs append as many pandoc formats as you want: pdf docx html txt md tex
-l --localbibonly use only local bib file
-d --outputsubdir outputdir - always a subdir of the working directory
-i --include include files
-r --replace follow find/replace instructions from yaml or json file
-m --messy disable clean up of intermediate files
-mf --move_figures move all figures to the end and create captions section for submission to journal
-pm --pandoc_mermaid use pandoc-mermaid-filter
-lt --latex_template a latex template to be passed to pandoc
-wt --word_template a word template to be passed to pandoc
-ptp --pathtopandoc specify a particular path to pandoc if desired') # e.g. a letter forma
-redact --redact redact between tags
-s --stripcomments stripcomments in html format
-v --verbose verbose
-x --xelatex use xelatex in pandoc build
-w --wholereference try to match whole references.
-ch --chaptermode use pandoc --top-level-division=chapter
-svg --convert_svg convert svg images to pdf - replaces any pdf files with the same name
-ui --uncomment_images include images commented out using html syntax
-flc --force_lowercase_citations force all citation references into lowercase

hackmd.py

Uses wget to download your published files from hackmd.io and runs fixcitations.py. This makes working collaboratively with hackmd very easy, but takes a bit of manual setup:

  1. Specify the filepath to a directory (hackmddir) in config.json. This should be an empty directory.
  2. Within your new hackmddir, create a folder with the exact name of your hackmd.io username or group name.
  3. Create a hackmd file and publish it. It's easier if you choose the url name for it, such as "mytestfile"
  4. Create yet another folder with the same name as your hackmd file. So your directory structure would be like this:
- my_hackmd_dir
|	+-- my_username
|		+-- mytestfile

  1. Run python hackmd.py

If this works, your file should download into the mytestfile directory, and you should see the following files:

- my_hackmd_dir
|	+-- my_username
|		+-- mytestfile
|			+-- mytestfile.md
|			+-- mytestfie.citemd.md
|			+-- mytestfie.citemd.pdf
|			+-- mytestfie.citemd.docx
|			+-- mytestfie.citemd.html

hackmd.py - download hackmd files either by combining elements from a directory tree or a single file, then run fixcitations.py

Additional utilities

The following additional scripts are included to provide some useful functions with varying levels of documentation:

  • include.py - include files recursively according to a variety of syntaxes
  • put_pmid_in_bibtex.py - search for PMIDs for every element in a .bib file. Designed to run as a cron job to keep your bib file up to date.
  • abbreviation_finder.py - print abbreviations in a document (so that you can explain them)
  • wordcount.py - function unknown at this time
  • svg2pdf.py - batch convert a directory (-d) or file (-f) using cairosvg, which seems to perform better than imagemagick.
  • format_author_list.py - this is a work in progress but it is intended to create author lists in a variety of formats from a spreadsheet in xlsx or csv format

Additional dependencies for optional extra functions

For including excel files in markdown

  • pandas 1.1
  • tabulate-0.8.7

pip install pandas pip install tabulate

For converting from svg to pdf

  • inkscape

To manage your own bibtex database

There are lots of bibtex database software tools available. The following work very well:

  • zotero
  • better bibtex (with automatic export feature)

Intended use:

  • Feed in a .md or .tex document
  • references in square [] or curly {} brackets will be searched for: -- PMID -- doi -- bibtext -- whole ciation

For making Gantt, flowcharts etc

  • mermaids.cli (https://github.com/mermaid-js/mermaid-cli) Mermaids is a bit of a hassle to install and needs to be in you PATH variable for this to work. But if it is in there, you can use the -pm option to call mermaids and automatically generate nice figures.

This installation procedure worked for me on a mac:

Install node:

https://www.npmjs.com/get-npm

Install pandiff:

https://github.com/davidar/pandiff

Install mermaid:

npm install mermaid.cli Test it: ./node_modules/.bin/mmdc -h Add mermaid to ~/.bash_profile: export MERMAID_BIN=/Users//node_modules/.bin/mmdc export PATH=$PATH:/Users//node_modules/.bin ~

citeswitcher's People

Contributors

kennethbaillie avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.