texworld / betterbib Goto Github PK

:green_book: Command-line tools for bibliographies.

betterbib's Introduction

Bibliography files are notoriously hard to work with. Betterbib contains a number of easy-to-use command-line tools to help.

betterbib convert converts between different bibliography formats, e.g.
- BibTeX
- BibLaTeX
- RIS
- CSL-JSON
betterbib sync syncs bibliography data with a number of online sources, e.g.,
- Crossref
- DBLP
- PubMed
- arXiv
- Zenodo
betterbib format formats your bibliography files to your liking. Can also (un)abbreviate author and journal names.
betterbib doi-to converts a DOI or DOI URL to a bibliography entry.

Installation

Install betterbib from PyPI with

pip install betterbib

See here for licensing information.

Convert

Sync

Simply run

betterbib sync in.bib

to sync your bibliography file with online sources. For example, the input

@article{wiles,
title={Fermat Last Theorem},
doi={10.2307/2118559},
}

is converted to

@article{wiles,
  number = {3},
  doi = {10.2307/2118559},
  pages = {443},
  source = {Crossref},
  volume = {141},
  author = {Wiles, Andrew},
  year = {1995},
  month = may,
  url = {https://doi.org/10.2307/2118559},
  issn = {0003-486X},
  journal = {The Annals of Mathematics},
  publisher = {JSTOR},
  title = {Modular Elliptic Curves and Fermat's Last Theorem},
}

See -h/--help for all options.

betterbib sync -h

Usage: betterbib sync [-h] [-i] [-c N] [-m MINIMUM_SCORE] [-q] [--debug] infiles [infiles ...]

Positional Arguments:
  infiles               input bibliography files

Options:
  -h, --help            show this help message and exit
  -i, --in-place        modify infile in place
  -c, --num-concurrent-requests N
                        number of concurrent HTTPS requests (default: 5)
  -m, --minimum-score MINIMUM_SCORE
                        minimum score to count as a match (default: 0.0)
  -q, --quiet           don't show progress info (default: show)
  --debug               some debug output (default: false)

Format

After that, you can for example run

betterbib format in.bib --sort-fields --align-values --journal-names short --abbrev-first-names

to get

@article{wiles,
  author    = {Wiles, A.},
  doi       = {10.2307/2118559},
  issn      = {0003-486X},
  journal   = {Ann. Math.},
  month     = may,
  number    = {3},
  pages     = {443},
  publisher = {JSTOR},
  source    = {Crossref},
  title     = {Modular Elliptic Curves and Fermat's Last Theorem},
  url       = {https://doi.org/10.2307/2118559},
  volume    = {141},
  year      = {1995},
}

betterbib format -h

Usage: betterbib format [-h] [-i] [--drop DROP] [--journal-names {long,short,unchanged}] [--abbrev-first-names]
                        [--sort-entries] [--sort-fields] [--doi-url-type {unchanged,old,new,short}]
                        [--page-range-separator PAGE_RANGE_SEPARATOR] [--protect-title-capitalization]
                        [--indent [INDENT]] [--align-values]
                        infiles [infiles ...]

Positional Arguments:
  infiles               input BibTeX files

Options:
  -h, --help            show this help message and exit
  -i, --in-place        modify infile in place
  --drop DROP           drop fields from entries (can be passed multiple times)
  --journal-names {long,short,unchanged}
                        force full or abbreviated journal names (default: unchanged)
  --abbrev-first-names  abbreviate first names in author lists etc. (default: false)
  --sort-entries        sort entries alphabetically by BibTeX key (default: false)
  --sort-fields         sort fields alphabetically (default: false)
  --doi-url-type {unchanged,old,new,short}
                        DOI URL (new: https://doi.org/<DOI>, short: https://doi.org/abcde) (default: new)
  --page-range-separator PAGE_RANGE_SEPARATOR
                        page range separator (int or string, default: unchanged)
  --protect-title-capitalization
                        brace-protect names in titles (e.g., {Newton}; default: false)
  --indent [INDENT]     indentation (int or string; default: 1)
  --align-values        align field values (default: false)

Dereference DOIs

Given a DOI or a DOI URL, it's often useful to generate a bibliography entry for it. betterbib doi-to does just that.

betterbib doi-to ris 10.1002/andp.19053221004

TY  - JOUR
IS  - 10
DO  - 10.1002/andp.19053221004
SP  - 891
EP  - 921
DS  - Crossref
VL  - 322
AU  - Einstein, A.
DA  - 1905/01
UR  - https://doi.org/10.1002/andp.19053221004
SN  - 0003-3804
SN  - 1521-3889
JF  - Annalen der Physik
JO  - Ann. Phys.
PB  - Wiley
TI  - Zur Elektrodynamik bewegter Körper
ER  -

betterbib's People

Contributors

Stargazers

Watchers

betterbib's Issues

escape for special characters gets removed

The escape for & in the publisher field gets removed by betterbib.

Example:

betterbib bib.in bib.bib

bib.in

@book{scholl2008handbook,
 Author = {Sch{\"o}ll, Eckehard and Schuster, Heinz Georg},
 Doi = {10.1002/9783527622313},
 Month = {10},
 Publisher = {Wiley-VCH Verlag GmbH \& Co. KGaA},
 Source = {Crossref},
 Title = {Handbook of Chaos Control},
 Url = {https://doi.org/10.1002/9783527622313},
 Year = {2007},
}

bib.bib

%comment{This file was created with betterbib v3.1.3.}

@book{scholl2008handbook,
 Author = {Sch{\"o}ll, Eckehard and Schuster, Heinz Georg},
 Doi = {10.1002/9783527622313},
 Month = {10},
 Publisher = {Wiley-VCH Verlag GmbH & Co. KGaA},
 Source = {Crossref},
 Title = {Handbook of Chaos Control},
 Url = {https://doi.org/10.1002/9783527622313},
 Year = {2007},
 isbn = {9783527622313, 9783527406050},
}

assert r.ok AssertionError

The following entry

@InBook{BIMEnergie,
  chapter = {18},
  pages = {293--303},
  title = {BIM f\"{u}r die Energiebedarfsermittlung und Geb\"{a}udesimulation},
  publisher = {VDI Buch, Springer Verlag},
  doi = {10.1007/978-3-658-05606-3_18},
  year = {2015},
  author = {van Treeck, Christoph and Wimmer, Reinhard and Maile, Tobias}
}

gives me the following error:

Reading from: refin.bib
Saving to: refout.bib
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\mthorade\Documents\Python\WinPython-32bit-3.5.1.2\python-3.5.1\Scripts\betterbib", line 100, in <module>
    _main()
  File "C:\Users\mthorade\Documents\Python\WinPython-32bit-3.5.1.2\python-3.5.1\Scripts\betterbib", line 52, in _main
    result = source.find_unique(entry)
  File "C:\Users\mthorade\Documents\Python\WinPython-32bit-3.5.1.2\python-3.5.1\lib\site-packages\betterbib\crossref.py", line 160, in find_unique
    assert r.ok
AssertionError

Note: I was already getting the error before the most recent fixes, so it seems to be unrelated to those.

Betterbib has a lots of problems with not acsii characters. In science is really common to find this characters, such as ° or Å, in titles or abstracts. When betterbibfind this kind of characters, simply give a generic error, there is not any information about the position or whats the error.

Traceback (most recent call last):
  File ".local/bin/betterbib-sync", line 5, in <module>
    pkg_resources.run_script('betterbib==3.2.0', 'betterbib-sync')
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 540, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 1455, in run_script
    execfile(script_filename, namespace, namespace)
  File ".local/lib/python2.7/site-packages/betterbib-3.2.0-py2.7.egg/EGG-INFO/scripts/betterbib-sync", line 130, in <module>
    _main()
  File ".local/lib/python2.7/site-packages/betterbib-3.2.0-py2.7.egg/EGG-INFO/scripts/betterbib-sync", line 44, in _main
    betterbib.write(od, args.outfile, 'braces', tab_indent=False)
  File ".local/lib/python2.7/site-packages/betterbib-3.2.0-py2.7.egg/betterbib/tools.py", line 324, in write
    dictionary=dictionary,
  File ".local/lib/python2.7/site-packages/betterbib-3.2.0-py2.7.egg/betterbib/tools.py", line 193, in pybtex_to_bibtex_string
    value = codecs.encode(value, 'ulatex')
  File ".local/lib/python2.7/site-packages/latexcodec/codec.py", line 800, in encode
    encoder.encode(unicode_, final=True),
  File ".local/lib/python2.7/site-packages/latexcodec/lexer.py", line 483, in encode
    raise ValueError(e)
ValueError: 'latex' codec can't encode character u'\u2009' in position 0: don't know how to translate u'\u2009' into latex

Also is common to find names or surnames with special characters, like ñ. I solve it changing the import sys for:

import sys
reload(sys)
sys.setdefaultencoding('utf8')`

in betterbib-sync

Manual Installation?

I noticed that you've removed the manual installation instructions from README.md. Trying them anyway, I got the following error (previous output suppressed):

Searching for pipdate
Reading https://pypi.python.org/simple/pipdate/
No local packages or working download links found for pipdate
error: Could not find suitable distribution for Requirement.parse('pipdate')

To me, this error says that the installation process is attempting to do something with pip, which I don't use (and thus don't have). That is, after all, why I'm trying to do a manual installation in the first place. Are you dropping manual installation support altogether, or is there something else here that I should be paying attention to?

add more sources

ADS http://adswww.harvard.edu/ads_abstracts.html
DBLP http://dblp.uni-trier.de/
IEEExplore http://ieeexplore.ieee.org/
CrossRef https://search.crossref.org/

add DBLP search

API described here:
http://dblp.uni-trier.de/faq/How+to+use+the+dblp+search+API.html

Add tool for journal name abbreviation

BibTeX databases are often inconsistent in that some entries contain full journal names, some other abbreviated ones. This could be made consistent with a command-line tool.

Once Crossref's journal data reports the short name,

curl https://api.crossref.org/journals/0895-4798

this could be implemented.

Proposal: expose some argument option to "betterbib" command

Currently, the "betterbib -h" command merely returns useful information, only the following

Usage: betterbib in.bib out.bib
-h     show this help list

So, to have some customization, we need the command combination, as you introduced.

betterbib-sync in.bib | betterbib-journal-abbrev | betterbib-format -b - out.bib

IMHO, it is better to show some arguments in the "betterbib", such as

-v, --version         display version information
-s {crossref,dblp}, --source {crossref,dblp}
                        data source (default: crossref)
-l, --long-journal-name
                        prefer long journal names (default: false)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 121: ordinal not in range(128)

Hi I am using your application for journal abbreviation. I've come across this error that I can't seem to fix for this particular citation.

100%|####################################################################################| 1/1 [00:05<00:00,  5.09s/it]

Total number of entries: 1
Found: 1
Traceback (most recent call last):
  File "c:\users\user\anaconda3\envs\py27_32bit\lib\runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "c:\users\user\anaconda3\envs\py27_32bit\lib\runpy.py", line 72, in _run_code
    exec code in run_globals
  File "C:\Users\user\Anaconda3\envs\py27_32bit\Scripts\betterbib.exe\__main__.py", line 9, in <module>
  File "c:\users\user\anaconda3\envs\py27_32bit\lib\site-packages\betterbib\cli\full.py", line 35, in main
    tools.write(d, args.outfile, args.delimeter_type, tab_indent=args.tab_indent)
  File "c:\users\user\anaconda3\envs\py27_32bit\lib\site-packages\betterbib\tools.py", line 374, in write
    file_handle.write("\n\n".join(segments) + "\n")
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 121: ordinal not in range(128)

The BibTex citation in question:

@article{meyfroidt2009machine,
	title={Machine learning techniques to examine large patient databases},
	author={Meyfroidt, Geert and Guiza, Fabian and Ramon, Jan and Bruynooghe, Maurice},
	journal={Best Practice \& Research Clinical Anaesthesiology},
	volume={23},
	number={1},
	pages={127--143},
	year={2009},
	publisher={Elsevier}
}

Would you be so kind to help me with this? I find it weird this error because for example this citation works perfectly:

@article{jiang2017artificial,
  title={Artificial intelligence in healthcare: past, present and future},
  author={Jiang, Fei and Jiang, Yong and Zhi, Hui and Dong, Yi and Li, Hao and Ma, Sufeng and Wang, Yilong and Dong, Qiang and Shen, Haipeng and Wang, Yongjun},
  journal={Stroke and vascular neurology},
  volume={2},
  number={4},
  pages={230--243},
  year={2017},
  publisher={BMJ Specialist Journals}
}

I am using python 2.7.15 virtual env with anaconda3.

KeyError: u'04' with month="04" in bibtex file

Hi!

Thanks for writing this tool, it is very appealing. I just got the following error, using the command

betterbib test.bib out-test.bib --sources {mref,crossref

with a fresh install with pip install betterbib.

The bibtex file has a single entry

@article{lockhart2014significance,
author = "Lockhart, Richard and Taylor, Jonathan and Tibshirani, Ryan J. and Tibshirani, Robert",
doi = "10.1214/13-AOS1175",
fjournal = "The Annals of Statistics",
journal = "Ann. Statist.",
number = "2",
month = "04",
pages = "413--468",
publisher = "The Institute of Mathematical Statistics",
title = "A significance test for the lasso",
url = "http://dx.doi.org/10.1214/13-AOS1175",
volume = "42",
year = "2014"
}

The traceback is the following:

Traceback (most recent call last):
  File "/usr/local/bin/betterbib", line 128, in <module>
    _main()
  File "/usr/local/bin/betterbib", line 79, in _main
    a = pybtex_to_bibtex_string(entry, bib_id)
  File "/usr/local/lib/python2.7/dist-packages/betterbib/bibtex.py", line 65, in pybtex_to_bibtex_string
    content.append('%s = %s' % (field, _index_to_month[value]))
KeyError: u'04'

If I remove the month line

month = "04",

from the bibtex file, then it works. I hope this is enough to reproduce the bug, let me know if you need more information.

ValueError: 'latex' codec can't encode character '\u2009' in position 0: don't know how to translate '\u2009' into latex

Use BibTeX data from Crossref for DOI:10.1364/OPTICA.2.000832

$ echo "@article{Takesue_2015,
	doi = {10.1364/optica.2.000832},
	url = {https://doi.org/10.1364%2Foptica.2.000832},
	year = 2015,
	month = {sep},
	publisher = {The Optical Society},
	volume = {2},
	number = {10},
	pages = {832},
	author = {Hiroki Takesue and Shellee D. Dyer and Martin J. Stevens and Varun Verma and Richard P. Mirin and Sae Woo Nam},
	title = {Quantum teleportation over 100{\hspace{0.167em}}{\hspace{0.167em}}km of fiber using highly efficient superconducting nanowire single-photon detectors},
	journal = {Optica}
}" | betterbib - out.bib

100%|████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 74.35it/s]

Total number of entries: 1
Found: 1
Traceback (most recent call last):
  File "/Users/leofang/miniconda3/envs/qutip-env/lib/python3.6/site-packages/latexcodec/codec.py", line 715, in _get_latex_bytes_tokens_from_char
    return self.table.latex_map[c]
KeyError: '\u2009'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/leofang/miniconda3/envs/qutip-env/lib/python3.6/site-packages/latexcodec/lexer.py", line 477, in encode
    self.get_latex_bytes(unicode_, final=final))
  File "/Users/leofang/miniconda3/envs/qutip-env/lib/python3.6/site-packages/latexcodec/codec.py", line 751, in get_latex_bytes
    bytes_, tokens = self._get_latex_bytes_tokens_from_char(c)
  File "/Users/leofang/miniconda3/envs/qutip-env/lib/python3.6/site-packages/latexcodec/codec.py", line 724, in _get_latex_bytes_tokens_from_char
    .format(repr(c)))
UnicodeEncodeError: 'latex' codec can't encode character '\u2009' in position 0: don't know how to translate '\u2009' into latex

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/leofang/miniconda3/envs/qutip-env/bin/betterbib", line 10, in <module>
    sys.exit(main())
  File "/Users/leofang/miniconda3/envs/qutip-env/lib/python3.6/site-packages/betterbib/cli/full.py", line 35, in main
    tools.write(d, args.outfile, args.delimeter_type, tab_indent=args.tab_indent)
  File "/Users/leofang/miniconda3/envs/qutip-env/lib/python3.6/site-packages/betterbib/tools.py", line 372, in write
    for bib_id, d in od.items()
  File "/Users/leofang/miniconda3/envs/qutip-env/lib/python3.6/site-packages/betterbib/tools.py", line 372, in <listcomp>
    for bib_id, d in od.items()
  File "/Users/leofang/miniconda3/envs/qutip-env/lib/python3.6/site-packages/betterbib/tools.py", line 222, in pybtex_to_bibtex_string
    value = codecs.encode(value, "ulatex")
  File "/Users/leofang/miniconda3/envs/qutip-env/lib/python3.6/site-packages/latexcodec/codec.py", line 827, in encode
    encoder.encode(unicode_, final=True),
  File "/Users/leofang/miniconda3/envs/qutip-env/lib/python3.6/site-packages/latexcodec/lexer.py", line 481, in encode
    raise ValueError(e)
ValueError: 'latex' codec can't encode character '\u2009' in position 0: don't know how to translate '\u2009' into latex

It seems the author field caused the problem (one can try removing other fields), but I'm not sure what's wrong. It doesn't seem to contain any unicode character.

betterbib fails when trying to match paper (without DOI?)

Hi,

I am trying to use betterbib to polish the bibliography of my PhD thesis, in particular to have nicely abbreviated journal names.
However, currently I'm running into issues that it fails on the following bibtex entry:

@article{vossoughi_compressibility_1980,
	title = {Compressibility of the myocardial tissue},
	volume = {1980},
	journal = {Adv Bioeng},
	author = {Vossoughi, Jafar and Vaishnav, Ramesh N. and Patel, Dali J.},
	year = {1980},
	keywords = {\_tablet},
	pages = {45--48}
}

It's a relatively old paper lacking a DOI.
Is it possible that is the issue?

Thanks a lot in advance for your help,
Fabian

PS: Here we go with the error messages:

Reading from: xyz.bib
Saving to: xyz_fixed.bib

5%|███▊ | 21/446 [00:48<16:12, 2.29s/it]Traceback (most recent call last):
File "C:\bin\Anaconda3\lib\site-packages\pybtex\database_init_.py", line 356, in getitem
return super(FieldDict, self).getitem(key)
File "C:\bin\Anaconda3\lib\site-packages\pybtex\utils.py", line 154, in getitem
return self._dict[key.lower()]
KeyError: 'url'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\bin\Anaconda3\Scripts\betterbib", line 125, in
_main()
File "C:\bin\Anaconda3\Scripts\betterbib", line 63, in main
d = betterbib.bibtex.sanitize_doi_url(result if result else entry)
File "C:\bin\Anaconda3\lib\site-packages\betterbib\bibtex.py", line 190, in sanitize_doi_url
m = re.match('https?://(?:dx\.)?doi\.org/(.*)', entry.fields['url'])
File "C:\bin\Anaconda3\lib\site-packages\pybtex\database_init.py", line 364, in getitem
raise KeyError(key)
KeyError: 'url'

remove entries with betterbib-format

Some programs add useless meta-data to automatically exported bibtex files, eg. a local file-path of the pdf.

betterbib-format --drop file in.bib out.bib

should drop the "file" fields from all entries.

betterbib crashes on some items

Hi i'm quite impressed how the betterbib works. But i have a trouble with dozen of entries, e.g.

@article{Arendt:2003,
    author  = {Detlev Arendt},
    journal = {Int. J. Dev. Biol},
    year    = {2003},
    volume  = {47},
    pages   = {563--571},
    langid  = {english},
    title   = {Evolution of eyes and photoreceptor cell types},
}

the betterbib produces such output:

▶ betterbib arendt.bib arendt-b.bib                                                                                                        
Reading from: arendt.bib
Saving to: arendt-b.bib
  0%|                                                                                                                 | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/elk/.local/bin/betterbib", line 97, in <module>
    _main()
  File "/home/elk/.local/bin/betterbib", line 49, in _main
    result = source.find_unique(entry)
  File "/home/elk/.local/lib/python3.5/site-packages/betterbib/crossref.py", line 188, in find_unique
    if result['title'][0].lower() in d['title'].lower():
IndexError: list index out of range

i am noticed that with the journal string removed it passed
what is wrong with this bibitem?
Thanks in advance

Skipping entries

Dear all,

is it possible to skip entries that already contain a doi field?

Thanks in advance!

An option to change characters into UTF8

It seems that betterbib will automatically change something like {\'e} into é. It's more safe for me to use {\'e}. Is there any option to disable changing characters like {\'e} into its UTF8 corresponding é?

No-change entries are changed

Take the following entry (which is typical of an entry created and maintained by BibDesk) as a sample:

@book{Johnson:1969,
	Address = {Cambridge},
	Author = {Johnson, Marshall D.},
	Date-Added = {2017-10-25 01:01:41 +0000},
	Date-Modified = {2017-10-31 00:56:37 +0000},
	Publisher = {Cambridge University Press},
	Read = {1},
	Title = {The Purpose of Biblical Genealogies: With Special Reference to the Setting of the Genealogies of Jesus},
	Year = {1969}}

This resource has no entry on CrossRef (so far as I can find) and thus there is nothing for betterbib to correct when it encounters this entry. However, when run through betterbib, it comes out as follows:

@book{Johnson:1969,
 Author = {Johnson, Marshall D.},
 Address = {Cambridge},
 Date-Added = {2017-10-25 01:01:41 +0000},
 Date-Modified = {2017-10-31 00:56:37 +0000},
 Publisher = {Cambridge University Press},
 Read = {1},
 Title = {The Purpose of Biblical Genealogies: With Special Reference to the Setting of the Genealogies of Jesus},
 Year = {1969},
}

Several things have been changed in this entry, though none are related to the content of the fields: Author field is now first, leading tabs are replaced by spaces, a comma and new line have been added before the closing brace. As a result of these changes, this entry reads as changed when the input and output file are compared using diff (or similar program). Since every entry in my database is consistently formatted in the first style, this makes it extremely difficult to check the work that betterbib has done and decide whether it's made good corrections.

I'd like, therefore, for one of two options to be implemented:

Entries which are not found in the source (CrossRef in the above case) are reproduced in the outfile exactly as they were in the infile
Add the option of a null source (or a "format only" run). I.e. nothing is actually looked up, but betterbib reformats each entry so that the resulting outfile can be used as the infile for a second run where an entry which is not found in the chosen source won't have any formatting changes that would cause diff to mark it as changed.

Option 1 would be nicer for my workflow, but I suspect that option 2 may be easier to implement under the hood (as it won't require a new method for writing entries to the outfile).

short DOIs

Right now, betterbib optionally sanitizes DOI URLs to use the new form https://doi.org/<DOI>. Another option would be to generate a short DOI and use that, see http://shortdoi.org/. One distinct advantage of it is that it doesn't contain symbols that are problematic for LaTeX to process.

Catch errors with broken author field

I have a large bibtex file, where several author fileds are not valid, some examples are:

author = {{Test, Author}}

author = {unknown}

author = {testfile.pdf}

betterbib does have some issues with these author fields (which is probably wanted!). However, for other broken entries (e. g. missing bibtex key) I get a very precise error message also indicating which line of my bibtex file is borken.

I think it would be of great value if such a precise error message would be raised for these kinds of errors.

Traceback of the error is:

Traceback (most recent call last):
  File "/home/.../bin/betterbib", line 10, in <module>
    sys.exit(main())
  File "/home/.../lib/python3.7/site-packages/betterbib/cli/full.py", line 32, in main
    tools.write(d, args.outfile, args.delimeter_type, tab_indent=args.tab_indent)
  File "/home/.../lib/python3.7/site-packages/betterbib/tools.py", line 375, in write
    for bib_id, d in od.items()
  File "/home/.../lib/python3.7/site-packages/betterbib/tools.py", line 375, in <listcomp>
    for bib_id, d in od.items()
  File "/home/.../lib/python3.7/site-packages/betterbib/tools.py", line 194, in pybtex_to_bibtex_string
    persons_str = " and ".join([_get_person_str(p) for p in persons])
  File "/home/.../lib/python3.7/site-packages/betterbib/tools.py", line 194, in <listcomp>
    persons_str = " and ".join([_get_person_str(p) for p in persons])
  File "/home/.../lib/python3.7/site-packages/betterbib/tools.py", line 289, in _get_person_str
    _join_abbreviated_names(p.first_names + p.middle_names),
  File "/home/.../lib/python3.7/site-packages/betterbib/tools.py", line 269, in _join_abbreviated_names
    out = lst[0]
IndexError: list index out of range

pip show betterbib
Name: betterbib
Version: 3.5.1
Summary: Better BibTeX data
Home-page: https://github.com/nschloe/betterbib
Author: Nico Schlömer
Author-email: [email protected]
License: License :: OSI Approved :: MIT License
Location: /home/.../lib/python3.7/site-packages
Requires: requests, tqdm, pybtex, appdirs, requests-cache, pyenchant, latexcodec
Required-by:

skipping malformed references?

Is it possible to skip malformed references inside a bibtex file and continue with the next one instead of aborting the crossref query?

Thanks in advance!

Stephan

error formatting scipy citation

Betterbib will format

@online{scipy,
 author = {Jones, Eric and Oliphant, Travis and Peterson, Pearu and others},
 title = {{{SciPy}:} {Open} source scientific tools for {Python}},
 url = {http://www.scipy.org/},
 year = {2001},
 urldate = {2018-04-29},
}

@inbook{scipy,
 author = {Hill, Christian},
 title = {{{SciPy}:} {Open} source scientific tools for {Python}},
 url = {https://doi.org/10.1017/cbo9781139871754.008},
 year = {2001},
 urldate = {2018-04-29},
 doi = {10.1017/cbo9781139871754.008},
 pages = {333-401},
 source = {Crossref},
 booktitle = {Learning Scientific Programming with Python},
 publisher = {Cambridge University Press},
 chapter = {SciPy},
 isbn = {9781139871754},
}

Seems to be because of the title. But the first one is the correct citation (https://www.scipy.org/citing.html).

Remove BibTeX string inconsistencies

When using quote delimiters:

title = "Binary prefix codes ending in a "1"",

should be

title = "Binary prefix codes ending in a {"1"}",

Dropped Editor field

Take the following entry as an example:

@incollection{Edwin_2010,
 Editor = {Bergmann, Michael and Murray, Michael J and Rea, Michael C},
 author = {Edwin, Curley},
 doi = {10.1093/acprof:oso/9780199576739.003.0006},
 url = {https://doi.org/10.1093%2Facprof%3Aoso%2F9780199576739.003.0006},
 year = {2010},
 month = nov,
 publisher = {Oxford University Press},
 pages = {58--78},
 title = {The God of Abraham, Isaac, and {Jacob}},
 booktitle = {Divine Evil?},
}

The entry is taken from CrossRef, except that I've manually added the Editor field because that information is missing from the CrossRef entry. When run through betterbib I get back the following:

@incollection{Edwin_2010,
 author = {Edwin, Curley},
 doi = {10.1093/acprof:oso/9780199576739.003.0006},
 url = {http://dx.doi.org/10.1093/acprof:oso/9780199576739.003.0006},
 year = {2010},
 month = nov,
 publisher = {Oxford University Press},
 pages = {58-78},
 title = {The God of Abraham, Isaac, and {Jacob}},
 booktitle = {Divine Evil?},
 source = {Crossref},
 isbn = {9780199576739},
}

The Editor field has been dropped. That's consistent with the CrossRef entry, but also inappropriate for this kind of resource (an edited book with chapters by multiple authors) in the citation styles that I use. That editor field needs to be retained even if CrossRef is missing it for some reason.

Truncation of URL's with ASCII encoding

Hi, and thanks for a great project!

When including URL's with URL ASCII encoding (the percent encoding), betterbib truncates everything following the percentage sign. I assume this is a consequence of removing Latex comments in a bib-file. A small example:

@misc{wolframalphai1,
 author = {Wolfram|Alpha},
 year = {2019},
 url = {https://www.wolframalpha.com/input/?i=integrate+from+0+to+2pi+(cos(x)+e%5E(i+*+(m+-+n)+*+x))},
 note = {Online; accessed 19-February-2019}
}

Running betterbib on this bib-file yields the truncated output:

@misc{wolframalphai1,
 author = {Wolfram|Alpha},
 year = {2019},
 url = {https://www.wolframalpha.com/input/?i=integrate+from+0+to+2pi+(cos(x)+e},
 note = {Online; accessed 19-February-2019},
}

Escaping the percentage sign, i.e., % to \% gives the same output from betterbib, but this yields the wrong URL in the Latex document as the slash gets translated to the URL code %5C when building the pdf.

check out https://tex.stackexchange.com/questions/6848/automatically-dereference-doi-to-bib

ValueError: 'latex' codec can't encode character '\u2010' in position 0: don't know how to translate '‐' into latex

missing dependency: pybtex

Running pip install betterbib in WinPython 3.4 64bit failed with error pybtex not found.
After running pip install pybtex I was able to install betterbib using pip install betterbib, so for me pybtex was the only missing dependency.

Failure on a reference

This seems that this downloads some non-ASCII and crashes:

@article{benzi-olshanskii-lagrangian,
author = {Michele  Benzi and Maxim A.  Olshanskii},
title = {An Augmented {L}agrangian-Based Approach to the {O}seen Problem},
journal = {SIAM J. Sci. Comput.},
volume = {28},
number = {6},
pages = {2095-2113},
year = {2006},
doi = {10.1137/050646421},
URL = {http://dx.doi.org/10.1137/050646421},
eprint = {http://dx.doi.org/10.1137/050646421}
}

feature request: quiet mode for version controlled bib files

My .bib file is under version control. It would be nice to have a quiet mode that does not print the error messages (or at least not to the file, maybe to stdout or some logfile instead). Then I could use betterbib to enhance just those entries that are indexed by Crossref. Also, in quiet mode do not add the source = {CrossRef}, field.
Just a suggestion, does not have high priority. And thanks again for your script!

Index error on a particular bibentry

With

@Article{     acharya.a:on,
  year      = {1999},
  issn      = {0374-3535},
  journal   = {Journal of Elasticity},
  volume    = {56},
  number    = {2},
  doi       = {10.1023/A:1007653400249},
  title     = {On compatibility conditions for the left {C}auchy--{G}reen
          deformation field in three dimensions},
  publisher = {Kluwer Academic Publishers},
  keywords  = {compatibility; left Cauchy-Green deformation; three
          dimensions or (3-D)},
  author    = {Acharya, A.},
  pages     = {95--105},
  language  = {English}
}

and master I get

Traceback (most recent call last):
  File "/home/jan/.local/bin/betterbib", line 99, in <module>
    _main()
  File "/home/jan/.local/bin/betterbib", line 48, in _main
    result = source.find_unique(entry)
  File "/home/jan/.local/lib/python2.7/site-packages/betterbib/crossref.py", line 182, in find_unique
    return self._crossref_to_pybtex(result)
  File "/home/jan/.local/lib/python2.7/site-packages/betterbib/crossref.py", line 277, in _crossref_to_pybtex
    title = data['title'][0]
IndexError: list index out of range

@Strings are not preserved

An input file like

@String{j-MONTHLY-WEATHER-REVIEW = "Monthly Weather Review"}
@Article{foo,
  journal = j-MONTHLY-WEATHER-REVIEW,
}

produces the output

@Article{foo,
  journal = "Monthly Weather Review"
}

i.e., the @String variable is replaced by a literal. The replacement happens when parsing the input file via PybTeX.

It's unclear how to consistently treat this in betterbib. One can imagine the situation where betterbib suggests different solutions for two different entries which had the same @String reference at input.

One way could perhaps be to offer that all journal titles which appear more than once are replaced by a string variable after the Crossref pass.

ValueError: Unknown type 'misc'

The following entry:

@misc{laine2010energy,
  title = {Energy and Thermal Performance Management Through Utilisation of Building Information Models},
  author = {Laine, Tuomas and Oy, Olof Granlund and Karola, Antti},
  year = {2010}
}

gives me the error:

Reading from: refin.bib
Saving to: refout.bib
  0%|                                                                                            | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\mthorade\Documents\Python\WinPython-32bit-3.5.1.2\python-3.5.1\Scripts\betterbib", line 100, in <module>
    _main()
  File "C:\Users\mthorade\Documents\Python\WinPython-32bit-3.5.1.2\python-3.5.1\Scripts\betterbib", line 52, in _main
    result = source.find_unique(entry)
  File "C:\Users\mthorade\Documents\Python\WinPython-32bit-3.5.1.2\python-3.5.1\lib\site-packages\betterbib\crossref.py", line 176, in find_unique
    return self._crossref_to_pybtex(results[0])
  File "C:\Users\mthorade\Documents\Python\WinPython-32bit-3.5.1.2\python-3.5.1\lib\site-packages\betterbib\crossref.py", line 347, in _crossref_to_pybtex
    raise ValueError('Unknown type \'%s\'' % bibtex_type)
ValueError: Unknown type 'misc'

Where unknown type misc is misleading, as other misc entries work just fine (besides not being found on creossref).

Do-not-change option for list of fields

Take the following entry as a sample:

@article{Byskov:1972,
	Author = {Byskov, Martha},
	Date-Added = {2017-10-24 00:21:40 +0000},
	Date-Modified = {2017-11-02 02:27:24 +0000},
	Journal = {Studio Theologica},
	Pages = {25--32},
	Read = {1},
	Title = {Verus Deus --- verus homo: Luc 3.23--38},
	Volume = {26},
	Year = {1972},
	Bdsk-File-1 = {YnBsaXN0MDDUAQIDBAUGJCVYJHZlcnNpb25YJG9iamVjdHNZJGFyY2hpdmVyVCR0b3ASAAGGoKgHCBMUFRYaIVUkbnVsbNMJCgsMDxJXTlMua2V5c1pOUy5vYmplY3RzViRjbGFzc6INDoACgAOiEBGABIAFgAdccmVsYXRpdmVQYXRoWWFsaWFzRGF0YV8QTS4uLy4uLy4uLy4uL0RvY3VtZW50cy9SZWZlcmVuY2VzL0J5c2tvdi9CeXNrb3YxOTcyX1ZlcnVzIERldXMgLS0tIHZlcnVzLWEucGRm0hcLGBlXTlMuZGF0YU8RAfoAAAAAAfoAAgAADE1hY2ludG9zaCBIRAAAAAAAAAAAAAAAAAAAAMzdHpxIKwAAAm6BJh9CeXNrb3YxOTcyX1ZlcnVzIERlIzI2RTgxMTEucGRmAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACboER1hP/yQAAAAAAAAAAAAQABAAACSAAAAAAAAAAAAAAAAAAAAAGQnlza292ABAACAAAzN1k7AAAABEACAAA1hQ4CQAAAAEAFAJugSYABn0oAAZsuQAGPcgAAhAqAAIAV01hY2ludG9zaCBIRDpVc2VyczoAUlBTOgBEb2N1bWVudHM6AFJlZmVyZW5jZXM6AEJ5c2tvdjoAQnlza292MTk3Ml9WZXJ1cyBEZSMyNkU4MTExLnBkZgAADgBMACUAQgB5AHMAawBvAHYAMQA5ADcAMgBfAFYAZQByAHUAcwAgAEQAZQB1AHMAIAAtAC0ALQAgAHYAZQByAHUAcwAtAGEALgBwAGQAZgAPABoADABNAGEAYwBpAG4AdABvAHMAaAAgAEgARAASAEtVc2Vycy9SUFMvRG9jdW1lbnRzL1JlZmVyZW5jZXMvQnlza292L0J5c2tvdjE5NzJfVmVydXMgRGV1cyAtLS0gdmVydXMtYS5wZGYAABMAAS8AABUAAgAK//8AAIAG0hscHR5aJGNsYXNzbmFtZVgkY2xhc3Nlc11OU011dGFibGVEYXRhox0fIFZOU0RhdGFYTlNPYmplY3TSGxwiI1xOU0RpY3Rpb25hcnmiIiBfEA9OU0tleWVkQXJjaGl2ZXLRJidUcm9vdIABAAgAEQAaACMALQAyADcAQABGAE0AVQBgAGcAagBsAG4AcQBzAHUAdwCEAI4A3gDjAOsC6QLrAvAC+wMEAxIDFgMdAyYDKwM4AzsDTQNQA1UAAAAAAAACAQAAAAAAAAAoAAAAAAAAAAAAAAAAAAADVw==}}

It contains 4 special fields which are related to BibDesk (the program I use to manage my bib file):

Date-Added: the date the entry was added to my database
Date-Modified: the date I last edited the entry
Read: this shows up as a check mark in BibDesk indicating that I have read the article
Bdsk-File: a file link that BibDesk has created to a pdf of the article on my computer

When I run this entry through betterbib I get the following out:

@article{Byskov:1972,
 author = {Byskov, Martha},
 doi = {10.1080/00393387208599926},
 issn = {0039-338X, 1502-7791},
 journal = {Studia Theologica - Nordic Journal of Theology},
 month = jan,
 number = {1},
 pages = {25-32},
 publisher = {Informa UK Limited},
 source = {Crossref},
 title = {Verus Deus ‐ veras homo Luc 3.23–381},
 url = {https://doi.org/10.1080/00393387208599926},
 volume = {26},
 year = {1972},
}

I'm quite happy with the added and corrected information, but the BibDesk specific fields have been lost (making the revised entry less useful, particularly because of the loss of the file link). I would like an option which allows me to list certain fields which should be transferred from the original entry (if they occur) to the revised one without change. In my case these would be fields which will not occur in the look-up source, but the list of fields should probably generalize to a "do-not-change" list in the case of fields which are found in the information retrieved from the source.

Further, #33 could be also be handled by a sort of inverse of this mechanism (i.e. an option in which betterbib is told which fields it is allowed to change, rather than which it is not allowed to change).

more Python 3 Unicode

The following entry

@Article{Cigler2012,
  author  = {Cigler, Ji{\v{r}}{\'{i}} and Pr{\'{i}}vara, Samuel and V{\'{a}\v{n}}a, Zden{\v{e}}k and {\v{Z}}{\'{a}}{\v{c}}ekov{\'{a}}, Eva and Ferkl, Luk{\'{a}}{\v{s}}},
  title   = {Optimization of predicted mean vote index within model predictive control framework: Computationally tractable solution},
  journal = {Energy and Buildings},
  year    = {2012},
  volume  = {52},
  pages   = {39--49},
  doi     = {10.1016/j.enbuild.2012.05.022},
  issn    = {03787788}
}

gives the following error:

Reading from: refin2.bib
Saving to: refout.bib
|----------| 0/1   0% [elapsed: 00:00 left: ?, ? iters/sec]Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Users\mthorade\Documents\Python\WinPython-64bit-3.4.2.3\python-3.4.2.amd64\lib\threading.py", line 921, in _bootstrap_inner
    self.run()
  File "C:\Users\mthorade\Documents\Python\WinPython-64bit-3.4.2.3\python-3.4.2.amd64\lib\threading.py", line 869, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\mthorade\Documents\Python\WinPython-64bit-3.4.2.3\python-3.4.2.amd64\lib\subprocess.py", line 1170, in _readerthread
    buffer.append(fh.read())
  File "C:\Users\mthorade\Documents\Python\WinPython-64bit-3.4.2.3\python-3.4.2.amd64\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 176: character maps to <undefined>

Traceback (most recent call last):
  File "C:\Users\mthorade\Documents\Python\WinPython-64bit-3.4.2.3\python-3.4.2.amd64\Scripts\betterbib", line 98, in <module>
    _main()
  File "C:\Users\mthorade\Documents\Python\WinPython-64bit-3.4.2.3\python-3.4.2.amd64\Scripts\betterbib", line 48, in _main
    result = source.find_unique(entry)
  File "C:\Users\mthorade\Documents\Python\WinPython-64bit-3.4.2.3\python-3.4.2.amd64\lib\site-packages\betterbib\crossref.py", line 148, in find_unique
    payload = latex_to_unicode(' '.join(l)).replace(' ', '+')
  File "C:\Users\mthorade\Documents\Python\WinPython-64bit-3.4.2.3\python-3.4.2.amd64\lib\site-packages\betterbib\bibtex.py", line 15, in latex_to_unicode
    universal_newlines=True
  File "C:\Users\mthorade\Documents\Python\WinPython-64bit-3.4.2.3\python-3.4.2.amd64\lib\subprocess.py", line 609, in check_output
    output, unused_err = process.communicate(inputdata, timeout=timeout)
  File "C:\Users\mthorade\Documents\Python\WinPython-64bit-3.4.2.3\python-3.4.2.amd64\lib\subprocess.py", line 959, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "C:\Users\mthorade\Documents\Python\WinPython-64bit-3.4.2.3\python-3.4.2.amd64\lib\subprocess.py", line 1234, in _communicate
    stdout = stdout[0]
IndexError: list index out of range

Works in WinPython 2.7.9 on Windows, fails in WinPython 3.4 on Windows.

Abbreviate journal titles using general rules

Rather than maintain a lookup table of journal names and their standard abbreviation, which obviously must be ever changing, why not instead use the formal List of Title Word Abbreviations (LTWA). Then, we look up each word of the journal name and abbreviate according to the standard for that word.

This seems to me like the smarter, more generalised approach to implementing journal name shortening.

Sorting order reversed

I noticed that in the current version, the sorting order is reversed (last entry becomes first entry after betterbib processing).

Stephan

Missing dependency: tqdm

Hi,
betterbib seems to depend on package tqdm, but tqdm is not listed as a dependency in setup.py. (This dependency is not mentioned on the web-site either).

Thus, on Ubuntu:
sudo pip install betterbib
installs betterbib only, and betterbib fails to run

ImportError: No module named tqdm

Or course an easy fix is to do first:
sudo pip install tqdm

Cheers
NC

KeyError: 'page'

The following refin.bib file with just one single entry fails:

@InProceedings{KraeuchiEtAl2015:1,
  author = {Kr\"auchi, Philipp and Schluck, Thomas and Sulzer, Matthias},
  title = {Modelling of low temperature heating networks with IDA-ICE},
  booktitle = {Proceedings of International Conference CISBAT},
  doi = {10.5075/epfl-cisbat2015-827-832},
  pages = {827--832},
  address = {Lausanne, Switzerland},
  month = sep,
  year = {2015}
}

with the following error:

Reading from: refin.bib
Saving to: refout.bib

|----------| 0/1   0% [elapsed: 00:00 left: ?, ? iters/sec]Traceback (most recent call last):
  File "C:\Users\mthorade\Documents\Python\WinPython-32bit-2.7.9.2\python-2.7.9\Scripts\betterbib", line 99, in <module>
    _main()
  File "C:\Users\mthorade\Documents\Python\WinPython-32bit-2.7.9.2\python-2.7.9\Scripts\betterbib", line 48, in _main
    result = source.find_unique(entry)
  File "C:\Users\mthorade\Documents\Python\WinPython-32bit-2.7.9.2\python-2.7.9\lib\site-packages\betterbib\crossref.py", line 189, in find_unique
    if result['page'] == d['pages']:
KeyError: 'page'

If I delete the line pages = {827--832}, the error disappears (but there is still no result, as this conference paper is not indexed by crossref).
Tested with WinPython 2.7.9 32bit on Windows 10 64bit.

Assertion error on particular bibentry

With

@incollection{Bonito2011305,
  title =        "Viscoelastic Flows with Complex Free Surfaces: Numerical Analysis and Simulation",
  editor =       "R. Glowinski and J. Xu",
  booktitle =    "Numerical Methods for Non-Newtonian Fluids",
  publisher =    "Elsevier",
  year =         2011,
  volume =       16,
  pages =        "305 - 369",
  series =       "Handbook of Numerical Analysis ",
  issn =         "1570-8659",
  doi =          "10.1016/B978-0-444-53047-9.00003-4",
  url =          "http://www.sciencedirect.com/science/article/pii/B9780444530479000034",
  author =       "Andrea Bonito and Philippe Clément and Marco Picasso"
}

and master I get

Traceback (most recent call last):
  File "/home/jan/.local/bin/betterbib", line 99, in <module>
    _main()
  File "/home/jan/.local/bin/betterbib", line 48, in _main
    result = source.find_unique(entry)
  File "/home/jan/.local/lib/python2.7/site-packages/betterbib/crossref.py", line 176, in find_unique
    return self._crossref_to_pybtex(results[0])
  File "/home/jan/.local/lib/python2.7/site-packages/betterbib/crossref.py", line 237, in _crossref_to_pybtex
    bibtex_type = self._crossref_to_bibtex_type(data)
  File "/home/jan/.local/lib/python2.7/site-packages/betterbib/crossref.py", line 42, in _crossref_to_bibtex_type
    assert r.ok
AssertionError

biblatex flavor of betterbib

It would be nice to have a biblatex flavor of betterbib.

Dropped Author field

Occasionally the author field is being dropped from the source to the original. Take the following entry as sample:

@article{Lincoln_2013,
 year = 2013,
 publisher = {Johns Hopkins University Press},
 volume = {132},
 number = {3},
 pages = {639--658},
 author = {Andrew T. Lincoln},
 title = {Luke and Jesus' Conception: A Case of Double Paternity?},
 journal = {Journal of Biblical Literature}
}

This entry is taken from CrossRef (first result when searching on the title), but with the doi and url fields removed. When run through betterbib I get:

@article{Lincoln_2013,
 year = {2013},
 publisher = {JSTOR},
 volume = {132},
 number = {3},
 pages = {639},
 title = {{Luke} and Jesus' Conception: {A} Case of Double Paternity?},
 journal = {Journal of Biblical Literature},
 doi = {10.2307/23487891},
 source = {Crossref},
 url = {http://dx.doi.org/10.2307/23487891},
 issn = {0021-9231},
}

As you can see, the author field has been lost. It also appears that betterbib is grabbing the JSTOR entry rather than original publisher's version. Maybe that has something to do with it?

Check out http://www.ams.org/mref

Check out http://www.ams.org/mref as alternative to MathSciNet.

The search interface is quite poor, but if fed with a search result from Zentralblatt http://zbmath.org/, it does return the same result as MathSciNet. Both Zentralblatt and mref are free services.

line 59, in _main: can't concat bytes to str

When I try to run the script, I get the following error:

Reading from: references.bib
Saving to: refout.bib
|----------| 0/368   0% [elapsed: 00:00 left: ?, ? iters/sec]Traceback (most recent call last):
  File "C:\Users\mthorade\Documents\Python\WinPython-64bit-3.4.2.3\python-3.4.2.amd64\Scripts\betterbib", line 99, in <module>
    _main()
  File "C:\Users\mthorade\Documents\Python\WinPython-64bit-3.4.2.3\python-3.4.2.amd64\Scripts\betterbib", line 59, in _main
    '\n\n'
TypeError: can't concat bytes to str

betterbib-journal-abbrev : problem with equation in title

When using betterbib-journal-abbrev, if I convert this:

bib_test.bib:

@article{B,
  title = {Aaa ${\text{Pt/Co/AlO}}_{x}$ aaa bbb},
  author = {Etal, etal},
  journal = {Phys. Rev. B},
  volume = {1},
  issue = {1},
  pages = {1},
  numpages = {1},
  year = {2018},
  month = {May},
  publisher = {American Physical Society},
  doi = {},
  url = {}
}

using betterbib-journal-abbrev bib_test.bib, I get

@article{B,
 author = {Etal, etal},
 title = {Aaa ${\text {{Pt/Co/AlO}}\_{x}$} aaa bbb},
 journal = {Phys. Rev. B},
 volume = {1},
 issue = {1},
 pages = {1},
 numpages = {1},
 year = {2018},
 month = may,
 publisher = {American Physical Society},
 doi = {},
 url = {},
}

with the wrong equation formatting: ${\text {{Pt/Co/AlO}}\_{x}$} , with an extra } that LaTeX complains about

KeyError - What does it mean?

Here's the traceback:

$ betterbib Padraic.bib test.bib

  2%|▉                                         | 21/933 [00:05<04:13,  3.60it/s]
Traceback (most recent call last):
  File "/opt/local/bin/betterbib", line 214, in <module>
    _main()
  File "/opt/local/bin/betterbib", line 41, in _main
    _update_from_source(od, source, args.num_concurrent_requests)
  File "/opt/local/bin/betterbib", line 96, in _update_from_source
    data = future.result()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/betterbib/crossref.py", line 165, in find_unique
    for tp in _bibtex_to_crossref_type(entry.type)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/betterbib/crossref.py", line 36, in _bibtex_to_crossref_type
    return _bibtex_to_crossref_map[bibtex_type]
KeyError: 'webpage'

betterbib Python API

I was thinking that it may be worth considering a unified betterbib API, such that betterbib functionality could be added programmatically to other projects (rather than as a separate CLI tool). For example, something along the lines of:

import betterbib

# Programmatic equivalent of the betterbib tool.
bibtex = betterbib.sync(infile, **kwargs)
bibtex = betterbib.journal_abbrev(bibtex, **kwargs)
bibtex = betterbib.format(bibtex, **kwargs)

make the comments in bibtex format

'@comment' instead of '#'
http://www.bibtex.org/Format/de/

arXiv as data source

If a bibtex entry has a arxiv.org url, betterbib should fill in author, title, year, archivePrefix, eprint and primaryClass.
If it does have a valid journal field, pubstatus should be set to submitted; else the type should be changed from article to misc.

Version 3.0.3 crashes with TypeError

I get this traceback:

Traceback (most recent call last):
  File "./bin/betterbib", line 214, in <module>
    _main()
  File "./bin/betterbib", line 45, in _main
    _write(od, args.outfile, args.delimeter_type)
  File "./bin/betterbib", line 128, in _write
    dictionary=dictionary
TypeError: pybtex_to_bibtex_string() got an unexpected keyword argument 'bracket_delimeters'

I am using

$ betterbib -v
betterbib 3.0.3, Python 3.6.2 | packaged by conda-forge | (default, Jul 23
2017, 22:59:30) [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)]

Version 2.5 works for my bibfile.

betterbib doesn't work on Windows

I'm using python3 from https://www.python.org on Windows 7. I've installed betterbib with pip3 install -U betterbib. Then, when I do betterbib -v I get:

Traceback (most recent call last):
  File "C:\Python36-32\Scripts\betterbib.py", line 14, in <module>
    import betterbib
  File "C:\Python36-32\Scripts\betterbib.py", line 15, in <module>
    from betterbib import pybtex_to_bibtex_string
ImportError: cannot import name 'pybtex_to_bibtex_string'

I cannot use betterbib with python3 on Windows. It also doesn't work with python2.

--
Cesar