jrsmith3 / refmanage Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 0.0 460 KB

Manage a BibTeX database

Home Page: https://github.com/jrsmith3/refmanage

License: MIT License

Python 94.42% TeX 5.58%

refmanage's People

Contributors

Stargazers

Watchers

refmanage's Issues

Refactor "--test" flag to `ref test` argparse subcommand

This issue is a specific directive to refactor the "test" functionality, based on issue #29. Instead of calling

$ ref -test *.bib

$ ref -t *.bib

The command should look like

$ ref test *.bib

Indent verbose output for "test" functionality

The verbose output should be indented by two spaces to visually offset it from the terse output.

Example

$ ref -tv bookshelf.bib
bookshelf.bib
  syntax error: entry key expected
  228 @Book{,
       ^^^

Pretty-print list of .bib files that couldn't be merged.

Right now the list of .bib files that raised exceptions during the default combine operation are just dumped to the screen via python's print command. Each file should be printed one per line.

Add "--parseables" flag to "test" cli functionality to print only filenames of parseable files

The default, as defined in issue #19, is to print only the unparseable filenames. If a -v or --verbose flag is passed on the command-line for the "test" functionality, the program prints lists of both the parseable and unparseable files. There should be a third option, indicated by a -p or --parseables flag that prints only the list of files that were parseable.

Test `refmanage.fs_utils.import_bib_files`

Print relative paths from "test" functionality

Currently, paths printed from the "test" functionality are always fully-qualified. When refmanage -t is invoked, the paths printed should be relative to pwd unless they are not.

Example

Assume the following directory structure in ~/Desktop:

Desktop
├── bib
│   ├── one.bib
│   ├── subbib
│   │   └── three.bib
│   └── two.bib
└── otherbib
    ├── five.bib
    └── four.bib

All (sub)directories contain .bib files with valid BibTeX. The user is currently in ~/Desktop/bib.

$ pwd
/home/username/Desktop/bib

$ ref -t *.bib
one.bib
two.bib

$ ref -t subbib/*.bib
subbib/three.bib

$ ref -t ../otherbib/*.bib
/home/username/Desktop/otherbib/five.bib
/home/username/Desktop/otherbib/six.bib

This issue is split from #35.

The above example assumes that issue #33 has not been closed.

Publish minimal documentation

This project needs minimal documentation that is hosted someplace on the web. It can be hosted at jrsmith3.github.io/refmanage but a better choice is probably readthedocs.org or pythonhosted.org.

The documentation should include the following:

Scope of project
Installation (including prereqs)
Examples
License
Contributing
API reference

Rename `cat_db` to `merge`

After further consideration of Issue #16, it makes the most sense to name this method merge.

Calling signature for CLI merge

refmanage merge [option] [...]

Consider switching backend to bibtexparser

bibtexparser

Rename `fs_utils` to just `utils`

The fs_utils name appears in several places (modules, filenames, etc.). I think the "fs" is superfluous.

Merge: flags for sources: Delete duplicates.

Rule for sources: No specified source indicates all .bib files.

Merge: flags for target: append

Switch to setuptools instead of distutils

I should probably switch to setuptools instead of distutils for setup.py.

Split `refmanage.cli_args_dispatcher` functionality from `refmanage.main`

Currently, refmanage.main constructs the argparse structure, parses the command-line arguments, and dispatches the appropriate functionality based on the arguments. The dispatching functionality should be split off from refmanage.main in order to better separate concerns and make for easier testing.

With a separate refmanage.cli_args_dispatcher method, I can create argparse objects in my tests which would be created from command line arguments in normal operation. In this way testing will be much easier.

I might also need to write refmanage.cli_args_dispatcher to accept an unparsed argparse object. In that way I can test that the command-line arguments were called properly.

Add functionality to test BibTeX file(s)

At the command line, the user can pass a filename/filenames of a file/files containing bibTeX entries and a flag. refmanage will test the file(s) and return a status indicating if the bibTeX is parseable or unparseable.

Call signature

refmanage [-t] [-v] [files]

Behavior

Testing is the default behavior of refmanage. If no file arguments are passed, refmanage defaults to *.bib and prints a short message about getting help via the -h flag. By default, refmanage will only return a list of files that were unparseable; if the -v or --verbose flags are passed, refmanage will return two lists: the list of parseable files and the list of unparseable files, in that order.

Fix pathname output from "test" functionality

Once #34 has been closed, the output returned from the "test" functionality should be fixed.

There's no need to output a message like "The following files are unparseable" since the user is explicitly asking for a list of parseable or unparseable files and both options are mutually exclusive.
The fully qualified pathnames are too much. Output should be relative to cwd except for the following conditions:
- The user passed a fully qualified pathname as an argument on the cli.
- The user passed a pathname who's origin is above cwd in the filesystem, e.g. ../bib/bibtex.

Examples

$ ref -t *.bib
unparseable_one.bib
unparseable_two.bib
unparseable_three.bib

$ ref -t bib/*.bib
bib/unparseable_one.bib
bib/unparseable_two.bib
bib/unparseable_three.bib

$ ref -t /home/username/bib/bookshelf.bib /home/username/bib/unparseable.bib
/home/username/bib/unparseable.bib

$ pwd
/home/username/Desktop
$ ref -t ../bibtex/unparseable.bib
/home/username/bibtex/unparseable.bib

Note

This issue is split from #32.

The above examples assume issue #33 has not been closed.

CLI "test" functionality does not handle default argument properly

When manually executing refmanage within the repo, I get some unexpected output.

Example code triggering bug

Commit 38fae89 was used with the following code to trigger the bug. Note that ~ are used to generalize the filesystem location.

$ pwd
~/refmanage/refmanage

$ ls
__init__.py  fs_utils.py  refmanage.py

$ python refmanage.py -t
The following files are unparseable:
    ~/refmanage/refmanage
    ~/refmanage/refmanage/fs_utils.pyc

Since ~/refmanage/refmanage contains no BibTeX files, I would expect no output when calling refmanage.py from the cli in the above case.

Rules for sources: file can't be merged

If a file can't be merged (because it's a duplicate, it's not parsing) add it to a list to be printed later.

Test functionality of `refmanage.utils.handle_files_args`

I need all combinations of the following:

user path (e.g. ~/path/to/files)
wildcards (e.g. *.bib)
relative path
path that doesn't exist

Set files metavar to "file(s)"

Additionally, rename the variable to paths_args.

Verbose test output not useful for certain .bib files

Description

There are some BibTeX files for which ref -tv does not return a useful output message. In fact, it returns nothing.

Example file

Consider the file 10.1371__journal.pone.0115069.bib:

@article{10.1371/journal.pone.0115069,
    author = {Knauff, , Markus AND Nejasmic, , Jelica},
    journal = {PLoS ONE},
    publisher = {Public Library of Science},
    title = {An Efficiency Comparison of Document Preparation Systems Used in Academic Research and Development},
    year = {2014},
    month = {12},
    volume = {9},
    url = {http://dx.doi.org/10.1371%2Fjournal.pone.0115069},
    pages = {e115069},
    abstract = {<p>The choice of an efficient document preparation system is an important decision for any academic researcher. To assist the research community, we report a software usability study in which 40 researchers across different disciplines prepared scholarly texts with either Microsoft Word or LaTeX. The probe texts included simple continuous text, text with tables and subheadings, and complex text with several mathematical equations. We show that LaTeX users were slower than Word users, wrote less text in the same amount of time, and produced more typesetting, orthographical, grammatical, and formatting errors. On most measures, expert LaTeX users performed even worse than novice Word users. LaTeX users, however, more often report enjoying using their respective software. We conclude that even experienced LaTeX users may suffer a loss in productivity when LaTeX is used, relative to other document preparation systems. Individuals, institutions, and journals should carefully consider the ramifications of this finding when choosing document preparation strategies, or requiring them of authors.</p>},
    number = {12},
    doi = {10.1371/journal.pone.0115069}
}

Erroneous output

$ ref -tv 10.1371__journal.pone.0115069.bib
/home/username/library/bib/10.1371__journal.pone.0115069.bib

Solution

The refmanage.fs_utils.gen_verbose_msg method needs to be refactored to handle this kind of problem with the file format. Fortunately, pybtex raises different types of exceptions and so the solution is to inspect the exception type and create a verbose string based on this information.

Example

Consider the example file above and invalid.bib from 9a78a37.

>>> import pathlib2 as pathlib
>>> import refmanage

>>> p = pathlib.Path("10.1371__journal.pone.0115069.bib")
>>> bib_p = refmanage.fs_utils.parse_bib_file(p)

>>> u = pathlib.Path("invalid.bib")
>>> bib_u = refmanage.fs_utils.parse_bib_file(u)

>>> bib_p.__class__
pybtex.exceptions.PybtexError

>>> bib_u.__class__
pybtex.scanner.TokenRequired

So for pybtex.exceptions.PybtexError, gen_verbose_msg should just return the message attribute of the exception. But for pybtex.scanner.TokenRequired,gen_verbose_msg should construct the string it is already constructing.

Package should be installable

The command-line application should be called ref.

Consider renaming `cat_db` to simply `cat`

Currently, the cat_db method is located in the namespace as follows: refmanage.db_utils.cat_db. Since it is below db_utils the additional "utils" may be redundant and could be dropped. Obviously a method named "cat" located in a "database utilities" submodule is going to concatenate databases.

Refactor class hierarchy modeling BibTeX file functionality

I noted (#41 (comment)) that instead of making a single BibFile class, refmanage should actually have two classes, one for files containing valid BibTeX, and the other for files containing invalid BibTeX. Both classes should be children of a common RefFile class which is not intended to be instantiated.

At the end of this refactor, three new classes (with tests) will exist:

RefFile (parent)
BibFile
NonbibFile

In addition, I will add a method in fs_utils.py which takes a pathlib.Path and outputs the appropriate RefFile child object, depending on the parseability of the file located at pathlib.Path.

Test command-line functionality

The command-line logic of this program should be tested. I don't know how to write those tests right now, but I imagine I will end up testing the API as well as the functionality. Testing the command-line functionality probably looks a lot like typical unit tests.

Dustin Collins has a writeup that may be useful.

Create `BibFile` class to combine several functionalities

There are a number of functionalities that should be combined into a class:

fs_utils.construct_bib_dict
fs_utils.parse_bib_file
fs_utils.gen_terse_msg
fs_utils.gen_verbose_msg
fs_utils.gen_bib_dict_test_msg

Rules for sources: how to deal with target

If the target is explicitly specified, throw a warning and ignore it as a source. If it isn't explicitly specified (I.e. It comes in via *.bib), quietly ignore it.

Test "test" command-line functionality

This issue pertains specifically to testing the "test" command-line functionality and is split from issue #21.

To test the "test" functionality, I should test:

Defaults.
Every combination of cli flags.

Use python's `pathlib` to deal with paths

Using the pathlib module seems much more sane than attempting to deal with paths via python's os.path methods.

For versions of python <3, pathlib2 is a backport of the functionality.

Merge: flags for target: add duplicates

"test" command-line arg should be optional

Required to close #19

Switch testing to py.test

Refactor "test" functionality with flags to show either parseable or unparseable files

The refmanage "test" functionality should have flags to show either a list of unparseable files or a list of parseable files. These flags should be mutually exclusive. For parseable files, the flag should be "-p" or "--parseable". For unparseable files the flag should be "-u" or "--unparseable". If neither flag is used, the default behavior should be to display a list of unparseable files.

This issue is split from issue #32.

Use argparse sub-commands for functionalities

The refmanage application will have many functionalities; I've listed the "test" functionality in issue #19, but there will be others. For example: replace all entries' BibTeX keys with the entry's DOI in a specified .bib file.

These different functionalities are very different and will require different sets of arguments. Therefore, it makes sense to use argparse sub-commands to implement the different functionalities.

Add "--version" flag that returns version information

This flag returns the version of the application. I should be able to pass the data from refmanage.__version__.

This option is mutually exclusive of all other options.

Rules for sources: If a path (either relative or absolute) is specified, all .bib files will be added to the list of sources.

Improve the "test" functionality

I'm not happy with the way the output is formatted and presented when refmanage is called with the "--test" flag. Here are a list of suggestions:

Fully-qualified pathname output

I don't like the fully-qualified pathname when relative paths are passed as arguments. The following is jarring to see:

$ ref -t *.bib
The following files are unparseable:
    /home/username/path/to/bibtex/files/one.bib
    /home/username/path/to/bibtex/files/two.bib
    /home/username/path/to/bibtex/files/three.bib

The result should look like

$ ref -t *.bib
one.bib
two.bib
three.bib

That said, I'm not sure how to deal with absolute pathnames passed as a cli argument.

Mutually exclusive (unparseable) flags

There should be two mutually exclusive flags that display either the parseable or unparseable files. The flag to display unparseable files should be the default.

$ ref -t -u *.bib # -u displays unparseable
unparseable1.bib
unparseable2.bib
unparseable3.bib

$ ref -t -p *.bib # -p displays parseable
parseable1.bib
parseable2.bib
parseable3.bib

More useful messages with unparseable files

There should be more useful output describing the problems with unparseable files. Probably the "--verbose" flag can handle this case.

Tag at commit fa60327

Create a new bugfix tag at commit fa60327: 0.1.1.

Merge: flags for target: overwrite

Categories of functionality

Merge

Combine several bibTeX files into a single file.
Option to delete source files after merge.

UID for bibTeX key

Change bibTeX key to a UID (e.g. DOI, ISBN, etc.).
Report on which entries do not have a UID as the bibTeX key.
Report on which entries are missing a UID field altogether.
Find missing PDFs.

PDF

Extract DOI from PDF.
Change PDF filename to DOI.
Watermark PDF with DOI on all pages.
Print PDF with watermark of date printed.
Get bibTeX file from DOI found in PDF via crossref, merge into existing database.

Merge: flags for sources: Delete successfully merged files.

Abstract version information out of `setup.py`

This information should be accessable like so:

import refmanage
print(refmanage.__version__)

Merge: flags for target: rename source keys to UID before merging

Fix documentation in README.md

The current documentation is way too much.

merge: flag for sources: ignore

The user should have the ability to specify source files to ignore. For example, the user wants to merge all .bib files except no_merge.bib.

Include command-line help documentation in package documentation

Currently the published documentation covers only the docstrings found in the library modules. The command line docs should also be included.

Cannot install version 0.1.1-0.0.1; dependencies don't seem to work

Attempting to install version 0.1.1-0.0.1 in a conda virtual environment throws a bunch of errors. I think there's a problem in the setup.py in terms of the dependencies required for this package.

Example

Create a conda virtual environment and attempt to install version 0.1.1-0.0.1 from github.

$ cd test
$ conda create -p ./env anaconda
$ source activate ./env/
$ pip install git+git://github.com/jrsmith3/[email protected]#egg=refmanage
Downloading/unpacking refmanage from git+git://github.com/jrsmith3/[email protected]
  Cloning git://github.com/jrsmith3/refmanage.git (to 0.1.1-0.0.1) to /private/var/folders/qs/jhzvfv7x41v8p_6y91p044ww0000gn/T/pip_build_jrsmith3/refmanage
  Running setup.py (path:/private/var/folders/qs/jhzvfv7x41v8p_6y91p044ww0000gn/T/pip_build_jrsmith3/refmanage/setup.py) egg_info for package refmanage
    Traceback (most recent call last):
      File "<string>", line 17, in <module>
      File "/private/var/folders/qs/jhzvfv7x41v8p_6y91p044ww0000gn/T/pip_build_jrsmith3/refmanage/setup.py", line 3, in <module>
        import refmanage
      File "refmanage/__init__.py", line 11, in <module>
        from fs_utils import *
      File "refmanage/fs_utils.py", line 4, in <module>
        import pathlib2 as pathlib
    ImportError: No module named pathlib2
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 17, in <module>

  File "/private/var/folders/qs/jhzvfv7x41v8p_6y91p044ww0000gn/T/pip_build_jrsmith3/refmanage/setup.py", line 3, in <module>

    import refmanage

  File "refmanage/__init__.py", line 11, in <module>

    from fs_utils import *

  File "refmanage/fs_utils.py", line 4, in <module>

    import pathlib2 as pathlib

ImportError: No module named pathlib2

----------------------------------------
Cleaning up...
Command python setup.py egg_info failed with error code 1 in /private/var/folders/qs/jhzvfv7x41v8p_6y91p044ww0000gn/T/pip_build_jrsmith3/refmanage
Storing debug log for failure in /Users/jrsmith3/.pip/pip.log

Change "--verbose" functionality of "test" functionality/Display more useful messages for unparseable files

Once issue #34 has been closed, the functionality of the "--verbose" flag should be refactored to output more useful information about the nature of a file's unparseability. Output information should include:

The error type.
The error message.
The literal offending line encountered by the BibTeX parser.
The line number of the unparseable line.

This information can be determined by storing and later inspecting the Exception object created when a parsing error occurs.

Note

This issue is split from #32.

The above example assumes that issue #33 has not been closed.

jrsmith3 / refmanage Goto Github PK

refmanage's People

Contributors

Stargazers

Watchers

refmanage's Issues

Example

Example

Call signature

Behavior

Examples

Note

Example code triggering bug

Description

Example file

Erroneous output

Solution

Example

Fully-qualified pathname output

Mutually exclusive (unparseable) flags

More useful messages with unparseable files

Merge

UID for bibTeX key

PDF

Example

Suggested solution

Note

Recommend Projects

Recommend Topics

Recommend Org