Giter Club home page Giter Club logo

refmanage's People

Contributors

jrsmith3 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

refmanage's Issues

Indent verbose output for "test" functionality

The verbose output should be indented by two spaces to visually offset it from the terse output.

Example

$ ref -tv bookshelf.bib
bookshelf.bib
  syntax error: entry key expected
  228 @Book{,
       ^^^

Print relative paths from "test" functionality

Currently, paths printed from the "test" functionality are always fully-qualified. When refmanage -t is invoked, the paths printed should be relative to pwd unless they are not.

Example

Assume the following directory structure in ~/Desktop:

Desktop
├── bib
│   ├── one.bib
│   ├── subbib
│   │   └── three.bib
│   └── two.bib
└── otherbib
    ├── five.bib
    └── four.bib

All (sub)directories contain .bib files with valid BibTeX. The user is currently in ~/Desktop/bib.

$ pwd
/home/username/Desktop/bib

$ ref -t *.bib
one.bib
two.bib

$ ref -t subbib/*.bib
subbib/three.bib

$ ref -t ../otherbib/*.bib
/home/username/Desktop/otherbib/five.bib
/home/username/Desktop/otherbib/six.bib

This issue is split from #35.

The above example assumes that issue #33 has not been closed.

Publish minimal documentation

This project needs minimal documentation that is hosted someplace on the web. It can be hosted at jrsmith3.github.io/refmanage but a better choice is probably readthedocs.org or pythonhosted.org.

The documentation should include the following:

  • Scope of project
  • Installation (including prereqs)
  • Examples
  • License
  • Contributing
  • API reference

Split `refmanage.cli_args_dispatcher` functionality from `refmanage.main`

Currently, refmanage.main constructs the argparse structure, parses the command-line arguments, and dispatches the appropriate functionality based on the arguments. The dispatching functionality should be split off from refmanage.main in order to better separate concerns and make for easier testing.

With a separate refmanage.cli_args_dispatcher method, I can create argparse objects in my tests which would be created from command line arguments in normal operation. In this way testing will be much easier.

I might also need to write refmanage.cli_args_dispatcher to accept an unparsed argparse object. In that way I can test that the command-line arguments were called properly.

Add functionality to test BibTeX file(s)

At the command line, the user can pass a filename/filenames of a file/files containing bibTeX entries and a flag. refmanage will test the file(s) and return a status indicating if the bibTeX is parseable or unparseable.

Call signature

refmanage [-t] [-v] [files]

Behavior

Testing is the default behavior of refmanage. If no file arguments are passed, refmanage defaults to *.bib and prints a short message about getting help via the -h flag. By default, refmanage will only return a list of files that were unparseable; if the -v or --verbose flags are passed, refmanage will return two lists: the list of parseable files and the list of unparseable files, in that order.

Fix pathname output from "test" functionality

Once #34 has been closed, the output returned from the "test" functionality should be fixed.

  • There's no need to output a message like "The following files are unparseable" since the user is explicitly asking for a list of parseable or unparseable files and both options are mutually exclusive.
  • The fully qualified pathnames are too much. Output should be relative to cwd except for the following conditions:
    • The user passed a fully qualified pathname as an argument on the cli.
    • The user passed a pathname who's origin is above cwd in the filesystem, e.g. ../bib/bibtex.

Examples

$ ref -t *.bib
unparseable_one.bib
unparseable_two.bib
unparseable_three.bib
$ ref -t bib/*.bib
bib/unparseable_one.bib
bib/unparseable_two.bib
bib/unparseable_three.bib
$ ref -t /home/username/bib/bookshelf.bib /home/username/bib/unparseable.bib
/home/username/bib/unparseable.bib
$ pwd
/home/username/Desktop
$ ref -t ../bibtex/unparseable.bib
/home/username/bibtex/unparseable.bib

Note

This issue is split from #32.

The above examples assume issue #33 has not been closed.

CLI "test" functionality does not handle default argument properly

When manually executing refmanage within the repo, I get some unexpected output.

Example code triggering bug

Commit 38fae89 was used with the following code to trigger the bug. Note that ~ are used to generalize the filesystem location.

$ pwd
~/refmanage/refmanage

$ ls
__init__.py  fs_utils.py  refmanage.py

$ python refmanage.py -t
The following files are unparseable:
    ~/refmanage/refmanage
    ~/refmanage/refmanage/fs_utils.pyc

Since ~/refmanage/refmanage contains no BibTeX files, I would expect no output when calling refmanage.py from the cli in the above case.

Verbose test output not useful for certain .bib files

Description

There are some BibTeX files for which ref -tv does not return a useful output message. In fact, it returns nothing.

Example file

Consider the file 10.1371__journal.pone.0115069.bib:

@article{10.1371/journal.pone.0115069,
    author = {Knauff, , Markus AND Nejasmic, , Jelica},
    journal = {PLoS ONE},
    publisher = {Public Library of Science},
    title = {An Efficiency Comparison of Document Preparation Systems Used in Academic Research and Development},
    year = {2014},
    month = {12},
    volume = {9},
    url = {http://dx.doi.org/10.1371%2Fjournal.pone.0115069},
    pages = {e115069},
    abstract = {<p>The choice of an efficient document preparation system is an important decision for any academic researcher. To assist the research community, we report a software usability study in which 40 researchers across different disciplines prepared scholarly texts with either Microsoft Word or LaTeX. The probe texts included simple continuous text, text with tables and subheadings, and complex text with several mathematical equations. We show that LaTeX users were slower than Word users, wrote less text in the same amount of time, and produced more typesetting, orthographical, grammatical, and formatting errors. On most measures, expert LaTeX users performed even worse than novice Word users. LaTeX users, however, more often report enjoying using their respective software. We conclude that even experienced LaTeX users may suffer a loss in productivity when LaTeX is used, relative to other document preparation systems. Individuals, institutions, and journals should carefully consider the ramifications of this finding when choosing document preparation strategies, or requiring them of authors.</p>},
    number = {12},
    doi = {10.1371/journal.pone.0115069}
}  

Erroneous output

$ ref -tv 10.1371__journal.pone.0115069.bib
/home/username/library/bib/10.1371__journal.pone.0115069.bib

Solution

The refmanage.fs_utils.gen_verbose_msg method needs to be refactored to handle this kind of problem with the file format. Fortunately, pybtex raises different types of exceptions and so the solution is to inspect the exception type and create a verbose string based on this information.

Example

Consider the example file above and invalid.bib from 9a78a37.

>>> import pathlib2 as pathlib
>>> import refmanage

>>> p = pathlib.Path("10.1371__journal.pone.0115069.bib")
>>> bib_p = refmanage.fs_utils.parse_bib_file(p)

>>> u = pathlib.Path("invalid.bib")
>>> bib_u = refmanage.fs_utils.parse_bib_file(u)

>>> bib_p.__class__
pybtex.exceptions.PybtexError

>>> bib_u.__class__
pybtex.scanner.TokenRequired

So for pybtex.exceptions.PybtexError, gen_verbose_msg should just return the message attribute of the exception. But for pybtex.scanner.TokenRequired,gen_verbose_msg should construct the string it is already constructing.

Consider renaming `cat_db` to simply `cat`

Currently, the cat_db method is located in the namespace as follows: refmanage.db_utils.cat_db. Since it is below db_utils the additional "utils" may be redundant and could be dropped. Obviously a method named "cat" located in a "database utilities" submodule is going to concatenate databases.

Refactor class hierarchy modeling BibTeX file functionality

I noted (#41 (comment)) that instead of making a single BibFile class, refmanage should actually have two classes, one for files containing valid BibTeX, and the other for files containing invalid BibTeX. Both classes should be children of a common RefFile class which is not intended to be instantiated.

At the end of this refactor, three new classes (with tests) will exist:

  • RefFile (parent)
  • BibFile
  • NonbibFile

In addition, I will add a method in fs_utils.py which takes a pathlib.Path and outputs the appropriate RefFile child object, depending on the parseability of the file located at pathlib.Path.

Test command-line functionality

The command-line logic of this program should be tested. I don't know how to write those tests right now, but I imagine I will end up testing the API as well as the functionality. Testing the command-line functionality probably looks a lot like typical unit tests.

Dustin Collins has a writeup that may be useful.

Test "test" command-line functionality

This issue pertains specifically to testing the "test" command-line functionality and is split from issue #21.

To test the "test" functionality, I should test:

  • Defaults.
  • Every combination of cli flags.

Refactor "test" functionality with flags to show either parseable or unparseable files

The refmanage "test" functionality should have flags to show either a list of unparseable files or a list of parseable files. These flags should be mutually exclusive. For parseable files, the flag should be "-p" or "--parseable". For unparseable files the flag should be "-u" or "--unparseable". If neither flag is used, the default behavior should be to display a list of unparseable files.

This issue is split from issue #32.

Use argparse sub-commands for functionalities

The refmanage application will have many functionalities; I've listed the "test" functionality in issue #19, but there will be others. For example: replace all entries' BibTeX keys with the entry's DOI in a specified .bib file.

These different functionalities are very different and will require different sets of arguments. Therefore, it makes sense to use argparse sub-commands to implement the different functionalities.

Improve the "test" functionality

I'm not happy with the way the output is formatted and presented when refmanage is called with the "--test" flag. Here are a list of suggestions:

Fully-qualified pathname output

I don't like the fully-qualified pathname when relative paths are passed as arguments. The following is jarring to see:

$ ref -t *.bib
The following files are unparseable:
    /home/username/path/to/bibtex/files/one.bib
    /home/username/path/to/bibtex/files/two.bib
    /home/username/path/to/bibtex/files/three.bib

The result should look like

$ ref -t *.bib
one.bib
two.bib
three.bib

That said, I'm not sure how to deal with absolute pathnames passed as a cli argument.

Mutually exclusive (unparseable) flags

There should be two mutually exclusive flags that display either the parseable or unparseable files. The flag to display unparseable files should be the default.

$ ref -t -u *.bib # -u displays unparseable
unparseable1.bib
unparseable2.bib
unparseable3.bib

$ ref -t -p *.bib # -p displays parseable
parseable1.bib
parseable2.bib
parseable3.bib

More useful messages with unparseable files

There should be more useful output describing the problems with unparseable files. Probably the "--verbose" flag can handle this case.

Categories of functionality

Merge

  • Combine several bibTeX files into a single file.
  • Option to delete source files after merge.

UID for bibTeX key

  • Change bibTeX key to a UID (e.g. DOI, ISBN, etc.).
  • Report on which entries do not have a UID as the bibTeX key.
  • Report on which entries are missing a UID field altogether.
  • Find missing PDFs.

PDF

  • Extract DOI from PDF.
  • Change PDF filename to DOI.
  • Watermark PDF with DOI on all pages.
  • Print PDF with watermark of date printed.
  • Get bibTeX file from DOI found in PDF via crossref, merge into existing database.

merge: flag for sources: ignore

The user should have the ability to specify source files to ignore. For example, the user wants to merge all .bib files except no_merge.bib.

Cannot install version 0.1.1-0.0.1; dependencies don't seem to work

Attempting to install version 0.1.1-0.0.1 in a conda virtual environment throws a bunch of errors. I think there's a problem in the setup.py in terms of the dependencies required for this package.

Example

Create a conda virtual environment and attempt to install version 0.1.1-0.0.1 from github.

$ cd test
$ conda create -p ./env anaconda
$ source activate ./env/
$ pip install git+git://github.com/jrsmith3/[email protected]#egg=refmanage
Downloading/unpacking refmanage from git+git://github.com/jrsmith3/[email protected]
  Cloning git://github.com/jrsmith3/refmanage.git (to 0.1.1-0.0.1) to /private/var/folders/qs/jhzvfv7x41v8p_6y91p044ww0000gn/T/pip_build_jrsmith3/refmanage
  Running setup.py (path:/private/var/folders/qs/jhzvfv7x41v8p_6y91p044ww0000gn/T/pip_build_jrsmith3/refmanage/setup.py) egg_info for package refmanage
    Traceback (most recent call last):
      File "<string>", line 17, in <module>
      File "/private/var/folders/qs/jhzvfv7x41v8p_6y91p044ww0000gn/T/pip_build_jrsmith3/refmanage/setup.py", line 3, in <module>
        import refmanage
      File "refmanage/__init__.py", line 11, in <module>
        from fs_utils import *
      File "refmanage/fs_utils.py", line 4, in <module>
        import pathlib2 as pathlib
    ImportError: No module named pathlib2
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 17, in <module>

  File "/private/var/folders/qs/jhzvfv7x41v8p_6y91p044ww0000gn/T/pip_build_jrsmith3/refmanage/setup.py", line 3, in <module>

    import refmanage

  File "refmanage/__init__.py", line 11, in <module>

    from fs_utils import *

  File "refmanage/fs_utils.py", line 4, in <module>

    import pathlib2 as pathlib

ImportError: No module named pathlib2

----------------------------------------
Cleaning up...
Command python setup.py egg_info failed with error code 1 in /private/var/folders/qs/jhzvfv7x41v8p_6y91p044ww0000gn/T/pip_build_jrsmith3/refmanage
Storing debug log for failure in /Users/jrsmith3/.pip/pip.log

Change "--verbose" functionality of "test" functionality/Display more useful messages for unparseable files

Once issue #34 has been closed, the functionality of the "--verbose" flag should be refactored to output more useful information about the nature of a file's unparseability. Output information should include:

  • The error type.
  • The error message.
  • The literal offending line encountered by the BibTeX parser.
  • The line number of the unparseable line.

This information can be determined by storing and later inspecting the Exception object created when a parsing error occurs.

Suggested solution

First, pybtex.fs_utils.import_bib_files should be refactored to store the Exception object created when a parsing error occurs instead of storing None in the dictionary it returns.

These Exception objects have a number of parameters which can be queried to determine exactly what happened to cause the parser to fail. For example:

>>> from pybtex.database.input import bibtex
>>> from pybtex.exceptions import PybtexError
>>> parser = bibtex.Parser()
>>> try:
>>>    parser.parse_file("bookshelf.bib")
>>> except PybtexError, e:
>>>    pass
>>> e.error_type
'syntax error'
>>> e.message
u'entry key expected'
>>> e.lineno
228
>>> e.get_context()
u'@Book{,\n     ^^^'

For each unparseable file, the refmanage cli should output the filename (like it does with the default behavior), then each of the items in the list above, indented with a tab.

$ ref --test --verbose bookshelf.bib
bookshelf.bib
    type: syntax error
    message: entry key expected
    line number: 228
    context: @Book{,\n     ^^^

Note

This issue is split from #32.

The above example assumes that issue #33 has not been closed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.