jrsmith3 / refmanage Goto Github PK
View Code? Open in Web Editor NEWManage a BibTeX database
Home Page: https://github.com/jrsmith3/refmanage
License: MIT License
Manage a BibTeX database
Home Page: https://github.com/jrsmith3/refmanage
License: MIT License
This issue is a specific directive to refactor the "test" functionality, based on issue #29. Instead of calling
$ ref -test *.bib
or
$ ref -t *.bib
The command should look like
$ ref test *.bib
The verbose output should be indented by two spaces to visually offset it from the terse output.
$ ref -tv bookshelf.bib
bookshelf.bib
syntax error: entry key expected
228 @Book{,
^^^
Right now the list of .bib files that raised exceptions during the default combine operation are just dumped to the screen via python's print
command. Each file should be printed one per line.
The default, as defined in issue #19, is to print only the unparseable filenames. If a -v
or --verbose
flag is passed on the command-line for the "test" functionality, the program prints lists of both the parseable and unparseable files. There should be a third option, indicated by a -p
or --parseables
flag that prints only the list of files that were parseable.
Currently, paths printed from the "test" functionality are always fully-qualified. When refmanage -t
is invoked, the paths printed should be relative to pwd unless they are not.
Assume the following directory structure in ~/Desktop
:
Desktop
├── bib
│ ├── one.bib
│ ├── subbib
│ │ └── three.bib
│ └── two.bib
└── otherbib
├── five.bib
└── four.bib
All (sub)directories contain .bib
files with valid BibTeX. The user is currently in ~/Desktop/bib
.
$ pwd
/home/username/Desktop/bib
$ ref -t *.bib
one.bib
two.bib
$ ref -t subbib/*.bib
subbib/three.bib
$ ref -t ../otherbib/*.bib
/home/username/Desktop/otherbib/five.bib
/home/username/Desktop/otherbib/six.bib
This issue is split from #35.
The above example assumes that issue #33 has not been closed.
This project needs minimal documentation that is hosted someplace on the web. It can be hosted at jrsmith3.github.io/refmanage but a better choice is probably readthedocs.org or pythonhosted.org.
The documentation should include the following:
After further consideration of Issue #16, it makes the most sense to name this method merge
.
refmanage merge [option] [
The fs_utils
name appears in several places (modules, filenames, etc.). I think the "fs" is superfluous.
I should probably switch to setuptools instead of distutils for setup.py
.
Currently, refmanage.main
constructs the argparse
structure, parses the command-line arguments, and dispatches the appropriate functionality based on the arguments. The dispatching functionality should be split off from refmanage.main
in order to better separate concerns and make for easier testing.
With a separate refmanage.cli_args_dispatcher
method, I can create argparse
objects in my tests which would be created from command line arguments in normal operation. In this way testing will be much easier.
I might also need to write refmanage.cli_args_dispatcher
to accept an unparsed argparse
object. In that way I can test that the command-line arguments were called properly.
At the command line, the user can pass a filename/filenames of a file/files containing bibTeX entries and a flag. refmanage
will test the file(s) and return a status indicating if the bibTeX is parseable or unparseable.
refmanage [-t] [-v] [files]
Testing is the default behavior of refmanage
. If no file arguments are passed, refmanage defaults to *.bib
and prints a short message about getting help via the -h
flag. By default, refmanage
will only return a list of files that were unparseable; if the -v
or --verbose
flags are passed, refmanage
will return two lists: the list of parseable files and the list of unparseable files, in that order.
Once #34 has been closed, the output returned from the "test" functionality should be fixed.
../bib/bibtex
.$ ref -t *.bib
unparseable_one.bib
unparseable_two.bib
unparseable_three.bib
$ ref -t bib/*.bib
bib/unparseable_one.bib
bib/unparseable_two.bib
bib/unparseable_three.bib
$ ref -t /home/username/bib/bookshelf.bib /home/username/bib/unparseable.bib
/home/username/bib/unparseable.bib
$ pwd
/home/username/Desktop
$ ref -t ../bibtex/unparseable.bib
/home/username/bibtex/unparseable.bib
This issue is split from #32.
The above examples assume issue #33 has not been closed.
When manually executing refmanage
within the repo, I get some unexpected output.
Commit 38fae89 was used with the following code to trigger the bug. Note that ~
are used to generalize the filesystem location.
$ pwd
~/refmanage/refmanage
$ ls
__init__.py fs_utils.py refmanage.py
$ python refmanage.py -t
The following files are unparseable:
~/refmanage/refmanage
~/refmanage/refmanage/fs_utils.pyc
Since ~/refmanage/refmanage
contains no BibTeX files, I would expect no output when calling refmanage.py
from the cli in the above case.
If a file can't be merged (because it's a duplicate, it's not parsing) add it to a list to be printed later.
I need all combinations of the following:
~/path/to/files
)*.bib
)Additionally, rename the variable to paths_args
.
There are some BibTeX files for which ref -tv
does not return a useful output message. In fact, it returns nothing.
Consider the file 10.1371__journal.pone.0115069.bib:
@article{10.1371/journal.pone.0115069,
author = {Knauff, , Markus AND Nejasmic, , Jelica},
journal = {PLoS ONE},
publisher = {Public Library of Science},
title = {An Efficiency Comparison of Document Preparation Systems Used in Academic Research and Development},
year = {2014},
month = {12},
volume = {9},
url = {http://dx.doi.org/10.1371%2Fjournal.pone.0115069},
pages = {e115069},
abstract = {<p>The choice of an efficient document preparation system is an important decision for any academic researcher. To assist the research community, we report a software usability study in which 40 researchers across different disciplines prepared scholarly texts with either Microsoft Word or LaTeX. The probe texts included simple continuous text, text with tables and subheadings, and complex text with several mathematical equations. We show that LaTeX users were slower than Word users, wrote less text in the same amount of time, and produced more typesetting, orthographical, grammatical, and formatting errors. On most measures, expert LaTeX users performed even worse than novice Word users. LaTeX users, however, more often report enjoying using their respective software. We conclude that even experienced LaTeX users may suffer a loss in productivity when LaTeX is used, relative to other document preparation systems. Individuals, institutions, and journals should carefully consider the ramifications of this finding when choosing document preparation strategies, or requiring them of authors.</p>},
number = {12},
doi = {10.1371/journal.pone.0115069}
}
$ ref -tv 10.1371__journal.pone.0115069.bib
/home/username/library/bib/10.1371__journal.pone.0115069.bib
The refmanage.fs_utils.gen_verbose_msg
method needs to be refactored to handle this kind of problem with the file format. Fortunately, pybtex
raises different types of exceptions and so the solution is to inspect the exception type and create a verbose string based on this information.
Consider the example file above and invalid.bib from 9a78a37.
>>> import pathlib2 as pathlib
>>> import refmanage
>>> p = pathlib.Path("10.1371__journal.pone.0115069.bib")
>>> bib_p = refmanage.fs_utils.parse_bib_file(p)
>>> u = pathlib.Path("invalid.bib")
>>> bib_u = refmanage.fs_utils.parse_bib_file(u)
>>> bib_p.__class__
pybtex.exceptions.PybtexError
>>> bib_u.__class__
pybtex.scanner.TokenRequired
So for pybtex.exceptions.PybtexError
, gen_verbose_msg
should just return the message
attribute of the exception. But for pybtex.scanner.TokenRequired
,gen_verbose_msg
should construct the string it is already constructing.
The command-line application should be called ref
.
Currently, the cat_db
method is located in the namespace as follows: refmanage.db_utils.cat_db
. Since it is below db_utils
the additional "utils" may be redundant and could be dropped. Obviously a method named "cat" located in a "database utilities" submodule is going to concatenate databases.
I noted (#41 (comment)) that instead of making a single BibFile
class, refmanage
should actually have two classes, one for files containing valid BibTeX, and the other for files containing invalid BibTeX. Both classes should be children of a common RefFile
class which is not intended to be instantiated.
At the end of this refactor, three new classes (with tests) will exist:
RefFile
(parent)BibFile
NonbibFile
In addition, I will add a method in fs_utils.py
which takes a pathlib.Path
and outputs the appropriate RefFile
child object, depending on the parseability of the file located at pathlib.Path
.
The command-line logic of this program should be tested. I don't know how to write those tests right now, but I imagine I will end up testing the API as well as the functionality. Testing the command-line functionality probably looks a lot like typical unit tests.
Dustin Collins has a writeup that may be useful.
There are a number of functionalities that should be combined into a class:
fs_utils.construct_bib_dict
fs_utils.parse_bib_file
fs_utils.gen_terse_msg
fs_utils.gen_verbose_msg
fs_utils.gen_bib_dict_test_msg
If the target is explicitly specified, throw a warning and ignore it as a source. If it isn't explicitly specified (I.e. It comes in via *.bib), quietly ignore it.
This issue pertains specifically to testing the "test" command-line functionality and is split from issue #21.
To test the "test" functionality, I should test:
Required to close #19
The refmanage
"test" functionality should have flags to show either a list of unparseable files or a list of parseable files. These flags should be mutually exclusive. For parseable files, the flag should be "-p" or "--parseable". For unparseable files the flag should be "-u" or "--unparseable". If neither flag is used, the default behavior should be to display a list of unparseable files.
This issue is split from issue #32.
The refmanage
application will have many functionalities; I've listed the "test" functionality in issue #19, but there will be others. For example: replace all entries' BibTeX keys with the entry's DOI in a specified .bib file.
These different functionalities are very different and will require different sets of arguments. Therefore, it makes sense to use argparse
sub-commands to implement the different functionalities.
This flag returns the version of the application. I should be able to pass the data from refmanage.__version__
.
This option is mutually exclusive of all other options.
I'm not happy with the way the output is formatted and presented when refmanage
is called with the "--test" flag. Here are a list of suggestions:
I don't like the fully-qualified pathname when relative paths are passed as arguments. The following is jarring to see:
$ ref -t *.bib
The following files are unparseable:
/home/username/path/to/bibtex/files/one.bib
/home/username/path/to/bibtex/files/two.bib
/home/username/path/to/bibtex/files/three.bib
The result should look like
$ ref -t *.bib
one.bib
two.bib
three.bib
That said, I'm not sure how to deal with absolute pathnames passed as a cli argument.
There should be two mutually exclusive flags that display either the parseable or unparseable files. The flag to display unparseable files should be the default.
$ ref -t -u *.bib # -u displays unparseable
unparseable1.bib
unparseable2.bib
unparseable3.bib
$ ref -t -p *.bib # -p displays parseable
parseable1.bib
parseable2.bib
parseable3.bib
There should be more useful output describing the problems with unparseable files. Probably the "--verbose" flag can handle this case.
Create a new bugfix tag at commit fa60327: 0.1.1.
This information should be accessable like so:
import refmanage
print(refmanage.__version__)
The current documentation is way too much.
The user should have the ability to specify source files to ignore. For example, the user wants to merge all .bib files except no_merge.bib.
Currently the published documentation covers only the docstrings found in the library modules. The command line docs should also be included.
Attempting to install version 0.1.1-0.0.1 in a conda virtual environment throws a bunch of errors. I think there's a problem in the setup.py
in terms of the dependencies required for this package.
Create a conda virtual environment and attempt to install version 0.1.1-0.0.1 from github.
$ cd test
$ conda create -p ./env anaconda
$ source activate ./env/
$ pip install git+git://github.com/jrsmith3/[email protected]#egg=refmanage
Downloading/unpacking refmanage from git+git://github.com/jrsmith3/[email protected]
Cloning git://github.com/jrsmith3/refmanage.git (to 0.1.1-0.0.1) to /private/var/folders/qs/jhzvfv7x41v8p_6y91p044ww0000gn/T/pip_build_jrsmith3/refmanage
Running setup.py (path:/private/var/folders/qs/jhzvfv7x41v8p_6y91p044ww0000gn/T/pip_build_jrsmith3/refmanage/setup.py) egg_info for package refmanage
Traceback (most recent call last):
File "<string>", line 17, in <module>
File "/private/var/folders/qs/jhzvfv7x41v8p_6y91p044ww0000gn/T/pip_build_jrsmith3/refmanage/setup.py", line 3, in <module>
import refmanage
File "refmanage/__init__.py", line 11, in <module>
from fs_utils import *
File "refmanage/fs_utils.py", line 4, in <module>
import pathlib2 as pathlib
ImportError: No module named pathlib2
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 17, in <module>
File "/private/var/folders/qs/jhzvfv7x41v8p_6y91p044ww0000gn/T/pip_build_jrsmith3/refmanage/setup.py", line 3, in <module>
import refmanage
File "refmanage/__init__.py", line 11, in <module>
from fs_utils import *
File "refmanage/fs_utils.py", line 4, in <module>
import pathlib2 as pathlib
ImportError: No module named pathlib2
----------------------------------------
Cleaning up...
Command python setup.py egg_info failed with error code 1 in /private/var/folders/qs/jhzvfv7x41v8p_6y91p044ww0000gn/T/pip_build_jrsmith3/refmanage
Storing debug log for failure in /Users/jrsmith3/.pip/pip.log
Once issue #34 has been closed, the functionality of the "--verbose" flag should be refactored to output more useful information about the nature of a file's unparseability. Output information should include:
This information can be determined by storing and later inspecting the Exception
object created when a parsing error occurs.
First, pybtex.fs_utils.import_bib_files
should be refactored to store the Exception
object created when a parsing error occurs instead of storing None
in the dictionary it returns.
These Exception
objects have a number of parameters which can be queried to determine exactly what happened to cause the parser to fail. For example:
>>> from pybtex.database.input import bibtex
>>> from pybtex.exceptions import PybtexError
>>> parser = bibtex.Parser()
>>> try:
>>> parser.parse_file("bookshelf.bib")
>>> except PybtexError, e:
>>> pass
>>> e.error_type
'syntax error'
>>> e.message
u'entry key expected'
>>> e.lineno
228
>>> e.get_context()
u'@Book{,\n ^^^'
For each unparseable file, the refmanage
cli should output the filename (like it does with the default behavior), then each of the items in the list above, indented with a tab.
$ ref --test --verbose bookshelf.bib
bookshelf.bib
type: syntax error
message: entry key expected
line number: 228
context: @Book{,\n ^^^
This issue is split from #32.
The above example assumes that issue #33 has not been closed.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.