Giter Club home page Giter Club logo

duecredit's Introduction

duecredit

Build Status Coverage Status DOI PyPI version fury.io

duecredit is being conceived to address the problem of inadequate citation of scientific software and methods, and limited visibility of donation requests for open-source software.

It provides a simple framework (at the moment for Python only) to embed publication or other references in the original code so they are automatically collected and reported to the user at the necessary level of reference detail, i.e. only references for actually used functionality will be presented back if software provides multiple citeable implementations.

Installation

Duecredit is easy to install via pip, simply type:

pip install duecredit

Examples

To cite the modules and methods you are using

You can already start "registering" citations using duecredit in your Python modules and even registering citations (we call this approach "injections") for modules that do not (yet) use duecredit. duecredit will remain an optional dependency, i.e. your software will work correctly even without duecredit installed.

For example, list citations of the modules and methods yourproject uses with a few simple commands:

cd /path/to/yourmodule # for ~/yourproject
cd yourproject # change directory into where the main code base is
python -m duecredit yourproject.py

Or you can also display them in BibTex format, using:

duecredit summary --format=bibtex

See this gif animation for a better illustration: Example

To let others cite your software

For using duecredit in your software

  1. Copy duecredit/stub.py to your codebase, e.g.

     wget -q -O /path/tomodule/yourmodule/due.py \
       https://raw.githubusercontent.com/duecredit/duecredit/master/duecredit/stub.py
    

    Note that it might be better to avoid naming it duecredit.py to avoid shadowing installed duecredit.

  2. Then use duecredit import due and necessary entries in your code as

     from .due import due, Doi, BibTeX
    

    To provide a generic reference for the entire module just use e.g.

      due.cite(Doi("1.2.3/x.y.z"), description="Solves all your problems", path="magicpy")
    

    By default, the added reference does not show up in the summary report (but see the User-view section below). If your reference is to a core package and you find that it should be listed in the summary then set cite_module=True (see here for a complete description of the arguments)

      due.cite(Doi("1.2.3/x.y.z"), description="The Answer to Everything", path="magicpy", cite_module=True)
    

    Similarly, to provide a direct reference for a function or a method, use the dcite decorator (by default this decorator sets cite_module=True)

      @due.dcite(Doi("1.2.3/x.y.z"), description="Resolves constipation issue")
      def pushit():
          ...
    

    You can easily obtain a DOI for your software using Zenodo.org and a few other DOI providers.

References can also be entered as BibTeX entries

    due.cite(BibTeX("""
            @article{mynicearticle,
            title={A very cool paper},
            author={Happy, Author and Lucky, Author},
            journal={The Journal of Serendipitous Discoveries}
            }
            """),
            description="Solves all your problems", path="magicpy")

Now what

Do the due

Once you obtained the references in the duecredit output, include them in in the references section of your paper or software.

Add injections for other existing modules

We hope that eventually this somewhat cruel approach will not be necessary. But until other packages support duecredit "natively" we have provided a way to "inject" citations for modules and/or functions and methods via injections: citations will be added to the corresponding functionality upon those modules import.

All injections are collected under duecredit/injections. See any file there with mod_ prefix for a complete example. But overall it is just a regular Python module defining a function inject(injector) which will then add new entries to the injector, which will in turn add those entries to the duecredit whenever the corresponding module gets imported.

User-view

By default duecredit does exactly nothing -- all decorators do not decorate, all cite functions just return, so there should be no fear that it would break anything. Then whenever anyone runs their analysis which uses your code and sets DUECREDIT_ENABLE=yes environment variable or uses python -m duecredit, and invokes any of the cited function/methods, at the end of the run all collected bibliography will be presented to the screen and pickled into .duecredit.p file in the current directory or to your DUECREDIT_FILE environment setting:

$> python -m duecredit examples/example_scipy.py
I: Simulating 4 blobs
I: Done clustering 4 blobs

DueCredit Report:
- Scientific tools library / numpy (v 1.10.4) [1]
- Scientific tools library / scipy (v 0.14) [2]
  - Single linkage hierarchical clustering / scipy.cluster.hierarchy:linkage (v 0.14) [3]

2 packages cited
0 modules cited
1 function cited

References
----------

[1] Van Der Walt, S., Colbert, S.C. & Varoquaux, G., 2011. The NumPy array: a structure for efficient numerical computation. Computing in Science & Engineering, 13(2), pp.22–30.
[2] Jones, E. et al., 2001. SciPy: Open source scientific tools for Python.
[3] Sibson, R., 1973. SLINK: an optimally efficient algorithm for the single-link cluster method. The Computer Journal, 16(1), pp.30–34.

Incremental runs of various software would keep enriching that file. Then you can use duecredit summary command to show that information again (stored in .duecredit.p file) or export it as a BibTeX file ready for reuse, e.g.:

$> duecredit summary --format=bibtex
@article{van2011numpy,
        title={The NumPy array: a structure for efficient numerical computation},
        author={Van Der Walt, Stefan and Colbert, S Chris and Varoquaux, Gael},
        journal={Computing in Science \& Engineering},
        volume={13},
        number={2},
        pages={22--30},
        year={2011},
        publisher={AIP Publishing}
        }
@Misc{JOP+01,
      author =    {Eric Jones and Travis Oliphant and Pearu Peterson and others},
      title =     {{SciPy}: Open source scientific tools for {Python}},
      year =      {2001--},
      url = "http://www.scipy.org/",
      note = {[Online; accessed 2015-07-13]}
    }
@article{sibson1973slink,
        title={SLINK: an optimally efficient algorithm for the single-link cluster method},
        author={Sibson, Robin},
        journal={The Computer Journal},
        volume={16},
        number={1},
        pages={30--34},
        year={1973},
        publisher={Br Computer Soc}
    }

and if by default only references for "implementation" are listed, we can enable listing of references for other tags as well (e.g. "edu" depicting instructional materials -- textbooks etc. on the topic):

$> DUECREDIT_REPORT_TAGS=* duecredit summary

DueCredit Report:
- Scientific tools library / numpy (v 1.10.4) [1]
- Scientific tools library / scipy (v 0.14) [2]
  - Hierarchical clustering / scipy.cluster.hierarchy (v 0.14) [3, 4, 5, 6, 7, 8, 9]
  - Single linkage hierarchical clustering / scipy.cluster.hierarchy:linkage (v 0.14) [10, 11]

2 packages cited
1 module cited
1 function cited

References
----------

[1] Van Der Walt, S., Colbert, S.C. & Varoquaux, G., 2011. The NumPy array: a structure for efficient numerical computation. Computing in Science & Engineering, 13(2), pp.22–30.
[2] Jones, E. et al., 2001. SciPy: Open source scientific tools for Python.
[3] Sneath, P.H. & Sokal, R.R., 1962. Numerical taxonomy. Nature, 193(4818), pp.855–860.
[4] Batagelj, V. & Bren, M., 1995. Comparing resemblance measures. Journal of classification, 12(1), pp.73–90.
[5] Sokal, R.R., Michener, C.D. & University of Kansas, 1958. A Statistical Method for Evaluating Systematic Relationships, University of Kansas.
[6] Jain, A.K. & Dubes, R.C., 1988. Algorithms for clustering data, Prentice-Hall, Inc..
[7] Johnson, S.C., 1967. Hierarchical clustering schemes. Psychometrika, 32(3), pp.241–254.
[8] Edelbrock, C., 1979. Mixture model tests of hierarchical clustering algorithms: the problem of classifying everybody. Multivariate Behavioral Research, 14(3), pp.367–384.
[9] Fisher, R.A., 1936. The use of multiple measurements in taxonomic problems. Annals of eugenics, 7(2), pp.179–188.
[10] Gower, J.C. & Ross, G., 1969. Minimum spanning trees and single linkage cluster analysis. Applied statistics, pp.54–64.
[11] Sibson, R., 1973. SLINK: an optimally efficient algorithm for the single-link cluster method. The Computer Journal, 16(1), pp.30–34.

The DUECREDIT_REPORT_ALL flag allows one to output all the references for the modules that lack objects or functions with citations. Compared to the previous example, the following output additionally shows a reference for scikit-learn since example_scipy.py uses an uncited function from that package.

$> DUECREDIT_REPORT_TAGS=* DUECREDIT_REPORT_ALL=1 duecredit summary

DueCredit Report:
- Scientific tools library / numpy (v 1.10.4) [1]
- Scientific tools library / scipy (v 0.14) [2]
  - Hierarchical clustering / scipy.cluster.hierarchy (v 0.14) [3, 4, 5, 6, 7, 8, 9]
  - Single linkage hierarchical clustering / scipy.cluster.hierarchy:linkage (v 0.14) [10, 11]
- Machine Learning library / sklearn (v 0.15.2) [12]

3 packages cited
1 module cited
1 function cited

References
----------

[1] Van Der Walt, S., Colbert, S.C. & Varoquaux, G., 2011. The NumPy array: a structure for efficient numerical computation. Computing in Science & Engineering, 13(2), pp.22–30.
[2] Jones, E. et al., 2001. SciPy: Open source scientific tools for Python.
[3] Sneath, P.H. & Sokal, R.R., 1962. Numerical taxonomy. Nature, 193(4818), pp.855–860.
...

Tags

You are welcome to introduce new tags specific to your citations but we hope that for consistency across projects, you would use the following tags

  • implementation (default) — an implementation of the cited method
  • reference-implementation — the original implementation (ideally by the authors of the paper) of the cited method
  • another-implementation — some other implementation of the method, e.g. if you would like to provide a citation for another implementation of the method you have implemented in your code and for which you have already provided implementation or reference-implementation tag
  • use — publications demonstrating a worthwhile noting use of the method
  • edu — tutorials, textbooks and other materials useful to learn more about cited functionality
  • donate — should be commonly used with URL entries to point to the websites describing how to contribute some funds to the referenced project
  • funding — to point to the sources of funding which provided support for a given functionality implementation and/or method development
  • dataset - for datasets

Ultimate goals

Reduce demand for prima ballerina projects

Problem: Scientific software is often developed to gain citations for original publication through the use of the software implementing it. Unfortunately, such an established procedure discourages contributions to existing projects and fosters new projects to be developed from scratch.

Solution: With easy ways to provide all-and-only relevant references for used functionality within a large(r) framework, scientific developers will prefer to contribute to already existing projects.

Benefits: As a result, scientific developers will immediately benefit from adhering to proper development procedures (codebase structuring, testing, etc) and already established delivery and deployment channels existing projects already have. This will increase efficiency and standardization of scientific software development, thus addressing many (if not all) core problems with scientific software development everyone likes to bash about (reproducibility, longevity, etc.).

Adequately reference core libraries

Problem: Scientific software often, if not always, uses 3rd party libraries (e.g., NumPy, SciPy, atlas) which might not even be visible at the user level. Therefore they are rarely referenced in the publications despite providing the fundamental core for solving a scientific problem at hand.

Solution: With automated bibliography compilation for all used libraries, such projects and their authors would get a chance to receive adequate citability.

Benefits: Adequate appreciation of the scientific software developments. Coupled with a solution for "prima ballerina" problem, more contributions will flow into the core/foundational projects making new methodological developments readily available to even wider audiences without proliferation of the low quality scientific software.

Similar/related projects

sempervirens -- an experimental prototype for gathering anonymous, opt-in usage data for open scientific software. Eventually, in duecredit we aim either to provide similar functionality (since we are collecting such information as well) or just interface/report to sempervirens.

citepy -- Easily cite software libraries using information from automatically gathered from their package repository.

Currently used by

This is a running list of projects that use DueCredit natively. If you are using DueCredit, or plan to use it, please consider sending a pull request and add your project to this list. Thanks to @fedorov for the idea.

Last updated 2024-02-23.

duecredit's People

Contributors

a-detiste avatar adityasavara avatar bdrung avatar clbarnes avatar dependabot[bot] avatar dvolgyes avatar effigies avatar emirvine avatar fepegar avatar jwilk avatar jwodder avatar katrinleinweber avatar lesteve avatar marcelzwiers avatar mslw avatar mvdoc avatar ofgulban avatar orbeckst avatar raamana avatar sanjaymsh avatar yarikoptic avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

duecredit's Issues

zenodo and "unofficial" bibtex entry types

as showed in #76, zenodo's bibtex entries are saved as @data entry types, which are unofficial bibtex entries, thus unsupported by citeproc-py (which we use to parse the bibtex into formatted text).

possible workaround is to change @data to @article only when parsing the text, but remembering the original version when outputting to bibtex. what do you think @yarikoptic ?

test_no_double_activation on injector fails on travis and locally with Python 2.7.{6,9}

locally (in virtualenv) with Python 2.7.6 I get

======================================================================
FAIL: duecredit.tests.test_injections.test_no_double_activation
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/contematto/virtualenv/duecredit/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/Users/contematto/github/duecredit/duecredit/tests/test_injections.py", line 143, in test_no_double_activation
    assert_true(__builtin__.__import__ is duecredited__import__) # we didn't decorate again
AssertionError: False is not true
-------------------- >> begin captured logging << --------------------
duecredit: WARNING: Seems that we are calling duecredit_importer twice. No harm is done but shouldn't happen
duecredit: WARNING: _orig_import is not yet known, so we haven't decorated default importer yet. Nothing TODO
--------------------- >> end captured logging << ---------------------

outside virtualenv with python 2.7.9 works fine. however it fails on travis https://travis-ci.org/duecredit/duecredit/jobs/75193538

Support RRIDs

may be we could even get canonical reference from RRID entry, thus not demanding to list both

Provide support of arbitrary cmdline commands

via concise wrapper scripts (in bash or Python) to be provided in overloaded PATH so e.g. calling feat from FSL with specific parameters would have triggered proper citation in duecredit.
Establishing this on this level, instead of e.g. decorating interfaces within nipype, would make it reusable across all "pipeline" engines or toolkits calling out to those tools

Problems to keep in mind: name conflicts. Some frameworks (rarely, but do) provide cmdline tools with names conflicting (e.g. "cluster"). To mitigate we could collaborate with developers (as was done for e.g. cmtk) to provide a gateway runner (e.g. as is done for git, cmtk, etc), which we would decorate instead then and provide citations based on the specific invoked command

RFC: either rename "kind" and "level" into some thing more descriptive

Hi @mvdoc , we should decide either it would stay as is:

  • level is just a string with some attached semantic on the prefix of it, then we should document/check on given values, or should it may be become differently defined,
  • kind I have recently added to state what kind of reference it is -- canonical reference to the method/implementation, or example use, or educational tutorial

Should we stick to those names or may be there could be better one chosen?
May be "kind" could simply be refactored into "tags" list, so multiple tags could be provided and we would just have some "semantic" associated with "canonical", "use", "edu", and the rest might come useful later...

What do you think?

Anyone else reading -- your input would be valuable!

Support system-specific references

I.e. on neurodebian we would then provide something like /etc/duecredit/entries/neurodebian.due which would add entries to be used on this system
I think I had similar issue somewhere but can't find it -- to reference e.g. funding which supported establishing a given computing environment.

Outputting to bibtex doesn't filter by used citations

Probably linked to #19

Example output text

$duecredit summary --format text

DueCredit Report:
- Scientific tools library / numpy (v 1.10.4) [1]
- Scientific tools library / scipy (v 0.17) [2]
  - Single linkage hierarchical clustering / scipy.cluster.hierarchy:linkage (v 0.17) [3]
- Machine Learning library / sklearn (v 0.17) [4]

3 packages cited
0 modules cited
1 functions cited

References
----------

[1] Van Der Walt, S., Colbert, S.C. & Varoquaux, G., 2011. The NumPy array: a structure for efficient numerical computation. Computing in Science & Engineering, 13(2), pp.22–30.
[2] Jones, E. et al., 2001. SciPy: Open source scientific tools for Python.
[3] Sibson, R., 1973. SLINK: an optimally efficient algorithm for the single-link cluster method. The Computer Journal, 16(1), pp.30–34.
[4] Pedregosa, F. et al., 2011. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12, pp.2825–2830.

Example output BibTeX

$duecredit summary --format bibtex
@article{sneath1962numerical,
        title={Numerical taxonomy},
        author={Sneath, Peter HA and Sokal, Robert R},
        journal={Nature},
        volume={193},
        number={4818},
        pages={855--860},
        year={1962},
        publisher={Nature Publishing Group}
    }
@article{batagelj1995comparing,
        title={Comparing resemblance measures},
        author={Batagelj, Vladimir and Bren, Matevz},
        journal={Journal of classification},
        volume={12},
        number={1},
        pages={73--90},
        year={1995},
        publisher={Springer}
    }
@Misc{JOP+01,
      author =    {Eric Jones and Travis Oliphant and Pearu Peterson and others},
      title =     {{SciPy}: Open source scientific tools for {Python}},
      year =      {2001--},
      url = "http://www.scipy.org/",
      note = {[Online; accessed 2015-07-13]}
    }

...

In PyMVPA, fail to handle failures if lxml, types not available

Might be due to weird setup on discovery?

In [1]: from mvpa2.suite import *
2015-12-02 15:44:10,822 [WARNING] DueCredit internal failure while running <function _get_inactive_due at 0x2b59d722d8c0>: ImportError('No module named lxml',). Please report to developers at https://github.com/duecredit/duecredit/issues (utils.py:73)
2015-12-02 15:44:10,824 [WARNING] DueCredit internal failure while running <function _get_active_due at 0x2b59d722d9b0>: ImportError('cannot import name types',). Please report to developers at https://github.com/duecredit/duecredit/issues (utils.py:73)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-1-972bf264f2d0> in <module>()
----> 1 from mvpa2.suite import *

/ihome/castello/venv/pymvpa-upstream-2.4.0.1-42-g8ebdeed/lib/python2.7/site-packages/mvpa2/__init__.pyc in <module>()
    178 # Setup duecredit entry for the entire PyMVPA
    179 from .support.due import due, Doi
--> 180 due.cite(
    181     Doi("10.1007/s12021-008-9041-y"),
    182     description="Multivariate pattern analysis of neural data",

AttributeError: 'DueSwitch' object has no attribute 'cite'

duecredit on nipype

Hi @yarikoptic,

I am adding references on nipype (nipy/nipype#1466).

When I try to add the Zenodo DOI I don't see any difference in the report. After some attempts I decided to remove the mod_nipype.py injections file from duecredit and I get this warning:

2016-05-03 16:23:21,072 [WARNING] Failed to obtain bibtex from doi.org, retrying... (io.py:54)
160503-16:23:21,72 duecredit WARNING:
     Failed to obtain bibtex from doi.org, retrying...
2016-05-03 16:23:21,573 [WARNING] DueCredit internal failure while running <function DueSwitch._dump_collector_summary at 0x7f114a0dbd90>: ValueError('Query http://dx.doi.org/10.5281/zenodo.50186 for BibTeX for a DOI 10.5281/zenodo.50186 (wrong doi?) has failed. Response code 406. ',). Please report to developers at https://github.com/duecredit/duecredit/issues (utils.py:76)
160503-16:23:21,573 duecredit.utils WARNING:
     DueCredit internal failure while running <function DueSwitch._dump_collector_summary at 0x7f114a0dbd90>: ValueError('Query http://dx.doi.org/10.5281/zenodo.50186 for BibTeX for a DOI 10.5281/zenodo.50186 (wrong doi?) has failed. Response code 406. ',). Please report to developers at https://github.com/duecredit/duecredit/issues

Is there any other alternative? How can I fix it?

Many thanks.

Best,
Alex

doi importer doesn't work with zenodo dois

see #72

we should look into how http://www.doi2bib.org/ fetches bibtex entries, because it works for them.

our way of getting bibtex entries returns Error 406 (not acceptable)

$ curl -LH "Accept: text/bibliography; style=bibtex" http://dx.doi.org/10.5281/zenodo.48147
<html><head><title>Apache Tomcat/7.0.26 - Error report</title><style><!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--></style> </head><body><h1>HTTP Status 406 - Not Acceptable</h1><HR size="1" noshade="noshade"><p><b>type</b> Status report</p><p><b>message</b> <u>Not Acceptable</u></p><p><b>description</b> <u>The resource identified by this request is only capable of generating responses with characteristics not acceptable according to the request "accept" headers (Not Acceptable).</u></p><HR size="1" noshade="noshade"><h3>Apache Tomcat/7.0.26</h3></body></html>%

Provide additional information for a citation, e.g. chapter/pages

not sure yet either it should become part of the reference entry (e.g. we could easily add a chapter or pages to generated bib if doi/bibtex), or more of a comment (might be more flexible) added to the citation (e.g. arg "comment" or more explicit "location" argument for Citation/dcite/cite?). Example could be when referencing 'edu' materials from a text book and wanting to point to a specific location within that reference. If original reference is a Url, it might be a "#id" to be added to the url (too specific for Url though but smth to keep in mind)

Web interface to add references to code for crowdsourcing?

I was thinking about something on the lines of github code review, you click on a function or method and you can insert a citation that goes to an injection file (maybe using Google scholar api to automatically search for the doi). This will allow anyone to contribute. Of course it would be reviewed periodically.

provide convenience to collate "users" of the toolbox/method

It is a bit cumbersome ATM in sphinx/rst to collate those, see PyMVPA/PyMVPA#282 .

It would be neat if we could have a simple helper which would use duecredit functionality for fetching entries and formatting output

users_due = due.collate([
 Doi("1.0.0"),
 Doi("1.2.3"),
 BibTeX("whatever"), ...])
users_due.output(filename="../doc/publications_users.rst", groupby="year")
users_due.output(filename="../doc/publications_users.bib")

or may be just a mvpa2/due_users.py with

due.cite(Doi("1.0.0"), kind="use")

and later call due.filter(kind="use").output(...)?

or smth like that, so those could become a part of the website

Nested `@dcite`s should result in a single one collating all citations

So we do not nest too deep without necessity. I guess @dcite should just inspect function it is wrapping and if it is already wrapped with duecredit, internally just store that additional reference in its "list of references" associated with that function, and then go through it upon call

references package even if no cited functions/methods used

@mvdoc Why sklearn listed (not that it shouldn't since we use its functionality here but none of the decorated methods was called AFAIK)?

$> python -m duecredit examples/example_scipy.py
I: Simulating 4 blobs           
I: Done clustering 4 blobs

DueCredit Report:
- scipy (v 0.14.1) [1]
  - scipy.cluster.hierarchy:linkage (v 0.14.1) [2]
- sklearn (v 0.16.1) [3]

2 packages cited
0 modules cited
1 functions cited

References
----------

[1] Jones, E. et al., 2001. SciPy: Open source scientific tools for Python.
[2] Sibson, R., 1973. SLINK: an optimally efficient algorithm for the single-link cluster method. The Computer Journal, 16(1), pp.30–34.
[3] Pedregosa, F. et al., 2011. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12, pp.2825–2830.

test_noincorrect_import_if_no_lxml fails on my laptop (and on travis)

I get the same error on my laptop and on travis (see https://travis-ci.org/duecredit/duecredit/jobs/128580818)

======================================================================
FAIL: duecredit.tests.test_api.test_noincorrect_import_if_no_lxml
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/contematto/anaconda/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/Users/contematto/github/duecredit/duecredit/tests/test_api.py", line 82, in test_noincorrect_import_if_no_lxml
    assert_equal(ret, 1)
AssertionError: 0 != 1

======================================================================
FAIL: duecredit.tests.test_api.test_noincorrect_import_if_no_lxml_numpy({'cmd': 'import duecredit; import numpy as np; print("done123")'}, {'DUECREDIT_ENABLE': 'yes'})
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/contematto/anaconda/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/Users/contematto/github/duecredit/duecredit/tests/test_api.py", line 107, in check_noincorrect_import_if_no_lxml_numpy
    assert_in('For formatted output we need citeproc', out)
AssertionError: 'For formatted output we need citeproc' not found in u"done123\n\nDueCredit Report:\n- Scientific tools library / scipy (v 0.14) [1]\n  - Single linkage hierarchical clustering / scipy.cluster.hierarchy:linkage (v 0.14) [2]\n\n1 packages cited\n0 modules cited\n1 functions cited\n\nReferences\n----------\n\n[1] Jones, E. et al., 2001. SciPy: Open source scientific tools for Python.\n[2] 2016-05-07 19:23:58,362 [WARNING] DueCredit internal failure while running <function _dump_collector_summary at 0x102491a28>: UnicodeEncodeError('ascii', u'Sibson, R., 1973. SLINK: an optimally efficient algorithm for the single-link cluster method. The Computer Journal, 16(1), pp.30\\u201334.', 128, 129, 'ordinal not in range(128)'). Please report to developers at https://github.com/duecredit/duecredit/issues (utils.py:76)\n"

======================================================================
FAIL: duecredit.tests.test_api.test_noincorrect_import_if_no_lxml_numpy({'script': '/Users/contematto/github/duecredit/duecredit/tests/envs/stubbed/script.py'}, {'DUECREDIT_ENABLE': 'yes'})
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/contematto/anaconda/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/Users/contematto/github/duecredit/duecredit/tests/test_api.py", line 107, in check_noincorrect_import_if_no_lxml_numpy
    assert_in('For formatted output we need citeproc', out)
AssertionError: 'For formatted output we need citeproc' not found in u"done123\n\nDueCredit Report:\n- Scientific tools library / scipy (v 0.14) [1]\n  - Single linkage hierarchical clustering / scipy.cluster.hierarchy:linkage (v 0.14) [2]\n- Machine Learning library / sklearn (v 0.15.2) [3]\n\n2 packages cited\n0 modules cited\n1 functions cited\n\nReferences\n----------\n\n[1] Jones, E. et al., 2001. SciPy: Open source scientific tools for Python.\n[2] 2016-05-07 19:23:59,233 [WARNING] DueCredit internal failure while running <function _dump_collector_summary at 0x101497c08>: UnicodeEncodeError('ascii', u'Sibson, R., 1973. SLINK: an optimally efficient algorithm for the single-link cluster method. The Computer Journal, 16(1), pp.30\\u201334.', 128, 129, 'ordinal not in range(128)'). Please report to developers at https://github.com/duecredit/duecredit/issues (utils.py:76)\n"

would not output citations for objects with e.g. 'edu' tag if there is no 'edu' citations for that module

the reason is in dump that it loops through the modules after filtering by tags, so no package/module references left, thus has no chance to match (there is a filter later on) those 'edu' references. Here is a sample .duecredit.p: http://www.onerussian.com/tmp/.duecredit.p_func_cite_no_module_edu
just run

$> DUECREDIT_REPORT_TAGS=edu duecredit summary

DueCredit Report:

0 packages cited
0 modules cited
1 functions cited%      

(note seems to be lacking trailing new line)

Syntax error in test_utils.py with python 2.7.6

We didn't catch this I believe because travis skips it.

uecredit.tests.test_injections.test_injector_del ... ok
duecredit.tests.test_io.test_import_doi ... ok
duecredit.tests.test_io.test_pickleoutput ... ok
duecredit.tests.test_io.test_text_output ... ok
duecredit.tests.test_io.test_text_output_dump_formatting ... ok
Failure: SyntaxError (unqualified exec is not allowed in function '_test_external' it is a nested function (test_utils.py, line 57)) ... ERROR

======================================================================
ERROR: Failure: SyntaxError (unqualified exec is not allowed in function '_test_external' it is a nested function (test_utils.py, line 57))
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/contematto/virtualenv/duecredit/lib/python2.7/site-packages/nose/loader.py", line 418, in loadTestsFromName
    addr.filename, addr.module)
  File "/Users/contematto/virtualenv/duecredit/lib/python2.7/site-packages/nose/importer.py", line 47, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/Users/contematto/virtualenv/duecredit/lib/python2.7/site-packages/nose/importer.py", line 94, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
SyntaxError: unqualified exec is not allowed in function '_test_external' it is a nested function (test_utils.py, line 57)

----------------------------------------------------------------------
Ran 37 tests in 7.152s

FAILED (errors=1)

FOI: related package -- usagestats

see https://pypi.python.org/pypi/usagestats

This package is meant to easily get usage statistics from the users of your program.
Statistics will be collected but won’t be uploaded until the user opts in. A message will be printed on stderr asking the user to explicitely opt in or opt out.

could be of use one way or another when implementing centralized service (#82)

support referencing by pubmed?

it seems that there is a web interface: e.g.

$> wget -q -O- http://www.bioinformatics.org/texmed/cgi-bin/list.cgi?PMID=17255514
<!DOCTYPE html
    PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">
<head>
<title>TeXMed Articel List</title>
<link rel="stylesheet" type="text/css" href="../texmed.css" />
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
</head>
<body bgcolor="white" text="black">
<h1>Exported References will appear here ...
<p></h1>trying to export 1 references ...
<PRE>
% 17255514 
@Article{pmid17255514,
   Author="Kicheva, A.  and Pantazis, P.  and Bollenbach, T.  and Kalaidzidis, Y.  and Bittig, T.  and Julicher, F.  and Gonzalez-Gaitan, M. ",
   Title="{{K}inetics of morphogen gradient formation}",
   Journal="Science",
   Year="2007",
   Volume="315",
   Number="5811",
   Pages="521--525",
   Month="Jan"
}

</PRE>

</body>
</html>%  

so we could parse it out and extract bibtex entries

When injecting multiple citations at the same point, only one referenced

See

(duecredit)contematto@talete ~/github/scikit-learn/examples/cluster (master*) $ python -m duecredit plot_cluster_comparison.py

< ...> 

DueCredit Report:
- scipy (v 0.14) [1]
- sklearn (v 0.17.dev0) [2]
  - sklearn.cluster.affinity_propagation_ (v 0.17.dev0) [3]
  - sklearn.cluster.dbscan_:dbscan (v 0.17.dev0) [4]
  - sklearn.cluster.spectral:spectral_clustering (v 0.17.dev0) [5]

2 packages cited
1 modules cited
2 functions cited

References
----------

[1] Jones, E. et al., 2001. SciPy: Open source scientific tools for Python.
[2] Pedregosa, F. et al., 2011. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12, pp.2825–2830.
[3] Frey, B.J. & Dueck, D., 2007. Clustering by Passing Messages Between Data Points. Science, 315(5814), pp.972–976.
[4] Ester, M. et al., 1996. A density-based algorithm for discovering clusters in large spatial databases with noise.. In Kdd. pp. 226–231.
[5] von Luxburg, U., 2007. A tutorial on spectral clustering. Stat Comput, 17(4), pp.395–416.

yet in mod_sklearn.py

 # sklearn.cluster.spectral
    injector.add('sklearn.cluster.spectral', 'discretize', Doi('10.1109/ICCV.2003.1238361'),
                 description="Multiclass spectral clustering", tags=['reference'])
    injector.add('sklearn.cluster.spectral', 'spectral_clustering', Doi('10.1109/34.868688'),
                 description="Spectral clustering", tags=['implementation'])
    injector.add('sklearn.cluster.spectral', 'spectral_clustering', Doi('10.1007/s11222-007-9033-z'),
                 description="Spectral clustering", tags=['implementation'])

possible bug (race condition) in injector's __import__ handling

see e.g. https://travis-ci.org/duecredit/duecredit/jobs/74501423#L261 failure. Was "resolved" by next identical in functionality push

  File "/home/travis/virtualenv/python2.7.9/lib/python2.7/site-packages/nose/result.py", line 182, in _exc_info_to_string
    from nose.plugins.skip import SkipTest
  File "/home/travis/build/duecredit/duecredit/duecredit/injections/injector.py", line 228, in __import
    return self._orig_import(name, *args, **kwargs)

TypeError: 'NoneType' object is not callable

For some reason reports difficulty processing .bib in python -O mode

$> rm .duecredit.p; DUECREDIT_ALLOW_FAIL=1 DUECREDIT_ENABLE=True venv/bin/python -O examples/example_scipy.py 
I: Simulating 4 blobs
I: Done clustering 4 blobs
DueCredit Report:
- scipy (v 0.14.1) [1]
  - scipy.cluster.hierarchy:linkage (Single linkage hierarchical clustering) [2]
- numpy (v 1.8.2) [3]

2 modules cited
1 functions cited

References
----------

[1] Jones, E. et al., 2001. SciPy: Open source scientific tools for Python.
[2] Sibson, R., 1973. SLINK: an optimally efficient algorithm for the single-link
        cluster method. The Computer Journal, 16(1), pp.30–34.2015-07-28 09:56:06,625 [ERROR  ] Failed to process BibTeX file /home/yoh/.tmp/tmpo5X208.bib (io.py:196)

[3] ERRORED: u'\\'

support reference / citation for the system

e.g. when using a system which was supported by a grant. So we should load some /etc/duecredit/citations/ and include them in every report
We would probably need a new type of entry (Grant (institution, award) or just a generic Miscellaneous or Text which would simply state in a free text form).

Use pandoc to generate output?

I don't know if we want to add pandoc as a dependency, but we could store all the citations as bibtex, then make a markdown output with stats and citations, then use pandoc to generate any output format.

Refactoring of the Injector code

Now that we have established plausibility for the approach and collected a number of unit-tests, with some peculiar use-cases tested, might be a good time to RF injector code to make it less ad-hoc and thus more robust

Travis skips tests

Is this what we want?

1.92s$ nosetests --with-doctest --with-cov --cover-package duecredit --logging-level=INFO
...................SS..................S.SSSSSS

Provide "Matlab support"

this issue is a stub for discussing support of duecredit in Matlab. Should be quite doable (may be without fancy decorators and injections, just collect and store to e.g. json or any other format for exchange with python "core"), and some folks (e.g. Arnaud Delorme of EEGLAB, @nno for CoSMoMVPA?) seems have liked the idea.

Needs to delay injections for not yet imported submodules until their import time

Hi,

I am getting a warning message when using duecredit with nipype. Please check below.
Might be the way you are checking for the existence of submodules? Could you point where in the duecredit code this is?

2016-05-04 16:14:32,911 [WARNING] Could not find fsl in module <module 'nipype.interfaces' from '/home/alexandre/Software/nipype/nipype/interfaces/__init__.py'>: module 'nipype.interfaces' has no attribute 'fsl' (injector.py:193)
160504-16:14:32,911 duecredit WARNING:
     Could not find fsl in module <module 'nipype.interfaces' from '/home/alexandre/Software/nipype/nipype/interfaces/__init__.py'>: module 'nipype.interfaces' has no attribute 'fsl'
2016-05-04 16:14:32,911 [WARNING] Could not find spm in module <module 'nipype.interfaces' from '/home/alexandre/Software/nipype/nipype/interfaces/__init__.py'>: module 'nipype.interfaces' has no attribute 'spm' (injector.py:193)
160504-16:14:32,911 duecredit WARNING:
     Could not find spm in module <module 'nipype.interfaces' from '/home/alexandre/Software/nipype/nipype/interfaces/__init__.py'>: module 'nipype.interfaces' has no attribute 'spm'

Thanks!

edited by @yarikoptic:
Original title: Warning when looking for nipype submodules
The "bug": now injections are happening when top level module is imported, but then it might be too early for submodules which are not imported yet at that point. So we need to keep those injections ready for when actual submodule gets imported

Make "conditions" even more powerful

To e.g. check on some attributes of the Xth argument, e.g. to check args[0].params.algorithm, where args[0] would correspond to self. Probably just needs an 'eval' or a chain of getattrs

while "dump"ing -- references shouldn't be duplicated even if used in multiple modules

Currently if running mvpa2 unittests, it seems to be even "worse" -- absent agreement among numbers on number of functions cited, entries indices missed but then the same one listed multiple times etc. The pickled collector is http://www.onerussian.com/tmp/.duecredit-mvpa2tests.p

DueCredit Report:
- numpy (v 1.8.2) [1]
- mvpa2 (v 2.4) [2]
  - mvpa2.featsel.rfe:_train (Recursive feature elimination procedure) [3]
  - mvpa2.algorithms.group_clusterthr:_train (Statistical assessment of (searchlight) MVPA results) [5]
  - mvpa2.clfs.transerror:_call (Bayesian hypothesis testing) [7]

2 modules cited
2 functions cited

References
----------

[1] Van Der Walt, S., Colbert, S.C. & Varoquaux, G., 2011. The NumPy array: a structure for efficient numerical computation. Computing in Science & Engineering, 13(2), pp.22–30.
[2] Hanke, M. et al., 2009. PyMVPA: a Python Toolbox for Multivariate Pattern Analysis of fMRI Data. Neuroinform, 7(1), pp.37–53.
[3] Guyon, I. et al., 2002. Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning, 46, pp.389–422.
[4] Stelzer, J., Chen, Y. & Turner, R., 2013. Statistical inference and multiple testing correction in classification-based multi-voxel pattern analysis (MVPA): Random permutations and cluster size control. NeuroImage, 65, pp.69–82.
[5] Guyon, I. et al., 2002. Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning, 46, pp.389–422.
[6] Stelzer, J., Chen, Y. & Turner, R., 2013. Statistical inference and multiple testing correction in classification-based multi-voxel pattern analysis (MVPA): Random permutations and cluster size control. NeuroImage, 65, pp.69–82.
[7] Olivetti, E., Veeramachaneni, S. & Nowakowska, E., 2012. Bayesian hypothesis testing for pattern discrimination in brain decoding. Pattern Recognition, 45(6), pp.2075–2084.
DUECREDIT_ENABLE=1 /home/yoh/proj/duecredit/venv-mvpa2-duecredited/bin/python  1985.15s user 45.35s system 122% cpu 27:42.04 total

DUECREDIT_ENABLE doesn't work anymore

Is this just replaced by python -m duecredit?

(duecredit)contematto@talete ~/github/duecredit/examples (master*) $ DUECREDIT_ENABLE=True python example_scipy.py
I: Simulating 4 blobs
I: Done clustering 4 blobs

(duecredit)contematto@talete ~/github/duecredit/examples (master*) $ python -m duecredit example_scipy.py
I: Simulating 4 blobs
I: Done clustering 4 blobs
DueCredit Report:
- scipy (v 0.14) [1]
  - scipy.cluster.hierarchy:linkage (v 0.14) [2]
- sklearn (v 0.17.dev0) [3]

2 packages cited
0 modules cited
1 functions cited

References
----------

[1] Jones, E. et al., 2001. SciPy: Open source scientific tools for Python.
[2] Sibson, R., 1973. SLINK: an optimally efficient algorithm for the single-link cluster method. The Computer Journal, 16(1), pp.30–34.
[3] Pedregosa, F. et al., 2011. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12, pp.2825–2830.

Centralized service (on datalad.org) to collect and collate usage stats

will be useful for any methods/software developer.

  • Unlike referencing only by tracking publications, which some times take years to publish, it will be more rapid, and up-to-date
  • Would provide stats on the environment (e.g. OS used) which some developers really would like to know
  • Would allow for "self-enrichment", i.e. to tag/provide prototypical/example/demo references for where any given methodology was used; and also track publicly available "implementations" of methodologies (not possible also by merely looking at citations in papers)
  • Should be completely voluntary, like debian popcon submissions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.