Giter Club home page Giter Club logo

Comments (9)

nno avatar nno commented on July 22, 2024

I like the idea. CoSMoMVPA does something like this already, but in a toolbox-specific manner. in particular, the function cosmo_check_external has internally a list of references. When using cosmo_check_external('afni'), for example, it keeps track that the AFNI matlab library has been used. Then the used external toolboxes can be listed using cosmo_check_external('-cite'). For an example, see the very end of http://cosmomvpa.org/_static/publish/demo_surface_tfce.html

However, in CoSMoMVPA case, the code and the citations are in the same file, which is not very generalisable. It would probably be better to separate the code keeping track of citations from the bibliography information. That doesn't seem difficult though.

Matlab does not have decorators, but it can just call a function (say duecredit_cite) that has an persistent variable keeping track of which items have been cited. Or we could use object-oriented style, preferably 'old' style to be Octave-compatible.

Another challenge may be supporting bibtex or other bibliography formats, as I am not aware of a bibtex parser.

from duecredit.

yarikoptic avatar yarikoptic commented on July 22, 2024

sweet, well done Nick ;)
I think a function (duecredit_cite) should suffice. I am scared of thinking OOP in Matlab/Octave but if you know some compatible way which will withstand the test of time -- sure ;)
Also I wonder how easy would be to enable/disable (nothing to be done at all by duecredit_cite) the mode observing "DUECREDIT_ENABLE" env variable?
Supporting formats could be left to the "duecredit core" which is what we are writing here, and which will support all that madness. So Matlab side would just need to collect those references (plain strings for bibtex or doi or url), save into some file (e.g. .duecredit.mat) which we would load in duecredit, process and add to the global list of citations.

If you are to cook up 'duecredit_cite.m' look at "API" we have so far on Python side: https://github.com/duecredit/duecredit/blob/master/duecredit/collector.py#L30
i.e. would be nice to have/support following arguments: (gy gy -- spotted a typo in inoked -- will fix)

        entry: str or DueCreditEntry
          The entry to use, either identified by its id or a new one (to be added)
        description: str, optional
          Description of what this functionality provides
        path: str, optional
          Path to the object which this citation associated with.  Format is
          "module[.submodules][:[class.]method]", i.e. ":" is used to separate module
          path from the path within the module.
        version: str or tuple, version
          Version of the beast (e.g. of the module) where applicable
        tags: list of str, optional
          Add tags for the reference for this method.  Some tags have associated
          semantics in duecredit, e.g.
          - "implementation" [default] tag describes as an implementation of the cited
             method
          - "reference" tag describes as the original implementation of
            the cited method
          - "use" tag points to publications demonstrating a worthwhile noting use
             the method
          - "edu" references to tutorials, textbooks and other materials useful to learn
            more
          - "cite-on-import" for a module citation would make that module citeable even
            without internal duecredited functionality inoked.  Should be used only for
            core packages whenever it is reasonable to assume that its import constitute
            its use (e.g. numpy)

from duecredit.

yarikoptic avatar yarikoptic commented on July 22, 2024

@nno btw, comments/recommendations/etc on API is also very welcome -- may be we haven't foresaw additional use-cases which couldn't be covered with such setup.

from duecredit.

nno avatar nno commented on July 22, 2024

I think a function (duecredit_cite) should suffice

Note that Matlab / Octave do not have direct import functionality. Thus, functions using duecredit would have to call that function directly. But since it may not be present on other machines, every time it is to be used it should be surrounded by try / catch, e.g.:

try
    duecredit_cite('GNU Octave')
catch
    % do nothing
end

which is not very elegant...

I am scared of thinking OOP in Matlab/Octave but if you know some compatible way which will withstand the test of time -- sure ;)

OOP will be difficult, particular if this is to be Octave compatible. Octave only supports 'old-style' OOP, which means syntax like this:

duecredit = cite(duecredit, 'GNU Octave')

It seems that a function using a persistent variable is easiest.

Also I wonder how easy would be to enable/disable (nothing to be done at all by duecredit_cite) the mode observing "DUECREDIT_ENABLE" env variable?

That should be straightforward, as matlab has a getenv function.

Supporting formats could be left to the "duecredit core" which is what we are writing here, and which will support all that madness. So Matlab side would just need to collect those references (plain strings for bibtex or doi or url), save into some file (e.g. .duecredit.mat) which we would load in duecredit, process and add to the global list of citations.

Does that mean that duecredit for Matlab/ Octave would always require that Python and the Python code for duecredit is available? Many users of Matlab run on MS Windows and are unlikely to have that available, or may not be willing to invest the time setting that up... Alternatively, a pure Matlab / Octave implementation may be an option, but that involves a lot of code and effort duplication.

Also, I looked into the injector functionality. That's nice, but more difficult to achieve on Matlab / Octave. The only possible way I could see to make this work is have subdirectories for each package that have the same name as functions called by the respective package (such as ft_defaults for FieldTrip). Upon duecredit initalization, these subdirectories are added to the top of the search path, overriding the original function. Upon the first call of such a function, duecredit_cite is called and the directory removed from the search path. However, such a solution is not very scalable (certain toolboxes have many functions that may be called), and also not elegant as it involves run-time modifications to the search path. It also does not work if a toolbox function is called from the toolbox directory itself, as the current directory has higher precedence than anything in the search path.
Thus, any ideas on how injector functionality can be achieved would be appreciated.

from duecredit.

yarikoptic avatar yarikoptic commented on July 22, 2024
  1. The idea is probably to have similar to our stub.py: since it would be trickier to overload name-spaces, I would say that "stock" duecredit matlab/octave module would provide filenames with _ in them, e.g. duecredit_cite_.m and then we will provide the ultimate "stub" module duecredit_cite.m which people would copy to their code-base to carry around and which will have the
try
    duecredit_cite_('GNU Octave')
catch
    % do nothing
end

not sure exactly what to do with different types of citation (safetly catch all for which we provide within the same stub.py - BibTeX, Doi, etc) but I guess we could easily just make duecredit_cite as first argument accept a string which would state what kind of reference next argument is (bibtex, doi, url, etc). So we end up with only 1 file
2. "Does that mean that duecredit for Matlab/ Octave would always require that Python" At the beginning -- I think so. But we will make duecredit available from everywhere possible (we have it on pypi already) -- standalone bundle, conda, etc, may be even we could provide some ugly duecredit_install.m to be shipped along to install a standalone bundle on a given system. Later we might cook up a duecredit.org website, to which folks could upload their citations manually or may be even that datalad_cite.m could get a basic implementation to upload those collected citations. I don't think it is worth reimplementing everything in matlab/octave
3. Injectors -- primary motivation for them is to demonstrate benefit of duecredit this early in its life-time. I hope that eventually projects just adopt duecredit stub/citations within their code base so no injections would be necessary. Indeed messing up with path in matlab would be a cruelty better to avoid. And in Matlab land if e.g. cosmomvpa, eeglab, and few others adopt it -- that hopefully would provide sufficient demo/motivation for others.

from duecredit.

nno avatar nno commented on July 22, 2024

Picking up this thread...

  1. we could help people setting up the duecredit_cite_ command. We could support different types of arguments, e.g.
duecredit_cite('text',['John W. Eaton, David Bateman, Soren Hauberg, Rik Wehbring (2014).'...
                              GNU Octave version 3.8.1 manual <snip>'],...
                     'BiBTeX',[' @book{\n,author    = {John W. Eaton <snip>'])

or as a starting point just use the 'text' version.

Alternatively, for already widely-used packages we could include the citation information directly with due_credit, so that something like

duecredit_cite_('GNU Octave')

would automatically use the correct information that is part of duecredit.

Then, when run from Matlab / Octave, if the users calls

duecredit_cite()

a list is shown of citations, or

duecredit_cite('BiBTeX')

could show in BiBTeX format if available, and in text for packages not provided in BIBTeX.

  1. I think requirement of Python would make adaptation much more difficult in Matlab/Octave land.
  2. Injectors are close to impossible in Octave / Matlab. So to demonstrate the benefit, we would have to convince project leaders to include duecredit in their project. For example, the AFNI Matlab library, Neuroelf, NIFTI libray, GIFTI library, FieldTrip, EEGLab, surfing toolbox.

As a side note: how about provided an update mechanism for duecredit, so that recent citations can added to a users' current duecredit installation? Or is that too invasive? It could be something that users have to allow explicitly.

from duecredit.

yarikoptic avatar yarikoptic commented on July 22, 2024
  1. Probably the best would be to mimic Python's API, i.e. having separate
    • duecredit_cite to add a (single) citation, with tags argument also to describe its nature (imlpementation, reference-implementation, edu, etc), and indeed first argument depicting a 'type' of provided reference ("text", "doi", "BiBTeX")
    • duecredit_summary for the output
  2. "for already widely-used packages we could include the citation information directly with due_credit," Yeap -- we would need something like that ;) and that is something what we already do on Python side (numpy, sklearn, ...) but I guess for octave we would need more automation since, as you have mentioned, injection is not possible. I.e. for some of them to define some 'checkers' which would automagically cite them? (e.g. if duecredit_cite is invoked from within octave -- add octave citation). I think it would be worth introducing the same notion of a "path" as we have in Python implementation to define what that reference belongs to
  3. "requirement of Python". I don't mind if we get also generic nearly full featured Matlab/Octave implementation. But what we should really assure probably is that the "database" is stored in a format which both could I/O -- json? (atm lazy us just dump Python's pickle)
  4. "convincing". indeed... for that we would need a good starting point I guess, e.g. Cosmo... if only there also was some Python module which would have called out to cosmo like nipype does into e.g. glm -- then we could really work out the case across environments. alternative is just e.g. having separate invocations of cosmo analysis script and then pymvpa script... but then collating it all into a single report

from duecredit.

nno avatar nno commented on July 22, 2024
  1. Seperate duecredit_cite and duecredit_summary is fine. They both will have to call some other duecredit helper function to store internal state.

  2. It's difficult to include 'checkers' to see which packages have been used, due to lack of injection support. For Octave itself it is possible, but I don't see how to do it for other packages if they don't call some duecredit_cite themselves.

  3. JSON would be good, I found a free Matlab / Octave toolbox here:

    https://github.com/fangq/jsonlab

  4. I think if we add something that is easy to use by other package developers, then it may be adopted widely. Actually CoSMoMVPA may be a good use case for this to try it out, and also already provides a basic implementation of most functionality required by
    Re storing / keeping track of citations: not sure if that should be

    I would be tempted to have only c (with maybe a as fallback) - any thoughts.

  5. In terms of use case: I assume that in the Matlab / Octave environment, the user only has to install duecredit (add the appropriate directory to the search path), then can run their analysis scripts, and just call duecredit_summary() to get a summary? Or would it be more complicated?

from duecredit.

yarikoptic avatar yarikoptic commented on July 22, 2024
  1. done ;)

  2. yeah -- I saw checkers only as an addition for that limited set of cases, suchas 'environment', which would include e.g. information about infrastructure itself (see #55). So they are invoked ones in a runtime whenever any duecredit_cite command to be invoked

  3. Cool. So that then should be the next one on our table to tackle -- decide on the format. I think we should have smth like .duecredit/citations/ subdirectory where we would collect {octave,python}.json citations, which then would all be picked up by summary command. We also might end up with .duecredit/config which then could be used to override defaults (e.g. format to output stuff in etc)

4.1. ah btw -- to ease adoption... That is why we came up with that https://github.com/duecredit/duecredit/blob/master/duecredit/stub.py which would be a minimal thing to include into projects to provide necessary API but so that if duecredit is not installed -- their stuff still works as usual. I guess in case of the matlab/octave implementation if overloading names would be tricky, stub file could define those proxy duecredit_cite whenever actual, bigger, duecredit module define duecredit_cite_ to be called by those stubbed ones

4.2. In this Python implementation we have separated Entry (Doi, BibTex, etc) from actual Citation. Entries could be loaded from anywhere -- .bib file or specified directly in the code. Then Citation could either consume a new Entry or just a key identifying preloaded one + accept usage tags and description for it. This way a single bibliographic reference could be used for multiple citations in different places, possibly with different description/path/etc. So we kinda supported all a b c I guess ;) But in majority of usecases we seems to use only c atm.

5 Our thought was: by default ducredit should have no impact/affect on anything, and only the run with 'DUECREDIT_ENABLE' environment variable -- it starts tracking. Then 'summary' is independent of that -- it just loads up stored citations, filters by desired tag, and presents them. But may be we could/should enable it by default -- not sure, since with all the injections it does have some measurable impact on imports time ATM.

from duecredit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.