broadinstitute / barclay Goto Github PK

View Code? Open in Web Editor NEW

8.0 34.0 6.0 1.11 MB

Command line argument parser and online documentation generation utilities for java command line programs.

License: BSD 3-Clause "New" or "Revised" License

Java 69.19% HTML 7.67% FreeMarker 7.30% Shell 4.74% WDL 11.10%

argument-parser argument-parsing documentation-tool javadoc

barclay's Introduction

Barclay

Barclay is a set of classes for annotating, parsing, validating, and generating documentation for command line options.

##Requirements

Java 17
Gradle 7.4.2 or greater. We recommend using the ./gradlew script which will download and use an appropriate gradle version automatically.

barclay's People

Contributors

Stargazers

Watchers

Forkers

lbergelson bioinformagik mikeyhuang markjschreiber sreekanth370 rhowe

barclay's Issues

Is it possible to create a custom index for each super-category?

I'm in the situation where I need an different index for each component/super-category (e.g., one index for the tools and one index for the utilities). It looks that Barclay only have support for (correct me if I'm wrong):

An index template for all the documentation. This is only used once by FreeMarker for a single index.
A generic template for each component. This is parsed for each @DocumentedFeature, and thus could use some macro to switch the format for each component.

It will be useful to be able to parse the i general index template (or to have several index templates) to have an index for each super-category. If there is any way to do this in the current implementation, I'm sorry to rise this issue and I would appreciate if it could be documented in the Wiki.

Thank you very much in advance.

Barclay snapshots not being published to Artifactory

Looking at Artifactory, it seems like snapshots of Barclay aren't being published to it anymore.

@Hidden, @Advanced annotations are not handled by command line parser

I don't see any handling of annotations for arguments:

@Hidden: I expect that it hides the option from help, but not for the argument parsing.
@Advance: I expect this option to appear under an especial category pointing out that they should be used with caution. I think that another interesting way for this advance options could be an argument in SpecialArgumentsCollection to show the "advance" help.

Should max/min values be Comparable<> instead of double?

I noticed that the min/max value in the @Argument are all specified as double. Would it make more sense to take Comparable<?> since that would allow any sort of ordered input to be bounded?

Request: accept mutex boolean arguments to be provided if they are complementary

For instance, in a command line tool with the following arguments:

...
@Argument(fullName="arg1", mutex = {"arg2"}, optional = true)
public Boolean arg1 = false;
@Argument(fullName="arg2", mutex = {"arg1"}, optional = true)
public Boolean arg2 = false;
...

Will be useful to be able to provide the following to the command line and don't blow up:

--arg1 false --arg2 false
--arg1 true --arg2 false
--arg1 false --arg2 true

And only throw an exception if --arg1 true --arg2 true.

Allow a different output extension for the index file

I'm in the case in which I would like to have a different format for the index than the rest of components for use with jekyll: 1) index as an yml data file for iterate over different components, and 2) markdown to render for each component.

It will be nice to have a new option for the index extension, which by default should be the same as the component extension. Someone have any objection to this?

Discussion: how to deal with javadoc-link tags

A javadoc could have several the @see and the @link tags to point to specific classes/method. Because the javadoc is used for developer consumption and in the case of Barclay for help-pages generation, this introduces problems when writing the javadoc thinking in both use cases. This is an example class to evaluate the possible problems:

/**
 * Description in javadoc. 
 * This class may be related with {@link SecondDocumentedFeature}, so check its documentation.
 *
 * @see ThirdDocumentedFeature
 * @see FourthDocumentedFeature
 * @see UndocumentedFeature
 */
@DocumentedFeature(extraDocs = FourthDocumentedFeature.class)
public class FirstDocumentedFeature {
    ...
}

The problems that may arise from this class in the help pages are the following:

There is no way to access the SecondDocumentedFeature or the ThirdDocumentedFeature while parsing the FreeMarker templates. Even if they are documented and they have a link in the help pages. It will be ideal to populate this classes into the extraDocs.
In the case that the @see classes are populated, there will be a clash in the FourthDocumentedFeature; in addition, the UndocumentedFeature does not have any entry in the help pages.
The populated description from the javadoc is "Description in javadoc. This class may be related with {@link SecondDocumentedFeature}, so check its documentation.", which is confusing for a non-programer user. It will be ideal to parse the javadoc with Barclay and substitute this tag by the url.
In the case that the @link tag is parsed: should the url be formatted as HTML or Markdown? What will happen with @link tags for non-documented features?

My suggestions, in order of preference, are the following:

Populate @see and @link tags into the extraDocs, not allowing classes with the same name (this is already constrained for tool names). Then, the parsing on in-line tags will be done by FreeMarker using a custom macro (I would like to have a macro file in Barclay containing this and other "common" functionality).
Remove all in-line tags by Barclay on output and do not take into account the @see tag. This will require that the FreeMarker template look for matching strings with the extraDocs and apply the link. If someone wants the @see tag, they could use a custom binding. This will keep the developer/user help completely separate, but it will complicate things in the template.
Parse the @link tags with Barclay and set the URL for documented features either as HTML or Markdown, by setting an option. This introduces a constraint in template outputs, because they will be expected to be encoded as HTML or Markdown.

CommandLineArgumentParser ArgumentDefinition class should be factored out

Its currently an inner class, but it should be properly exposed for access by the doc gen code.

Support multiple argument options (--input a b c)

This would enable the use of shell expansion. There is a discussion here. Its thought jopt-simple didn't support this, at least for now, but @lbergelson seems to think it does.

Allow usage of other tags and not only inline in DefaultDocWorkUnitHandler

As an example:

/**
* This is the javadoc for the description.
* 
* {@MyTag.test1 this is test1 tag}
*
* @MyTag.test2 this is test2 tag
*/
@DocumentedFeature
public class TestClass {}

If the custom tag prefix is MyTag, the final json will pass the test1 tag but not the test2 tag because DefaultDocWorkUnitHandler.addCustomBindings only uses the ClassDoc.inlineTags(). Using ClassDoc.tags() instead will allow to output test2, with the advantage of showing it in a normal javadoc task.

Default plugins that have command line args should be represented by a link in the doc

If tool includes a default plugin (i.e. a default read filter such as ReadLengthReadFilter) that itself has command line args that can be set by the user, those arguments should be included in the arguments for the tool, either inline or maybe via a link. We'll need to figure out what that should look like in the doc output.

Port the change to use replace rather than append for collection arguments

Change should be made in the CommandLineParser, but not the legacy parser. The change was originally made here: https://github.com/broadinstitute/gatk/pull/2275/files.

Issue a helpful message when CommandLineArgumentParser sees old-style command line syntax

Embrace use of the Optional type for optional arguments

There are some good arguments/suggestions here and here.

@ArgumentCollection should accept a disambiguation prefix

i.e.,

@ArgumentCollection(prefix="input")
public IntervalArgumentCollection inputIntervals = ...;
...
@ArgumentCollection(prefix="target")
public IntervalArgumentCollection inputIntervals = ...;

From broadinstitute/gatk#2582. We'd also need to consider any implications for docgen and for the Picard parser.

Port Picard PR: Show HTSJDK version in stderr when running

broadinstitute/picard#704

Fully Integrate @Deprecated in the CLI/help handling

The current DefaultDocWorkUnitHandler is handling the @Deprecated annotation for arguments differently as other kind of arguments. I think that it is a good idea, but also it will be cool to integrate this behaviour in other parts of the code:

Deprecated arguments separated in the CLI from required/optional/etc
Add a sort note about the deprecation in the CLI (similar with the @BetaFeature) to allow just annotating with it and automatically show in the cli-help
The current json does not contain a marker about the deprecation (the same for beta) status. Perhaps it will be useful to add also a "type" entry, empty for normal features, and "deprecated" / "beta" for other cases. In addition, it could also include a description for deprecation through the @deprecated javadoc tag. This will be useful for online pages.
Handle the deprecated tag also @DocumentedFeaturein the doclet code, not only with the arguments.

Does this make sense for you? I think that it is a good addition...

Request: output doc for ArgumentCollection

Argument collection could set documentation for all the options under the collection. I expect this to be printed out in the command line.

Document the structure of the Freemarker property map generated by Barlcay

We should document the structure (property names, types, and values) of the Freemarker property map that we generate for both the index and workunit templates. This is partially and informally done here, but needs to be completed and maintained as it evolves.

Add support for argument tagging to barclay

Currently in GATK4 we have some support for tagging arguments, but the tagging is done on the value side, and manually by the engine. Eg.,

-V myTag:my.vcf

This causes many issues: parsing ambiguity with URIs, interfering with shell auto-expansion, etc.

Let's add native tagging support to barclay, so that the tag can be on the argument side rather than the value side. Eg.,

-I:tumor my.bam

We also need to support arbitrary key/value pairs after the tag. Eg.,

-I:tumor,key=value,key2=value2

I think that the way to do this is to introduce a new TaggedArgument interface in barclay with methods to get/set both the tag and the optional key/value pairs. Eg.,

void setTag(String)
String getTag()
void setTagAttributes(Map<String, String> attributes)
Map<String, String> getTagAttributes()

The upcoming URI class in GATK will implement this interface. When barclay encounters an @Argument-annotated field of a type that implements the TaggedArgument interface, it should parse the tags and inject them into the object instance for that field via the setter methods.

Propose to remove CommandLineProgramProperties.omitFromCommandLine

This doesn't appear to be used employed anywhere other than in Picard documentation/test code that is no longer necessary, and one reference in PicardCommandLineProgram.extractCommandLineProgram.

usageExample field in CommandLineProgramProperties isn't reflected in help/doc output

This field was present in GATK4, but it was never integrated with the CLP. Since its now also in Barclay, it should be reflected in the help/doc gen output.

Docgen assumes DocumentedFeatures have no-arg constructors

Most documented features are command line programs, which have no-arg constructors, but some are not (i.e., TableReader), and so may not have a no-arg constructor. We need to relax the assumption and only instantiate classes that are command line programs.

JSON output file names should not include the output file extension

For example, if the output extension is "html", the JSON files are currently called "workunit.html.json". This behavior was carried over from GATK3, but is unnecessary and a little misleading. The work unit output format and extension are independent of JSON, so the JSON file should just be called "workunit.json".

Design index template(s)

I'm starting to plan how we'll make the tooldocs available on the GATK website, and specifically what organization layouts we want to offer to users. There are three main patterns that people tend to follow when looking for docs, which would be best served by offering separate index pages:

full alphabetical list of all tools
subset by top-level package (in the GATK world, core vs protected vs Picard), then alphabetical
functional breakdown (QC vs bam processing vs variant discovery etc)

Note that this would only apply to tools, proper -- read filters, annotation modules, metrics collections (in Picard) etc still make sense to categorize separately in any case.

That being said, I need to think a bit more about the UX side of things before implementing anything. TBC. Comments welcome.

Plugin descriptor arguments should include a link to their group in the index

For example, for the read filter plugin descriptor, instead of explicitly listing every plugin instance/read filter under allowed values, we should include a link to the ReadFilters group in the index. There are too many allowed values to keep these inline in the doc.

Update Barclay for other file output types (i.e. bash tab-completion, WDL, etc.)

In order to support generating tab-completion files for command-line usage and WDL files, some minor changes will need to be made to Barclay - specifically how it prepares data for ingestion by FreeMarker.

A good test will include some new templates and tests to prove that the output is what is expected.

Add a switch for selecting append or replace behavior for collection arguments

Argument files, .list files, nulls, tagged args are not discoverable or reflected in usage/doc

-list files/comments in list files
-collection argument files/comments (#28)
-using "null" to reset collection arguments
-argument tagging rules (#33)

Its probably too much to include in usage output every time. We may want to add a special command line option to display this kind of help, or just output a link to an online resource describing these things.

Proposal: limit the allowed values of an enum argument

The same as with min/max values, it could be interesting to limit the allowed values of an enum. An example usage case is the one described here: samtools/htsjdk#792

Implement the generic ability for Collection arguments to be provided via a file

For any argument of a Collection type, we want the ability to provide a .list file containing the literal values for the argument, instead of having to provide all literal values on the command line.

So, if an argument is of a Collection type, and the token from the command line ends in .list, treat the token as a file name, load it, and unpack all lines in the file into the Collection.

Request: recover ArgumentCollectionDefinition and add functionality

I know that ArgumentCollectionDefinition was removed in the initial port of the command line parsing, but I think that it could include some functionality for parsing the argument collection. Sometimes the argument collections have complex dependencies between parameters that are not always easy to add to the Argument annotation. For example, two numeric values should be within a range, and they could use the max/min fields in Argument; but if this ranges depends on the other value, this is not possible to reflect here.

I suggest that the ArgumentCollectionDefinition class may be used to improve this adding a method to validate the arguments in more complex ways. In addition, it allows to make the collections serializable. This will allow to use this definitions with default validation instead of implement a validation method that should be called within the tools.

Nevertheless, I think that the ArgumentCollectionDefinition should not be mandatory for argument collections. For example, one nice thing of handling the ArgumentCollection without this limitation in GATK is that a ReadFilter with arguments could be used as an ArgumentCollection to set this arguments even if the tool does not have the plugin.

Consider writing a gradle plugin that adds documentation tasks

It would be convenient to have a gradle plugin that defines barclay documentation tasks.

Support grouping arguments by class in the argument container hierarchy

Based on a discussion with @droazen and @vdauwera, we want to replace the "common" attribute with a mechanism that allows us to group arguments by where they're located in the argument container's hierarchy. So if we have class hierarchy like this:

CommandLineProgram
GATKTool
ReadWalker
PrintReads

we can group the arguments by where they came from in the hierarchy. We'll need some way to add a doc string to each class to display as the group heading in the output.

Bug: null numeric unbounded arguments try thrown IllegalArgumentException

When unbounded arguments have a null value, the command line parser try to thrown an CommandLineException.OutOfRangeArgumentValue but because there are no boundaries it throws an IllegalArgumentException.

HelpDoclet cannot be used directly

Although the javadoc states that this class can be used to generate documentation directly, that's not the case because if used in a gradle task (e.g. onlineDoc) it complains with the following error:

:compileJava UP-TO-DATE
:processResources UP-TO-DATE
:classes UP-TO-DATE
:cleanOnlineDoc
:onlineDoc
javadoc: error - Doclet class org.broadinstitute.barclay.help.HelpDoclet does not contain a start method
1 error
:onlineDoc FAILED

It seems that it is because it requires a public static method called start, as in the GATK Doctlet or in the Picard one. I'm planing to have my own doctlet, but while testing how this work it will be nice to be able to use the minimal implementation...

Use markdown in place of embedded html for doc

There is a fair amount of discussion in this thread

Test docgen code at a granular level

From @cmnbroad (#68 (comment)):

One high-level request is that we find a way to incrementally unit-test new functionality like this at a granular level. We currently only have coarse-grained, file-based integration tests, and I'd like to find a way to avoid proliferating a new set of test files with each new feature, in addition to changing many/all of the existing ones. @magicDGS any thoughts on how we can address that ?

Handle CommandLineProgramProperties.usageExample in help paths

Currently the usageExamplemethod is not used anywhere. I suggest the following support:

Add to the tool CLI help to show an example
Populate the String to the JSON and/or FreeMarker properties to use in the doclet

In addition, it will be nice if this method returns an array of String to allow more examples for the usage. For the CLI help only the first one could be printed, and the rest will be useful for the online documentation.

Populate javadoc see tags into the extraDocs

If the @see tag in the javadoc contains a @DocumentedFeature class, it will be nice to populate it into the extraDocs to obtain the urls easily.

Parse javadoc link tag for improving user-readability in doclet

Currently, if any tag is present linking to a different class in the javadoc, this is output as it is. This have two disadvantages: 1) developer javadoc could not include links if they are part of the documented features; 2) the help page does not have access to the linked class URL if that one is documented.

I have some suggestions to solve this problem:

Populate this classes into the extraDoc. The user could use a macro in FreeMarker to substitute the {@link ClassName} pattern by the extraDoc URL or just remove the tag if not present.
Parse the javadoc internally to add the URL as an HTML or Markdown formatted String. This is not the optimal solution, because it adds complexity to Barclay and it is less flexible.

I prefer the first option, and providing a file with macros for parsing this kind of information to be included by the user.

Could docgen generate a json for the index?

The current implementation of Barclay docgen code is to generate an index.html and feature-specific files (class_name.html and class_name.html.json).

Is it a possibility to generate a common JSON file used by the index in the current framework? It will be useful for generate other pages sharing information such as the version...

barclay should ALWAYS call setAccessible(true) when retrieving a Field object

Whenever barclay gets a Field object, it should call setAccessible(true) on it as a protective measure so that we don't have to worry about downstream permissions issues (many fields that are annotated as arguments are marked as private)

change usage methods from void usage(printStream) to String usage()

Having the usage methods be void methods that take a PrintStream seems kind of backwards to me. Maybe now would be a good time to update them to be functions that return a String instead of having side effects?

Failure accessing private fields with javadoc.

When traversing through class hierarchies trying to resolve arguments, docgen checks each field ti encounters to see if its annotated as an argument collection, and if so, recursively reflects on the field's type. If the type has a private field that has javadoc, getDeclaredFields doesn't return the field, but javadoc includes it in the list of FieldDocs. The code needs to be tolerant of that case during field traversal. See broadinstitute/gatk-protected#1048.

Does barclay support tagging arguments as incompatible?

Let's say we have a tool that accepts two arguments, each is fine separately but they can't be provided together because the corresponding functionality is incompatible. Is there a way I can tag these arguments? How and what exactly happens if I do invoke them both?

CLP doesn't distinguish between programmatically set value and user-specified value

Because they cannot be null, they are always treated as optional by the CLP.

Write a full README with barclay usage + features

Should include features like list file support, etc. as well as a general tutorial on how to annotate arguments, etc.

mutex behaves inconsistently with collection arguments and optional=false

Mutex arguments that aren't collections allow both arguments to be marked as optional=false. This is treated as "exactly of these arguments is required" However collection arguments don't behave the same way. A scalar and collection argument where both are mutually exclusive to each other and both are optional=false will fail if the collection argument is not specified.

Duplicate argument definition throws exception, but that exception is not given to the user.

lichtens@OncobuntuMk3:~/IdeaProjects/hellbender-protected$ java -jar build/libs/gatk-protected.jar GetHetCoverageLocusWalker
lichtens@OncobuntuMk3:~/IdeaProjects/hellbender-protected$