Giter Club home page Giter Club logo

Comments (15)

thorhj avatar thorhj commented on August 15, 2024 1

Good, I agree with your observations. I will probably take a stab at this when I get the time.

from clifx.

thorhj avatar thorhj commented on August 15, 2024 1

Can you give me an example CLI that uses this -- placeholder syntax? I have only seen it being used in place for stdin.

One such CLI is git, e.g.

git checkout [-f|--ours|--theirs|-m|--conflict=<style>] [<tree-ish>] [--] <paths>

although in this example it additionally ambiguate the required <paths> from the optional positional argument <tree-ish>.

Generally, I would be open to using escapes, e.g. encapsulating the value in quotes (command "--flag") to treat it as a literal since options names can't have spaces anyway.

Yes, this crossed my mind as well, but .NET parses the arguments into a string[], and here it strips the quotes.

By invoking mycli value1 "value2" "value 3" value 4, the parsed arguments would be

value1
value2
value 3
value
4

with each line denoting a string in the string[]

Notice that the strings value2 and value 3 are not enclosed in quotes. A different, more exotic escape could be used, or the user could be required to escape the quotes in the command-line, but those requirements are far from user-friendly.

from clifx.

thorhj avatar thorhj commented on August 15, 2024 1

Here are some POSIX/GNU guidelines I found:

  • Arguments are options if they begin with a hyphen delimiter (‘-’).
  • Multiple options may follow a hyphen delimiter in a single token if the options do not take arguments. Thus, ‘-abc’ is equivalent to ‘-a -b -c’.
  • Option names are single alphanumeric characters (as for isalnum; see Classification of Characters).
  • Certain options require an argument. For example, the ‘-o’ command of the ld command requires an argument—an output file name.
  • An option and its argument may or may not appear as separate tokens. (In other words, the whitespace separating them is optional.) Thus, ‘-o foo’ and ‘-ofoo’ are equivalent.
  • Options typically precede other non-option arguments.

The implementations of getopt and argp_parse in the GNU C Library normally make it appear as if all the option arguments were specified before all the non-option arguments for the purposes of parsing, even if the user of your program intermixed option and non-option arguments. They do this by reordering the elements of the argv array. This behavior is nonstandard; if you want to suppress it, define the _POSIX_OPTION_ORDER environment variable. See Standard Environment.

  • The argument ‘--’ terminates all options; any following arguments are treated as non-option arguments, even if they begin with a hyphen.
  • A token consisting of a single hyphen character is interpreted as an ordinary non-option argument. - By convention, it is used to specify input from or output to the standard input and output streams.
  • Options may be supplied in any order, or appear multiple times. The interpretation is left up to the particular application program.

(https://www.gnu.org/software/libc/manual/html_node/Argument-Syntax.html)

It is a good idea to follow the POSIX guidelines for the command-line options of a program. The easiest way to do this is to use getopt to parse them. Note that the GNU version of getopt will normally permit options anywhere among the arguments unless the special argument ‘--’ is used. This is not what POSIX specifies; it is a GNU extension.

(https://www.gnu.org/prep/standards/html_node/Command_002dLine-Interfaces.html)

from clifx.

Tyrrrz avatar Tyrrrz commented on August 15, 2024 1

For future reference: I've decided to rename the concept of arguments into "parameters" to avoid ambiguity with command line arguments (which includes everything).

from clifx.

thorhj avatar thorhj commented on August 15, 2024

The command name and options are currently parsed in the CommandInputParser. It currently builds the command name from the arguments until it hits the first option. Given the arguments foo bar --baz 123, the command name would be foo bar.

This obviously needs to change to support positional arguments. The biggest question to answer is: when does the command name stop and the positional argument(s) start? -- especially when considering commands with both subcommands and positional arguments.

My suggestion would be to split the parsing in two parts:

  1. Read the input to find the most specific command schema
  2. Parse the remaining input as options for that command

The most specific command schema would be the one with most of the arguments matching the command name. For instance, having two commands foo <pos> and foo bar <pos> where <pos> is a positional argument, the command schema resolution would be as follows:

  • foo bar baz matches foo bar command with positional argument baz
  • foo baz bar matches foo command with positional arguments baz bar

@Tyrrrz what are your thoughts on this approach?

from clifx.

thorhj avatar thorhj commented on August 15, 2024

We also need to figure out how to mark a property as positional argument. I was thinking a separate attribute CommandArgumentAttribute could be used for this purpose:

[CommandArgument("ref", IsRequired = true)]
public string Reference { get; set; }

The "ref" in this case could be used in the help text:

Usage
  git checkout <ref> [options]

This approach might be tricky when considering whether or not to support multiple positional arguments, and/or collections as positional arguments, which could make sense, for instance:

movefile <source> <destination>

To map the correct argument to Source and Destination properties in this example, the user would need to indicate the order on the attribute ([CommandArgument("source", Order = 0)]). This would obviously require additional validation and open a new venue of errors for library users.

Alternatively, we could allow only a single property to be marked as a positional argument, and let the user utilize collections to support such scenarios. This would however require the library users to parse the input themselves if the arguments contain different data types, e.g.:

printmanytimes "Hello, world!" 10

In summary

  • How should the library user mark properties as positional arguments?
  • Are we going to support multiple arguments? If so:
    • Will we support mapping into multiple properties, and if so, how do we map the positional arguments correctly?
    • Are we going to support collection type properties?

Depending on how we answer these questions, it may also be possible to just extend the CommandOptionAttribute.

from clifx.

Tyrrrz avatar Tyrrrz commented on August 15, 2024

@thorhj

My suggestion would be to split the parsing in two parts:

  • Read the input to find the most specific command schema
  • Parse the remaining input as options for that command

This is the approach I had in mind, except I saw this happening during resolving part. The parser is currently context-unaware that makes it simpler to test and work with for users, and if possible I'd like to keep it this way.

So the way I envisioned it is that CommandInput is extended with a new property IReadOnlyList<string> UnboundArguments { get; } that contains all arguments that were not matched as options. Later, during resolving stage, CommandInitializer would try to find the most specific command name that can be extracted from input.UnboundArguments and use the rest of items as arguments.

Everything else you said, I agree with your way of thinking. :)

We also need to figure out how to mark a property as positional argument. I was thinking a separate attribute CommandArgumentAttribute could be used for this purpose:

Yes, I don't see it possible to re-use CommandOptionAttribute, so a new attribute will be needed. Relevant properties I see are: IsRequired, Order (order discriminator, for two properties with same position, their document order should win), Name (for help text), Description.

This would obviously require additional validation and open a new venue of errors for library users.

Unfortunately, yes. Hopefully CommandSchemaResolver can be extended with required additional validation. Also, #5 should probably help.

Alternatively, we could allow only a single property to be marked as a positional argument, and let the user utilize collections to support such scenarios. This would however require the library users to parse the input themselves if the arguments contain different data types.

I think that would be a step in the wrong direction because we want the users to effectively forget they're dealing with command line.

How should the library user mark properties as positional arguments?

CommandArgumentAttribute

Are we going to support multiple arguments?

Yes.

Will we support mapping into multiple properties, and if so, how do we map the positional arguments correctly?

We will use the Order for that. If order is not defined, the schemas will be ordered according to the properties, which is document order by default when getting them from reflection. Maybe make order required to eliminate "default" behavior?

Are we going to support collection type properties?

Yes. Only one collection type argument is allowed and it must be last in order to remove ambiguity. We should ideally validate this so the user doesn't make a mistake.

from clifx.

Tyrrrz avatar Tyrrrz commented on August 15, 2024

Awesome :)

from clifx.

thorhj avatar thorhj commented on August 15, 2024

@Tyrrrz I had some time to look into this, and I think that the choice to have a context-free parser makes implementing this feature hard/impossible, or at least makes us unable to sensibly disambiguate in certain scenarios. Let me explain.

First of all, I think having the positional arguments before the options was a bad suggestion. Doing so will mean that we cannot disambiguate positional arguments from command names and option keys at the same time, at least not in a sensible manner.

Having the commands

command subcommand <pos> [--flag1] [--flag2]
command <posarg> [--flag1] [--flag2]

Supporting invokation of command with posarg set to either subcommand or --flag1 proves troublesome.


What GNU CLIs typically do, is to have the positional arguments after the options, possibly with a -- to disambiguate:

command subcommand
command -- subcommand

The first calls the command command subcommand, whereas the second calls the command command with positional argument subcommand. Of course, this is only necessary if the positional argument clashes with the name of a subcommand.

Secondly, to disambiguate from option keys:

command --flag1 --flag2
command --flag1 -- --option

The first calls command with the option flag, whereas the second calls command with positional argument --option.


Since this library uses a context-free parser, we lose some usability goodies. If --verbose is a flag option (no value needed), the following is not really ambigious:

command --verbose value

However, without the context that --verbose has no input value, we do not know at parse time whether to bind value to the option, or as a positional argument. We would need to use -- in this scenario as well with a context-free parser, or try to unscramble this at a later point. For a context-aware parser, the only similar scenario I can think of would be if the last option before the positional argument is a list, e.g.

command --inputs a b c -- d.txt

The examples are of course contrived, but I think they demonstrate that the choice of having a context-free parser will definitely make it harder to disambiguate arguments in certain scenarios.

What do you think about these considerations? Any insightful suggestions?

from clifx.

Tyrrrz avatar Tyrrrz commented on August 15, 2024

Thank you for such a thorough analysis @thorhj

Can you give me an example CLI that uses this -- placeholder syntax? I have only seen it being used in place for stdin.

Anyway, I've given it a lot of thought after reading your message.

I really hate ambiguity in command line arguments as I believe they should deterministically convey the user's intent. Ideally, you should be able to look at command line arguments and be able to tell what is set to what, without having to know the structure of the program. This is also why I think it's nice that the arguments go first.

I realize the existence of command arguments and subcommands violates this perfect world where everything is deterministic. This is why I sort of lead on to a compromise where the parser separates the input into arguments and options and later, during command initialization stage, arguments are matched one by one against potential subcommands to see if they are normal arguments or subcommands.

In essence, it would like this:

Input:
aaa bbb ccc -o value

Parsed:
  Args: aaa, bbb, ccc
  Options: {o, value}

if command exists (aaa) then
   if command exists (aaa bbb) then
      if command exists (aaa bbb ccc) then
          route to (aaa bbb ccc) with args [] and options [{o, value}]
      else
          route to (aaa bbb) with args [ccc] and options [{o, value}]
   else
       route to (aaa) with args [bbb, ccc] and options [{o, value}]
else
   route to () with args [aaa bbb ccc] and options [{o, value}]

The above approach, if extended to match required arguments first, should even work with scenarios such as:

command arg1 arg2 subcommand arg3

In case one of the arguments matches the name of a subcommand, the subcommand should win. After all, the designer of the CLI committed to this when they made it this way. To avoid ambiguity, they can declare the argument as an option instead. Also, this problem will also happen with context-aware parsers.

This separates the parser logic from the routing logic which, in my opinion, makes it easier to reason about and test.

As for taking --flag as an argument value, this technically goes against GPU CLI syntax as it states that an argument that begins with a dash is an option. That said, nobody really follows GPU syntax that closely anyway. Generally, I would be open to using escapes, e.g. encapsulating the value in quotes (command "--flag") to treat it as a literal since options names can't have spaces anyway.

Inputs such as command --inputs a b c -- d.txt are really confusing and if it's possible to enforce a more clear format (i.e. command d.txt --inputs a b c), I think it's very much worth doing.

from clifx.

Moophic avatar Moophic commented on August 15, 2024

I prefer a cleaner syntax:

app command1 arg1 arg2 --option

I think we could limit the ability to register ambiguous commands when bootstrapping to protect developers first and users second.
So, we might throw an ambiguity exception in a case when two or more commands have the same number of suffix elements (different subcommands and positional arguments).

These are valid commands:

app command1 arg1 arg2
app command1 subcommand1

But these are not:

app command1 arg1
app command1 subcommand1
app command1 arg1 arg2
app command1 subcommand arg1

from clifx.

Tyrrrz avatar Tyrrrz commented on August 15, 2024

Yes, this crossed my mind as well, but .NET parses the arguments into a string[], and here it strips the quotes.

There is the Environment.CommandLine, although then it's not going to be homogeneous with Main(string[] args) so I'm not very eager to use that.

Generally, if we can discourage using ambiguous configurations we will be fine so long as they have a viable alternative. The only one without a workaround is setting --flag as a value. This seems like a rare enough edge case to warrant introducing escapes, but maybe address that problem when it arises.

I think if there are multiple ways of doing the same thing, we should try to enforce only one that is the least ambiguous. In case with git checkout, the arguments could very well be in the front. It also seems that git checkout doesn't have a subcommand that could cause ambiguity.

from clifx.

Tyrrrz avatar Tyrrrz commented on August 15, 2024

Also, to be a bit pedantic, introducing -- is also possible with a context-free parser, so this discussion isn't really about that at all. 😄

from clifx.

thorhj avatar thorhj commented on August 15, 2024

@Tyrrrz

Input:
aaa "bbb ccc" -o value

If both aaa and aaa bbb ccc are commands, which one is invoked with this input? I would think aaa, since the user actively added quotes around bbb ccc.

If so, should the parser produce this output?

Args: [ "aaa", "bbb ccc" ]
Options: { "o": "value" }

Then we can check the individual arguments for whitespace, and only expand to commands if there are no whitespace characters in the arguments (in the example use aaa as command name but consider bbb ccc a positional argument because it contains whitespace).

Do you agree with this approach?

from clifx.

Tyrrrz avatar Tyrrrz commented on August 15, 2024

@thorhj
Yes, when the user encloses an argument within double quotes it should become a literal value (i.e. not a command, not an option name, not a directive).
I agree this approach sounds good.

from clifx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.