Giter Club home page Giter Club logo

Comments (32)

dspinellis avatar dspinellis commented on June 20, 2024 1

Don't worry, I think I have minimal examples for everything. Please stand by.

from dgsh.

dspinellis avatar dspinellis commented on June 20, 2024

Regarding the one-liner, this looks like an error in the dgsh grammar, because
cat test1 | tee | {{ cat & {{ cat test2 & cat & }} | grep -f - & }} | cat and your multiline equivalent parse OK.

Regarding the erratic behavior, maybe a state variable isn't initialized properly. Please try isolating it in a small script with a couple of multipipe invocations.

from dgsh.

trantor avatar trantor commented on June 20, 2024

Regarding the one-liner, this looks like an error in the dgsh grammar, because
cat test1 | tee | {{ cat & {{ cat test2 & cat & }} | grep -f - & }} | cat and your multiline equivalent parse OK.

So it wasn't a misunderstanding on my part concerning the correct syntax? Just checking if I understood correctly.

I just tried to replicate the aforementioned behaviour, so I did the following:

$ dgsh
// Forgot to chdir where the test files were //
$ {{ cat test1 & cat test2 & }} | grep -f -
dgsh-tee: Error opening test1: No such file or directory
dgsh-tee: Error opening test2: dgsh-tee: exiting before dgsh negotiation is complete
No such file or directory
dgsh-tee: exiting before dgsh negotiation is complete
22011 dgsh: timeout for negotiation. Exit.
22010 dgsh: timeout for negotiation. Exit.
22013 dgsh: timeout for negotiation. Exit.
22014 dgsh: timeout for negotiation. Exit.

Of course with the non-existant files I also expected errors, but are all the "timeout for negotiation" normal here?
I'll try to reproduce the erratic behaviour I mentioned before, but could you give me an idea of what you meant by small script? The one I used looked pretty small to me already.

from dgsh.

dspinellis avatar dspinellis commented on June 20, 2024

The timeout for negotiation is expected behavior. The commands fail finding the input files and never get a chance to negotiate I/O handles with the rest. ( @mfragkoulis: didn't we have an onexit handler to prevent this?)

Regarding the small script, I meant one to replicate the erratic behavior through a script, rather than interactively.

from dgsh.

trantor avatar trantor commented on June 20, 2024

@dspinellis Ok. Let me rephrase: I was talking about the script I demonstrated in the gist, the one with the nested multipipe block. Would like me to try and reproduce the bug using a script without a nested multipipe block, just with a single multipipe block? Would that be in line with your request for a small(er) script?

from dgsh.

dspinellis avatar dspinellis commented on June 20, 2024

Grammar problem

# Fails
{{ {{ echo hello ; }} | cat ; echo world ; }} | cat

# Equivalent multi-line works
{{
  {{ echo hello ; }} |
  cat
  echo world
}} |
cat

# Equivalent shell construct works
{ { echo hello ; } | cat ; echo world ; } | cat

Interactive use problem

Pasting the following into the shell in one go gives a negotiation problem error. Pasting it in two chunks, works fine.

{{ echo hello ; }} | cat
echo world

Here is a session:

$ {{ echo hello ; }} | cat
dgsh-conc: dgsh-conc.c:330: pass_message_blocks: Assertion `pi[next].to_write == ((void *)0)' failed.
4559 dgsh: timeout for negotiation. Exit.
4560 dgsh: timeout for negotiation. Exit.
4557 dgsh: timeout for negotiation. Exit.
$ echo world
world

from dgsh.

trantor avatar trantor commented on June 20, 2024

BTW @dspinellis @mfragkoulis I am deeply thankful both for the tool and the work you are putting into it.
As a sysadmin whose life is basically pipelines and data structure manipulation, I longed for a tool which could allow for a graph-like flow of commands to be feasible. Process substitutions in bash help a lot but in the scatter-gather model they are severely lacking in the gathering department.
If dgsh proves to be stable enough it'll become an integral part of my everyday arsenal, simplifying immensely my work.

from dgsh.

mfragkoulis avatar mfragkoulis commented on June 20, 2024

Thank you @trantor for helping us make dgsh a reliable tool.
I have started working on this issue.
I hope to have a solution within this weekend.

from dgsh.

dspinellis avatar dspinellis commented on June 20, 2024

Thank you @trantor for your kind words! Please note that we have not up to now used dgsh interactively, which may explain the teething problems of instability you're experiencing. On the other hand, we've used dgsh extensively with scripts where it has proven quite reliable, even to the point of meta-programming (see the dgsh-parallel script).

We'd be very interested to see how you use dgsh, and perhaps also add some part of your work in the examples section.

from dgsh.

trantor avatar trantor commented on June 20, 2024

@mfragkoulis 👍
@dspinellis I hope to be able to convert/simplify several scripts/pipelines where I need to process a source multiple times given the inability with bash to get a graph-like flow of information.
I had tried out dgsh at the beginning of the year with problems similar to the ones I just mentioned, so I put it aside for a while. I hoped to ask you a few questions about it at FOSDEM but at the last minute I had to forgo attending the conference.
A feedback I can give now is about the documentation: it's pretty interesting (the images of the graphs are very helpful) but I think that the examples would benefit tremendously by adding sample input and overall output and maybe also the output at strategic intermediate steps. It would help to better grasp the flow reading through the examples, especially with the more convoluted ones.
I would also like to see examples focussing more on the dgsh-readval and dgsh-writeval programs. I get the feeling that they could be useful to me but I don't quite grasp what they can do in a practical situation. They also have quite a few options which I am not quite sure how to use despite looking at the man page.
Another thing unclear to me is the function of incoming channel in wrapped/modified programs such as grep. I inferred from the examples that the first channel and the second channel were for the pattern file and the input data, but I didn't found where that was documented.
Again, going more than slightly off-topic here ...

from dgsh.

mfragkoulis avatar mfragkoulis commented on June 20, 2024

@trantor does the fix resolve the interactive use problem?

from dgsh.

trantor avatar trantor commented on June 20, 2024

@mfragkoulis Afraid not. Just built dgsh from scratch and the behaviour is exactly the same with the commands and script in the gist. :(

from dgsh.

mfragkoulis avatar mfragkoulis commented on June 20, 2024

Hm, I'll take a look at the gist.
How about the following simplified example?

{{ echo hello ; }} | cat
echo world

from dgsh.

trantor avatar trantor commented on June 20, 2024

@mfragkoulis That works fine

from dgsh.

dspinellis avatar dspinellis commented on June 20, 2024

So probably the "Interactive use problem" I identified is different from the one @trantor witnessed. Have we solved the grammar problem? @trantor can you please try to create a script/interaction sequence that reliably reproduces the other problem?

from dgsh.

mfragkoulis avatar mfragkoulis commented on June 20, 2024

@trantor thanks for the heads up!
Please note that the previous fix targeted the interactive use problem solely.
The gist includes the grammar problem also.
Does the latest fix solve the grammar problem?
If so, the script in your gist should run as expected.

from dgsh.

dspinellis avatar dspinellis commented on June 20, 2024

@mfragkoulis This is the grammar problem.

from dgsh.

trantor avatar trantor commented on June 20, 2024

@mfragkoulis Sorry I hadn't seen your last commit. Checking it out now.

from dgsh.

trantor avatar trantor commented on June 20, 2024

$ cat test1 | tee | {{ {{ cat test2 & cat & }} | grep -f - & cat & }} | cat
a
b
c
e
grep: write error

Strange thing below.

$ ./dgsh-sample
a
b
c
e
grep: write error
$ ./dgsh-sample
a
b
c
e
$ ./dgsh-sample
a
b
c
e
grep: write error

Rebuilding it from scratch again to check for leftovers of the previous build.

from dgsh.

trantor avatar trantor commented on June 20, 2024

Apparently now the command line and the script give the same output.
So the grammar problem appears to be fixed.
Running it gives me

a
b
c
e
grep: write error

as output most of the time but not all the time, just as seen in the previous comment.
Tomorrow I'll be available for further checks. Good night for now.

from dgsh.

dspinellis avatar dspinellis commented on June 20, 2024

Running strace on grep shows me that grep's standard output isn't properly connected.

write(1, "a\n", 2)                      = -1 ECONNREFUSED (Connection refused)
write(1, "b\n", 2)                      = -1 ENOTCONN (Transport endpoint is not connected)
write(1, "c\n", 2)                      = -1 ENOTCONN (Transport endpoint is not connected)
write(1, "e\n", 2)                      = -1 ENOTCONN (Transport endpoint is not connected)

Furthermore, the trailing cat's debug output shows that it got assigned only one input channel nin=1.

5778: dgsh_negotiate(): Tool cat with pid 5778 negotiating: nin=-1 nout=-1.
5778: read_message_block(): cat (0)
5778: add_node(): Added node cat in position 4 on dgsh graph, initiator: 5760
5778: write_message_block(): cat (4)
5778: read_message_block(): cat (4)
5778: write_message_block(): cat (4)
5778: dgsh_negotiate(): cat (4) leaves after write with state COMPLETE.
5778: establish_io_connections(): successful for node cat at index 4
5778: nin=1 nout=0
5778: New ifp assigned fd 0
5778: Input files
5778: 0x81c1058: chain_last=1 (standard input)
5778: Output files
5778: 0x81c1140: chain_last=1 ifp=0x81c1058 (standard output)
5778: Retiring file standard output pos_written=pos_to_write=8 source_pos_read=8

from dgsh.

mfragkoulis avatar mfragkoulis commented on June 20, 2024

@dspinellis that's true.
The command grep -f - does not produce an output channel to pipe to the cat command that follows.
It is because there are specific arguments of grep that activate output channels.
The default is --matching-lines.
So the command cat test1 | tee | {{ {{ cat test2 & cat & }} | grep -f - --matching-lines & cat & }} | cat seems to work as expected.

@trantor do you get the expected behavior now?

Note that there are limitations to the functionality we can achieve by modifying the grep tool.
To overcome this situation, we are developing dgsh-grep, a script that will produce a multipipe block consisting of grep commands (similarly to dgsh-parallel) to be able to freely combine output channels with no modification to grep's source code.

from dgsh.

dspinellis avatar dspinellis commented on June 20, 2024

@mfragkoulis grep should be silent (not attempt to write) or issue an error in the above case.

I have mixed feelings regarding the re-implementation of grep. Wrapping the invocation of multiple grep commands doesn't offer much in comparison to what you can already write by hand. I proposed writing in C a bare-bones version of grep that would loop reading lines, invoking regex_match and outputting in different channels (depending on its arguments)

  • matching lines
  • non-matching lines
  • matching files
  • non-matching files

However, although this is elegant, I see the following issues.

  • I'm not sure we have realistic use cases for the above
  • The bare bones version would not support coloring
  • The bare bones version would not fast fixed string search of specified input

Note that the last two are capabilities we already use.

from dgsh.

trantor avatar trantor commented on June 20, 2024

@mfragkoulis I don't get the "grep: write error" anymore adding --matching-lines.
I am going to modify the sample script to do something meaningful and get back to you.
Btw I get a whole lot of "timeout for negotiation" whenever I try to use bash completion (TAB-driven completion on filename arguments) in interactive mode.
Could you give a concrete example of the "translation" dgsh-grep would do?
Which grep arguments activate output channels?

@dspinellis About the possible replacement, no "fast" fixed string search on no fixed string search at all? What about the other three modes? Practically speaking fixed string search is something I use quite a lot.

from dgsh.

dspinellis avatar dspinellis commented on June 20, 2024

@trantor We're discussing to find the best approach. Nothing is yet cast in stone. We were thinking that dgsh-grep -e a -v b -l c would create three channels:

  • lines containing a
  • lines not containing b
  • names of files containing c

What do you mean by "What about the other three modes?"

from dgsh.

trantor avatar trantor commented on June 20, 2024

What do you mean by "What about the other three modes?"

@dspinellis Basic Regular Expressions match , Extended Regular Expressions match and PCRE match. For GNU grep obviously.

from dgsh.

dspinellis avatar dspinellis commented on June 20, 2024

I was thinking of a simple loop calling regexec(3). This would cover regular and extended REs.

from dgsh.

trantor avatar trantor commented on June 20, 2024

@dspinellis I for instance use rather heavily the PCRE match function (grep --perl-regexp|-P). Look-aheads, Look-behinds etc. are vital features afaiac.
Of course there's always pcregrep but I don't know how they compare to each other feature-wise.
Just my experience obviously.

from dgsh.

mfragkoulis avatar mfragkoulis commented on June 20, 2024

@dspinellis
I see non-trivial benefit with writing a grep command via dgsh-grep than by hand.
Here is a simple example:

dgsh-grep -c -v pattern file

where the produced script or hand-written form is:

{{
    grep -c pattern file
    grep -v pattern file
}}
# or
{{ grep -c pattern file ; grep -v pattern file ; }}

It seems to me that the benefit of dgsh-parallel in a similarly simple command is comparable:

dgsh-parallel -n 2 'echo C{}'

where the produced script or hand-written form is:

{{
  echo C1
  echo C2
}}
# or
{{ echo C1 ; echo C2 ; }}

Here is a more complicated example of dgsh-grep:

dgsh-grep --recursive --ignore-case --color --context=2 -c -v -l -L pattern dir

where the produced script or hand-written form is:

{{
    grep --recursive --ignore-case --color --context=2 -c pattern dir
    grep --recursive --ignore-case --color --context=2 -v pattern dir
    grep --recursive --ignore-case --color --context=2 -l pattern dir
    grep --recursive --ignore-case --color --context=2 -L pattern dir
}}

Here is a more complicated example of dgsh-parallel:

dgsh-parallel -n 5 "tr -s ' \t\n\r\f' '\n' | sort -S 512M | uniq -c"

where the produced script or hand-written form is:

{{
  tr -s ' \t\n\r\f' '\n' | sort -S 512M | uniq -c
  tr -s ' \t\n\r\f' '\n' | sort -S 512M | uniq -c
  tr -s ' \t\n\r\f' '\n' | sort -S 512M | uniq -c
  tr -s ' \t\n\r\f' '\n' | sort -S 512M | uniq -c
  tr -s ' \t\n\r\f' '\n' | sort -S 512M | uniq -c
}}

An additional advantage of dgsh-grep to improved readability and reduced write effort is that the dgsh-grep form is similar to the way users already write a grep command.

I am hesitant regarding the bare-bones version of grep.
It is indeed elegant and, as you've argued many times, tools have been rewritten in the past when there was good reason for it.
However, I would see value in a new version of grep only if it is developed with the mindset to gradually grow to include all the functionality provided by grep (and more that the users inspire and contribute).
I'm not sure a new version with limited characteristics would be very useful.

@trantor

  • can you provide a workflow (in a new issue perhaps) where auto-completion fails? I tried to reproduce it, but in vain.
  • regarding examples of dgsh-grep, you can see a couple in the previous paragraphs
  • you can find grep's arguments that activate output channels currently here

from dgsh.

trantor avatar trantor commented on June 20, 2024

Thanks @mfragkoulis for the link and the example. I'll try to debug what the completion actually does (programmable completion should be active) and open a new issue on that.
I am a bit busy with work so it might take me a while.

About dgsh-grep, If I understand correctly the selection of how many output channels appear and what's in them would follow the order of a select number of grep options.

But what if I wanted to translate a match with multiple patterns and non-simple combinations?
For example
grep -c -v -e "PATTERN1" -e "PATTERN2" *
and
grep -v -e "PATTERN1" -e "PATTERN2" *
are both valid and have different meanings, but that wouldn't work with your examples, since you split -c to a different invocation.
I think -v should be propagated to all instances of grep in the multipipe block and the multiple patterns as well. Same for -c. Then there are other matching control options such as -w and -x.
While interesting there is also the problem that the strict order of options in your examples in order to map precisely to output channels does not match how freely you can order them in your usual grep call. That would go against the principle of least surprise for the average grep users, but clearly specified it might work.
Just my two cents. ;)

from dgsh.

dspinellis avatar dspinellis commented on June 20, 2024

@mfragkoulis I think it would be difficult to get the semantics of the grep script correct and understandable. In contrast, dgsh-parallel just repeats N times the same command. I agree that there is currently no demonstrated need for a C implementation. Therefore, I suggest, following POLA, to just fix the existing grep implementation so that it will output lines by default, mirroring the operation of the unmodified version.

$ echo hi | grep -f <(echo .)
hi

from dgsh.

mfragkoulis avatar mfragkoulis commented on June 20, 2024

Is it OK to close this issue?

from dgsh.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.