Comments (32)
Don't worry, I think I have minimal examples for everything. Please stand by.
from dgsh.
Regarding the one-liner, this looks like an error in the dgsh grammar, because
cat test1 | tee | {{ cat & {{ cat test2 & cat & }} | grep -f - & }} | cat
and your multiline equivalent parse OK.
Regarding the erratic behavior, maybe a state variable isn't initialized properly. Please try isolating it in a small script with a couple of multipipe invocations.
from dgsh.
Regarding the one-liner, this looks like an error in the dgsh grammar, because
cat test1 | tee | {{ cat & {{ cat test2 & cat & }} | grep -f - & }} | cat and your multiline equivalent parse OK.
So it wasn't a misunderstanding on my part concerning the correct syntax? Just checking if I understood correctly.
I just tried to replicate the aforementioned behaviour, so I did the following:
$ dgsh
// Forgot to chdir where the test files were //
$ {{ cat test1 & cat test2 & }} | grep -f -
dgsh-tee: Error opening test1: No such file or directory
dgsh-tee: Error opening test2: dgsh-tee: exiting before dgsh negotiation is complete
No such file or directory
dgsh-tee: exiting before dgsh negotiation is complete
22011 dgsh: timeout for negotiation. Exit.
22010 dgsh: timeout for negotiation. Exit.
22013 dgsh: timeout for negotiation. Exit.
22014 dgsh: timeout for negotiation. Exit.
Of course with the non-existant files I also expected errors, but are all the "timeout for negotiation" normal here?
I'll try to reproduce the erratic behaviour I mentioned before, but could you give me an idea of what you meant by small script? The one I used looked pretty small to me already.
from dgsh.
The timeout for negotiation
is expected behavior. The commands fail finding the input files and never get a chance to negotiate I/O handles with the rest. ( @mfragkoulis: didn't we have an onexit handler to prevent this?)
Regarding the small script, I meant one to replicate the erratic behavior through a script, rather than interactively.
from dgsh.
@dspinellis Ok. Let me rephrase: I was talking about the script I demonstrated in the gist, the one with the nested multipipe block. Would like me to try and reproduce the bug using a script without a nested multipipe block, just with a single multipipe block? Would that be in line with your request for a small(er) script?
from dgsh.
Grammar problem
# Fails
{{ {{ echo hello ; }} | cat ; echo world ; }} | cat
# Equivalent multi-line works
{{
{{ echo hello ; }} |
cat
echo world
}} |
cat
# Equivalent shell construct works
{ { echo hello ; } | cat ; echo world ; } | cat
Interactive use problem
Pasting the following into the shell in one go gives a negotiation problem error. Pasting it in two chunks, works fine.
{{ echo hello ; }} | cat
echo world
Here is a session:
$ {{ echo hello ; }} | cat
dgsh-conc: dgsh-conc.c:330: pass_message_blocks: Assertion `pi[next].to_write == ((void *)0)' failed.
4559 dgsh: timeout for negotiation. Exit.
4560 dgsh: timeout for negotiation. Exit.
4557 dgsh: timeout for negotiation. Exit.
$ echo world
world
from dgsh.
BTW @dspinellis @mfragkoulis I am deeply thankful both for the tool and the work you are putting into it.
As a sysadmin whose life is basically pipelines and data structure manipulation, I longed for a tool which could allow for a graph-like flow of commands to be feasible. Process substitutions in bash help a lot but in the scatter-gather model they are severely lacking in the gathering department.
If dgsh proves to be stable enough it'll become an integral part of my everyday arsenal, simplifying immensely my work.
from dgsh.
Thank you @trantor for helping us make dgsh a reliable tool.
I have started working on this issue.
I hope to have a solution within this weekend.
from dgsh.
Thank you @trantor for your kind words! Please note that we have not up to now used dgsh interactively, which may explain the teething problems of instability you're experiencing. On the other hand, we've used dgsh extensively with scripts where it has proven quite reliable, even to the point of meta-programming (see the dgsh-parallel script).
We'd be very interested to see how you use dgsh, and perhaps also add some part of your work in the examples section.
from dgsh.
@mfragkoulis 👍
@dspinellis I hope to be able to convert/simplify several scripts/pipelines where I need to process a source multiple times given the inability with bash to get a graph-like flow of information.
I had tried out dgsh at the beginning of the year with problems similar to the ones I just mentioned, so I put it aside for a while. I hoped to ask you a few questions about it at FOSDEM but at the last minute I had to forgo attending the conference.
A feedback I can give now is about the documentation: it's pretty interesting (the images of the graphs are very helpful) but I think that the examples would benefit tremendously by adding sample input and overall output and maybe also the output at strategic intermediate steps. It would help to better grasp the flow reading through the examples, especially with the more convoluted ones.
I would also like to see examples focussing more on the dgsh-readval and dgsh-writeval programs. I get the feeling that they could be useful to me but I don't quite grasp what they can do in a practical situation. They also have quite a few options which I am not quite sure how to use despite looking at the man page.
Another thing unclear to me is the function of incoming channel in wrapped/modified programs such as grep. I inferred from the examples that the first channel and the second channel were for the pattern file and the input data, but I didn't found where that was documented.
Again, going more than slightly off-topic here ...
from dgsh.
@trantor does the fix resolve the interactive use problem?
from dgsh.
@mfragkoulis Afraid not. Just built dgsh from scratch and the behaviour is exactly the same with the commands and script in the gist. :(
from dgsh.
Hm, I'll take a look at the gist.
How about the following simplified example?
{{ echo hello ; }} | cat
echo world
from dgsh.
@mfragkoulis That works fine
from dgsh.
So probably the "Interactive use problem" I identified is different from the one @trantor witnessed. Have we solved the grammar problem? @trantor can you please try to create a script/interaction sequence that reliably reproduces the other problem?
from dgsh.
@trantor thanks for the heads up!
Please note that the previous fix targeted the interactive use problem solely.
The gist includes the grammar problem also.
Does the latest fix solve the grammar problem?
If so, the script in your gist should run as expected.
from dgsh.
@mfragkoulis This is the grammar problem.
from dgsh.
@mfragkoulis Sorry I hadn't seen your last commit. Checking it out now.
from dgsh.
$ cat test1 | tee | {{ {{ cat test2 & cat & }} | grep -f - & cat & }} | cat
a
b
c
e
grep: write error
Strange thing below.
$ ./dgsh-sample
a
b
c
e
grep: write error
$ ./dgsh-sample
a
b
c
e
$ ./dgsh-sample
a
b
c
e
grep: write error
Rebuilding it from scratch again to check for leftovers of the previous build.
from dgsh.
Apparently now the command line and the script give the same output.
So the grammar problem appears to be fixed.
Running it gives me
a
b
c
e
grep: write error
as output most of the time but not all the time, just as seen in the previous comment.
Tomorrow I'll be available for further checks. Good night for now.
from dgsh.
Running strace on grep shows me that grep's standard output isn't properly connected.
write(1, "a\n", 2) = -1 ECONNREFUSED (Connection refused)
write(1, "b\n", 2) = -1 ENOTCONN (Transport endpoint is not connected)
write(1, "c\n", 2) = -1 ENOTCONN (Transport endpoint is not connected)
write(1, "e\n", 2) = -1 ENOTCONN (Transport endpoint is not connected)
Furthermore, the trailing cat's debug output shows that it got assigned only one input channel nin=1
.
5778: dgsh_negotiate(): Tool cat with pid 5778 negotiating: nin=-1 nout=-1.
5778: read_message_block(): cat (0)
5778: add_node(): Added node cat in position 4 on dgsh graph, initiator: 5760
5778: write_message_block(): cat (4)
5778: read_message_block(): cat (4)
5778: write_message_block(): cat (4)
5778: dgsh_negotiate(): cat (4) leaves after write with state COMPLETE.
5778: establish_io_connections(): successful for node cat at index 4
5778: nin=1 nout=0
5778: New ifp assigned fd 0
5778: Input files
5778: 0x81c1058: chain_last=1 (standard input)
5778: Output files
5778: 0x81c1140: chain_last=1 ifp=0x81c1058 (standard output)
5778: Retiring file standard output pos_written=pos_to_write=8 source_pos_read=8
from dgsh.
@dspinellis that's true.
The command grep -f -
does not produce an output channel to pipe to the cat
command that follows.
It is because there are specific arguments of grep that activate output channels.
The default is --matching-lines
.
So the command cat test1 | tee | {{ {{ cat test2 & cat & }} | grep -f - --matching-lines & cat & }} | cat
seems to work as expected.
@trantor do you get the expected behavior now?
Note that there are limitations to the functionality we can achieve by modifying the grep
tool.
To overcome this situation, we are developing dgsh-grep, a script that will produce a multipipe block consisting of grep
commands (similarly to dgsh-parallel
) to be able to freely combine output channels with no modification to grep's source code.
from dgsh.
@mfragkoulis grep should be silent (not attempt to write) or issue an error in the above case.
I have mixed feelings regarding the re-implementation of grep. Wrapping the invocation of multiple grep commands doesn't offer much in comparison to what you can already write by hand. I proposed writing in C a bare-bones version of grep that would loop reading lines, invoking regex_match and outputting in different channels (depending on its arguments)
- matching lines
- non-matching lines
- matching files
- non-matching files
However, although this is elegant, I see the following issues.
- I'm not sure we have realistic use cases for the above
- The bare bones version would not support coloring
- The bare bones version would not fast fixed string search of specified input
Note that the last two are capabilities we already use.
from dgsh.
@mfragkoulis I don't get the "grep: write error" anymore adding --matching-lines
.
I am going to modify the sample script to do something meaningful and get back to you.
Btw I get a whole lot of "timeout for negotiation" whenever I try to use bash completion (TAB-driven completion on filename arguments) in interactive mode.
Could you give a concrete example of the "translation" dgsh-grep would do?
Which grep arguments activate output channels?
@dspinellis About the possible replacement, no "fast" fixed string search on no fixed string search at all? What about the other three modes? Practically speaking fixed string search is something I use quite a lot.
from dgsh.
@trantor We're discussing to find the best approach. Nothing is yet cast in stone. We were thinking that dgsh-grep -e a -v b -l c
would create three channels:
- lines containing
a
- lines not containing
b
- names of files containing
c
What do you mean by "What about the other three modes?"
from dgsh.
What do you mean by "What about the other three modes?"
@dspinellis Basic Regular Expressions match , Extended Regular Expressions match and PCRE match. For GNU grep obviously.
from dgsh.
I was thinking of a simple loop calling regexec
(3). This would cover regular and extended REs.
from dgsh.
@dspinellis I for instance use rather heavily the PCRE match function (grep --perl-regexp|-P). Look-aheads, Look-behinds etc. are vital features afaiac.
Of course there's always pcregrep but I don't know how they compare to each other feature-wise.
Just my experience obviously.
from dgsh.
@dspinellis
I see non-trivial benefit with writing a grep command via dgsh-grep
than by hand.
Here is a simple example:
dgsh-grep -c -v pattern file
where the produced script or hand-written form is:
{{
grep -c pattern file
grep -v pattern file
}}
# or
{{ grep -c pattern file ; grep -v pattern file ; }}
It seems to me that the benefit of dgsh-parallel
in a similarly simple command is comparable:
dgsh-parallel -n 2 'echo C{}'
where the produced script or hand-written form is:
{{
echo C1
echo C2
}}
# or
{{ echo C1 ; echo C2 ; }}
Here is a more complicated example of dgsh-grep
:
dgsh-grep --recursive --ignore-case --color --context=2 -c -v -l -L pattern dir
where the produced script or hand-written form is:
{{
grep --recursive --ignore-case --color --context=2 -c pattern dir
grep --recursive --ignore-case --color --context=2 -v pattern dir
grep --recursive --ignore-case --color --context=2 -l pattern dir
grep --recursive --ignore-case --color --context=2 -L pattern dir
}}
Here is a more complicated example of dgsh-parallel
:
dgsh-parallel -n 5 "tr -s ' \t\n\r\f' '\n' | sort -S 512M | uniq -c"
where the produced script or hand-written form is:
{{
tr -s ' \t\n\r\f' '\n' | sort -S 512M | uniq -c
tr -s ' \t\n\r\f' '\n' | sort -S 512M | uniq -c
tr -s ' \t\n\r\f' '\n' | sort -S 512M | uniq -c
tr -s ' \t\n\r\f' '\n' | sort -S 512M | uniq -c
tr -s ' \t\n\r\f' '\n' | sort -S 512M | uniq -c
}}
An additional advantage of dgsh-grep
to improved readability and reduced write effort is that the dgsh-grep
form is similar to the way users already write a grep command.
I am hesitant regarding the bare-bones version of grep
.
It is indeed elegant and, as you've argued many times, tools have been rewritten in the past when there was good reason for it.
However, I would see value in a new version of grep
only if it is developed with the mindset to gradually grow to include all the functionality provided by grep
(and more that the users inspire and contribute).
I'm not sure a new version with limited characteristics would be very useful.
- can you provide a workflow (in a new issue perhaps) where auto-completion fails? I tried to reproduce it, but in vain.
- regarding examples of
dgsh-grep
, you can see a couple in the previous paragraphs - you can find
grep
's arguments that activate output channels currently here
from dgsh.
Thanks @mfragkoulis for the link and the example. I'll try to debug what the completion actually does (programmable completion should be active) and open a new issue on that.
I am a bit busy with work so it might take me a while.
About dgsh-grep, If I understand correctly the selection of how many output channels appear and what's in them would follow the order of a select number of grep options.
But what if I wanted to translate a match with multiple patterns and non-simple combinations?
For example
grep -c -v -e "PATTERN1" -e "PATTERN2" *
and
grep -v -e "PATTERN1" -e "PATTERN2" *
are both valid and have different meanings, but that wouldn't work with your examples, since you split -c
to a different invocation.
I think -v
should be propagated to all instances of grep in the multipipe block and the multiple patterns as well. Same for -c
. Then there are other matching control options such as -w
and -x
.
While interesting there is also the problem that the strict order of options in your examples in order to map precisely to output channels does not match how freely you can order them in your usual grep call. That would go against the principle of least surprise for the average grep users, but clearly specified it might work.
Just my two cents. ;)
from dgsh.
@mfragkoulis I think it would be difficult to get the semantics of the grep script correct and understandable. In contrast, dgsh-parallel just repeats N times the same command. I agree that there is currently no demonstrated need for a C implementation. Therefore, I suggest, following POLA, to just fix the existing grep implementation so that it will output lines by default, mirroring the operation of the unmodified version.
$ echo hi | grep -f <(echo .)
hi
from dgsh.
Is it OK to close this issue?
from dgsh.
Related Issues (20)
- [RFE] Parallelize processes receiving different input streams HOT 8
- Timeout for negotiation when run interactively from dgsh HOT 10
- Bash raising 'no version information available' HOT 4
- error: 'dir_fd' may be used uninitialized in this function HOT 6
- Executable stack HOT 2
- Unable to Build on MacOs Mojave HOT 4
- How to background a process with dgsh HOT 1
- different example HOT 5
- rsync: getaddrinfo: translationproject.org 873: Temporary failure in name resolution HOT 2
- Named pipelines HOT 1
- Porting for 1) Natively on termux (Follows aarch64) 2)Ubuntu arm64 /(any other linux distro) running inside proot container on termux HOT 21
- can not compile - gnulib problem HOT 11
- Deadlock in dgsh-tee
- Avoid nout=0 on last pipe component HOT 6
- Optimize dgsh program detection
- Provide way to detect whether a script is running under dsgh? HOT 7
- Wrap rather than modify the source code of single-output commands HOT 1
- Document POSIX tools with dgsh capabilities
- Problem with grep (maybe) HOT 18
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dgsh.