Giter Club home page Giter Club logo

genivia / ugrep Goto Github PK

View Code? Open in Web Editor NEW
2.4K 31.0 98.0 119.79 MB

ugrep 5.1: A more powerful, ultra fast, user-friendly, compatible grep. Includes a TUI, Google-like Boolean search with AND/OR/NOT, fuzzy search, hexdumps, searches (nested) archives (zip, 7z, tar, pax, cpio), compressed files (gz, Z, bz2, lzma, xz, lz4, zstd, brotli), pdfs, docs, and more

Home Page: https://ugrep.com

License: BSD 3-Clause "New" or "Revised" License

Makefile 6.88% Shell 6.01% M4 3.62% C++ 71.30% C 12.17% Java 0.01% Dockerfile 0.02%
grep grep-search unicode grepping file-search recursively-search text-search regex search shell-utilities

ugrep's People

Contributors

arrufat avatar barsnick avatar camuffo avatar carlwgeorge avatar chrismoutsos avatar dependabot[bot] avatar emaste avatar ericonr avatar felixonmars avatar genivia-inc avatar idigdoug avatar illiliti avatar juhopp avatar landfillbaby avatar lgtm-migrator avatar mmuman avatar onuralpszr avatar pete-woods avatar peterdavehello avatar polluks avatar rbnor avatar ribalda avatar robert-van-engelen avatar sitiom avatar stdedos avatar trantor avatar vlkrs avatar vpicavet avatar wahjava avatar zmajeed avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ugrep's Issues

Add build artifacts at the release assets

As mentioned in

ugrep/README.md

Line 305 in 2c9d5c5

Or visit https://github.com/Genivia/ugrep/releases to download a release.

Would you please add all the releases here?
image

Including the cp2bin's target (bin/linux/ugrep does not run on my system, whereas bin/ugrep does)

It actually took me a while to figure out that everything is instead under ugrep/bin

Something like:

illegal instruction, failes to execute

GNU gdb (Ubuntu 7.11.1-0ubuntu116.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./ugrep/bin/ugrep...done.
[New LWP 21110]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `ugrep --format=%f:%O%
-rzcFf test /aaa/bbb/cccc/ssss/2222/2222-22-22/'.
Program terminated with signal SIGILL, Illegal instruction.
#0 0x000000000043df0c in reflex::Pattern::parse(std::set<reflex::Pattern::Position, std::lessreflex::Pattern::Position, std::allocatorreflex::Pattern::Position >&, std::map<reflex::Pattern::Position, std::set<reflex::Pattern::Position, std::lessreflex::Pattern::Position, std::allocatorreflex::Pattern::Position >, std::lessreflex::Pattern::Position, std::allocator<std::pair<reflex::Pattern::Position const, std::set<reflex::Pattern::Position, std::lessreflex::Pattern::Position, std::allocatorreflex::Pattern::Position > > > >&, std::map<int, reflex::ORanges, std::less, std::allocator<std::pair<int const, reflex::ORanges > > >&, std::map<int, reflex::ORanges, std::less, std::allocator<std::pair<int const, reflex::ORanges > > >&) ()

Backtrace:
(gdb) backtrace
#0 0x000000000043df0c in reflex::Pattern::parse(std::set<reflex::Pattern::Position, std::lessreflex::Pattern::Position, std::allocatorreflex::Pattern::Position >&, std::map<reflex::Pattern::Position, std::set<reflex::Pattern::Position, std::lessreflex::Pattern::Position, std::allocatorreflex::Pattern::Position >, std::lessreflex::Pattern::Position, std::allocator<std::pair<reflex::Pattern::Position const, std::set<reflex::Pattern::Position, std::lessreflex::Pattern::Position, std::allocatorreflex::Pattern::Position > > > >&, std::map<int, reflex::ORanges, std::less, std::allocator<std::pair<int const, reflex::ORanges > > >&, std::map<int, reflex::ORanges, std::less, std::allocator<std::pair<int const, reflex::ORanges > > >&) ()
#1 0x0000000000442efd in reflex::Pattern::init(char const*, unsigned char const*) ()
#2 0x000000000040755b in main ()

Also the test fails. I built from master, same system as before, should not be any issues.

Homebrew ugrep formula for MacOS and Linux

EDIT: The repo includes a Homebrew formula to install the latest version of ugrep, fully-featured and AVX/SSE2 optimized:

$ brew install https://raw.githubusercontent.com/Genivia/ugrep/master/Formula/ugrep.rb

Still, it would be nice to have an official homebrew formula for ugrep. The problem is that I am stuck with brew audit reporting "GitHub repository not notable enough (<30 forks, <30 watchers and <75 stars)".

  1. If you're happy with ugrep, please ⭐️ the project.

  2. If you want easy installation with homebrew and other installers, please ⭐️ the project.

  3. If you want more features in ugrep, please let us know what you want and don't forget to ⭐️ the project when you are happy with the ugrep updates.

But if you are somehow unhappy and have problems with ugrep, please submit your issue with technical details so we can improve ugrep.

macOS and symbolic link

Hi
for macOS this error is still present
Thanks
P.S
please fix syntax exampe
ugrep -Rl '.' -N '\p{Unicode})'

[FR] quickly navigate to next file and back one file in the query UI

Map two keys (or key combos) to quickly navigate to the next file on screen and back one file. Perhaps use CTRL-S to scroll forward to the next file and CTRL-W to scroll back one file. The 'W' and 'S' keys are sometimes used to move up and down, so should be a good fit for this purpose. This should also work in selection mode to move the cursor.

Per-directory configuration?

Some other similar utilities such as sift have the very useful feature of having a per-directory configuration file, which is searched for in the same way as .git, i.e. in the current directories, then in parent directories. This provides a convenient way to set up search parameters on a per-project basis.

An alternative (or complement) would be a UGREP_OPTIONS variable, like GREP_OPTIONS for GNU grep, which could be combined with something like direnv to achieve the same result.

Forgive me if I've overlooked equivalent functionality in ugrep, which I continue to enjoy getting to greps with (err, getting to grips with!).

Segementation fault with over 9 files on Cygwin

I'd like to be able to use ugrep at a Cygwin prompt on Windows 10. Unfortunately I get a segmentation fault if there are more than nine files on the command line, e.g.:

ugrep thing f1 f2 f3 f4 f5 f6 f6 f7 f8 f9 f10

For this reason I also get segmentation faults when using shell filename expansion, e.g.:

ugrep thing f*

Any thoughts? Many thanks!

idea: mark matches with tags

Right now ugrep has a --colors=COLORS, --colours=COLORS parameter to let the user mark different parts of the matches with different colors. However, this is only useful in a console environment. When the result of ugrep was redirected to a file, all colors will disappear. If there is an option like --tags to let the user mark different parts of the matches using different text string, it will be more useful. For colors, we only need a single parameter for a matching part. For example, for selected lines we provide a color. But if we mark selected line with text, there will be a begin selected line and an end selected line.

Please add tests for directories with compressed files

Request:
A very common usecase for people like me that seek maximum performance is searching within large directories with a lot of compressed log files.

I have explored most options, but none have beaten the following syntax:
time find 2019* -name '' -print0 | xargs -P8 -0 zgrep -Fl 10.0.0.1
(8 because that was optimal on the last computer i tested on, but will as you know likely differ.) This method is in some cases just as fast on twice the data, while options like rg almost 2x in time when the data doubles, which means there is some unesecary serialization or something else that causes it to wait rather than using available resources to get the work done. Even :
time find 2019
-name '*' -print0 | xargs -P8 -0 -I {} sh -c 'zcat {} |rg -Fl 10.0.0.1'
was faster than using rg on its own.

It could also be that the performance gain is first seen on a larger amount of data as is the case with one of the servers i usually work with.

On this data:
https://github.com/RuneBergh/example_data/blob/master/testdir.tar.gz

Note that this data is really silly test-data, but it made it possible to showcase the differences.

I have not yet tested ugrep, i will later, hopefully it is better with alrger amounts of directories and files and gains more from parallelism than rg on larger worksets.

Would love to see this, or another kind of test implemented. Thanks!

ugrep -z not being the default

For the ugrep (i.e. not called as grep):

  • Why is -z not the default? Nr. 1 (and paramount) target of this program should be in-place zero-conf (as shown in e.g. #48), and Nr. 2 should be to search archived files. It is even in your wiki (https://github.com/Genivia/ugrep/wiki#why-did-you-build-ugrep):

    Why did you build ugrep?

    We were looking for a grep tool to quickly dig trough hundreds of zip- and tar-archived project repos with thousands of source code files, documentation files, images, and binary files. We wanted to do this without having to expand the archives, to save time and storage resources.

    So, I am wondering why -z would not be a default.

  • Even when ugrep is explicitly given a supported archive (ugrep -FTrin PrefHashCalculator::Validate src.git-refs_tags_83.0.4103.116.tar.gz) it refuses to find its match. Given that I didn't want to decompress 3G source code on my disk, I was able to at least do a ramdisk, and spill it there.

macOS Illegal instruction: 4

Hi
for compiled binary from you

  1. macOS versions of ugrep depends on libpcre2-8.0.dylib
    please make static pcre2 lib and make ugrep without depends on pcre2 dylib.
  2. modify ugrep with install_name_tool -change or add pcre2 dylib to /opt/local/lib and now i have a message:
    Illegal instruction: 4
    macOS High Sierra.
    Regards.

Feature Request - larger hex view

If i am grepping for strings in a binary with the pager, the hex view can break up strings for an in pager refined search (for string associations)

I would like to have the option for a 4 colum hex view, or N colum hex view, so i can use the in pager search to match a larger line.

Segmentation fault

segmentation fault when searching using wildcards and matching multiple files? for example with path 2019-01-01/conn*. Ran into this issue trying to search all logs for an entire month.

alocal reference hardcoded?

Had to use autoreconf -vfi on both systems i have been testing on. Have not checked the code, so feel free to shoot me down. Both servers were ubuntu 16.04 LTS.

loading -f from workingdir is maybe bad practice or even potential security risk

Scripting languages has had the convention of implicitly including current working dir before traversing a lookup $PATH - but have in recent time (despite the big headache that involves) shifted to not use $CURDIR except when explicitly requested (e.g. by checing if path begins with ./.

I propose to tighten the logic of the -f option to be similarly cautious. I cannot point at a concrete problem (but Python and Perl developers had the anti-pattern for 10s of years before acknowledging it was a problem).

man page

Please fix it
Archives (.cpio,
.pax, .tar, .zip) and compressed archives (e.g. .taz, .tgz, .tpz,
.tbz, .tbz2, .tb2, .tz2, .tlz, and .txz)

to &.cpio etc. or only for .pax and .tbz
without this, the page does not show completely.

Replacing `grep` with `ugrep` makes `./configure` fail

As shown in rocky/zshdb#27:

$ git clone https://github.com/rocky/zshdb && zshdb
$ zshdb → λ git master → bat ./autogen.sh 
───────┬─────────────────────────────────────────────────────────────
       │ File: ./autogen.sh
───────┼─────────────────────────────────────────────────────────────
   1   │ #!/bin/zsh
   2   │ cp README.md README
   3   │ autoreconf -i && \
   4   │ autoconf && {
   5   │   echo "Running configure with --enable-maintainer-mode $@"
   6   │   ./configure --enable-maintainer-mode $@
   7   │ }
───────┴─────────────────────────────────────────────────────────────
$ ./autogen.sh
configure.ac:6: installing './install-sh'
configure.ac:6: installing './missing'
Running configure with --enable-maintainer-mode 
checking whether to enable maintainer-specific portions of Makefiles... yes
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking for grep that handles long lines and -e... /bin/grep
checking for a sed that does not truncate output... /bin/sed
checking for zsh... /usr/bin/zsh
checking Checking whether /usr/bin/zsh is compatible with zshdb... zsh 5.4.2 (x86_64-ubuntu-linux-gnu) is recent enough.
yes!
checking for diff... /usr/bin/diff
adding -w to diff in regression tests
adding --unified to diff in regression tests
checking whether ln -s works... yes
checking for rm... /bin/rm
checking that generated files are newer than configure... done
configure: creating ./config.status
configure: error: could not make ./config.status
$ echo $?
1

I know that chmod -x ugreping the artifact I have extracted fixes the above issue.

Unfortunately, I am not familiar with ./configure at all.

ugrep doesn't work for perl-regexp

I'm using ugrep 2.3.1 on Windows 7.

If I have a file, say foo.txt

un kilo de arroz
arroz con pollo
medio kilo de pato
arroz con pato
una cucharada de sal
arroz con huevo

and I do:
ugrep "(?=.*pollo)(?=.*arroz)" foo.txt
or
ugrep -P "(?=.*pollo)(?=.*arroz)" foo.txt

I don't get any result unlike grep or pcregrep

Pre-defined search pattern contributions

You can help improving ugrep by contributing your search patterns. By sharing your search patterns you are helping the community of "greppers" to improve their source code search skills and results.

I've defined some common search patterns for C, C++, Java, JS, JSON, Markdown, PHP, Python, Ruby, and XML (see the patterns directory README's). This is good starting point.

What is a pattern file?

Pattern files are used with option -f. Pattern files contain regex patterns. Any pattern that matches is part of the search results, except for "negative patterns" (see below). Comments start at the first column with a hash #. To start a pattern with a hash, use \#. Patterns that require multi-line matching should have a first line with ###-o to implicitly enable option -o. Empty lines in pattern files are ignored and can be used together with comments to organize the file contents. For POSIX matching (the default) the order of the patterns in the file does not matter. For Perl matching with option -P, the order matters.

What are "negative patterns"?

Note that patterns may include "negative patterns" such as zap_comments and zap_strings to skip comments and strings to improve the accuracy of search results. For example, ugrep -r -tc++ some_function -n -o -f c++/zap_comments searches lines matchingsome_function in the working directory C++ source code files. A negative pattern has the form (?^X) where X should not be matched to generate search results. Negative patterns cannot be used with option -P (Perl matching).

Questions?

For questions please comment.

regex length limit exceeded

This works fine with grep:

> wc uniprot-ids tt
 181256818  181256818 1875485054 uniprot-ids
    623606     623606    4736122 tt
 181880424  181880424 1880221176 total
> ugrep -vf uniprot-ids tt>ttt
ugrep: error: error in regex at position 16777215
|A0A2A2FGH5|A0A2A2FIR1|A0A2A2FKU0|A0A2A2FKV7|A0A2H9LNT8|B6YTZ0|A2SSR6|A2STZ5|B6
               \___exceeds length limit

Or maybe there is a better way to find all lines of one file that are not in the other?

provide reasonable defaults

Any software should default to the options that make most sense to the most users in most situations.

For a grep tool, I would expect it to:

  • ignore hidden/ignored files (ie. no build artifacts, no vcs files, etc)
  • pipe it's output through a pager (when running in a tty)
  • show color output (when running in a tty)

ugrep already does the third, but not the other two.

This is especially frustrating, as ugrep apparently already supports these modes of operation, but requires you (read, every first time user of the software) to read through a massive README file in order to find out how to enable these options, then write some shell aliases for them (as apparently, there is no configuration file?).

Context: I was so used to git grep, and missed it a lot when we switched to Mercurial. Ordinary grep gives tons of unwanted results from inside .hg and temporary build products, which git grep didn't. Luckily I stumbled upon ag, which did exactly what I was looking for, so I never looked back to grep.

Unfortunately it currently has a bug where it ignores all VCS ignore files, so I looked for alternatives (there are quite a few now, and interestingly, every single one is faster then all the others!). None of these did what ag did.

ugrep was one of the alternatives I tried, and I spent the better part of an hour to make any sense of it (the README is, sorry, quite intimidating), and try to get it to do what seemed reasonably to me (ignore .hg, and ignore everything from .hgignore. And in the end I gave up, did a simple ln -s .hgignore .ignore (workaround for the ag bug), and now happily continue to use ag.

I also read the stanza about people who have never heard of shell aliases. That's what prompted me to spent another 20min writing this entry, as I very strongly disagree. Yes, I am very well aware of shell aliases, and, No, I do not think that every first time user should be required to write these in order to use the tool in a meaningful way. The default behaviour of the tool is what every new user sees for the first time, and if it doesn't make sense, some of them will just walk away and never look back.

-g does not work

Hi
for 1.6.8
$ ugrep Hji -g file.txt - nothing
for 1.6.7 - everything is good
Thanks.
P.S.
tested on macOS, case sensitive.

support collections of filters (and related options)

Sometimes you want to apply a set of filters - would be nice if possible to store "presets" of common collections (and-f c/zap_comments
yet related command-libe options).

To explain what I mean, take this example from README.md:

ugrep -r -nkw 'main' -f c/zap_strings -f c/zap_comments myproject

Would be useful if ugrep supported "filters" that was instead shell-parsed lines of command-line options - e.g. using shebang #!ugrep to distinguish from "regular" filter files - like this (as file $GREP_PATH/code:

#!ugrep

-f c/zap_strings
-f c/zap_comments
# not implemented yet 
# -f c/zap_bugs

to apply them as a set:

ugrep -r -nkw 'main' -f c/code myproject

A nifty extension would be that when filter is a folder then look for file default inside the folder, allowing to compose a defauly set of filters from a collection:

ugrep -r -nkw 'main' -f c myproject

Now, that in itsef is little different from writing it all out explicitly, but the imagine adding code and non-code files for each language collecyion and introducing a meta collection called CODE with files code and non-code which a) includes all of the language-specific files, and adds CODE/default as symlink to CODE/code` - then you can can grep for "main" across any supported code language:

ugrep -r -nkw 'main' -f CODE myproject

Enhancing it further, a appropriate --filter options could be added for languages like SVG and Postscript using only semi-plaintext content, and things like Smalltalk where sourcecode is binary images.

Issue with install in wsl ubuntu

ugrep.cpp: In member function ‘bool Grep::open_file(const char*)’:
ugrep.cpp:2156:10: error: ‘filter’ was not declared in this scope
if (!filter(file, pathname))
^~~~~~
ugrep.cpp:2156:10: note: suggested alternative: ‘file’
if (!filter(file, pathname))
^~~~~~
file
ugrep.cpp: In member function ‘bool Grep::close_file(const char*)’:
ugrep.cpp:3044:31: warning: unused parameter ‘pathname’ [-Wunused-parameter]
bool close_file(const char pathname)
^~~~~~~~
ugrep.cpp: In function ‘void sigint_reset_tty(int)’:
ugrep.cpp:360:5: warning: ignoring return value of ‘ssize_t write(int, const void
, size_t)’, declared with attribute warn_unused_result [-Wunused-result]
(void)write(1, "\033[0m", 4);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~
Makefile:436: recipe for target 'ugrep-ugrep.o' failed
make[1]: *** [ugrep-ugrep.o] Error 1
make[1]: Leaving directory '/home/rbergh/ugrep/src'
Makefile:456: recipe for target 'install-recursive' failed
make: *** [install-recursive] Error 1

I did not have an issue with WSL and ugrep before.

Unable to build on Cygwin

I tried building ugrep 1.5.4 on Windows 10 using Cygwin's gcc. That would be handy, e.g. for symlinks. The build hits problems with fseeko and ftello:

input.cpp:642:13: error: 'ftello' was not declared in this scope
input.cpp:642:13: note: suggested alternative: 'ftell'

input.cpp:737:2: error: 'fseeko' was not declared in this scope
input.cpp:737:2: note: suggested alternative: 'fseek'

If I work around this, I then hit other problems with _vsnprintf, fopen_s and strerror_s:

In file included from ugrep.cpp:104:0:
/usr/include/w32api/strsafe.h:1555:12: error: '_vsnprintf' was not declared in this scope
/usr/include/w32api/strsafe.h:1555:12: note: suggested alternative: 'vsnprintf'

ugrep.cpp:1609:9: error: 'fopen_s' was not declared in this scope
ugrep.cpp:1609:9: note: suggested alternative: '_fopen_r'

ugrep.cpp:5370:5: error: 'strerror_s' was not declared in this scope
ugrep.cpp:5370:5: note: suggested alternative: '_strerror_r'

Any chance of making this compile under Cygwin? Many thanks!

[FR] Accept Ctrl^\ Keystroke in the ugrep -Q mode

I just noticed that it's not quite convenient to quit from the ugrep -Q mode fastly. ugrep -Q does not accept the Ctrl^C keystroke (SIGTERM) and that fine. But I think it should support the Ctrl^\ keystroke (SIGQUIT). Being able to quit from the interactive mode is helpful for faster interaction.

Thanks for the awesome tool.

does not accept `without-match`

❯ ugrep --binary-files=without-match -Q                                                                                      
ugrep: invalid argument --binary-files=TYPE, valid arguments are 'binary', 'without-match', 'text', 'hex', and 'with-hex'
Usage: ugrep [OPTIONS] [PATTERN] [-f FILE] [-e PATTERN] [FILE ...]
Try 'ugrep --help' for more information.

❯ ugrep --binary-files='without-match' -Q                                                                                    
ugrep: invalid argument --binary-files=TYPE, valid arguments are 'binary', 'without-match', 'text', 'hex', and 'with-hex'
Usage: ugrep [OPTIONS] [PATTERN] [-f FILE] [-e PATTERN] [FILE ...]
Try 'ugrep --help' for more information.

❯ ugrep --version                                                                                                            
ugrep 2.1.1 x86_64-apple-darwin19.0.0 +avx +pcre2_jit +zlib +bzip2 +lzma
License BSD-3-Clause: <https://opensource.org/licenses/BSD-3-Clause>
Written by Robert van Engelen and others: <https://github.com/Genivia/ugrep>

support named match

would be really nice if ugrep supported named matching.

My use-case is scanning for copyright and licensing information. I have a large collection of patterns and would like to scan for them all at once, and be told which one matched.

More fine-grained, I would also like to be able to detect if certain subpattern was found within a larger pattern. A Concrete example is that the GPL-2 license has 6 variants and the first had a typo - would be nice if I was able to do a single scan for all licenses, and additionally be told if a file detected as GPL-2 was a "flawed" on (there are other types of flaws, e.g. outdated postal address to Free Software Foundation, or a typo for newest postal address of "Franklin Steet" with a missing "r").

[FR] Add fuzzy mode

I think this shouldn't be too hard? See this for turning the text into a fuzzy search and this for sorting them intelligently. I think ugrep beats fzf in searching file's content interactively.

Some ideas to (de)select files to search in interactive query mode

The query UI mode with option -Q allows interactive pattern searching of the files specified on the command line and the files found in (sub)directories when performing recursive searches. The breadth of the search can be controlled with the usual file and directory inclusion/exclusion command-line options, such as -g to match files against globs, -t to match files by file type, and --ignore-files to ignore files specified in .gitignore.

Once the UI starts there is currently no feature to (further) restrict the search to a collection of files searched.

It would be nice to be able to further restrict the search, for example:

  1. by selecting individual files and directories to exclude;
  2. by specifying additional globs to exclude files and directories.

I am not sure yet what the most intuitive way would be to offer either one or both of these features in the query UI.

Here are my thoughts on this.

For the first feature, perhaps individual files can be selected from the list of matching files (like option -l)? That could work, but directories are not explicitly listed and cannot be individually excluded.

For the second feature some form of a "glob editor" will be needed. This editor should make it easy to add, change, and delete globs that are in the "exclusion lists". Initially, those globs are collected from -g "do not match" globs, --exclude, and --exclude-dir specified on the command line. This glob editor could place each glob at its own line to edit or simply offer a one line-based edit of all globs on a single line, e.g. separated by commas. I prefer a glob per line as the more user-friendly option. A simpler line-based implementation is easier to implement, but perhaps less user friendly, although it could be quicker to edit globs as they are all on one line. On the other hand, the line can get very long and parts may drop off from view.

I think this extension of the query UI would be useful. Not sure yet what this feature is going to look like.

optionally filter same content multiple times

Sometimes it makes sense to grep multiple filters of a file.

Example: Looking for copyright info in a PDF file - some hints may exist in the document content and therefore found by grepping with --filer=pdf:'pdftotext %', whereas other hints may be properly stored in XMP metadata found by grepping with --filer=pdf:'exiftool %'.

Would be helpful if possible to tell ugrep to not replace a filter (as I blindly assume is the current behaviour) but optionally add another layer of input for same file, e.g. by using suffix "+" to extension:

ugrep -r copyright --filer=pdf:'pdftotext %' --filer=pdf+:'exiftool %' Documents

When running ugrep.exe under Windows10 an error messages: "not compatible to 64 bit Version of Windows" occur

Steps:

  1. Downloaded ugrep.exe from Die Version von C:\Users\foo\bin\ugrep.exe ist mit der ausgeführten Windows-Version nicht kompatibel. (Release 1.6.3)
  2. Placed in a location which is in the PATH
  3. Run the exe:
C:\Users\foo>ugrep
Die Version von C:\Users\foo\bin\ugrep.exe ist mit der ausgeführten Windows-Version nicht kompatibel. Überprüfen Sie die Systeminformationen des Computers, und wenden Sie sich anschließend an den Herausgeber der Software.

Translation of the error Message

The version of C:\Users\foo\bin\ugrep.exe is not compatible with the running Windows version. Check the computer's system information, and then contact the publisher of the software.

Edition: Windows 10 Enterprise
Version: 1809
OS-Build: 17763.864

How can I get ugrep to run on windows 10?

ugrep and symbolic link

Hi
the folder contains a symbolic link to the file file.txt
file.txt contains the word "text", without quotes
ugrep -Rl '' and ugrep -rl ''
no file.txt in output
ugrep -ttext -R text, ugrep -r -ttext text
nothing
maybe I'm doing something wrong?
Thanks.

"Too many open files" error on recursive search

I'm using ugrep compiled for Cygwin. If I do a recursive search of a directory with a few hundred files in it (and its subdirectories), then it works fine. However, if I do a recursive search further up the file tree (with tens of thousands of files) I quickly get many "cannot read ... Too many open files" errors.

Is ugrep somehow doing a parallel search so that many files are open at once? Or is this an intrinsic limit of Cygwin? "ulimit -n" for Cygwin Bash reports 256.

-J 1 gives the same result as - J 16

Ubuntu 1604
Installed as per your latest guide.

time rg -j 1 -z -F 10.0.0.1 ./2019-01-1/2019-01-1.conn.[1,2].log.gz | wc -l
1000000

real 0m0.523s
user 0m0.536s
sys 0m0.428s

time rg -j 16 -z -F 10.0.0.1 ./2019-01-1/2019-01-1.conn.[1,2].log.gz | wc -l
1000000

real 0m0.364s
user 0m0.480s
sys 0m0.520s

time ugrep -J 16 -r -z -F 10.0.0.1 ./2019-01-1/2019-01-1.conn.[1,2].log.gz | wc -l
1000000

real 0m2.267s
user 0m2.112s
sys 0m0.492s

time ugrep -J 1 -z -F 10.0.0.1 ./2019-01-1/2019-01-1.conn.[1,2].log.gz | wc -l
1000000

real 0m2.308s
user 0m2.128s
sys 0m0.560s

[FR] directory and file navigation in query UI mode

It would be nice to be able to change into subdirectories while using the query UI.

For example, if the current search returns a list like this (e.g. with -l, but could be any other option):

foo/file.txt
...
foo/other.txt
...
bar/file.txt
...

then I'd like to be able to navigate to directory foo to inspect the files there.

One way to do this quickly and smoothly for the user is to re-assign the TAB key for this purpose: pressing the TAB key navigates one level deeper into the directory tree of the file shown at the top of the screen.

Using the TAB key is intuitive, because the result is similar to file path auto-completion. With one exception though: pressing TAB again moves deeper into the tree. To select among alternatives we use the up and down keys anyway.

Shift-TAB moves up in the directory tree.

Also, when there is no directory to navigate into, TAB could select a single file to search. This completes the file path navigation to a single file in the tee. This single file would be locked to search with any patterns that are entered, until Shift-TAB is pressed to deselect the file and show the contents in the directory again.

That's nice, but this assumes that we need to change the key mapping for TAB and Shift-TAB, which is currently to pan the screen right/left by 8 columns. The screen can also be panned right/left with Shift-Right/Left or Ctrl-Right/Left depending on the system.

I'm not sure how many folks would have a problem with the loss of the current TAB and Shift-TAB key mapping. It can be a problem in Selection Mode when pressing TAB with the new meaning, because it deletes the current selections to navigate and produce new search results.

Request: RE: ugrep predefined patterns

Hi,

A useful addition to the -f method to run multiple pattern files would offer users a more convenient way to 'bulk' run regex lists.

So for example, instead of running ug -f html/comments -f html/img -f html/href ... An alternative way to run everything in a folder and it's sub folders might be: ug -f html/* ...

The flexibility to group these regexes in a folder structure is greatly appreciated for organisational purposes. I do however often need to search large datasets for many different items of interest so calling a top level folder would be most useful.

support filter based on magic bytes

Would be nice if possible to express a filter to be used not based on extension but on magic bytes. Maybe the elegant approach for that is to express which extension a certain set of magic bytes should internally be treated as (similar to how stdin can be assigned a virtual extension).

My use-case is scanning for copyright and license information in source code, some of which is binary data. Concretely, some fonts - so-called Postscript Type1 fonts - are binary and without a standardized extension. I would like to be able to scan all of Ghostscript source, extracting metadata from PNG images and fonts using exiftool.

[DISCUSSION] Refactoring the interactive UI to work as a frontend

Refactoring the interactive UI to work as a frontend that can use any search tool (ripgrep, ucg, ag, fzf in noninteractive mode, ...) seems like a good idea to me, as I don't know of any such interactive tool currently available on Github. (Seems to me most people use vim/emacs frontends for this.)

It can potentially bring more exposure to this project.

Missing option in the help

Thanks for creating such a performant search utility!

I found that ugrep seems to support the -i option for case-insensitive search. But why it was not listed in the help text? Are there any other options not listed too? Below is my testing.

PS C:\my_projects\scintilla> ugrep pixmapselpattern
PS C:\my_projects\scintilla> ugrep -i pixmapselpattern
src\MarginView.cxx:             pixmapSelPattern.reset();
src\MarginView.cxx:             pixmapSelPatternOffset1.reset();
src\MarginView.cxx:                     pixmapSelPattern->Release();
src\MarginView.cxx:             if (pixmapSelPatternOffset1)
src\MarginView.cxx:                     pixmapSelPatternOffset1->Release();
src\MarginView.cxx:     if (!pixmapSelPattern)
src\MarginView.cxx:             pixmapSelPattern.reset(Surface::Allocate(vsDraw.technology));
src\MarginView.cxx:     if (!pixmapSelPatternOffset1)
src\MarginView.cxx:             pixmapSelPatternOffset1.reset(Surface::Allocate(vsDraw.technology));
src\MarginView.cxx:     if (!pixmapSelPattern->Initialised()) {
src\MarginView.cxx:             pixmapSelPattern->InitPixMap(patternSize, patternSize, surfaceWindow, wid);
src\MarginView.cxx:             pixmapSelPatternOffset1->InitPixMap(patternSize, patternSize, surfaceWindow, wid);
src\MarginView.cxx:             pixmapSelPattern->FillRectangle(rcPattern, colourFMFill);
src\MarginView.cxx:             pixmapSelPatternOffset1->FillRectangle(rcPattern, colourFMStripes);
src\MarginView.cxx:                             pixmapSelPattern->FillRectangle(rcPixel, colourFMStripes);
src\MarginView.cxx:                             pixmapSelPatternOffset1->FillRectangle(rcPixel, colourFMFill);
src\MarginView.cxx:                                             invertPhase ? *pixmapSelPattern : *pixmapSelPatternOffset1);
src\Editor.cxx: if (!marginView.pixmapSelPattern->Initialised()) {
src\MarginView.h:       std::unique_ptr<Surface> pixmapSelPattern;
src\MarginView.h:       std::unique_ptr<Surface> pixmapSelPatternOffset1;
PS C:\my_projects\scintilla> ugrep --help | ugrep -e '-i'
            Note that --exclude patterns take priority over --include patterns.
            --include-dir patterns.  GLOB should be quoted to prevent shell
            priority over --include-fs mounts.  This option may be repeated.
            Search only files whose name matches GLOB, same as --include=GLOB.
            comma-separated list of EXTENSIONS, same as --include='*.ext' for
    -T, --initial-tab
    -v, --invert-match
PS C:\my_projects\scintilla>

New mode for interactive grep search with simple UI (ugrep 2.0)

I am thinking of adding an interactive mode for interactive querying, filtering, and displaying of grep results. Perhaps we should repurpose option -Q for this, which starts an interactive mode by opening a simple UI to enter patterns to search files and display results to browse through.

Requirements: Option -Q opens a window in the current terminal to enter search queries interactively as regex patterns. Results are immediately displayed, similar to percol-like tools. Because ugrep is fast, this will produce results fast for interactive searches, even when recursing into directories. There is no need to search the entire recursive tree each time a pattern is entered or modified. Only the matches shown in the window need to be produced and the search engine can wait to produce more results until the user scrolls down. The search engine should be cancelled when the search pattern is updated. This create a responsive system that does not overload the CPU e.g. when the search pattern is frequently updated.

Formatting the outputted selection in query mode

I have been trying to get ugrep to output the path, the line number, and the column number of the selection(s) made in the interactive mode. I tried --format='%u%f%s%n%s%k', and --format-end='%u%f%s%n%s%k'. The first makes ugrep never find anything (I don't understand why.), and the second doesn't do anything I can notice. My usecase here is to be able to jump from selections in interactive ugrep searches into my text editor (emacs). --line-number, --column-number and -H aren't useful for this task, because --line-number and --column-number pollute what is shown on the interactive search (who wants to see the line numbers and column numbers strewn over the search results?), and -H becomes a heading that is not included in the output (to stdout).

What I want is, e.g., when I exit from this selection:

image

To get this output in stdout (, is the separator, the last field is the column number which is -1 for non-matching lines, and LINE_NUMBER is the selection's line number.):

dev/notes.md,LINE_NUMBER,-1
todo/dev/aria2.md,LINE_NUMBER,3

Poll: what would you like to add to improve option --pretty?

Option --pretty produces filename headings (-+, --heading) and tabbing (-T, --initial-tab) when the output is sent to a terminal. That is basically it.

In fact, -T+ is even shorter to type than --pretty!

So this option isn't doing much and seems wasteful to include. But it has a lot of potential and we can improve it.

What would you like --pretty to do in addition to -T+ when the output is sent to a terminal?

Here are some suggestions, but please add your own if you don't find this useful:
[ ] enable option -n to show line numbers
[ ] enable --sort to sort by filename, if no --sort option was specified
[ ] use different colors (or additional colors?) than the default

Number of hits not consistent

Searching a folder of compressed logs gives me a variation of up to 40% in the number of results, both with and without -o. without -o, its even at 1/7 of the results zgrep gives, and about 1/2 with -o. Both options differ from run to run, and its never the same number anymore.

[FR] a feature to edit a file in the query UI

When pressing v in less the (visual) editor specified by environment variable EDITOR is used to edit the current file being viewed. We can do the same in the query UI for the file that is currently viewed at the top of the screen. Variables GREP_EDIT and EDITOR are checked and when set the specified editor is forked.

This offers a nice workflow for users to search files centrally and selectively edit them based on the search results.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.