rgpipe
is a single bash/sh script and an alias to use with ripgrep to search through a myriad of file types that are otherwise not grep friendy. rgpipe
because the idea is similar to lesspipe.
Use it with ripgrep's -pre command which allows ripgrep to selectively process files before searching
I wrote up an extended gist about how to use it here
That gist is only useful because of the kind note by BurntSushi in this hacker news comment explaining how rg --pre-glob
works.
- New MS Office files (docx,pptx,xlsx, variants thereof)
- Uses
unzip
andsed
- Uses
- Old MS Office files (doc,ppt,xls,variants thereof) & new excel binary format
- Uses
strings
- Uses
- LibreOffice files (ods,odt,odp)
- Uses
unzip
andsed
- Uses
- PDF
- Uses
pdftottext
from poppler
- Uses
- Web/structured formats (html, xhtml ...)
- Uses
w3m
lynx and friends also works. Not 100% neccesary.
- Uses
- Web formats disguised as books (chm, epub)
unzip
andw3m
for epub7zip
andw3m
for chm
Ubuntu wants: sudo apt install poppler-utils p7zip w3m unzip
termux wants: pkg install poppler p7zip w3m
Assuming rgpipe is in path, use /path/to/rgpipe if it's not
rg --pre rgpipe YourSearchTermHere
Above uses rgpipe even when it's not needed, which is slow, ripgrep can selectively use it with --pre-glob
rg --pre-glob '*.{xlsx,pptx,docx,pdf}' --pre rgpipe YourSearchTermHere
A more thorough pre glob:
rg --pre-glob '*.{pdf,xl[tas][bxm],xl[wsrta],do[ct],do[ct][xm],p[po]t[xm],p[op]t,html,htm,xhtm,xhtml,epub,chm,od[stp]}' --pre rgpipe YourSearchTermHere
An alias because that is alot of typing
alias rgg="rg -i -z --max-columns-preview --max-columns 500 --hidden --no-ignore --pre-glob \
'*.{pdf,xl[tas][bxm],xl[wsrta],do[ct],do[ct][xm],p[po]t[xm],p[op]t,html,htm,xhtm,xhtml,epub,chm,od[stp]}' --pre rgpipe"
Step 1: use rgpipe to make text sidecar files
findrgpipetype() {
find `pwd` -type f -iname "*.$1" -exec sh -c 'for f; do rgpipe "$f" > "${f%.*}.txt"; done' _ {} +
}
Step 2: Use ripgrep to search those files
rg YourSearchTermHere
2 - The pre processing script that is the template into which I added some more file types
3 - midnight commander has great scripts on this subject
5 - rga is a rust based tool doing a similair thing