Comments (28)
Well, I've been experimenting, and I've found that the biggest speed problem is fontification.
(defun helm-org-rifle--get-source-for-literal-results (results)
"Return Helm source for RESULTS."
(let ((source (helm-build-sync-source (car results)
:after-init-hook helm-org-rifle-after-init-hook
:candidates (cdr results)
:candidate-transformer helm-org-rifle-transformer
:match 'identity
:multiline helm-org-rifle-multiline
:volatile t
:action (helm-make-actions
"Show entry" 'helm-org-rifle--show-candidates
"Show entry in indirect buffer" 'helm-org-rifle-show-entry-in-indirect-buffer
"Show entry in real buffer" 'helm-org-rifle-show-entry-in-real-buffer)
:keymap helm-org-rifle-map)))
source))
(let ((helm-candidate-separator " ")
(fontify-fn #'identity)
(fontify-fn #'helm-org-rifle-fontify-like-in-org-mode))
(helm :sources (cl-loop for r in (let ((case-fold-search t)
(input "emacs")
(outline-regexp "\\*+ "))
(with-current-buffer (get-buffer "*test*")
(cl-loop for file in org-agenda-files
do (progn
(insert-file-contents-literally file nil nil nil t)
(goto-char (point-min)))
collect (cons file
(cl-loop while (re-search-forward input nil t)
collect (progn
(outline-back-to-heading)
(cons (funcall fontify-fn
(buffer-substring-no-properties (point)
(progn
(outline-next-heading)
(point))))
(point))))))))
collect (helm-org-rifle--get-source-for-literal-results r))))
That will show results for emacs
in all of org-agenda-files
, but by inserting them literally into a temp buffer one-by-one instead of opening every file in a new buffer.
Now if you set fontify-fn
to 'identity
, it's fast. But when you set it to the fontification function, it's much slower. So if you don't care about the appearance of the results, you can have the faster version, but it looks like plain text, not an Org buffer. If you want the result fontified, it's slow.
And I don't see any way to fix that. Emacs has to do the fontification itself, so no matter how we feed it entries, whether from an external tool or from within Emacs, the fontification is going to be the bottleneck.
So, if you are interested in a non-fontified version, I can add some code to do that. The only advantage it would have over a plain grep command is that it would show the whole entry instead of just matching lines, but that's some benefit.
Let me know what you think. :)
from org-rifle.
Well, sift is almost good enough, but not quite. For example:
sift -n -i -m -e '^\*+ +[^*]+emacs[^*]+' --only-matching main.org
That produces what looks like good output: the heading and entry contents for every entry that contains "emacs". But the problem is that the negated character sequences [^*]+
will cause entries to be truncated if the entry contains a *
anywhere in it (e.g. for bold text, or plain lists). PCRE negated groups would probably fix this (e.g. (?!:^*)+
to match anything except a *
at the beginning of a line, which should match entry contents but not the next heading)...but, of course:
Error: cannot parse pattern: error parsing regexp: invalid or unsupported Perl syntax: `(?!`
I think using only negated character classes would be a bad idea, because it would result in truncated matches, and that might cause false negatives as well--I'm not sure.
Anyway, it's another case of a tool being 95% of what we need, but that last 5% is really important. :(
So, two possibilities that I can think of for going on from here:
-
Use
insert-file-contents-literally
to read every file into a single temporary buffer, one at a time, and search it for matches, then present the results. That would avoid opening a buffer for every file. It would be almost like making Emacs work like grep. It might be fast enough to be useful. -
Use grep or whatever for line-matching instead of entry-matching. Then open a buffer for every file in the results and get each entry from that file. The benefit over existing behavior would be that it would only open files that have matches in them (potential matches, at least; negations aside). But if a common term were searched for, it would still open a lot of buffers, which would defeat the purpose.
-
I wonder if
awk
could be used to get what we need. I don't relish writing Awk scripts, but I guess using it would be faster than doing it in Emacs, and it might be possible to do exactly what we need with it.
from org-rifle.
Ok, I'll try to push a branch with that soon. Thanks.
from org-rifle.
@Johnstone-Tech
I have been testing the find-files-raw branch with great success, time between keybinding and helm buffer appearing is significantly quicker. Shaved off about 5 seconds on my setup, I have around 200 org-mode files across 60 directories.
Thanks for the feedback. How long was the total time? Were any of the files already open in Emacs?
Wasnt able to get the benchmarking macro to work, should it just be a case of C-c C-c on the src_block.
I'm not sure which one you mean. What happened when you tried?
Finally, thanks for all the work you have put into this and your other emacs packages they are extremely helpful and much appreciated.
Thanks for the kind words.
You sound like someone who might be interested in some early code I have for indexing Org files in a SQLite database. It's not very user-friendly yet, but you can look at the org-rifle
branch if you are interested. Most of the db-related code is in that branch in the sandbox
directory. So far I think there are a few issues with the idea:
- Indexing happens in a child Emacs process, and we need some way to launch the indexing process and ensure that the same file isn't indexed simultaneously by multiple indexing processes. So we need some kind of locking, probably a queue that files can be added to, etc.
- Indexing is rather slow, but probably fast enough to be useful. And it can probably be improved to some extent.
- However, re-indexing files is very slow because of the way old rows are deleted from the db before re-indexing. There might be some SQLite tricks we could use to improve that. Or maybe we could forego SQLite altogether and use something like MySQL or Postgres. (Or even other indexers, like Recoll, but I know little about them.) But I would like it to be as user-friendly as possible and avoid requiring manual configuration of databases, etc, even though I'm sure there are some users who wouldn't mind doing that.
Thanks for your feedback.
from org-rifle.
I will do some additional tests prior to opening the org files and after to verify i was getting a performance increase. Certainly feels a lot faster.
Brilliant I will checkout the org-rifle branch. Sqlite seems like an appropriate choice considering how popular it is and the single file nature of the format.
from org-rifle.
I noticed that on the find-files-raw branch, that all of the files that are searched are still open buffers after the search is complete. I was thinking that this branch implemented the "draw all content into a single file and fontify/search that".
Is this not the case?
No, because that would present the results as all being from a single buffer rather than being from their individual source files. One could try using text properties on each buffer's text to keep track of that; it would require changes in a few places, and it would need benchmarking.
If that is never going to be a thing, what do you think about a parameter that would track which files were opened as a result of search, and then subsequently close them after search is complete?
That would make sense.
This also produces the prompt:
The branch is experimental, and probably needs rebasing by now, being a few years old.
from org-rifle.
Hey Z!
Funny you should mention that, just yesterday I was working on this because I saw that update in ivy
! (Great minds...?)
I'd like to try ripgrep
, but it still doesn't support multi-line matching, which makes it a bit less suitable. git grep -W
with the proper xfuncname
in .git/config
makes this a lot easier, although it's not perfect either, and of course it only works on Org files stored in a git repo, and requires the user to manually configure that setting.
The slowness comes from having to open each file and activate org-mode
in it, which can be slow depending on the size of the buffer and your personal Org config. So using an external tool to find out which files have matches could speed it up by skipping files that don't have matches, but each matching file would still require opening and activating Org in.
Another angle is to get actual matching nodes from the external tool (which git grep
allows), then inserting them into a temporary buffer, running Helm on that, and finally only opening the source files when the user selects a result. This is more complicated, but I think it's doable. But now that I think of it, maybe I should try the other idea first.
Can you give me an idea of how many files you end up searching when you call one of these commands? If it's a really large number, the first method would probably help your situation a lot. If it's a few large files, the second method would probably be the one to try.
Thanks for the feedback!
from org-rifle.
Hey, I just added a branch which may help a lot: https://github.com/alphapapa/helm-org-rifle/tree/find-files-raw This reads unopened files "literally," which avoids activating Org mode unless the user actually chooses a result. This should avoid the slowness caused by activating Org mode in every file before searching it. Could you please test it and let me know how it goes? I'd still like to make use of some external searching tools, but that's proving a bit difficult since rifle searches by nodes rather than lines, so this may be a good solution in the meantime.
from org-rifle.
Hi @alphapapa
Happy to test it out! yet im very nontechnical as you remember :) how do i update to the new branch?
best
Z
from org-rifle.
Oh, sorry. :) Probably the easiest way to try it is to go here in your browser: https://raw.githubusercontent.com/alphapapa/helm-org-rifle/find-files-raw/helm-org-rifle.el Copy and paste the contents of that file into a buffer in Emacs, then run eval-buffer
. Then try it out! :)
from org-rifle.
Whoa boy thats insanely fast now :) :)
i tried it various times over last ~1h hour and so and it seems to work great.
One related question, can the results be presented in ivy instead of helm (as an option?)
thx again, will be happy to test anything needed!
Z
from org-rifle.
Hey, great! Can you give me an idea of how many files you're searching with it? Like, are we talking tens, hundreds, or thousands, or...?
Using Ivy is not a bad idea for an alternative UI. It wouldn't be too difficult to add. However, Ivy doesn't provide as much functionality behind the scenes, so some of the more advanced features (like choosing between multiple actions, sorting in different ways) would either have to be reimplemented from scratch (not appealing) or left out altogether. But a basic version of the command could be done easily enough. Of course, if I were to do that, I might need to rename the package since it wouldn't be just for Helm anymore (and that's something I've been considering anyway). Another thing I don't know about is whether Ivy supports multi-line entries. If it doesn't, that would be a big drawback.
I'll put it on the todo list as a maybe item. ;)
I'm going to hold off on releasing this find-files-raw branch for a while because I wouldn't be surprised if it causes some little bugs here and there. I'll probably tag the 1.4 release without it in the next few days, and then push find-files-raw to master after I polish and test it more, aiming to release it in 1.5.
But I would appreciate it if you could continue testing it and let me know about any issues you may find. If you want to use it automatically, without having to evaluate the buffer, you can replace the helm-org-rifle.el
file in your elpa/helm-org-rifle...
directory; be sure to delete the helm-org-rifle.elc
file if you do.
Thanks.
from org-rifle.
Hi, i think its around 100 files more or less :)
Cool, ill keep testing it and report bugs if i find any! so far its been working flawlessly :)
best
Z
from org-rifle.
Great, thanks.
from org-rifle.
Hi, I have been testing helm-org-rifle on a project with about 8000 files.
find . -type f | wc -l
8334
After applying the branch as described above, running helm-org-rifle-directories
takes a few minutes before the pattern query appears in the minibuffer.
After that, typing a string known to exists fails silently (no results, no error message).
On a different note, have you considered testing sift? It seem to have multiline support.
from org-rifle.
Hi Priyadarshan,
Thank you very much, that's definitely the kind of testing I've been hoping for. That is a lot of files indeed. I am curious to see how Emacs would handle opening that many files in, say, text-mode. I'll see if I can test this myself. I'm guessing that that's simply too many for Emacs to handle quickly, and so the way rifle currently works, opening each one in an Emacs buffer first, is just not suitable for that many files.
As a matter of fact, I stumbled upon sift again last night, and it's on my list of tools to test. I've tried a few others, but each one seems to have some small issue that makes it unsuitable or difficult to use for this project. I'm hoping that sift will be the one!
By the way, can you give me a rough idea of the size of these files, like the average size? I doubt it matters much here, but I'm curious. For that many files, you might want to consider some kind of indexing solution.
If I could impose on you, would you mind running one of your typical queries using helm-do-grep
or one of the similar commands on the set of 8000 files, and let me know how it performs? I wonder what I'm up against here. :)
Thanks for your help.
from org-rifle.
Since testing on about 8000 files was too lengthy, I made a selection of a collection of 1768 files
$ cd archive
$ find . -type f -name '*.org' | wc -l
1768
Files are more or less the same length, total size is 76M,
$ du -ch -- **/*.org | tail -n 1
76M total
So, each file is about 42K.
I do use an indexing tool, recoll, but being able to access the files from Emacs would be ideal.
I tried searching for pattern "please" with helm-do-ag
. Emacs immediately displayed some results, but then it stop responding. CPU was at 100% for a few minutes, I could not even stop with C-g
.
I tested it on Intel i7 2.6 GHz, with 16GB Ram.
I wonder it if would make sense to just "slurp" all files as "fundamental mode", and then use occur
, or helm-swoop
?
Reading a file seems an ideal candidate for async operation, so Emacs could use all CPU cores.
I would not mind to dedicate even 1GB of RAM, in oder to have all the archive available through helm-org-rifle
.
from org-rifle.
Hm, well, that's a lot fewer files, but I'm guessing Emacs is going to take a while to open 1,768 files, no matter what.
I do use an indexing tool, recoll, but being able to access the files from Emacs would be ideal.
Have you seen helm-recoll? I remember reading about it a while back. Here are a couple of links you might want to check:
https://oremacs.com/2015/07/27/counsel-recoll/
https://github.com/emacs-helm/helm-recoll
I wonder it if would make sense to just "slurp" all files as "fundamental mode", and then use occur, or helm-swoop?
The find-files-raw branch does load them in fundamental mode...only in the Helm commands, not the occur
commands, though (I'll fix that sometime). I'm guessing that helm-swoop
would actually be extremely slow for this use case. IIRC it copies every buffer's content into a new buffer and adds line numbers, before it even starts searching them, so doing that across 1700 files and 76 MB would probably take a while...
Reading a file seems an ideal candidate for async operation, so Emacs could use all CPU cores.
It would be, indeed, but as far as I know, there's no way for Emacs to load files asynchronously. Tools that use async stuff, like Paradox, Magit, etc, run external processes. So, yeah, you could run a second Emacs process in the background and load all the files into it, but then you'd have to pass the results back into the first process, and if you're going to do that, you probably should just use a dedicated searching tool like sift, etc.
I would not mind to dedicate even 1GB of RAM, in oder to have all the archive available through helm-org-rifle.
Well, that sounds good to me! haha :) I guess you could try loading all of the files you might want to search, then taking a coffee break while they load, and then keeping that Emacs process loaded and all those buffers open while you work. I guess the only problem might be displaying the buffer list when you need to switch buffers, but that could probably be worked around with some kind of custom function that only displays certain ones, or something like that.
I tried searching for pattern "please" with helm-do-ag. Emacs immediately displayed some results, but then it stop responding. CPU was at 100% for a few minutes, I could not even stop with C-g.
That's a little bit surprising to me, but I don't actually have ag
, so I haven't tried that command. Maybe the results were coming in so fast that Emacs couldn't process them fast enough to respond? I'm not sure. If you are interested, you might try some of the other commands, like helm-do-grep
which just uses plain grep, and I think you can also use other similar searching tools. It's possible the bottleneck is not in the tool but in the way the Emacs command is implemented.
And it's also possible that Emacs is just not able to handle that much data coming in from a process very well. For example, if I use Magit on a git repo containing Firefox, it is...very slow indeed, just to display the status buffer. I guess that's because there are so many lines to read from the external tool, but I'm not completely sure.
Well anyway, thanks for your help. I hope to be able to make rifle more useful for you in the future, but I'm not sure how much of the issue is Emacs itself. If sift turns out to work, then I think that will help a lot.
from org-rifle.
Thank you for the detailed reply.
Thank you also for the recoll links.
I cut down my test files to a subset, since 8000 files were taking way too much time.
Please let me know if I can be of any help testing. I like helm-org-rifle, and I would like to use it as much as possible.
In case, let me know what kind of elisp functions to use for timing and benchmarking, or do replicable testing.
I am submitting a small report on my own "toying" around.
Just for fun, I tried to open those 1768 files with find-file-literally
, via elisp snippet. It took several minutes at 100% CPU.
Ram usage went from about 300M to about 900M. Browsing buffers via C-x C-b
was fine though.
Then I tried consolidating all files into one:
find . -type f -name '*.org' -exec cat {} > ~/results.org \;
That took about 4 secs (on SSD).
The nice thing was, to open that file with find-file-literally
was basically instantaneous. Also, navigation was instantaneous, much faster, say, than SublimeText, which was surprising to me.
Having the 76M file open in a fundamental buffer, I then tried to use helm-swoop
, but it was very slow, basically unusable. Using occur
was fine.
I then tried to install ivy+swiper
. Although I like helm
much better than ivy
, I must admit ivy+swiper
was much faster in this case, making it usable.
Also using counsel-rg
(ripgrep interface) was fine, although not as fast as having the big file open and searching with swiper or occur.
from org-rifle.
Please let me know if I can be of any help testing. I like helm-org-rifle, and I would like to use it as much as possible.
Great, I am very thankful for testers like you!
In case, let me know what kind of elisp functions to use for timing and benchmarking, or do replicable testing.
As a matter of fact, there is a macro in the notes.org
file in this repo called profile-rifle
that makes it pretty easy to test the functions that underlie the interactive commands. You can use them to test the interactive ones too, but you have to manually end the command with C-g
, and that makes it less accurate, of course.
So, for example, evaluate that macro and then you can run:
(profile-rifle 1 (helm-org-rifle-directories "~/org" t))
That will instrument most of the relevant commands, then run helm-org-rifle-directories
with those args 1
time. You can try typing a query as soon as the prompt appears, and then C-g
as soon as the results appear, and then you'll get a result showing which functions run the most and take the most time.
If you run the macro from an Org source block with C-c C-c
, the report will be output into an Org results block.
You can also profile the internal functions, like:
(profile-rifle 10 (helm-org-rifle--get-candidates-in-buffer (get-buffer "~/org/something.org") "please"))
And that will search that file for that input 10 times, then display the profiling results.
Just for fun, I tried to open those 1768 files with find-file-literally, via elisp snippet. It took several minutes at 100% CPU.
Thanks, that confirms my suspicion that Emacs simply can't open that many buffers quickly.
Ram usage went from about 300M to about 900M. Browsing buffers via C-x C-b was fine though.
That's a lot of memory, too, but not terribly surprising. Glad to hear that it's usable once it's loaded, though.
The nice thing was, to open that file with find-file-literally was basically instantaneous. Also, navigation was instantaneous, much faster, say, than SublimeText, which was surprising to me.
Yeah, I guess Emacs handles one large file better than many smaller ones. I doubt many people even try to open that many files in Emacs. :)
Having the 76M file open in a fundamental buffer, I then tried to use helm-swoop, but it was very slow, basically unusable. Using occur was fine.
Yeah, helm-swoop is slow by nature. It's okay for smaller files, but...
I then tried to install ivy+swiper. Although I like helm much better than ivy, I must admit ivy+swiper was much faster in this case, making it usable.
That's interesting! Sometime I'll have to take a look at how it works. Maybe helm-swoop can be made faster.
Did you happen to try helm-org-rifle-current-buffer
on the 76 MB file? I'm curious to see how well that would work. I imagine it would be pretty slow still, but maybe faster than helm-swoop.
Thanks for all your help. I'm going to be busy here for a while, but maybe in a few weeks we can work on improving this.
from org-rifle.
Regarding sift, sift.el may be of some use.
from org-rifle.
Thanks, I'll check it out.
from org-rifle.
Thank you, very useful comments.
I do not mind at all to barter fontification for more speed, I would be very interested in testing it and using it.
I see Emacs more as a platform for many applications, and I think it is fine to rely on lower-level tools, like find, awk, ag, rg, etc to leverage their speed.
For example, two packages that can deal with hundreds of thousand of text files are mu4e and notmuch.
They both use xapian to index the messages. Perhaps in the future that could be leveraged as well.
from org-rifle.
I was intrigued by your hint of combining a "search-engine" like recoll. In the meanwhile, I have found beagrep, which could perhaps offer some additional ideas.
from org-rifle.
Thanks, that looks very interesting. The author says that it only supports whole-word matches, so we'd need to test it to see how it matches Org syntax, non-alphabetic characters, etc. It might be useful.
from org-rifle.
@alphapapa I have been testing the find-files-raw branch with great success, time between keybinding and helm buffer appearing is significantly quicker. Shaved off about 5 seconds on my setup, I have around 200 org-mode files across 60 directories. Wasnt able to get the benchmarking macro to work, should it just be a case of C-c C-c on the src_block.
Finally, thanks for all the work you have put into this and your other emacs packages they are extremely helpful and much appreciated.
from org-rifle.
I noticed that on the find-files-raw branch, that all of the files that are searched are still open buffers after the search is complete. I was thinking that this branch implemented the "draw all content into a single file and fontify/search that".
Is this not the case? If that is never going to be a thing, what do you think about a parameter that would track which files were opened as a result of search, and then subsequently close them after search is complete?
from org-rifle.
This also produces the prompt:
The file gulp.org is already visited literally,
meaning no coding system decoding, format conversion, or local variables.
You have asked to visit it normally,
but Emacs can visit a file in only one way at a time.
Do you want to revisit the file normally now? (y or n) y
When you go to open the file normally.
from org-rifle.
Related Issues (20)
- Changing value of helm-org-rifle-fontify-headings to nil causes helm-org-rifle-org-directory to stop working HOT 3
- org-narrow-to-subtree: prevents showing other parts of file, still shows from other files
- command-line client HOT 1
- Search gives up on small strings HOT 6
- Prioritising heading matches HOT 2
- Wrong number of arguments error with helm-org-rifle--refile HOT 1
- Req: Indication of status (progress, no results, which files have been searched?) HOT 1
- Req: Results in current file displayed first HOT 1
- helm-collect-matches error HOT 1
- Is there a way to do a exact match? HOT 1
- Disabling helm-org-rifle-fontify-headings breaks helm-org-rifle-agenda-files HOT 6
- helm-org-rifle-directories: how to add many directories? HOT 1
- Limit candidates to those with certain properties HOT 1
- Fix warning "Helm source <file>: after-init-hook Should be defined as a symbol HOT 1
- Docstring length warnings HOT 1
- Returned search results not always respecting sort order HOT 1
- Warning (emacs): ... after-init-hook Should be defined as a symbol HOT 5
- Opting out of v1.5.0 TODO keyword special handling HOT 4
- Not possible to move as always between the helm-org-rifle-agenda-files results HOT 1
- Hidden parent headings when using helm-org-rifle-current-buffer on collapsed hierarchies HOT 8
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from org-rifle.