hoelzro / tw-full-text-search Goto Github PK
View Code? Open in Web Editor NEWFull text search plugin for TiddlyWiki powered by lunr.js
Home Page: https://hoelz.ro/files/fts.html
License: Other
Full text search plugin for TiddlyWiki powered by lunr.js
Home Page: https://hoelz.ro/files/fts.html
License: Other
Lunr.js allows you to hook into the pipeline it uses to massage documents in order to index them; currently I have a single custom pipeline function that expands synonymns in queries.
It might be helpful to expose the pipeline to external clients, such as other plugins - this way we can prototype new ideas outside of the plugin itself, and it makes the plugin more flexible.
I'm thinking the plugin could load modules with module-type: lunrpipelinefunction
, and a configuration list variable could specify the ordering of the pipeline functions. A few questions about the configuration, though:
A preview pane of how the pipeline would massage data would be very cool, but probably a bit beyond the scope of this issue.
If I search for two terms, I think that tiddlers where the two terms are close to each other in the document should rank higher than a document where they're farther apart. At the very least it would be nice if I could throw in a NEAR
keyword to make this happen.
Line 40 of control-panel.tid
has a typo in the file path:
<$set name="autoIndexTiddler" value="$:/plugins/hoelzro/full-text-seach/auto-index">
The subdirectory full-text-seach
is missing the r in seach.
Meaning that the index should be built automatically when the wiki is loaded. The index should still be able to be explicitly rebuilt using the button on the configuration page
I have a node installation of tw5 on my machine, and I was using your excellent plugin when I discovered the following possible bug. This is replicated on a fresh node version of tw5 with your plugin:
This produces the following error popup:
And in the javascript console, I see:
(index):28821 Uncaught RangeError: Maximum call stack size exceeded
$tw.utils.error @ (index):28821
window.onerror @ (index):28848
$:/plugins/hoelzro/full-text-search/lunr.min.js:7 Uncaught RangeError: Maximum call stack size exceeded
at t.TokenStore.add ($:/plugins/hoelzro/full-text-search/lunr.min.js:7:14130)
at t.TokenStore.add ($:/plugins/hoelzro/full-text-search/lunr.min.js:7:14204)
at t.TokenStore.add ($:/plugins/hoelzro/full-text-search/lunr.min.js:7:14204)
...***this line is repeated about 7,000 times***....
at t.Index.add ($:/plugins/hoelzro/full-text-search/lunr.min.js:7:7852)
at t.Index.update ($:/plugins/hoelzro/full-text-search/lunr.min.js:7:8266)
at eval ($:/plugins/hoelzro/full-text-search/hooks.js:13:20)
at Object.$tw.hooks.invokeHook (http://localhost:8080/:30853:43)
at NavigatorWidget.handleSaveTiddlerEvent ($:/core/modules/widgets/navigator.js:363:28)
at eventListeners.(anonymous function) ($:/core/modules/widgets/widget.js:370:25)
at NavigatorWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:387:7)
at DropZoneWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at ListWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at ListItemWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at TranscludeWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at ElementWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at ListWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at ListItemWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at TranscludeWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at SetWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at SetWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at ElementWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at FieldManglerWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at SetWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at KeyboardWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at KeyboardWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at ListWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at ListItemWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at SetWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at TranscludeWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at SetWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at ElementWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at ElementWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at ListWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at ListItemWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at RevealWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at TranscludeWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at FieldManglerWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at ButtonWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at SendMessageWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
at SendMessageWidget.invokeAction ($:/core/modules/widgets/action-sendmessage.js:77:7)
at ButtonWidget.Widget.invokeActions ($:/core/modules/widgets/widget.js:501:13)
at HTMLButtonElement.eval ($:/core/modules/widgets/button.js:71:11)
After disabling your plugin, I can then rename the tiddler as usual.
Requested twice on the Google Group
See:
lunr.tokenizer
might give me some trouble - I think its usage is not pluggable.
See also #21
I have a nodejs wiki used as headless CMS, and I want to do search with this plugin.
But seems the index has to be build on the webpage side? Can it auto build index on server side?
I added query expansion a while ago, and the jury's still out on whether it's worth it. If it is, I should whip up a configuration UI for defining synonyms.
The latest version is really shaping up into a useful tool for TW. Tx!
For my use case the Sort-Order of results matters. Let me give an example.
You writing a novel and use Tiddler titles to order them (e.g. Chap 7.04, Chap 7.05 etc).
The way the tool delivers results by "hit relevance" is very good under some situations. But its not ideal for coping with sequential texts where their original (alpha-numeric) order matters. In those cases you want to view/edit them in original written sequence.
My request: Would it be possible to add a directive to be able to define a sort order for results?
Hope this is clear! Best wishes, Josiah
Hey @hoelzro , Im not sure what happened but Ive noticed a degradation in performance that I was only able to mitigate by removing this plugin. I have around 980 tiddlers system tiddlers.
2.3.3 is the most current, and includes some bug fixes - I actually think I just discovered a bug as well, so there might be a 2.3.4 I would upgrade to!
The plugin uses localForage to cache previously built indexes in browser storage; we shouldn't do this for the demo page.
This event handler, I believe, fires upon every tiddler change:
Line 18 in 3a442d5
This means for tiddlers like the text of the tiddler currently being edited too, I think - make sure we're not bogging the wiki down through this event handler.
We're currently on 2.1.4 - updating to at least to 2.2.0 to get features like term presence would be great (see #5)
https://github.com/nextapps-de/flexsearch
This is very fast, and support CJK likes Chinese better.
Since a lot of tests are running async, there are a lot of promises in flight - but unhandled rejected promises don't seem to cause test failures
Hello again!
I came across this section in the documentation:
https://lunrjs.com/guides/searching.html#fuzzy-matches
and when I tried to change my search string by typing the ~
character I got a series of uncaught errors. If I just press ok and continue until I get the number in like searchString~2
and press Ok a couple more times to some errors, I am actually able to perform the fuzzy matching.
If I just paste in the query with the ~2
attached, I dont get any errors.
Perhaps some error catching could be implemented to ignore this problem as you type character by character?
Also, just wanted to let you know of this discussion google groups where Jeremey mentions trying to incorporate lunr.js! And also this feature request on the tiddlywiki main repository: Jermolene/TiddlyWiki5#3233 where I am also trying to get something like this in the core!
@diego898 brought up wildcards in issue #5, but the way they work is a little unintuitive for users because of how lunr works with regards to stemming. For example, looking for title:format*ing
yields no results, because formatting
is stemmed down to format
.
@olivernn If you have a minute, could you chime in on how I could use lunr to make wildcards behave more intuitively for users? I suppose I could disable stemming, but that would interfere with other features I'm trying to offer.
I might want to tweak how the plugin uses lunr to tokenize things, to handle hyphenated words or URLs.
Examples:
tw-full-text-search/tests/test-simple.js
Lines 269 to 281 in 9d383ac
The ftsearch
filter, which the plugin uses internally to generate the search results page, has potential use in advanced search or other areas of the wiki. It should be documented!
Look at other users' plugins that provide new filters for inspiration. http://tobibeer.github.io/tw5-plugins/#Filters is probably a good start!
ftsearch
currently assumes a built index - but maybe this isn't a problem once #12 is done?
With the introduction on my site, it's full of redundant (at best) or misleading (at worst) information. It needs to get cleaned up!
How about easy install Lunr Languages in TW?
It's fast enough for a small number of results, but it took like 10 seconds to render the results when searching for "tags" on the tw5.com wiki. I'm wondering if I could leverage SVGs and/or caching to reduce the amount of time things take to render
It might be nice if the indexing process were aware of markup; here are some examples of why that might be useful:
Obviously this adds some overhead; I would need to measure the impact of this first, and then I would probably want to put it behind a configuration flag.
Hey, I'm not sure what caused this, but now when I try to generate the index, I get the following error in my console:
TypeError: Cannot read property 'Selection Constructors' of undefined
at e.MutableBuilder.e.Builder.add ($:/plugins/hoelzro/full-text-search/lunr.min.js:7)
at updateTiddler ($:/plugins/hoelzro/full-text-search/shared-index.js:202)
at eval ($:/plugins/hoelzro/full-text-search/shared-index.js:99)
at step ($:/plugins/hoelzro/full-text-search/shared-index.js:39)
at Object.eval [as next] ($:/plugins/hoelzro/full-text-search/shared-index.js:20)
at fulfilled ($:/plugins/hoelzro/full-text-search/shared-index.js:11)
at <anonymous>
I have small patches I make to lunr.js, and having the ability to build the plugin with unminified JS would be nice for debugging
The search results are ordered by a score; it might be nice to show the user a visual representation of this score.
As far as the appearance goes, I'm thinking a bar alongside the title in the results - the wider the bar, the more relevant the result was to the query.
As far as implementation goes, I'm thinking we would need to stash the scores in a data tiddler indexed by the tiddler names. I think that ftsearch
itself would need to do this, which raises the question of how this would work if there are multiple ftsearch
filters being displayed...
Currently, https://hoelz.ro/files/fts.html doesn't have much - just the plugin itself. It would be great if it contained a small handful of tiddlers to demonstrate what the FTS plugin does and a little bit about how it operates. It should have example searches and results for the following things:
How can I search for multiple terms? I didnt realize it was default OR. Should that be made clearer?
To reproduce: drag and drop the plugin into tiddlywiki.com, save to downloads folder, open the wiki, run the indexer, watch the page freeze
I've only tried this in Firefox so far
I'm guessing this is some issue with writing the cache to localstorage/indexeddb
Also note that synonyms don't seem to work properly (for example, TW
and TiddlyWiki
)
Custom stop words are useful for several reasons:
Whatever the case, I think that a) the list should be provided via a tiddler, and b) said tiddler should be able to be built from a transclusion of a separate tiddler. I have several plugins that use stop words, and I would rather have a single data tiddler that's used to derive custom stop word lists for each plugin rather than needing to update several separate lists.
Hey @hoelzro,
I love this plugin, and want it to become part of the core of TW eventually! Jeremy indicated he is also looking at lunr-based solutions.
In line with that, it would be awesome if this plugin could fully support the following features lunr also supports:
Scoring
Wildcards
title:format*
, I get all tiddlers titled with Formatting and format in their title. But if I type title:format*ing
I get no results.Fields
tags:tableofcontents
and it works. I can also do tags:tableofcontents title:getting
to find all tiddlers tagged tableofcontents
OR with getting
in their title.Boosts
Fuzzy
Term Presence
I recognize that this is a lot! I just figured I would report back the results of my testing in this "wish list"!
Side-note: What do you think of the relationship between this library and filtering?
See #3
TiddlyWiki's own search uses "AND" style logic by default - we should probably do the same!
All this means is that I want to be able to configure a filter to determine which tiddlers are indexed - I have an Icebox
tag in my own wiki that I use for projects I probably won't do, and I don't want FTS to search them.
I'm using a single file TW5 in chrome (loading it from disk). The console has the below error. I'm using some images that are loaded locally (adjacent to the tw file)
blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:94 Not allowed to load local resource: blob:null/8a3b03af-c581-475b-a012-1179e95f414d
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:94
Promise.then (async)
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:83
step @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:38
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:19
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:13
__awaiter @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:9
requireFromPage @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:77
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:166
step @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:38
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:19
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:13
__awaiter @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:9
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:63
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:260
In my TW I have some tiddlers whose title begins with "Robot Framework" and other tiddlers that contain "Robot Framework" in the title. Since this plugin claims to sort search results by relevance I was expecting that if my query is "Robot F" then the list of results begins with all the tiddlers whose title begins with "Robot Framework".
However, the first result is a tiddler whose title contains "Robot Framework". It is followed by some tiddlers whose title begins with "Robot Framework" and then some tiddlers that do not contain "Robot Framework" in the title and then more tiddlers whose title begins with "Robot Framework".
I guess that lunr is assigning the scores based on the full content of the tiddlers, not only their title. However, I only care about the tiddler's title. It would be great to be able to configure the plugin to only search in the title.
If there is an easy way to achieve what I want using only built-in TW functions, please tell me.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.