Giter Club home page Giter Club logo

tw-full-text-search's Introduction

Purpose

Provides an alternative search result list that orders results by search relevance and ignores differences in word forms (ex. tag vs tags).

On my personal wiki, I have the problem that there are terms I use across a lot of tiddlers, and sometimes I'll use different forms (such as the aforementioned tag vs tags). I wanted a plugin to allow me to find the tiddler I'm looking for quickly and didn't require me to worry about how I declined a noun or inflected a verb - so I wrote this plugin, which provides an alternative search list powered by lunr.js.

This plugin should be considered as BETA quality - I use it pretty much every day, but there's definitely room for improvement. Please let me know if there are any bugs!

Demo

https://hoelz.ro/files/fts.html

Installation

https://hoelz.ro/files/fts.html

Usage

Each time you start a new TiddlyWiki session, you'll need to build the FTS index. You can do this from a tab in the $:/ControlPanel. Older versions of the index are retained in web storage, so it should be pretty quick after the first time! After you build the index, you can just search as you would normally.

Ideas for Future Enhancement

  • Display score for search results
  • Specify a filter for tiddlers to be included in the index.
  • Custom stemmers for non-English/mixed language wikis

Source Code

If you want to help out, you can check out the source for this plugin (or its dependency, the progress bar plugin) on GitHub:

https://github.com/hoelzro/tw-full-text-search/

https://github.com/hoelzro/tw-progress-bar

Requires $:/plugins/hoelzro/progress-bar to display progress when generating the index.

tw-full-text-search's People

Contributors

diego898 avatar hoelzro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

tw-full-text-search's Issues

UI for configuring query expansion?

I added query expansion a while ago, and the jury's still out on whether it's worth it. If it is, I should whip up a configuration UI for defining synonyms.

Search on server?

I have a nodejs wiki used as headless CMS, and I want to do search with this plugin.

But seems the index has to be build on the webpage side? Can it auto build index on server side?

Figure out how to make wildcards do the right thing

@diego898 brought up wildcards in issue #5, but the way they work is a little unintuitive for users because of how lunr works with regards to stemming. For example, looking for title:format*ing yields no results, because formatting is stemmed down to format.

@olivernn If you have a minute, could you chime in on how I could use lunr to make wildcards behave more intuitively for users? I suppose I could disable stemming, but that would interfere with other features I'm trying to offer.

Speed up or remove query relevance rendering

It's fast enough for a small number of results, but it took like 10 seconds to render the results when searching for "tags" on the tw5.com wiki. I'm wondering if I could leverage SVGs and/or caching to reduce the amount of time things take to render

Consider tweaking tokenization

I might want to tweak how the plugin uses lunr to tokenize things, to handle hyphenated words or URLs.

Examples:

#5 (comment)

xit('should pick up "twitter" in a URL', async function() {
await prepare();
var text = 'https://twitter.com/hoelzro/status/877901644125663232';
$tw.wiki.addTiddler(new $tw.Tiddler(
$tw.wiki.getCreationFields(),
{ title: 'ContainsTweetLink', type: 'text/vnd.tiddlywiki', text: text },
$tw.wiki.getModificationFields()
));
await waitForNextTick();
var results = wiki.compileFilter('[ftsearch[twitter]]')();
expect(results).toContain('ContainsTweetLink');
});

Custom stop word lists

Custom stop words are useful for several reasons:

  • Non-English speakers might want to provide their own stop words
  • If a wiki centers on a particular domain, there might be stop words specific to that domain.

Whatever the case, I think that a) the list should be provided via a tiddler, and b) said tiddler should be able to be built from a transclusion of a separate tiddler. I have several plugins that use stop words, and I would rather have a single data tiddler that's used to derive custom stop word lists for each plugin rather than needing to update several separate lists.

Document ftsearch

The ftsearch filter, which the plugin uses internally to generate the search results page, has potential use in advanced search or other areas of the wiki. It should be documented!

Look at other users' plugins that provide new filters for inspiration. http://tobibeer.github.io/tw5-plugins/#Filters is probably a good start!

ftsearch currently assumes a built index - but maybe this isn't a problem once #12 is done?

Full support of lunr's additional features

Hey @hoelzro,

I love this plugin, and want it to become part of the core of TW eventually! Jeremy indicated he is also looking at lunr-based solutions.

In line with that, it would be awesome if this plugin could fully support the following features lunr also supports:

  • Scoring

    • Results are already ordered by this - it might be useful to display this as well (you already have this on future plans_
  • Wildcards

    • This kind of works, depending on where I place it. For example, if I type title:format*, I get all tiddlers titled with Formatting and format in their title. But if I type title:format*ing I get no results.
  • Fields

    • This already works out of the box! I can say tags:tableofcontents and it works. I can also do tags:tableofcontents title:getting to find all tiddlers tagged tableofcontents OR with getting in their title.
  • Boosts

    • This would be great, cant really figure out a good way to test right now.
  • Fuzzy

    • this was recently implemented - wonderful!
  • Term Presence

    • I dont think this works, especially in combination with the other terms.

I recognize that this is a lot! I just figured I would report back the results of my testing in this "wish list"!

Side-note: What do you think of the relationship between this library and filtering?

Performance Issues

Hey @hoelzro , Im not sure what happened but Ive noticed a degradation in performance that I was only able to mitigate by removing this plugin. I have around 980 tiddlers system tiddlers.

file:/// based wikis sometimes freeze up at end of indexing

To reproduce: drag and drop the plugin into tiddlywiki.com, save to downloads folder, open the wiki, run the indexer, watch the page freeze

I've only tried this in Firefox so far

I'm guessing this is some issue with writing the cache to localstorage/indexeddb

Console error

I'm using a single file TW5 in chrome (loading it from disk). The console has the below error. I'm using some images that are loaded locally (adjacent to the tw file)

blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:94 Not allowed to load local resource: blob:null/8a3b03af-c581-475b-a012-1179e95f414d
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:94
Promise.then (async)
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:83
step @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:38
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:19
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:13
__awaiter @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:9
requireFromPage @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:77
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:166
step @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:38
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:19
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:13
__awaiter @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:9
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:63
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:260

Make index building automatic

Meaning that the index should be built automatically when the wiki is loaded. The index should still be able to be explicitly rebuilt using the button on the configuration page

And vs OR search

How can I search for multiple terms? I didnt realize it was default OR. Should that be made clearer?

Create actual demo page

Currently, https://hoelz.ro/files/fts.html doesn't have much - just the plugin itself. It would be great if it contained a small handful of tiddlers to demonstrate what the FTS plugin does and a little bit about how it operates. It should have example searches and results for the following things:

  • Stemming
  • Scoring/relevance
  • (maybe) synonyms (since this is kind of a hidden feature)

Misspelled auto-index file path

Line 40 of control-panel.tid has a typo in the file path:

<$set name="autoIndexTiddler" value="$:/plugins/hoelzro/full-text-seach/auto-index">

The subdirectory full-text-seach is missing the r in seach.

Display scoring

Brought up by @diego898 in #5

The search results are ordered by a score; it might be nice to show the user a visual representation of this score.

As far as the appearance goes, I'm thinking a bar alongside the title in the results - the wider the bar, the more relevant the result was to the query.

As far as implementation goes, I'm thinking we would need to stash the scores in a data tiddler indexed by the tiddler names. I think that ftsearch itself would need to do this, which raises the question of how this would work if there are multiple ftsearch filters being displayed...

Renaming image error?

I have a node installation of tw5 on my machine, and I was using your excellent plugin when I discovered the following possible bug. This is replicated on a fresh node version of tw5 with your plugin:

  • upload a png (like the error popup png shown below)
  • open the new image tiddler
  • try to rename the tiddler and save

This produces the following error popup:

error

And in the javascript console, I see:

(index):28821 Uncaught RangeError: Maximum call stack size exceeded
$tw.utils.error @ (index):28821
window.onerror @ (index):28848

$:/plugins/hoelzro/full-text-search/lunr.min.js:7 Uncaught RangeError: Maximum call stack size exceeded
    at t.TokenStore.add ($:/plugins/hoelzro/full-text-search/lunr.min.js:7:14130)
    at t.TokenStore.add ($:/plugins/hoelzro/full-text-search/lunr.min.js:7:14204)
    at t.TokenStore.add ($:/plugins/hoelzro/full-text-search/lunr.min.js:7:14204)

...***this line is repeated about 7,000 times***....

 at t.Index.add ($:/plugins/hoelzro/full-text-search/lunr.min.js:7:7852)
    at t.Index.update ($:/plugins/hoelzro/full-text-search/lunr.min.js:7:8266)
    at eval ($:/plugins/hoelzro/full-text-search/hooks.js:13:20)
    at Object.$tw.hooks.invokeHook (http://localhost:8080/:30853:43)
    at NavigatorWidget.handleSaveTiddlerEvent ($:/core/modules/widgets/navigator.js:363:28)
    at eventListeners.(anonymous function) ($:/core/modules/widgets/widget.js:370:25)
    at NavigatorWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:387:7)
    at DropZoneWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ListWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ListItemWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at TranscludeWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ElementWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ListWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ListItemWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at TranscludeWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at SetWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at SetWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ElementWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at FieldManglerWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at SetWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at KeyboardWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at KeyboardWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ListWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ListItemWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at SetWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at TranscludeWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at SetWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ElementWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ElementWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ListWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ListItemWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at RevealWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at TranscludeWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at FieldManglerWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ButtonWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at SendMessageWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at SendMessageWidget.invokeAction ($:/core/modules/widgets/action-sendmessage.js:77:7)
    at ButtonWidget.Widget.invokeActions ($:/core/modules/widgets/widget.js:501:13)
    at HTMLButtonElement.eval ($:/core/modules/widgets/button.js:71:11)

After disabling your plugin, I can then rename the tiddler as usual.

Index creation filters

All this means is that I want to be able to configure a filter to determine which tiddlers are indexed - I have an Icebox tag in my own wiki that I use for projects I probably won't do, and I don't want FTS to search them.

Request: Search results ordering option

The latest version is really shaping up into a useful tool for TW. Tx!

For my use case the Sort-Order of results matters. Let me give an example.

You writing a novel and use Tiddler titles to order them (e.g. Chap 7.04, Chap 7.05 etc).

The way the tool delivers results by "hit relevance" is very good under some situations. But its not ideal for coping with sequential texts where their original (alpha-numeric) order matters. In those cases you want to view/edit them in original written sequence.

My request: Would it be possible to add a directive to be able to define a sort order for results?

Hope this is clear! Best wishes, Josiah

Fuzzy Matching

Hello again!

I came across this section in the documentation:

https://lunrjs.com/guides/searching.html#fuzzy-matches

and when I tried to change my search string by typing the ~ character I got a series of uncaught errors. If I just press ok and continue until I get the number in like searchString~2 and press Ok a couple more times to some errors, I am actually able to perform the fuzzy matching.

If I just paste in the query with the ~2 attached, I dont get any errors.

Perhaps some error catching could be implemented to ignore this problem as you type character by character?


Also, just wanted to let you know of this discussion google groups where Jeremey mentions trying to incorporate lunr.js! And also this feature request on the tiddlywiki main repository: Jermolene/TiddlyWiki5#3233 where I am also trying to get something like this in the core!

Clean up readme

With the introduction on my site, it's full of redundant (at best) or misleading (at worst) information. It needs to get cleaned up!

Update to latest lunr.js

We're currently on 2.1.4 - updating to at least to 2.2.0 to get features like term presence would be great (see #5)

Bump lunr.js version

2.3.3 is the most current, and includes some bug fixes - I actually think I just discovered a bug as well, so there might be a 2.3.4 I would upgrade to!

Pharse/near/proximity search

If I search for two terms, I think that tiddlers where the two terms are close to each other in the document should rank higher than a document where they're farther apart. At the very least it would be nice if I could throw in a NEAR keyword to make this happen.

Request: Configurable search by relevance

In my TW I have some tiddlers whose title begins with "Robot Framework" and other tiddlers that contain "Robot Framework" in the title. Since this plugin claims to sort search results by relevance I was expecting that if my query is "Robot F" then the list of results begins with all the tiddlers whose title begins with "Robot Framework".

However, the first result is a tiddler whose title contains "Robot Framework". It is followed by some tiddlers whose title begins with "Robot Framework" and then some tiddlers that do not contain "Robot Framework" in the title and then more tiddlers whose title begins with "Robot Framework".

I guess that lunr is assigning the scores based on the full content of the tiddlers, not only their title. However, I only care about the tiddler's title. It would be great to be able to configure the plugin to only search in the title.

If there is an easy way to achieve what I want using only built-in TW functions, please tell me.

TypeError

Hey, I'm not sure what caused this, but now when I try to generate the index, I get the following error in my console:

TypeError: Cannot read property 'Selection Constructors' of undefined
    at e.MutableBuilder.e.Builder.add ($:/plugins/hoelzro/full-text-search/lunr.min.js:7)
    at updateTiddler ($:/plugins/hoelzro/full-text-search/shared-index.js:202)
    at eval ($:/plugins/hoelzro/full-text-search/shared-index.js:99)
    at step ($:/plugins/hoelzro/full-text-search/shared-index.js:39)
    at Object.eval [as next] ($:/plugins/hoelzro/full-text-search/shared-index.js:20)
    at fulfilled ($:/plugins/hoelzro/full-text-search/shared-index.js:11)
    at <anonymous>

"Did you mean"

Alluded to by @diego898 in #5

I have a project I stored a link to in my own wiki called "nuzzel". I remembered the name, but not the colorful spelling, so when I searched for "nuzzle", the plugin - much to my confusion - found nothing.

Pipeline tiddlers

Lunr.js allows you to hook into the pipeline it uses to massage documents in order to index them; currently I have a single custom pipeline function that expands synonymns in queries.

It might be helpful to expose the pipeline to external clients, such as other plugins - this way we can prototype new ideas outside of the plugin itself, and it makes the plugin more flexible.

I'm thinking the plugin could load modules with module-type: lunrpipelinefunction, and a configuration list variable could specify the ordering of the pipeline functions. A few questions about the configuration, though:

  • How do we make sure the pipeline includes the default functions if it's a freeform list? Do we need to make such a guarantee?
    • You might not want to - think of replacing the default stopwords/stemmer with non-English variants!
  • How do we keep well-meaning but non-advanced users from hosing their wiki by throwing in too many pipeline plugins? Even for knowledgable users, arranging the functions in the pipeline is somewhat of an art!

A preview pane of how the pipeline would massage data would be very cool, but probably a bit beyond the scope of this issue.

Markup awareness

It might be nice if the indexing process were aware of markup; here are some examples of why that might be useful:

  • Ascribe more weight to terms present in headings in a tiddler
  • (Optionally) omit URLs from searches (for example, I have a bunch of tiddlers in my wiki that point to projects on GitHub, so when I search for "github" I get inundated with results. Sometimes I just want to look for an idea I had about GitHub itself!)
  • (Optionally) omit code sections - both triple and single backticks. I often throw snippets of code into these, and I don't always want them showing up
  • Don't index terms in TiddlyWiki filters or widget invocations!

Obviously this adds some overhead; I would need to measure the impact of this first, and then I would probably want to put it behind a configuration flag.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.