Giter Club home page Giter Club logo

tw-full-text-search's Issues

Pipeline tiddlers

Lunr.js allows you to hook into the pipeline it uses to massage documents in order to index them; currently I have a single custom pipeline function that expands synonymns in queries.

It might be helpful to expose the pipeline to external clients, such as other plugins - this way we can prototype new ideas outside of the plugin itself, and it makes the plugin more flexible.

I'm thinking the plugin could load modules with module-type: lunrpipelinefunction, and a configuration list variable could specify the ordering of the pipeline functions. A few questions about the configuration, though:

  • How do we make sure the pipeline includes the default functions if it's a freeform list? Do we need to make such a guarantee?
    • You might not want to - think of replacing the default stopwords/stemmer with non-English variants!
  • How do we keep well-meaning but non-advanced users from hosing their wiki by throwing in too many pipeline plugins? Even for knowledgable users, arranging the functions in the pipeline is somewhat of an art!

A preview pane of how the pipeline would massage data would be very cool, but probably a bit beyond the scope of this issue.

Pharse/near/proximity search

If I search for two terms, I think that tiddlers where the two terms are close to each other in the document should rank higher than a document where they're farther apart. At the very least it would be nice if I could throw in a NEAR keyword to make this happen.

"Did you mean"

Alluded to by @diego898 in #5

I have a project I stored a link to in my own wiki called "nuzzel". I remembered the name, but not the colorful spelling, so when I searched for "nuzzle", the plugin - much to my confusion - found nothing.

Misspelled auto-index file path

Line 40 of control-panel.tid has a typo in the file path:

<$set name="autoIndexTiddler" value="$:/plugins/hoelzro/full-text-seach/auto-index">

The subdirectory full-text-seach is missing the r in seach.

Make index building automatic

Meaning that the index should be built automatically when the wiki is loaded. The index should still be able to be explicitly rebuilt using the button on the configuration page

Renaming image error?

I have a node installation of tw5 on my machine, and I was using your excellent plugin when I discovered the following possible bug. This is replicated on a fresh node version of tw5 with your plugin:

  • upload a png (like the error popup png shown below)
  • open the new image tiddler
  • try to rename the tiddler and save

This produces the following error popup:

error

And in the javascript console, I see:

(index):28821 Uncaught RangeError: Maximum call stack size exceeded
$tw.utils.error @ (index):28821
window.onerror @ (index):28848

$:/plugins/hoelzro/full-text-search/lunr.min.js:7 Uncaught RangeError: Maximum call stack size exceeded
    at t.TokenStore.add ($:/plugins/hoelzro/full-text-search/lunr.min.js:7:14130)
    at t.TokenStore.add ($:/plugins/hoelzro/full-text-search/lunr.min.js:7:14204)
    at t.TokenStore.add ($:/plugins/hoelzro/full-text-search/lunr.min.js:7:14204)

...***this line is repeated about 7,000 times***....

 at t.Index.add ($:/plugins/hoelzro/full-text-search/lunr.min.js:7:7852)
    at t.Index.update ($:/plugins/hoelzro/full-text-search/lunr.min.js:7:8266)
    at eval ($:/plugins/hoelzro/full-text-search/hooks.js:13:20)
    at Object.$tw.hooks.invokeHook (http://localhost:8080/:30853:43)
    at NavigatorWidget.handleSaveTiddlerEvent ($:/core/modules/widgets/navigator.js:363:28)
    at eventListeners.(anonymous function) ($:/core/modules/widgets/widget.js:370:25)
    at NavigatorWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:387:7)
    at DropZoneWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ListWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ListItemWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at TranscludeWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ElementWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ListWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ListItemWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at TranscludeWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at SetWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at SetWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ElementWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at FieldManglerWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at SetWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at KeyboardWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at KeyboardWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ListWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ListItemWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at SetWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at TranscludeWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at SetWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ElementWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ElementWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ListWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ListItemWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at RevealWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at TranscludeWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at FieldManglerWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at ButtonWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at SendMessageWidget.Widget.dispatchEvent ($:/core/modules/widgets/widget.js:393:28)
    at SendMessageWidget.invokeAction ($:/core/modules/widgets/action-sendmessage.js:77:7)
    at ButtonWidget.Widget.invokeActions ($:/core/modules/widgets/widget.js:501:13)
    at HTMLButtonElement.eval ($:/core/modules/widgets/button.js:71:11)

After disabling your plugin, I can then rename the tiddler as usual.

Search on server?

I have a nodejs wiki used as headless CMS, and I want to do search with this plugin.

But seems the index has to be build on the webpage side? Can it auto build index on server side?

UI for configuring query expansion?

I added query expansion a while ago, and the jury's still out on whether it's worth it. If it is, I should whip up a configuration UI for defining synonyms.

Request: Search results ordering option

The latest version is really shaping up into a useful tool for TW. Tx!

For my use case the Sort-Order of results matters. Let me give an example.

You writing a novel and use Tiddler titles to order them (e.g. Chap 7.04, Chap 7.05 etc).

The way the tool delivers results by "hit relevance" is very good under some situations. But its not ideal for coping with sequential texts where their original (alpha-numeric) order matters. In those cases you want to view/edit them in original written sequence.

My request: Would it be possible to add a directive to be able to define a sort order for results?

Hope this is clear! Best wishes, Josiah

Performance Issues

Hey @hoelzro , Im not sure what happened but Ive noticed a degradation in performance that I was only able to mitigate by removing this plugin. I have around 980 tiddlers system tiddlers.

Bump lunr.js version

2.3.3 is the most current, and includes some bug fixes - I actually think I just discovered a bug as well, so there might be a 2.3.4 I would upgrade to!

Update to latest lunr.js

We're currently on 2.1.4 - updating to at least to 2.2.0 to get features like term presence would be great (see #5)

Fuzzy Matching

Hello again!

I came across this section in the documentation:

https://lunrjs.com/guides/searching.html#fuzzy-matches

and when I tried to change my search string by typing the ~ character I got a series of uncaught errors. If I just press ok and continue until I get the number in like searchString~2 and press Ok a couple more times to some errors, I am actually able to perform the fuzzy matching.

If I just paste in the query with the ~2 attached, I dont get any errors.

Perhaps some error catching could be implemented to ignore this problem as you type character by character?


Also, just wanted to let you know of this discussion google groups where Jeremey mentions trying to incorporate lunr.js! And also this feature request on the tiddlywiki main repository: Jermolene/TiddlyWiki5#3233 where I am also trying to get something like this in the core!

Figure out how to make wildcards do the right thing

@diego898 brought up wildcards in issue #5, but the way they work is a little unintuitive for users because of how lunr works with regards to stemming. For example, looking for title:format*ing yields no results, because formatting is stemmed down to format.

@olivernn If you have a minute, could you chime in on how I could use lunr to make wildcards behave more intuitively for users? I suppose I could disable stemming, but that would interfere with other features I'm trying to offer.

Consider tweaking tokenization

I might want to tweak how the plugin uses lunr to tokenize things, to handle hyphenated words or URLs.

Examples:

#5 (comment)

xit('should pick up "twitter" in a URL', async function() {
await prepare();
var text = 'https://twitter.com/hoelzro/status/877901644125663232';
$tw.wiki.addTiddler(new $tw.Tiddler(
$tw.wiki.getCreationFields(),
{ title: 'ContainsTweetLink', type: 'text/vnd.tiddlywiki', text: text },
$tw.wiki.getModificationFields()
));
await waitForNextTick();
var results = wiki.compileFilter('[ftsearch[twitter]]')();
expect(results).toContain('ContainsTweetLink');
});

Document ftsearch

The ftsearch filter, which the plugin uses internally to generate the search results page, has potential use in advanced search or other areas of the wiki. It should be documented!

Look at other users' plugins that provide new filters for inspiration. http://tobibeer.github.io/tw5-plugins/#Filters is probably a good start!

ftsearch currently assumes a built index - but maybe this isn't a problem once #12 is done?

Clean up readme

With the introduction on my site, it's full of redundant (at best) or misleading (at worst) information. It needs to get cleaned up!

Speed up or remove query relevance rendering

It's fast enough for a small number of results, but it took like 10 seconds to render the results when searching for "tags" on the tw5.com wiki. I'm wondering if I could leverage SVGs and/or caching to reduce the amount of time things take to render

Markup awareness

It might be nice if the indexing process were aware of markup; here are some examples of why that might be useful:

  • Ascribe more weight to terms present in headings in a tiddler
  • (Optionally) omit URLs from searches (for example, I have a bunch of tiddlers in my wiki that point to projects on GitHub, so when I search for "github" I get inundated with results. Sometimes I just want to look for an idea I had about GitHub itself!)
  • (Optionally) omit code sections - both triple and single backticks. I often throw snippets of code into these, and I don't always want them showing up
  • Don't index terms in TiddlyWiki filters or widget invocations!

Obviously this adds some overhead; I would need to measure the impact of this first, and then I would probably want to put it behind a configuration flag.

TypeError

Hey, I'm not sure what caused this, but now when I try to generate the index, I get the following error in my console:

TypeError: Cannot read property 'Selection Constructors' of undefined
    at e.MutableBuilder.e.Builder.add ($:/plugins/hoelzro/full-text-search/lunr.min.js:7)
    at updateTiddler ($:/plugins/hoelzro/full-text-search/shared-index.js:202)
    at eval ($:/plugins/hoelzro/full-text-search/shared-index.js:99)
    at step ($:/plugins/hoelzro/full-text-search/shared-index.js:39)
    at Object.eval [as next] ($:/plugins/hoelzro/full-text-search/shared-index.js:20)
    at fulfilled ($:/plugins/hoelzro/full-text-search/shared-index.js:11)
    at <anonymous>

Display scoring

Brought up by @diego898 in #5

The search results are ordered by a score; it might be nice to show the user a visual representation of this score.

As far as the appearance goes, I'm thinking a bar alongside the title in the results - the wider the bar, the more relevant the result was to the query.

As far as implementation goes, I'm thinking we would need to stash the scores in a data tiddler indexed by the tiddler names. I think that ftsearch itself would need to do this, which raises the question of how this would work if there are multiple ftsearch filters being displayed...

Create actual demo page

Currently, https://hoelz.ro/files/fts.html doesn't have much - just the plugin itself. It would be great if it contained a small handful of tiddlers to demonstrate what the FTS plugin does and a little bit about how it operates. It should have example searches and results for the following things:

  • Stemming
  • Scoring/relevance
  • (maybe) synonyms (since this is kind of a hidden feature)

And vs OR search

How can I search for multiple terms? I didnt realize it was default OR. Should that be made clearer?

file:/// based wikis sometimes freeze up at end of indexing

To reproduce: drag and drop the plugin into tiddlywiki.com, save to downloads folder, open the wiki, run the indexer, watch the page freeze

I've only tried this in Firefox so far

I'm guessing this is some issue with writing the cache to localstorage/indexeddb

Custom stop word lists

Custom stop words are useful for several reasons:

  • Non-English speakers might want to provide their own stop words
  • If a wiki centers on a particular domain, there might be stop words specific to that domain.

Whatever the case, I think that a) the list should be provided via a tiddler, and b) said tiddler should be able to be built from a transclusion of a separate tiddler. I have several plugins that use stop words, and I would rather have a single data tiddler that's used to derive custom stop word lists for each plugin rather than needing to update several separate lists.

Full support of lunr's additional features

Hey @hoelzro,

I love this plugin, and want it to become part of the core of TW eventually! Jeremy indicated he is also looking at lunr-based solutions.

In line with that, it would be awesome if this plugin could fully support the following features lunr also supports:

  • Scoring

    • Results are already ordered by this - it might be useful to display this as well (you already have this on future plans_
  • Wildcards

    • This kind of works, depending on where I place it. For example, if I type title:format*, I get all tiddlers titled with Formatting and format in their title. But if I type title:format*ing I get no results.
  • Fields

    • This already works out of the box! I can say tags:tableofcontents and it works. I can also do tags:tableofcontents title:getting to find all tiddlers tagged tableofcontents OR with getting in their title.
  • Boosts

    • This would be great, cant really figure out a good way to test right now.
  • Fuzzy

    • this was recently implemented - wonderful!
  • Term Presence

    • I dont think this works, especially in combination with the other terms.

I recognize that this is a lot! I just figured I would report back the results of my testing in this "wish list"!

Side-note: What do you think of the relationship between this library and filtering?

Index creation filters

All this means is that I want to be able to configure a filter to determine which tiddlers are indexed - I have an Icebox tag in my own wiki that I use for projects I probably won't do, and I don't want FTS to search them.

Console error

I'm using a single file TW5 in chrome (loading it from disk). The console has the below error. I'm using some images that are loaded locally (adjacent to the tw file)

blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:94 Not allowed to load local resource: blob:null/8a3b03af-c581-475b-a012-1179e95f414d
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:94
Promise.then (async)
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:83
step @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:38
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:19
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:13
__awaiter @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:9
requireFromPage @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:77
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:166
step @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:38
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:19
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:13
__awaiter @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:9
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:63
(anonymous) @ blob:null/7bfc6874-4de5-4676-ac89-de70ad1e7e63:260

Request: Configurable search by relevance

In my TW I have some tiddlers whose title begins with "Robot Framework" and other tiddlers that contain "Robot Framework" in the title. Since this plugin claims to sort search results by relevance I was expecting that if my query is "Robot F" then the list of results begins with all the tiddlers whose title begins with "Robot Framework".

However, the first result is a tiddler whose title contains "Robot Framework". It is followed by some tiddlers whose title begins with "Robot Framework" and then some tiddlers that do not contain "Robot Framework" in the title and then more tiddlers whose title begins with "Robot Framework".

I guess that lunr is assigning the scores based on the full content of the tiddlers, not only their title. However, I only care about the tiddler's title. It would be great to be able to configure the plugin to only search in the title.

If there is an easy way to achieve what I want using only built-in TW functions, please tell me.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.