Giter Club home page Giter Club logo

stork's Introduction

Stork

Project update: I'm winding down my work with Stork.

Thanks to everyone who enjoyed using Stork over the past few years!
-James


Impossibly fast web search, made for static sites.

Crates.io Codecov GitHub branch checks state

Stork is a library for creating beautiful, fast, and accurate full-text search interfaces on the web.

It comes in two parts. First, it's a command-line tool that indexes content and creates a search index file that you can upload to a web server. Second, it's a Javascript library that uses that index file to build an interactive search interface that displays optimal search results immediately to your user, as they type.

Stork is built with Rust, and the Javascript library uses WebAssembly behind the scenes. It's easy to get started and is even easier to customize so it fits your needs. It's perfect for Jamstack sites and personal blogs, but can be used wherever you need an interactive search bar.

Currently in development by James Little

Gif of Stork in Action

Getting Started

Let's put a search box online that searches within the text of the Federalist Papers.

See this demo live at https://stork-search.net.

<!DOCTYPE html>
<html lang="en">
  <head>
    <title>Federalist Search</title>
  </head>
  <body>
    <div class="stork-wrapper">
      <input data-stork="federalist" class="stork-input" />
      <div data-stork="federalist-output" class="stork-output"></div>
    </div>
    <script src="https://files.stork-search.net/stork.js"></script>
    <script>
      stork.register(
        "federalist",
        "http://files.stork-search.net/federalist.st"
      );
    </script>
  </body>
</html>

Step 1: Include the HTML

Stork hooks into existing HTML that you include on your page. Each Stork instance has to have an input hook and a results list; those two elements should be placed in a wrapper, though the wrapper is optional.

The input hook should have the data-stork="federalist" attribute, where federalist is the name with which you register that search instance. (This way, you can have multiple, independent search boxes on a page, all pointing to different instances.) It doesn't have to be federalist -- you can change it to whatever you want.

The results list should be an empty <div> tag with the attribute data-stork="federalist-results". Again, here, you can change federalist to whatever you want.

The classes in the example above (stork-input, stork-output) are for the theme. Most Stork themes assume the format above; the theme documentation will tell you if it requires something different. You can also design your own theme, at which point the styling and class names are up to you.

Step 2: Include the Javascript

You need to include stork.js, which you can either load from the Stork CDN or host yourself. This will load the Stork WebAssembly blob and create the Stork object, which will allow for registering and configuring indices.

Then, you should register at least one index:

stork.register("federalist", "http://files.stork-search.net/federalist.st");

The search index you build needs to be stored somewhere with a public URL. To register

This registers the index stored at http://files.stork-search.net/federalist.st under the name federalist; the data-stork attributes in the HTML will hook into this name.

Finally, you can set some configuration options for how your search bar will interact with the index and with the page.

Building your own index

You probably don't want to add an interface to your own website that lets you search through the Federalist papers. Here's how to make your search bar yours.

To build an index, you need the Stork executable on your computer, which you can install at the latest Github release or by running cargo install stork-search --locked if you have a Rust toolchain installed.

The search index is based on a document structure: you give Stork a list of documents on disk and include some metadata about those documents, and Stork will build its search index based on the contents of those documents.

First, you need a configuration file that describes, among other things, that list of files:

[input]
base_directory = "test/federalist"
files = [
    {path = "federalist-1.txt", url = "/federalist-1/", title = "Introduction"},
    {path = "federalist-2.txt", url = "/federalist-2/", title = "Concerning Dangers from Foreign Force and Influence"},
    {path = "federalist-3.txt", url = "/federalist-3/", title = "Concerning Dangers from Foreign Force and Influence 2"},
    {path = "federalist-4.txt", url = "/federalist-4/", title = "Concerning Dangers from Foreign Force and Influence 3"},
    {path = "federalist-5.txt", url = "/federalist-5/", title = "Concerning Dangers from Foreign Force and Influence 4"},
    {path = "federalist-6.txt", url = "/federalist-6/", title = "Concerning Dangers from Dissensions Between the States"},
    {path = "federalist-7.txt", url = "/federalist-7/", title = "Concerning Dangers from Dissensions Between the States 2"},
    {path = "federalist-8.txt", url = "/federalist-8/", title = "The Consequences of Hostilities Between the States"},
    {path = "federalist-9.txt", url = "/federalist-9/", title = "The Union as a Safeguard Against Domestic Faction and Insurrection"},
    {path = "federalist-10.txt", url = "/federalist-10/", title = "The Union as a Safeguard Against Domestic Faction and Insurrection 2"}
]

This TOML file describes the base directory of all your documents, then lists out each document along with the web URL at which that document will be found, along with that document's title.

From there, you can build your search index by running:

$ stork build --input federalist.toml --output federalist.st

This will create a new file at federalist.st. You can search through it with the same command line tool:

$ stork search --index federalist.st --query "liberty"

To embed a Stork search interface on your website, first upload the index file to your web server, then pass its URL to the stork.register() function in your web page's Javascript.

Going further

You can read more documentation and learn more about customization at the project's website: https://stork-search.net.

Development

To build Stork, you'll need:

  • Rust, using the stable toolchain
  • wasm-pack
  • yarn
  • Just if you want to use the same build scripts I do (otherwise you can read the Justfile and run the scripts manually)

The repository is structured like a typical Cargo workspace, with some modifications.

  • The stork-* directories hold Rust packages. stork-cli and stork-wasm are the top-level packages; everything else is a dependency.
  • js holds the Javascript source code.
  • test-assets holds binary assets required by Stork's functional tests.
  • local-dev holds configuration files, corpora, and index files required to build and run the test webpage used for local development.

You can build the project using either the Rust entrypoint or the Javascript entrypoint (build instructions are listed below). After you've built the project, you'll see three more directories:

  • target holds the output binary build artifacts
  • pkg holds intermediate WASM build artifacts
  • dist holds the final build artifacts for the web.

If you're interested in extracting the final Stork build artifacts, you can extract the following files after building the project with yarn build:

  • /target/release/stork
  • /dist/stork.js
  • /dist/stork.wasm

Building the project for production

  • just build-indexer will build the indexer binary to target/release/stork
  • just build-js will build the WASM binary and the Javascript bridging code to the dist directory
  • just build-federalist-index will build the federalist.st index file that's referenced throughout the project. It will output to local-dev/test-indexes/federalist.st.

Building the project for development

  • just build-indexer-dev will build the indexer binary
  • cargo run -- <CLI OPTIONS> will run the indexer binary
  • just build-dev-site will build the WASM and Javascript bridge code, build the federalist.st index, and package the development site
  • ./scripts/serve.sh will serve the development site

Take a look at the project's Justfile for more available scripts.

stork's People

Contributors

abalabahaha avatar arsenarsen avatar atul9 avatar bronzehedwick avatar denialadams avatar dependabot[bot] avatar healeycodes avatar jameslittle230 avatar jmooring avatar kkwteh avatar reese avatar supersandro2000 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

stork's Issues

Add fuzzy search

Hey @jameslittle230 , awesome project! I tried the demo and i really like the quality of search results. I'd love to know what your thoughts are on adding fuzzy search to the project to make the results even better πŸ˜ƒ

binary operation `==` cannot be applied to type `std::string::FromUtf8Error` upon install

Hi there, cool project you got going! Wanted to kick its tires but got this:

$ cargo install stork-search
    Updating crates.io index
  Downloaded stork-search v0.7.1
  Downloaded 1 crate (150.7 KB) in 1.54s
  Installing stork-search v0.7.1
  Downloaded wasm-bindgen v0.2.63
  Downloaded console_error_panic_hook v0.1.6
  Downloaded htmlescape v0.3.1
  Downloaded num-format v0.4.0
  Downloaded rust-stemmers v1.2.0
  Downloaded arrayvec v0.4.12
  Downloaded wasm-bindgen-macro v0.2.63
  Downloaded srtparse v0.2.0
  Downloaded nodrop v0.1.14
  Downloaded wasm-bindgen-macro-support v0.2.63
  Downloaded rmp-serde v0.14.3
  Downloaded wasm-bindgen-backend v0.2.63
  Downloaded wasm-bindgen-shared v0.2.63
  Downloaded bumpalo v3.4.0
  Downloaded rmp v0.8.9
   Compiling proc-macro2 v1.0.18
   Compiling unicode-xid v0.2.0
   Compiling syn v1.0.31
   Compiling serde v1.0.112
   Compiling wasm-bindgen-shared v0.2.63
   Compiling log v0.4.8
   Compiling cfg-if v0.1.10
   Compiling byteorder v1.3.4
   Compiling ryu v1.0.5
   Compiling bumpalo v3.4.0
   Compiling autocfg v1.0.0
   Compiling lazy_static v1.4.0
   Compiling itoa v0.4.5
   Compiling serde_json v1.0.55
   Compiling wasm-bindgen v0.2.63
   Compiling arrayvec v0.4.12
   Compiling nodrop v0.1.14
   Compiling srtparse v0.2.0
   Compiling htmlescape v0.3.1
   Compiling quote v1.0.7
   Compiling num-traits v0.2.12
   Compiling num-format v0.4.0
   Compiling rmp v0.8.9
   Compiling wasm-bindgen-backend v0.2.63
   Compiling wasm-bindgen-macro-support v0.2.63
   Compiling serde_derive v1.0.112
   Compiling wasm-bindgen-macro v0.2.63
   Compiling bincode v1.2.1
   Compiling toml v0.5.6
   Compiling rust-stemmers v1.2.0
   Compiling rmp-serde v0.14.3
   Compiling console_error_panic_hook v0.1.6
   Compiling stork-search v0.7.1
error[E0369]: binary operation `==` cannot be applied to type `std::string::FromUtf8Error`
  --> /home/kvz/.cargo/registry/src/github.com-1ecc6299db9ec823/stork-search-0.7.1/src/searcher/index_analyzer.rs:64:28
   |
64 |     VersionStringUtf8Error(std::string::FromUtf8Error),
   |                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: an implementation of `std::cmp::PartialEq` might be missing for `std::string::FromUtf8Error`

error[E0369]: binary operation `!=` cannot be applied to type `std::string::FromUtf8Error`
  --> /home/kvz/.cargo/registry/src/github.com-1ecc6299db9ec823/stork-search-0.7.1/src/searcher/index_analyzer.rs:64:28
   |
64 |     VersionStringUtf8Error(std::string::FromUtf8Error),
   |                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: an implementation of `std::cmp::PartialEq` might be missing for `std::string::FromUtf8Error`

error: aborting due to 2 previous errors

For more information about this error, try `rustc --explain E0369`.
error: failed to compile `stork-search v0.7.1`, intermediate artifacts can be found at `/tmp/cargo-installzmiYls`

Caused by:
  could not compile `stork-search`.

To learn more, run the command again with --verbose.

Does that ring any bell?

Requiring 3 characters to perform a search works poorly for logographic corpora

Code

Relevant typescript:

if (query.length >= 3) {

But I don't know if this affects indexing as well, or if it's strictly a search interface issue

Details

I have a corpus of documents which are mixed Chinese/English. Searching for the English "cat" works well. However trying to search for the Chinese character 猫 (cat) is not fruitful, because the search will not trigger unless I input at least three characters.

Given that the ratio of semantics to character count varies across languages, I think that this can lead to a frustrating user experience.


As an aside, thanks a lot for making this, I'm super excited to use this!

Search result grouping by a key (e.g. version number)

Hi. I would love to have some influence of groupings of the search result, for example by page version or sub-tree.

The best example I can show is from the docs.antora.com search result panel:

search-result-sample

The search result in the picture above is nicely arranged under a key, in this case the Antora version.
For stork, this key could be specified in the document object in the configuration file.

Intermittent CORS errors

Opening this issue to track a few reports of intermittent CORS issues. I haven't been able to reproduce them, but I suspect that the CORS headers are being sent correctly when fetching files from Cloudfront, but not when fetching from S3 directly.

Helpfully handle hyphens

James, excellent piece of work, thank you. I have incorporated into the search function for a gallery of user-submitted coats of arms (https://drawshield.net/gallery/)

My suggestion might be covered by your goal of "fuzzy" search, but a simple "quick hit" might be to treat hyphenated words as separate tokens (unless the hyphen is at the end of a line, perhaps). At the moment a search for "Hull" does NOT find "Kingston-upon-Hull" which I think it should...?

I am fortunate that I can "condition" my input as I programmatically create the .toml file, which includes the content and can just do a search and replace before running stork, but I wouldn't have this ability with file based input.

Hope this is useful,

Kind regards,

Karl

"HTML selector is not present in the document"

While writing jekyll-stork #96 I found out that indexing HTML requires the <main> tag to be present, but it seems that if a single page doesn't have it the whole indexing fails. The error is not very helpful, I had to read the code to find out about the <main> tag.

The issue is that it doesn't tell me which page is missing it. Could it tell the path and maybe even skip the page?

I'm wondering if it would be better to index the source files... from the documentation I understand that Stork can index the frontmatter, but at least on my sites there's metadata there that sometimes doesn't end up on the public site so I prefer to index the generated pages (mimicking what a web crawler would do).

Stork JS adds a <ul> to the dom while the index is loading

Something in the DOM logic is making it so the output div has a <ul class="stork-results"> child until Stork fully loads, at which point the child is removed from the DOM. This is a little rough, because the default theme has some styling that makes that <ul> visible.

You can see this on https://jameslittle.me, and it looks pretty rough.

  • Fix this bug, but then...
  • Figure out why this wasn't affecting the test setup

Question/Feature : Build an index in the browser

Is your feature request related to a problem? Please describe.
I'd like to let my users upload a JSON/csv of documents and build an index in the client.

Describe the solution you'd like
A webassembly engine for building indexes and the JS APIs for it.

Describe alternatives you've considered
I currently use NDX and build the index in a webworker.
I'm looking for a few missing features

Additional context
I'm not sure if this is possible in stork already ?

Uncaught assertion errors in `selectResult`

Congratulations on the 1.0 πŸŽ‰ Just something I noticed while looking at this for my personal site:

Steps to reproduce

  1. Visit https://stork-search.net
  2. While the search bar is still loading, type anything and hit Enter.
  3. Check console for uncaught error.

Browser versions: Firefox Nightly 86.0a1 (2020-12-27), Firefox Stable 82.0.2, Chrome Version 87.0.4280.88

The error happens here due to a bad null check; this.highlightedResult is undefined before this loads, so typeof undefined is "undefined", which is not null.

This should probably just be this.highlightedResult != null or just if (this.highlightedResult) depending on the expected behavior.


As a side note, I'd be happy to make a PR for this, but there's not really much documentation on getting the project setup, and a clean yarn run > build currently fails. I'd be happy to make a fix for this, but would love some guidance on getting this up and running for testing.

Also, stripping assertions in production with something like webpack-strip-assert or webpack-unassert-loader (although there are probably better options out there) might prevent some issues like this down the line, since it looks like there's a fair number of assertions in the TS files.

Searching 73000-document-large index in Web causes Error: (Chrome 89)

Your project is awesome!!!

I built a reasonably large index, ~73000 documents. (took about 48 minutes to build the index)

I filed this this issue as instructed in the console message:

If I find more details, I will amend or close and re-open a new issue.

This is error was in the developer tool console: (Chrome 89)

panicked at 'range end index 6 out of range for slice of length 5', src/index_versions/v3/search/entry_and_intermediate_excerpts.rs:151:33

panicked at 'assertion failed: `(left == right)`
  left: `true`,
 right: `false`: cannot recursively acquire mutex', /rustc/e1884a8e3c3e813aada8254edfa120e85bf5ffca/library/std/src/sys/wasm/../unsupported/mutex.rs:23:9

Publish Stork UI package to NPM

Step 2 from the web instructions shows an embedded script:

<script src="https://files.stork-search.net/stork.js"></script>
  1. Is this JS published to npm? What if we want to add it as a dependency to our project so its versioned?
  2. Do the .wasm file and .js file need to be shipped together? Ideally the wasm file could be published to npm as well.

Integration with Antora

As part of this element in the roadmap:
"Write and publish some integrations with other projects, e.g. various static site generators, Wordpress, etc."

Antora is a popular tool for generating static web sites with documentation written in Asciidoc. I have already integrated Stork in a proof-of-concept with our setup and would love to see more support specifically for Antora.

One specific wanted feature is e.g. an Antora plugin to let Antora as part of site generation also generate the configuration file for Stork so all generated HTML files are included for indexing.

config toml file samples in demo pages are missing [input] key

sample toml files in demo site are missing [input] key.

[input] # This is missing
base_directory = "test/federalist"
files = [
    {path = "federalist-1.txt", url = "/federalist-1/", title = "Introduction"},
    {path = "federalist-2.txt", url = "/federalist-2/", title = "Concerning Dangers from Foreign Force and Influence"},
    {path = "federalist-3.txt", url = "/federalist-3/", title = "Concerning Dangers from Foreign Force and Influence 2"},
    {path = "federalist-4.txt", url = "/federalist-4/", title = "Concerning Dangers from Foreign Force and Influence 3"},
    {path = "federalist-5.txt", url = "/federalist-5/", title = "Concerning Dangers from Foreign Force and Influence 4"},
    {path = "federalist-6.txt", url = "/federalist-6/", title = "Concerning Dangers from Dissensions Between the States"},
    {path = "federalist-7.txt", url = "/federalist-7/", title = "Concerning Dangers from Dissensions Between the States 2"},
    {path = "federalist-8.txt", url = "/federalist-8/", title = "The Consequences of Hostilities Between the States"},
    {path = "federalist-9.txt", url = "/federalist-9/", title = "The Union as a Safeguard Against Domestic Faction and Insurrection"},
    {path = "federalist-10.txt", url = "/federalist-10/", title = "The Union as a Safeguard Against Domestic Faction and Insurrection 2"}
]

[output]
filename = "federalist.st"

Some words not searchable

Hi!

I'm running into an issue where I'm unable to search for certain words in my index; in particular, I seem to run into issues with "intro" and "polymorphism". I have a series of blog posts. When I search them by number, their titles show up in the search list:

Screenshot_2020-12-27 Daniel's Blog(6)

However, when I try to look them up by keywords, I don't get the relevant results:

Screenshot_2020-12-27 Daniel's Blog(5)

The word "introduction" shows up, but not "intro"! A similar thing happens with "polymorphism". It's there when I search by part number, but any prefix of the string ("pol", "poly", and so on) doesn't give me any relevant results. I suspect that this may be the case because "polymorphism" is a rather uncommon word, and "intro" may technically not be a word at all, but a slang contraction of "introduction".

I'm not sure what kind of information would be helpful in debugging this. I can say, though, that this occurs even when the relevant file / article is the only one being indexed. Attached are two index files, one with all articles, and one with just the polymorphism article.

Thank you for making Stork!

Highlights 'off' in some searches

Hi! Congrats on the recent 1.0 release. Stork is looking pretty good!

I'm running into a strange highlight issue while integrating stork into my site. I'm able to get it to load the index and search, but when I get results, the stork-highlight elements seem misplaced. Here's what it looks like:

Screenshot_2020-12-27 Daniel's Blog(4)

Double checking, the issue is in HTML. Here's one of the entries:

<p class="stork-excerpt">
        ...the user. Also, when there’s no error, our
        co<span class="stork-highlight">mpiler d</span>oesn’t really tell us anything at all...
</p>

I'm not quite sure what's going on. From what I know, Stork should only be using the index, both for searching and displaying results, so the issue doesn't seem to be on my end. My index file is attached, in case it helps: index.st.zip

Thanks for making this! It's a breeze to set up and use.

Add a prettierignore file

From the Prettier docs:

It’s recommended to always make sure that prettier --write . only formats what you want in your project. Use a .prettierignore file to ignore things that should not be formatted.

prettier --write . does not do the right thing currently. There should be a prettierignore file so that Prettier only operates on the /js directory.

Include HTML meta "keywords" tag in search index

Hello James! Your work is impressive! - Stork is really great already, I love it and hope you will expand it and possibly get more people like yourself developing and maintaining it. That last point (of being a one-man side project) is the main reason my employer is reluctant to make the company use Stork. Otherwise we love the architecture and how it all works.

Just adding this feature request (and a couple of more on other PRs). Hope to see the feature in the roadmap.

HTML files have some meta data that is worth including in the index, specifically the html->head->meta name="keywords" tag.

Is there a querry limit?

When i run stork --search with a query that is above 7 characters, I get thread 'main' panicked at 'index 8 out of range for slice of length 7'. Is there a query limit of 7? If so, is the query limit removable?

Only download index + WASM when someone actually searches

Hi,

Making some progress getting going, and it looks like WASM is 400KB, index for my site is 600KB (gzipped it's 200KB, so hopefully CDN will shrink it down to the latter). Even so, this is still a bunch of downloads for someone who isn't actively using search, and it seems like both get downloaded automatically even for users who aren't searching.

There's some work that could be done to shrink the WASM, I imagine, based on the tinysearch writeup (https://endler.dev/2019/tinysearch), but also it would be nice if download of WASM + .st file only happened if user clicks in the search box. At that point the cost of using Stork to most users is just the cost of the stork.js.

There are many places where a 1MB download is quite expensive (https://a4ai.org/new-mobile-broadband-pricing-data-2018), so it would be nice to minimize unnecessary downloading.

at some point, content stops being indexed

Hi,

Thank for this great tool. I'm starting to integrate it into my blog.

However, it seems that very small portion of each file is indexed. After further investigation it seems to stop at the first character not being a letter or a digit or space.

To confirm that I've created a text of plain English. Just by putting a double quote (") alone anywhere in the middle, it stops the indexation. Meaning I can find words before but not a word after. I'm sure other non alphanumeric can trigger that as not all my input files have double quotes.

I have started looking at WordListGenerators but unfortunately I'm not skilled enough in Rust to see what's happening.

Regards

Limit the indexer's RAM usage

Hi. I have a vm with 0.5GB ram that i will be using to built the stork index. The toml has 7k records. Is there a way to limit how much ram stork can use?

Supporting large indexes - split the index into parts

Maybe it is possible to have the index split into a configurable number of chunks. That along with a good caching policy in the browser could allow for much bigger indexes. I'll summarize below in an order of decreasing complexity and impact.

  • Index chunking / sharding
  • Preserving unmodified chunks when updating the index as much as possible (will help with caching)
  • Abstract away the number of chunks and allow configuring to prioritize for speed or bandwidth saving (or somewhere in between)
  • "Save data" mode where the search can be triggered manually after the query is typed

Demo: compress index to save network bandwidth

The demo site https://stork-search.net/ downloads a 1.7M index of the Federalist Papers uncompressed over the network.

To better evaluate the impact of adding stork search to a static site, it would help to precompress said example index.

Brotli compression with default settings compresses the same index down to 322K.

Likewise for the 180K .wasm file.

Cut down on WASM file size by removing Debug implementations.

I used twiggy to take a look at the WASM file size for Stork, and it looks like a pretty big chunk of the file size is from Debug implementations.

stork % twiggy top target/wasm32-unknown-unknown/release/stork_search.wasm -n 10
 Shallow Bytes β”‚ Shallow % β”‚ Item
───────────────┼───────────┼────────────────────────────────────────────────────────────────
        539131 β”Š    23.94% β”Š custom section '.debug_str'
        398637 β”Š    17.70% β”Š custom section '.debug_info'
        300580 β”Š    13.35% β”Š custom section '.debug_line'
        183576 β”Š     8.15% β”Š custom section '.debug_ranges'
        147033 β”Š     6.53% β”Š custom section '.debug_pubnames'
         55564 β”Š     2.47% β”Š "function names" subsection
         51906 β”Š     2.31% β”Š data[0]
         23344 β”Š     1.04% β”Š custom section '.debug_aranges'
         13290 β”Š     0.59% β”Š rmp_serde::decode::Deserializer<R>::read_map::hb0358e684d16b82f
         13282 β”Š     0.59% β”Š rmp_serde::decode::Deserializer<R>::read_map::h7780ae98a8fdbf18
        525461 β”Š    23.34% β”Š ... and 707 more.
       2251804 β”Š   100.00% β”Š Ξ£ [717 Total Rows]

More importantly though, the Debug implementations are almost entirely unused in the WASM binary:

stork % twiggy garbage target/wasm32-unknown-unknown/release/stork_search.wasm  
 Bytes   β”‚ Size % β”‚ Garbage Item
─────────┼────────┼─────────────────────────────────────────
  539131 β”Š 23.94% β”Š custom section '.debug_str'
  398637 β”Š 17.70% β”Š custom section '.debug_info'
  300580 β”Š 13.35% β”Š custom section '.debug_line'
  183576 β”Š  8.15% β”Š custom section '.debug_ranges'
  147033 β”Š  6.53% β”Š custom section '.debug_pubnames'
   23344 β”Š  1.04% β”Š custom section '.debug_aranges'
    4883 β”Š  0.22% β”Š custom section '.debug_abbrev'
     723 β”Š  0.03% β”Š custom section '__wasm_bindgen_unstable'
     450 β”Š  0.02% β”Š custom section '.debug_pubtypes'
      75 β”Š  0.00% β”Š custom section 'producers'
 1598432 β”Š 70.98% β”Š Ξ£ [10 Total Rows]
   51906 β”Š  2.31% β”Š 1 potential false-positive data segments

For the most part, I think implementing Display on the types needed for printing error messages would be a better alternative than deriving Debug, since Debug is meant for, well, debugging, so they shouldn't be left around in the release binary.

I'd be happy to dig into this a bit more when I get the time, this is just what came up after a quick look at it.

Documentation: Index Format + Theory missing

I would like to study the in-memory index format without reading through the Rust source first.

I also would like to get a basic grasp as to what theoretical approach to index-based search has been chosen (inverted index? what variant? which optimizations? etc.).

Please document these things if possible - that would be great!

Allow result noun displayed in DOM to be configurable (currently "file")

In the list of search results, the noun is currently hardcoded as "file" and "files":

stork/js/entity.ts

Lines 46 to 53 in db1b958

if (this.totalResultCount === 0) {
console.log(this.wasmQueue);
return "No files found.";
} else if (this.totalResultCount === 1) {
return "1 file found.";
} else {
return `${this.totalResultCount} files found.`;
}

But I would like to be able to set it to "entry" and "entries" (or maybe "result" and "results"), as I'm embedding Stork as more of a dictionary search than a file search

Self-hosted installation

Hi,

I'd rather host the JavaScript for Stork on my domain; given I'm already on Netlify with its own CDN, opening more HTTP connections seems unnecessary. In addition, having a specific version means I can upgrade on my own schedule (versioned CDN URLs would help with the latter, but I'd still prefer self-hosted).

It sounds from the docs like this is not possible now, but it would be cool if it was.

Thanks!

Fatal error when parsing curly quotes in config.toml

U+201C and U+201D are left and right double quotation marks. Stork 1.0.0 doesn't seem to like them in the config.toml file.

Could not generate index: unexpected character found: `\u{201c}` at line 4 column 17

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.