Giter Club home page Giter Club logo

fastindex's Introduction

Motivation

This is an experiment to improve performance of the indexing speed of HTML pages retrieved via a Chrome plugin. It will be used to test assumptions about performance enhancements in the Worldbrain Plugin.

Intended audience

People working on the Worldbrain plugin

Approach

We use currently the function innerText to get the concanetated content of the retrieved DOM into a string for further processing.

The tokenization of the string is done via the library lunr that offer a way to create a streaming pipeline with embedded and customizable tokenizer (as first part of a natural language processing library

Then the dexie library is used as a wrapper to IndexedDB. We use a multi-entry index to have a fast insert and retrieval path of text tokens (words)

To run the experiment, generate and install the plugin as explained in the documentation of the worldbrain plugin, open the console then run:

  1. run setup

await window.wasabi.fns.setup()

  1. run await window.wasabi.fns.indexNow() first to fill the cache (didn't verify yet that the cache is really filled)

  2. open the background page from the extension page

    • run setup again

    await window.wasabi.fns.setup()

    • open the development tools
    • open the performance tab
    • start recording
    • start indexing again

    await window.wasabi.fns.indexNow()

    • stop the recording

To test if a term (word) is in the IndexedDB use:

Searching for the term "wolf": await window.wasabi.db.notes.where('tokens').equals('wolf').toArray(val => val)

Code

The only really relevant file is src/background.js The rest are helper files to create the Chome plugin

Comments

  • As this a hack experiment, I just copied the list of dependencies from the Worldbrain plugin. It has a LOT of unnecessary dependencies
  • It has currently a hardcoded list of URL's
  • It is not yet covering the current needs of the Worldbrain project
  • The start of this work has been inspired bei a gist. Thanks to Nolan Lawson (@nolanlawson) for the inspiration.

To do

  • Write tests

Known bugs

  1. Some of the words are currently not found like "president" (from the wikipedia USA page)

fastindex's People

Contributors

poltak avatar bluesun avatar shishkabab avatar

Stargazers

Arpit avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.