Giter Club home page Giter Club logo

fullproof's Introduction

FullProof

A javascript-based fulltext search engine library.

Fullproof provides a full stack of components for managing a search engine in javascript.

Its main features are:

  • Boolean and Scoring search engines available, depending on the kind of search your application needs
  • Automatic HTML5 storage detection, and graceful degradation, with a configurable constraint-based capabilities system. Currently manages WebSQL, IndexedDB and Memory data storage.
  • Full unicode support and normalization, diacritical marks removal, stemming and phonetical algorithms (currently available for english and french)
  • Configurable and very easely extensible parsing and token normalization system
  • Easy to integrate, zero external dependency, ~100k minified

Note that fullproof is NOT a document management system, it does only one thing: provide fulltext search to your application, it does not aim at storing documents or data.

##Licence

Fullproof is released under the terms of the Apache License, version 2.0, january 2004

##Useful Links

##Building

The tools directory contains build-all.sh that can be used to create a convenient fullproof-all.js file containing everything you might need to get going on a Fullproof project. Note that in a production system you may want to just include specific Javascript files, not everything (see the examples).

To build fullproof-all.js:

$ cd tools
$ ./build-all.sh

If you have the Google closure compiler (see https://developers.google.com/closure/compiler/) you might prefer to run

$ cd tools
$ CLOSURE_COMPILER_JAR=/path/to/your/compiler.jar ./build-all.sh

All output from the build process will appear in the top-level build directory. In particular, see build/js/fullproof-all.js.

##Contribute !

You can help improve fullproof and fulltext research by creating new algorithms:

  • Tokenizers for specific formats and/or languages (html, pdf, epub, etc, or any language where tokenization have special rules)
  • New normalizers: Normalizers help improve drastically the quality of the search. The current token normalizers for english (porter stemmer, metaphone, etc) are rather naive and can surely be enhanced. If you are a native speaker for a non-english language, you can also help by providing normalizers adapted to your language.
  • More stores. Think you can optimize the current stores implementation ? Or create a new store ? Go ahead!

You can fork fullproof at https://github.com/reyesr/fullproof

fullproof's People

Contributors

reyesr avatar terrycojones avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fullproof's Issues

search by categories and add a custom score to each result

Hi reyesr,
Thank you for creating such a powerful search engine!
I'm writing a script for a program that use javascript as its script language. But I'm new to javascript, and I have some difficulties to figure out a couple of things.
First, I'm using the "score engine", I know that we have a score property for each search result. Can we add a second score to each item?
For example, if I search for "tom", and I select "Tom Cruise" from the results, then "Tom Cruise" gains one point. Each time I select one item from the results, that item will gain one point. The score will be stored in the database. So the next time I search for a string, if one item's score (the score generated by "score engine") are equal to another one's score, the item with a higher score(read from the database) will be displayed in front of the item with a lower score.
Can I do this using fullproof?
Second, I want to add some categories to my data, so that I can search results by categories. For example, if I search for "people:tom", I can use regex to split it to "people" and "tom". "people" is the category, "tom" is the string I want to search. Then the search engine will only search for results in "people" category. How can I do this?

Sorry for my poor English. ;-)

Thanks
Zhiqiang li

Troubles with text splitting

If i try to split text which ends with the split object - index file will not be created.
For example: I have text, and i try to split it with ".", so if text ends with "." last element of the array it will be empty (even if there are spaces after, they will be trimmed with Fullproof engine). As a result callback in engine.open() doesn't execute

What are word boundaries symbols?

Is apostrophe considered not a proper letter? If yes how do I add it to the set of proper letters so that when searching for "it's" I stop getting results with "it".

indexeddb JS runtime error, Visual Studio - Windows Universal App (JS) - indexeddb_store.js

I have been receiving a runtime error in VS when trying to open an indexeddb store.

Seems like this line in indexeddb_store.js was causing it:
this.dbVersion = version || "1.0"; --> Corrected --> this.dbVersion = version || 1.0;

& now everything works fine.

P.S>
Important: The version number is an unsigned long long number, which means that it can be a very big integer. It also means that you can't use a float, otherwise it will be converted to the closest lower integer and the transaction may not start, nor the upgradeneeded event trigger. So for example, don't use 2.4 as a version number:
var request = indexedDB.open("MyTestDatabase", 2.4); // don't do this, as the version will be rounded to 2

The function returned by fullproof.make_synchro_point should never call the callback more than once

The function returned by fullproof.make_synchro_point does not check to see if it has already called the callback it is passed. This can result in the callback being called more than once (for example from fullproof.AbstractEngine.prototype.injectDocument). I don't know all the details about exactly when this happens, but I just spent several hours tracking this down (mainly scratching my head over my own code) and when I added a simple guard to make_synchro_point my problem immediately went away.

Pull request coming up.

Make IndexedDB work with web workers on Chrome

Web workers do not have access to the window object, so these lines do not work for finding the IndexedDB API

        fullproof.store.indexedDB =  window.indexedDB || window.webkitIndexedDB || window.mozIndexedDB || window.msIndexedDB;
        fullproof.store.IDBTransaction = window.IDBTransaction || window.webkitIDBTransaction || window.mozIDBTransaction || window.msIDBTransaction || {};

fortunately, on Chrome at least, indexedDB and IDBTransaction are both present in the global scope, so there is a trivial fix (coming up in a pull request).

store undefined in scoring-engine.js

Line 94 of scoring-engine.js has this:

callback(new fullproof.ResultSet(store.caps.getComparatorObject()));

But store does not seem to be defined.

how to know when injected documents are searchable?

i am a little confused, so my apologies if this is covered in the docs somewhere. i am inserting text into a fullproof search engine and the callback for injectDocument() is getting called. however when i try to do a lookup() on a word in the document it can take several minutes before it actually comes back with a match. (until then, the result set is false.) my questions are:

  1. is this long wait while it is actually building the index?
  2. if so, is there a way to speed it up (the text is only a few hundred words long)
  3. is there a callback i can set somewhere for when the entire index is ready to be searched?

here is what my code looks like:

searchEngine.injectDocument(text, node.id, function() { _doneIndexing(node); });

and the results, from console (each attempt was done about 15 sec apart):

booker.search('bluff');
undefined
nope! popup.js:312
booker.search('bluff');
undefined
nope! popup.js:312
booker.search('bluff');
undefined
nope! popup.js:312
booker.search('bluff');
undefined
fullproof.ResultSet {comparatorObject: Object, data: Array[1], last_insert: "7", insert: function, merge: function…}
 popup.js:309

(the function _doneIndexing() is called almost immediately after the inject happens, for what it is worth.) i wonder if it has to do with me doing multiple iterations of injecting documents? i step through dozens of nodes in a loop and inject the text content from each via the above method. i am trying to discern why i am getting these long delays and some strange lookup results.

thanks!
-jon

Can I use fullproof as a database for my application?

I'm writing an offline HTML5 app. People go to my website, and it loads an interface to manage their own personal data. No other server requests are made other than the initial loading of the page.

This data can be quite large, as the data might grow without my control. However I'd like my users to be able to safely backup their data, say to one or several JSON files to their local computer. So they can decide to maybe safely store their data somewhere else. Is this possible through fullproof? Also I'd want to re-import this data, but this is possible thanks to the examples shown on the homepage.

Any insight if this might work for my scenario? Also, last question, can I search for specific field-names? For example in Lucene you can do name:luca and it will only search for stuff in the field name.

Thanks and great job!

how to remove an item from an index?

how does one remove an item from the an index? basically the opposite of injecting, i guess. when i delete a document that the index references, i wish to remove it from the index as well.

thanks,
-jon

Add browser support to readme.

It would be nice if the readme had a list of supoorted browsers or even required technolgies so that users don't have to do a whole load of testing / research to know if they can use this very nice library. 👍

Index updating

Is it possible to update the indexes (ie. inject more items) after the initialization? How would one do that?

Cannot call method 'inject' of undefined

It looks like in this code (common-engine.js line 227), index can be undefined in the loop:

fullproof.AbstractEngine.prototype.injectDocument = function(text, value, callback) {
    var synchro = fullproof.make_synchro_point(function(data) {
        callback();
    });

    this.forEach(function(name, index, parser) {
        if (name) {
            parser.parse(text, function(word) {
                if (word) {
                    index.inject(word, value, synchro); // the line number is the value stored
                } else {
                    synchro(false);
                }
            })
        }
    }, false);
    return this;
};

I'm calling injectDocument on the result of engine.open and seeing a stack trace that looks like this:

Error in event handler for 'undefined': Cannot call method 'inject' of undefined TypeError: Cannot call method 'inject' of undefined
at chrome-extension://gbplakaibfhhognfeabdecdmbnildeil/vendor.js:2424:444
    at chrome-extension://gbplakaibfhhognfeabdecdmbnildeil/vendor.js:2414:306
    at f (chrome-extension://gbplakaibfhhognfeabdecdmbnildeil/vendor.js:2413:158)
    at parse (chrome-extension://gbplakaibfhhognfeabdecdmbnildeil/vendor.js:2413:308)
    at parse (chrome-extension://gbplakaibfhhognfeabdecdmbnildeil/vendor.js:2414:79)
    at chrome-extension://gbplakaibfhhognfeabdecdmbnildeil/vendor.js:2424:420
    at fullproof.AbstractEngine.forEach (chrome-extension://gbplakaibfhhognfeabdecdmbnildeil/vendor.js:2426:189)
    at fullproof.AbstractEngine.injectDocument (chrome-extension://gbplakaibfhhognfeabdecdmbnildeil/vendor.js:2424:391)
    at Object.Liber.Fullproof.add (chrome-extension://gbplakaibfhhognfeabdecdmbnildeil/background.js:822:18)
    at Object.Liber.Index.add (chrome-extension://gbplakaibfhhognfeabdecdmbnildeil/background.js:874:34)

Use with Node.js

First let me say Fullproof looks impressive. It is terrific to see the code is heavily commented and you've written a tutorial etc. If only all dev's did this.

I'm wondering whether you've given any thought to using Fullproof on the server with Node.js and say MongoDB. MongoDB doesn't have full text search.

I couldn't agree more with your comment on Browser based databases, it is truly a mess. I'm using WebSQL in one app and am looking at IndexedDB/WebSQL shims and a newish library that lets you write database access code that works on IndexedDB, WebSQL and LocalStorage. Let me know if you want links.

Finally any plans to provide build's of the minified and full code in a single JS file?

-Neville

Tutorial example hits an uncaught TypeError

This code in boolean-engine.js:

for (var i = 0; i < array_of_words.length; ++i) {
    unit.index.lookup(array_of_words[i], lookup_synchro);
}

causes a "Cannot call lookup method of undefined" error when browsing examples/tutorial.html.

Add trim() function for old IE

Most of the examples fall over in old because the .trim() function doesn't exist.

It may be worth including the following fix in within the library.

if(typeof String.prototype.trim !== 'function') {
   String.prototype.trim = function() {
     return this.replace(/^\s+|\s+$/g, ''); 
   }
 }

Couple questions about building indexes

Hi,

fullproof looks like something that i can use but one disadvantage is initial building of indexes. So couple questions/ideas about that:

Would it be possible to build indexes on server and send it (as json maybe?) to client together with actual data?

Can indexes be built by web workers in paralleled manner?

callback not called in indexeddb_store.js

From line 389 in indexeddb_store.js:

        openRequest.onupgradeneeded = function(ev) {
            createStores(ev.target.result, reqIndexArray, self.metaStoreName);
            updated = true;
        };

If the above case happens, neither the callback nor the errback passed to open will be called.

I've not experienced this, but from just looking at the code it looks like it might be a bug. (If it's not, maybe a comment could explain why the callback/errback is not called.) What does update affect?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.