reyesr / fullproof Goto Github PK

View Code? Open in Web Editor NEW

666.0 44.0 46.0 1.89 MB

javascript fulltext search engine library

Home Page: http://reyesr.github.com/fullproof/

License: Apache License 2.0

JavaScript 100.00%

fullproof's Introduction

FullProof

A javascript-based fulltext search engine library.

Fullproof provides a full stack of components for managing a search engine in javascript.

Its main features are:

Boolean and Scoring search engines available, depending on the kind of search your application needs
Automatic HTML5 storage detection, and graceful degradation, with a configurable constraint-based capabilities system. Currently manages WebSQL, IndexedDB and Memory data storage.
Full unicode support and normalization, diacritical marks removal, stemming and phonetical algorithms (currently available for english and french)
Configurable and very easely extensible parsing and token normalization system
Easy to integrate, zero external dependency, ~100k minified

Note that fullproof is NOT a document management system, it does only one thing: provide fulltext search to your application, it does not aim at storing documents or data.

##Licence

Fullproof is released under the terms of the Apache License, version 2.0, january 2004

##Useful Links

The main web site is located at http://reyesr.github.com/fullproof/
The source code is hosted at GitHub's: https://github.com/reyesr/fullproof
Information can be found in the wiki: https://github.com/reyesr/fullproof/wiki
Bug reports and evolution requests can be reported at: https://github.com/reyesr/fullproof/issues

##Building

The tools directory contains build-all.sh that can be used to create a convenient fullproof-all.js file containing everything you might need to get going on a Fullproof project. Note that in a production system you may want to just include specific Javascript files, not everything (see the examples).

To build fullproof-all.js:

$ cd tools
$ ./build-all.sh

If you have the Google closure compiler (see https://developers.google.com/closure/compiler/) you might prefer to run

$ cd tools
$ CLOSURE_COMPILER_JAR=/path/to/your/compiler.jar ./build-all.sh

All output from the build process will appear in the top-level build directory. In particular, see build/js/fullproof-all.js.

##Contribute !

You can help improve fullproof and fulltext research by creating new algorithms:

Tokenizers for specific formats and/or languages (html, pdf, epub, etc, or any language where tokenization have special rules)
New normalizers: Normalizers help improve drastically the quality of the search. The current token normalizers for english (porter stemmer, metaphone, etc) are rather naive and can surely be enhanced. If you are a native speaker for a non-english language, you can also help by providing normalizers adapted to your language.
More stores. Think you can optimize the current stores implementation ? Or create a new store ? Go ahead!

You can fork fullproof at https://github.com/reyesr/fullproof

fullproof's People

Contributors

Stargazers

Watchers

fullproof's Issues

search by categories and add a custom score to each result

Hi reyesr,
Thank you for creating such a powerful search engine!
I'm writing a script for a program that use javascript as its script language. But I'm new to javascript, and I have some difficulties to figure out a couple of things.
First, I'm using the "score engine", I know that we have a score property for each search result. Can we add a second score to each item?
For example, if I search for "tom", and I select "Tom Cruise" from the results, then "Tom Cruise" gains one point. Each time I select one item from the results, that item will gain one point. The score will be stored in the database. So the next time I search for a string, if one item's score (the score generated by "score engine") are equal to another one's score, the item with a higher score(read from the database) will be displayed in front of the item with a lower score.
Can I do this using fullproof?
Second, I want to add some categories to my data, so that I can search results by categories. For example, if I search for "people:tom", I can use regex to split it to "people" and "tom". "people" is the category, "tom" is the string I want to search. Then the search engine will only search for results in "people" category. How can I do this?

Sorry for my poor English. ;-)

Thanks
Zhiqiang li

Does not seem to work in Explorer 10

The search field and button do not display. This was a test using the animal example.

closure compiler

Any plan for compiling with closure compiler?

I am interested in using togather with my indexeddb library https://bitbucket.org/ytkyaw/ydn-db

Troubles with text splitting

If i try to split text which ends with the split object - index file will not be created.
For example: I have text, and i try to split it with ".", so if text ends with "." last element of the array it will be empty (even if there are spaces after, they will be trimmed with Fullproof engine). As a result callback in engine.open() doesn't execute

What are word boundaries symbols?

Is apostrophe considered not a proper letter? If yes how do I add it to the set of proper letters so that when searching for "it's" I stop getting results with "it".

Invalid chars

Hello.

There are invalid chars in the source code.
Check the pich char at line 17
https://github.com/reyesr/fullproof/blob/master/src/misc/dataloader.js#L17
and line 30
https://github.com/reyesr/fullproof/blob/master/src/misc/dataloader.js#L30

Thanks.

indexeddb JS runtime error, Visual Studio - Windows Universal App (JS) - indexeddb_store.js

I have been receiving a runtime error in VS when trying to open an indexeddb store.

Seems like this line in indexeddb_store.js was causing it:
this.dbVersion = version || "1.0"; --> Corrected --> this.dbVersion = version || 1.0;

& now everything works fine.

P.S>
Important: The version number is an unsigned long long number, which means that it can be a very big integer. It also means that you can't use a float, otherwise it will be converted to the closest lower integer and the transaction may not start, nor the upgradeneeded event trigger. So for example, don't use 2.4 as a version number:
var request = indexedDB.open("MyTestDatabase", 2.4); // don't do this, as the version will be rounded to 2

The function returned by fullproof.make_synchro_point should never call the callback more than once

The function returned by fullproof.make_synchro_point does not check to see if it has already called the callback it is passed. This can result in the callback being called more than once (for example from fullproof.AbstractEngine.prototype.injectDocument). I don't know all the details about exactly when this happens, but I just spent several hours tracking this down (mainly scratching my head over my own code) and when I added a simple guard to make_synchro_point my problem immediately went away.

Pull request coming up.

tools/build-all.sh doesn't run on Mac OS X

There are several minor issues that stop the build-all.sh script from running cleanly.

Make IndexedDB work with web workers on Chrome

Web workers do not have access to the window object, so these lines do not work for finding the IndexedDB API

        fullproof.store.indexedDB =  window.indexedDB || window.webkitIndexedDB || window.mozIndexedDB || window.msIndexedDB;
        fullproof.store.IDBTransaction = window.IDBTransaction || window.webkitIDBTransaction || window.mozIDBTransaction || window.msIDBTransaction || {};

fortunately, on Chrome at least, indexedDB and IDBTransaction are both present in the global scope, so there is a trivial fix (coming up in a pull request).

injecting "constructor" word to index

If inserted string contains word "constructor" - engine fails during injecting.

store undefined in scoring-engine.js

Line 94 of scoring-engine.js has this:

callback(new fullproof.ResultSet(store.caps.getComparatorObject()));

But store does not seem to be defined.

how to use regex, wildcards or logical operators?

I'd like to search for "cat OR dog". What is the best way to do that?
If it's not possible how to search for more than one search string by sending only one request?

Step 1 of tutorial cant be completed - no fullproof-all.js exists.

The first step of the tutorial causes users to fall at the first herdle
<script type="text/javascript" src="fullproof-all.js"></script> does not exist.

how to know when injected documents are searchable?

i am a little confused, so my apologies if this is covered in the docs somewhere. i am inserting text into a fullproof search engine and the callback for injectDocument() is getting called. however when i try to do a lookup() on a word in the document it can take several minutes before it actually comes back with a match. (until then, the result set is false.) my questions are:

is this long wait while it is actually building the index?
if so, is there a way to speed it up (the text is only a few hundred words long)
is there a callback i can set somewhere for when the entire index is ready to be searched?

here is what my code looks like:

searchEngine.injectDocument(text, node.id, function() { _doneIndexing(node); });

and the results, from console (each attempt was done about 15 sec apart):

booker.search('bluff');
undefined
nope! popup.js:312
booker.search('bluff');
undefined
nope! popup.js:312
booker.search('bluff');
undefined
nope! popup.js:312
booker.search('bluff');
undefined
fullproof.ResultSet {comparatorObject: Object, data: Array[1], last_insert: "7", insert: function, merge: function…}
 popup.js:309

(the function _doneIndexing() is called almost immediately after the inject happens, for what it is worth.) i wonder if it has to do with me doing multiple iterations of injecting documents? i step through dozens of nodes in a loop and inject the text content from each via the above method. i am trying to discern why i am getting these long delays and some strange lookup results.

thanks!
-jon

Can I use fullproof as a database for my application?

I'm writing an offline HTML5 app. People go to my website, and it loads an interface to manage their own personal data. No other server requests are made other than the initial loading of the page.

This data can be quite large, as the data might grow without my control. However I'd like my users to be able to safely backup their data, say to one or several JSON files to their local computer. So they can decide to maybe safely store their data somewhere else. Is this possible through fullproof? Also I'd want to re-import this data, but this is possible thanks to the examples shown on the homepage.

Any insight if this might work for my scenario? Also, last question, can I search for specific field-names? For example in Lucene you can do name:luca and it will only search for stuff in the field name.

Thanks and great job!

how to remove an item from an index?

how does one remove an item from the an index? basically the opposite of injecting, i guess. when i delete a document that the index references, i wish to remove it from the index as well.

thanks,
-jon

Add browser support to readme.

It would be nice if the readme had a list of supoorted browsers or even required technolgies so that users don't have to do a whole load of testing / research to know if they can use this very nice library. 👍

Index updating

Is it possible to update the indexes (ie. inject more items) after the initialization? How would one do that?

Cannot call method 'inject' of undefined

It looks like in this code (common-engine.js line 227), index can be undefined in the loop:

fullproof.AbstractEngine.prototype.injectDocument = function(text, value, callback) {
    var synchro = fullproof.make_synchro_point(function(data) {
        callback();
    });

    this.forEach(function(name, index, parser) {
        if (name) {
            parser.parse(text, function(word) {
                if (word) {
                    index.inject(word, value, synchro); // the line number is the value stored
                } else {
                    synchro(false);
                }
            })
        }
    }, false);
    return this;
};

I'm calling injectDocument on the result of engine.open and seeing a stack trace that looks like this:

Error in event handler for 'undefined': Cannot call method 'inject' of undefined TypeError: Cannot call method 'inject' of undefined
at chrome-extension://gbplakaibfhhognfeabdecdmbnildeil/vendor.js:2424:444
    at chrome-extension://gbplakaibfhhognfeabdecdmbnildeil/vendor.js:2414:306
    at f (chrome-extension://gbplakaibfhhognfeabdecdmbnildeil/vendor.js:2413:158)
    at parse (chrome-extension://gbplakaibfhhognfeabdecdmbnildeil/vendor.js:2413:308)
    at parse (chrome-extension://gbplakaibfhhognfeabdecdmbnildeil/vendor.js:2414:79)
    at chrome-extension://gbplakaibfhhognfeabdecdmbnildeil/vendor.js:2424:420
    at fullproof.AbstractEngine.forEach (chrome-extension://gbplakaibfhhognfeabdecdmbnildeil/vendor.js:2426:189)
    at fullproof.AbstractEngine.injectDocument (chrome-extension://gbplakaibfhhognfeabdecdmbnildeil/vendor.js:2424:391)
    at Object.Liber.Fullproof.add (chrome-extension://gbplakaibfhhognfeabdecdmbnildeil/background.js:822:18)
    at Object.Liber.Index.add (chrome-extension://gbplakaibfhhognfeabdecdmbnildeil/background.js:874:34)

Works in nodejs?

Use with Node.js

First let me say Fullproof looks impressive. It is terrific to see the code is heavily commented and you've written a tutorial etc. If only all dev's did this.

I'm wondering whether you've given any thought to using Fullproof on the server with Node.js and say MongoDB. MongoDB doesn't have full text search.

I couldn't agree more with your comment on Browser based databases, it is truly a mess. I'm using WebSQL in one app and am looking at IndexedDB/WebSQL shims and a newish library that lets you write database access code that works on IndexedDB, WebSQL and LocalStorage. Let me know if you want links.

Finally any plans to provide build's of the minified and full code in a single JS file?

-Neville

Tutorial example hits an uncaught TypeError

This code in boolean-engine.js:

for (var i = 0; i < array_of_words.length; ++i) {
    unit.index.lookup(array_of_words[i], lookup_synchro);
}

causes a "Cannot call lookup method of undefined" error when browsing examples/tutorial.html.

Results vary from browser to browser

IE results differ to chrome. Why is this?

Add trim() function for old IE

Most of the examples fall over in old because the .trim() function doesn't exist.

It may be worth including the following fix in within the library.

if(typeof String.prototype.trim !== 'function') {
   String.prototype.trim = function() {
     return this.replace(/^\s+|\s+$/g, ''); 
   }
 }

IE10 throws InvalidAccessError

IE10 throws InvalidAccessError in indexeddb_store.js line 368

Couple questions about building indexes

Hi,

fullproof looks like something that i can use but one disadvantage is initial building of indexes. So couple questions/ideas about that:

Would it be possible to build indexes on server and send it (as json maybe?) to client together with actual data?

Can indexes be built by web workers in paralleled manner?

callback not called in indexeddb_store.js

From line 389 in indexeddb_store.js:

        openRequest.onupgradeneeded = function(ev) {
            createStores(ev.target.result, reqIndexArray, self.metaStoreName);
            updated = true;
        };

If the above case happens, neither the callback nor the errback passed to open will be called.

I've not experienced this, but from just looking at the code it looks like it might be a bug. (If it's not, maybe a comment could explain why the callback/errback is not called.) What does update affect?

reyesr / fullproof Goto Github PK

fullproof's Introduction

FullProof

fullproof's People

Contributors

Stargazers

Watchers

Forkers

fullproof's Issues

Recommend Projects

Recommend Topics

Recommend Org