Giter Club home page Giter Club logo

Comments (3)

GoogleCodeExporter avatar GoogleCodeExporter commented on August 19, 2024
The approach you took for case-insensitive queries on an indexed string, is 
actually the way to get the best performance.

There are two ways actually, the alternative one uses less memory but isn't 
quite as fast in all cases.

The first one which you described: you add an extra field in the POJO to store 
a lowercase version of the string in the POJO, and then define an attribute on 
the lowercase version:

    public static final Attribute<Car, String> NAME = new SimpleAttribute<Car, String>("name") {
        public String getValue(Car car) { return car.nameInLowercase; }
    };

Alternatively, you could define the attribute as a function on the mixed-case 
string:

    public static final Attribute<Car, String> NAME = new SimpleAttribute<Car, String>("name") {
        public String getValue(Car car) { return car.name.toLowerCase(); }
    };

There are slight differences in performance between the two. The first one will 
be fastest in all cases, but will use more memory (i.e. storing 2 versions of 
the string). The second one will be fast if you build an index on the 
attribute, AND the index gets used to answer your queries. If the index doesn't 
get used for some queries (i.e. it isn't suitable for some query, or CQEngine 
thinks another index will be faster), then once CQEngine has built a candidate 
set from other indexes, it will use this attribute to filter results and will 
end up converting the name to lowercase at runtime. If memory isn't an issue 
then the first option is fastest.

It's not realistic for indexes themselves to support case-insensitive queries. 
The letters 'A' and 'a' are represented by different bytes, so navigating 
indexes in a case-insensitive manner would degrade performance. The easiest 
solution is usually to just build the index on either lowercase or uppercase 
versions of strings, and then convert the string in the query to lowercase or 
uppercase accordingly. As you have done :)

However, it might be possible to enhance attributes to flag them as being 
case-insensitive. That way, if CQEngine encountered a query on a 
case-insensitive attribute, it could automatically convert the query string to 
lowercase. Then you wouldn't need to think about it when writing queries. I'll 
think about this and probably bundle it in with the changes per the 
null-handling discussion. Thanks!

Original comment by [email protected] on 25 Oct 2012 at 11:36

  • Added labels: Type-Enhancement
  • Removed labels: Type-Defect

from cqengine.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 19, 2024

Original comment by [email protected] on 29 Oct 2012 at 9:55

  • Changed state: Started

from cqengine.

GoogleCodeExporter avatar GoogleCodeExporter commented on August 19, 2024
I'm shelving this idea for the time being.

This would be a useful feature, but I'm not sure about the cost-benefit of 
implementing it. I'm only about 40% in favour and 60% against this idea right 
now though, so if others would like it added, please add an "I want this" 
comment to this issue to vote for it.

It's currently fairly easy to have case-insensitive retrieval, using the 
approach above, so this really is a nice-to-have feature.

Implementing this feature, would probably require adding two new types of 
attribute: SimpleCaseInsensitiveAttribute and 
MultiValueCaseInsensitiveAttribute. I'd consider any patches to add the feature 
and I'm happy to answer questions from anyone who really wants to implement it.

"If in doubt, leave it out" is the motto for the time being. 

Original comment by [email protected] on 18 Nov 2012 at 10:35

  • Changed state: Shelved

from cqengine.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.