Comments (5)
Love this!
ilike
is case-insensitive. is the query that uses to_tsvector @@ to_tsquery
also case-insensitive?
Edit: It IS case-insensitive! I tested it on my own DB on a table with 700k+ rows and got major improvements:
Before - ILIKE
: Execution Time: 3617.160 ms
After - to_tsquery
: Execution Time: 25.707 ms
from farcaster-indexer.
Here's a query that takes a very long time and this is due to me adding the ORDER subquery
select * from casts WHERE text ILIKE '%of%' AND deleted=false order by reactions_count;
from farcaster-indexer.
- To optimize this call: https://github.com/gskril/searchcaster/blob/d1baa79242e7a3e1db3183a3c8b27c9b30bcf832/src/pages/api/search.js#L158-L179
which I'm simulating as follows
SELECT *
FROM casts
WHERE to_tsvector('english', text) @@ to_tsquery('english', 'searchcaster') AND deleted=false
ORDER BY reactions_count;
where the "reactions_count" slows down the query significantly. Without indexes:
postgres=# SELECT * FROM casts WHERE to_tsvector('english', text) @@ to_tsquery('english', 'searchcaster') AND deleted=false ORDER BY reactions_count;
Time: 2110.693 ms (00:02.111)
So I'm creating an index over all fields that could be queried as follows
CREATE INDEX casts_text_gin_index ON casts USING gin(to_tsvector('english', text));
CREATE INDEX casts_published_at_index ON casts USING btree(published_at, deleted, recasts_count, replies_count, watches_count, author_username);
New performance with indexes
postgres=# \timing
Timing is on.
postgres=# SELECT * FROM casts WHERE to_tsvector('english', text) @@ to_tsquery('english', 'searchcaster') AND deleted=false ORDER BY reactions_count;
Time: 29.832 ms
from farcaster-indexer.
Haven‘t found anything definitive in the docs, we can do tests though. But I think it goes even beyond understanding capitalization as it also understands verb conjugations etc
Converting tokens into lexemes. A lexeme is a string, just like a token, but it has been normalized so that different forms of the same word are made alike. For example, normalization almost always includes folding upper-case letters to lower-case, and often involves removal of suffixes (such as s or es in English).
https://www.postgresql.org/docs/9.4/textsearch-intro.html
from farcaster-indexer.
Yes! That looks to be the case. Funny that the docs say "almost always includes".
Anecdotally, it was case-insensitive when I tried
from farcaster-indexer.
Related Issues (8)
- Return all grandchildren and ancestors per cast fetch HOT 5
- when running `yarn run seed`: new row violates row-level security policy for table "profile" HOT 1
- Hardware requirements HOT 1
- Bug: function public.gen_random_bytes(integer) does not exist HOT 1
- Backfill error: `invalid byte sequence for encoding "UTF8": 0x00` HOT 1
- Insert links error: bind message has 38010 parameter formats but 0 parameters. HOT 2
- Workers are dropping messages during backfill
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from farcaster-indexer.