Giter Club home page Giter Club logo

django-tsvector-field's Introduction

django-tsvector-field Package Build Test Coverage

django-tsvector-field is a drop-in replacement for Django's django.contrib.postgres.search.SearchVectorField field that manages the database triggers to keep your search field updated automatically in the background.

Installation

Python 3+, Django 1.11+ and psycopg2 are the only requirements.

Install django-tsvector-field with your favorite python tool, e.g. pip install django-tsvector-field.

You have two options to integrate it into your project:

  1. Simply add tsvector_field to your INSTALLED_APPS and start using it. This method uses Django's pre_migrate signal to inject the database operations into your migrations. This will work fine for many use cases.

    However, you'll run into issues with this method if you have unmigrated apps or you have disabled migrations for your unit tests. The problem is related to the fact that Django does not send pre_migrate signal for apps that do not have explicit migrations.

  2. Less simple but more reliable method is to create your own database engine module referencing tsvector_field.DatabaseSchemaEditor. This will ensure that the database triggers are reliably created and dropped for all methods of migration.

    Create a 'db' directory in your Django project with an __init__.py and a base.py with the following contents:

    from django.db.backends.postgresql import base
    import tsvector_field
    
    class DatabaseWrapper(base.DatabaseWrapper):
        SchemaEditorClass = tsvector_field.DatabaseSchemaEditor

    Then update the 'ENGINE' configuration in your DATABASES setting. For example, if your project is called my_project and it has the db module as described above, then change your DATABASE setting to have the following 'ENGINE' configuration:

    DATABASES = {
        'default': {
            'ENGINE': 'my_project.db',
        }
    }

Usage

tsvector_field.SearchVectorField works like any other Django field: add it to your model, run makemigrations, run migrate and tsvector_field will take care to create the postgres trigger and stored procedure.

To illustrate how this works we'll create a TextDocument model with a tsvector_field.SearchVectorField field and two textual fields to be used as inputs for the full text search.

from django.db import models
import tsvector_field

class TextDocument(models.Model):
    title = models.CharField(max_length=128)
    body = models.TextField()
    search = tsvector_field.SearchVectorField([
        tsvector_field.WeightedColumn('title', 'A'),
        tsvector_field.WeightedColumn('body', 'D'),
    ], 'english')

After you've migrated you can create some TextDocument records and see that postgres keeps it synchronized in the background. Specifically, because the search field is updated at the database level, you'll need to call refresh_from_db() to see the new value after a .save() or .create().

>>> doc = TextDocument.objects.create(
...     title="My hovercraft is full of spam.",
...     body="It's what eels love!"
... )
>>> doc.search
>>> doc.refresh_from_db()
>>> doc.search
"'eel':10 'full':4A 'hovercraft':2A 'love':11 'spam':6A"

Note that spam is recorded with 6A, this will be important later. Let's continue with the previous session and create another document.

>>> doc = TextDocument.objects.create(
...     title="What do eels eat?",
...     body="Spam, spam, spam, they love spam!"
... )
>>> doc.refresh_from_db()
>>> doc.search
"'eat':4A 'eel':3A 'love':9 'spam':5,6,7,10"

Now we have two documents: first document has just one spam with weight A and the second document has 4 spam with lower weight. If we search for spam and apply a search rank then the A weight on the first document will cause that document to appear higher in the results.

>>> from django.contrib.postgres.search import SearchQuery, SearchRank
>>> from django.db.models.expressions import F
>>> matches = TextDocument.objects\
...     .annotate(rank=SearchRank(F('search'), SearchQuery('spam')))\
...     .order_by('-rank')\
...     .values_list('rank', 'title', 'body')
>>> for match in matches:
...   print(match)
...
(0.607927, 'My hovercraft is full of spam.', "It's what eels love!")
(0.0865452, 'What do eels eat?', 'Spam, spam, spam, they love spam!')

If you are only interested in getting a list of possible matches without ranking you can filter directly on the search column like so:

>>> TextDocument.objects.filter(search='spam')
<QuerySet [<TextDocument: TextDocument object>, <TextDocument: TextDocument object>]>

Final note about the tsvector_field.SearchVectorField field is that it takes a language_column argument instead of or in addition to the language argument. When both arguments are provided then the database trigger will first look up the value in the language_column and if that is null it will use the language in language.

Migrating

When adding a tsvector_field.SearchVectorField field to an existing model you likely want to update the search vector for all existing records. django-tsvector-field includes the tsvector_field.IndexSearchVector operation that takes the model name and search vector column as arguments. If we had previously created the TextDocument without a search column then to add search capability we would use the following migration:

from django.db import migrations, models
import tsvector_field

class Migration(migrations.Migration):

    dependencies = []

    operations = [
        migrations.AddField(
            model_name='textdocument',
            name='search',
            field=tsvector_field.SearchVectorField(columns=[
                tsvector_field.WeightedColumn('title', 'A'),
                tsvector_field.WeightedColumn('body', 'D')
            ], language='english'),
        ),
        tsvector_field.IndexSearchVector('textdocument', 'search'),
    ]

For more information on querying, see the Django documentation on Full Text Search:

https://docs.djangoproject.com/en/dev/ref/contrib/postgres/search/

For more information on configuring how the searches work, see PostgreSQL docs:

https://www.postgresql.org/docs/devel/static/textsearch.html

django-tsvector-field's People

Contributors

eukreign avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

django-tsvector-field's Issues

Support for indexing fields of related models?

Hey, awesome library! I was disappointed to see that Django does not offer any support out of the box for FTS for real applications - using triggers and store procedures instead of indexing on the fly - which obviously does not scale on even a modest level of traffic.

For our use case, we'd like to index fields on related models through joins. One simple example to allude to is the ability to search for orders by line item descriptions. Briefly digging into the codebase, it does not look like this is currently supported.

I'd be happy to contribute something to make this work. Has this feature been thought about before? Any recommendations for how this should be tackled?

tsvector not recomputed when a NULL column is set to a value.

I believe there is a bug in the generated trigger logic. If a NULL column is given a non-NULL value, this won't trigger an update to the tsvector field to include the new text.

The issue is with logic like:

     ELSIF (NEW."some_indexed_field" <> OLD."some_indexed_field") THEN do_update = true;

Since NULL <> 'something' is not true in SQL. The simplest solution I could come up with is to replace the <> tests with IS DISTINCT FROM.

Let me know if you'd like a pull request.

add to README tutorial something on Headline

Update the README with some instructions on how to use the Headline class and present matched excerpts from the result set....

https://github.com/damoti/django-tsvector-field/blob/master/tsvector_field/query.py#L5

from django.db.models.expressions import F
from django.contrib.postgres.search import SearchRank, SearchQuery
import tsvector_field

query = SearchQuery(self.form['text'].value())
qs = (
  qs.filter(text__search=query)
    .annotate(
      match_title=tsvector_field.Headline(F('text__title'), query),
      match_text=tsvector_field.Headline(F('text__text'), query)
    )
    .annotate(rank=SearchRank(F('text__search'), query))
    .order_by('-rank')
)

tsvector Field Already Exists

Following along with the tutorial I've added a 'search' field to an existing model and generated the migration file. But when I run 'migrate' the migrations fails with a ProgrammingError 'name-of-the-field_tsvector' already exists.

Running django-tsvector-field 0.9.3, django 1.11, and PostgreSQL 9.5.6

field should be deferred by default

The contents of the tsvector is an implementation detail in postgres, it's kind of pointless to keep shuttling this back and forth between Postgres and Django.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.