Giter Club home page Giter Club logo

django-sphinx-db's Introduction

django-sphinx-db

Introduction

This is a simple Django database backend that allows interaction with Sphinx via SphinxQL. It is basically the default Django MySQL backend with some changes for Sphinx.

SphinxQL is a MySQL clone mode that Sphinx searchd supports. It allows you to query indexes via regular old SQL syntax. If you are using rt (real-time) indexes, you can also add and update documents in the index.

This backend is meant to be configued as a database in the Django settings.py.

This package provides a Manager class, SQLCompiler suite and supporting code to make this possible.

Usage

First of all, you must define a database connection in the Django configuration. You must also install the Sphinx database router and add django_sphinx_db to your INSTALLED_APPS list.

# Install django_sphinx_db:
INSTALLED_APPS += ('django_sphinx_db', )

# This is the name of the sphinx server in DATABASES:
SPHINX_DATABASE_NAME = 'sphinx'

# Define the connection to Sphinx
DATABASES = {
    'default': {
        # Your default database connection goes here...
    },
    SPHINX_DATABASE_NAME:  {
        'ENGINE': 'django_sphinx_db.backend.sphinx',
        # The database name does not matter.
        'NAME': '',
        # There is no user name or password.
        'USER': '',
        'PASSWORD': '',
        # Don't use localhost, this will result in using a UDS instead of TCP...
        'HOST': '127.0.0.1',
        'PORT': '9306',
    },
}

# ... and route accordingly ...
DATABASE_ROUTERS = (
    'django_sphinx_db.routers.SphinxRouter',
)    ```

Then define a model that derives from the SphinxModel. As usual, the model will be placed in models.py.

from django_sphinx_db.backend.models import SphinxModel, SphinxField

class MyIndex(SphinxModel):
    class Meta:
        # This next bit is important, you don't want Django to manage
        # the table for this model.
        managed = False

    name = SphinxField()
    content = SphinxField()
    date = models.DateTimeField()
    size = models.IntegerField()

Configuring Sphinx

Now you need to generate a configuration file for your index. A management command is provided to convert the model definition to a suitable configuration.

$ python manage.py syncsphinx >> /etc/sphinx.conf
$ vi /etc/sphinx.conf

The generated config file should be a good start however, you are urged to review the configuration against the [Sphinx configuration reference](http://sphinxsearch.com/docs/2.0.2/confgroup-index.html).

Using the Django ORM with Sphinx

You can now query and manage your real-time index using the Django ORM. You can insert and update documents in the index using the following methods. The example below uses the [fulltext library](https://github.com/btimby/fulltext) for reading file contents as plain text.

import os, time, fulltext

# Add a document to the index.
path = 'resume.doc'
st = os.stat(path)
MyIndex.objects.create(
    name = path,
    content = fulltext.get(path, ''),
    size = st.st_size,
    date = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(st.st_mtime)),
)

# Update a document in the index
doc = MyIndex.objects.get(pk=1)
doc.content = fulltext.get(path, '')
doc.size = st.st_size
doc.date = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(st.st_mtime))
doc.save()

You can perform full-text queries using the Django search operator. Read the Django documentation for more information.

MyIndex.objects.filter(content__search='Foobar')

The query is passed through directly to Sphinx, so the Sphinx extended query syntax is respected.

Unit Testing

The Sphinx backend for Django will ignore create_test_db and destroy_test_db calls. These calls will fail when the Sphinx database is configured, preventing you from running tests. However, this means that any configured Sphinx database will be used during testing. As long as you write your tests with this in mind, there should be no problem. Remember that you can use the TEST_NAME database connection parameter to redirect queries to a different database connection during test runs.

django-sphinx-db's People

Contributors

btimby avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

django-sphinx-db's Issues

Django 1.6.5 support?

Project hasn't been updated lately not sure if it's me or it just isn't supported.

sphinx_model.objects.filter(name__search='foo')

fails with:
sql_for_columns() takes exactly 4 arguments (5 given)

Timestamp support broken.

Because the MySQL backend is used as a base, for DateTime and Time columns the data is formatted in the MySQL way: "'2012-07-20 14:21:28'". However, the timestamp column type in Sphinx expects a unix timestamp: 1342814107.5690041. I know database backends in Django can do conversions to/from python objects, so we need to figure out how to make the Sphinx backend treat dates/times as Sphinx expects them.

Bad quoting for FK lookup

I'm using django-shpinx-db to index companies. I have a Company model which queries a PostgreSQL database as well as a CompanyIndex model.

To use the Company model as the id field of the Sphinx index, I used the following definition:

Company = models.ForeignKey(Company, db_column='id', primary_key=True)

This seems to work. I can create a CompanyIndex instance using a Company object and I can query the Company_id attribute.

>>> CompanyIndex.objects.create(Company=mycompany, name='my company', description='hello thar')
<CompanyIndex: CompanyIndex object>
>>> ci = CompanyIndex.objects.all()[0]
>>> ci.Company_id
1L

But querying the Company object directly does not work.

>>> ci.Company
...
DatabaseError: (1064, "sphinxql: syntax error, unexpected CONST_FLOAT, expecting BETWEEN (or 8 other tokens) near '.id = 1 '")

The problem seems to be that the wrong column name quoting is used. This is the executed query (as shown by using manage.py shell_plus --print-sql):

SELECT `id`, `customer_id`, `company_name`, ... FROM front_company WHERE front_company.id = 1 

In this query, backtick quoting is used. Postgres uses double quotes though.

Is this a bug in the router or in the database backend? Or do you know where else I can find the solution?

ORDER BY support

sphinx again has issues with the sql statement.

In [19]: EntryIndex.objects.all().order_by('-date')

errors:


DatabaseError: (1064, "sphinxql: syntax error, unexpected CONST_FLOAT, expecting $end near '.`date` A
SC LIMIT 21'")

i debugged the query sent to sphinx a bit:

In [27]: d.queries[0]['sql']

Out[27]: 'SELECT `id`, `date` FROM nfeeder_entryindex ORDER BY nfeeder_entryindex.`date` ASC LIMIT 21'

This doesn't work:

mysql> select `id`, `date` from app_modelindex order by app_MODELindex.date ASC LIMIT 2000;

These do however:

mysql> select `id`, `date` from app_modelindex order by date ASC LIMIT 2000;

mysql> select `id`, `date` from app_modelindex order `by date` ASC LIMIT 2000;

Filter by column

Problem

When filtering by a specific column, the column is ignored.

>>> CompanyIndex.objects.filter(company_name__search='my company')
SELECT `id`, `customer_id` FROM companies_rt WHERE MATCH ('@* my company') LIMIT 21

In this case, the correct SphinxQL query would be:

SELECT `id`, `customer_id` FROM companies_rt WHERE MATCH ('@company_name my company') LIMIT 21

This default @* at the beginning of the query causes problems when using the @-Syntax in the filtering query itself.

>>> CompanyIndex.objects.filter(company_name__search='@company_name my company')
SELECT `id`, `customer_id` FROM companies_rt WHERE MATCH ('@* @company_name my company') LIMIT 21

This fails of course.

Suggestion

I'd suggest that the standard query filtering accepts only basic syntax, no field names. It should also limit the filtering to the specified column name.

>>> CompanyIndex.objects.filter(company_name__search='my company')
SELECT `id`, `customer_id` FROM companies_rt WHERE MATCH ('@company_name my company') LIMIT 21

Additionally, you could add an unfiltered matching function, e.g.

>>> CompanyIndex.objects.match('@* software @city new york')

I guess this could be done by adding the function to the custom manager, but I'm not 100% sure about that.

SphinxQL syntax error in __search

q = EntryIndex.objects.filter(content__search='title')

In [42]:      q.model
Out[42]:    nfeeder.models.EntryIndex

In [43]: print q.query
SELECT `nfeeder_entryindex`.`id`, `nfeeder_entryindex`.`content`, `nfeeder_entryindex`.`date` 
FROM `nfeeder_entryindex` WHERE MATCH (`nfeeder_entryindex`.`content`) AGAINST
 (title IN BOOLEAN MODE);

I am trying to search the index with the orm. It works fine from the mysql command line with:

select * from nfeeder_entryindex where MATCH('title');

I am not sure if I am doing something wrong.

Support truncate

Currently, it's not possible to delete all items in the index.

In [8]: CompanyIndex.objects.all()[:].delete()
SELECT `id`, `customer_id` FROM sphinx_company_rt

DELETE FROM sphinx_company_rt WHERE id IN (22554, 22553, 22552, 22551, 22550, 22549, 22548, 22547, 22546, 22545, 22544, 22543, 22542, 22541, 22540, 22539, 22538, 22537, 22536, 22535)

The first SELECT without a limit returns only 20 entries, so only those 20 entries are removed.

Current trunk version added support for the TRUNCATE INDEX indexname command. That could help.

Maybe it's possible to alter the function of the .delete() statement? Otherwise, could we add a .truncate() option to execute a TRUNCATE INDEX statement?

LIMIT statement functionality

The default sphinx configuration returns 20 results, it would be nice to be able to configure that from django methods on the fly.

something like:

MyIndex.objects.filter(content__serach, limit=40)

I was thinking something similar for the offset functionality, but I don't know if the RT indexes support that yet.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.