Giter Club home page Giter Club logo

haskell-sphinx-client's Introduction

A haskell implementation of a sphinx full text search client. Sphinx is a very fast and featureful full-text search daemon. Version 0.4 is Compatible with sphinx version 1.1-beta Version 0.5+ is Compatible with sphinx version 2.0, but you can instead pass the version-one-one build flag. On hackage.

Usage

Constructing Queries

The data type Query is used to represent queries to the server. It specifies a search string and the indexes to run the query on, as well as a comment, which may be the empty string. In order to run a query on all indexes, use "*" in the index field.

The convenience function query executes a single query and constructs the Query by itself, so you don't have to.

To execute more than one Query, use runQueries. Details are below in the section Batch Queries. To construct simple queries, you can also use simpleQuery :: Text -> Query which constructs a Query over all indexes. Don't forget that you can use record updates on a Query.

In extended mode you may want to escape special query characters with escapeString.

All interaction with the server, including sending queries and receiving results, is based on the Data.Text string type. You might therefore want to enable the OverloadedStrings pragma.

Excerpts and XML Indexes

buildExcerpts creates highlighted excerpts.

You will probably need to import the types as well:

import qualified Text.Search.Sphinx as Sphinx
import qualified Text.Search.Sphinx.Types as SphinxT

There is also an Indexable module for generating an xml file of data to be indexed.

Batch Queries

You can send more than one query per request to the server (which may enable server-side query optimization in certain cases. Refer to the Sphinx manual for details.) The function runQueries pipelines multiple queries together. If you are trying to combine the results, there are some helpers such as maybeQueries and resultsToMatches.

      mr <- Sphinx.maybeQueries sphinxLogger sphinxConfig [
                 SphinxT.Query query1 "db1" ""
               , SphinxT.Query query1 "db2" ""
               , SphinxT.Query query2 "db1" ""
               , SphinxT.Query query2 "db2" ""
               ]
      case mr of
        Nothing -> return Nothing
        Just rs -> do
          let combined = Sphinx.resultsToMatches 20 rs
          if null combined
             then return Nothing
             else return $ Just combined

A note for those transitioning from 0.5.* to 0.6: the function addQueries has been removed. You can now directly send a list of Query to the server by using runQueries, which will handle the serialization for you behind the scenes.

Encoding

The sphinx server itself does not know about encodings except for the difference between single-byte encodings and multi-byte encodings. It assumes that all incoming queries are already properly encoded and matches the raw bytes it receives; the same holds for the results returned by the server. Hence the responsibilty for using the proper encoding (and decoding) routines lies with the caller.

Version 0.6.0 of haskell-sphinx-client introduces the encoding field in both the Configuration data type and the ExcerptConfiguration data type. The library handles proper encoding and decoding in the background; just make sure you set the right encoding setting in the configuration!

Details

Implemenation

Implementation of API as detailed in the documentation. Most search and buildExcerpts features are implemented.

History

Originally written by Tupil and maintained by Chris Eidhof for an earlier version of sphinx. Greg Weber improved the library and updated it for the latest version of sphinx, and is now maintaining it. Aleksandar Dimitrov updated the library to use Text.

Usage of this haskell client

Tupil originally wrote this for use on a commercial project. This sphinx package is now finding some use in the Yesod community. Here is a well described example usage, but do keep in mind there is no requirement to tie the generation of sphinx documents to your web application, just your database. Used in Yesod applications yesdoweb.com and eatnutrients.com.

haskell-sphinx-client's People

Contributors

adimit avatar gregwebs avatar jbransen avatar luite avatar paul-rouse avatar snoyberg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

haskell-sphinx-client's Issues

Server-side UTF-8 encoding isn't detected and properly treated

Hello,

I've set both the mysql and the sphinx index encoding to utf8 (as opposed to sbcs.) When using searchd with the haskell-sphinx-client, encoding errors prevent proper searching and retrieval. Example as follows:

Take the letter ü (u with umlaut.) latin1 ü will be encoded as fc, or 252 (decimal.) Haskell uses the latter internally:

λ> "ü"
"\252"

In UTF-8, it's going to be a multi-byte character, and when sphinx is set to utf-8, searchd is going to treat it as such. It will be encoded as c3 bc, or 195 188 in decimal. In Haskell, I found it easiest to just use the Data.Text library to achieve this:

λ> import qualified Data.Text as T
λ> import Data.Text.Encoding (encodeUtf8)
λ> encodeUtf8 . T.pack $ "ü"
"\195\188"

I've inspected the TCP traffic between my Haskell client and searchd, and while haskell-sphinx-client sends the former format (which searchd doesn't "correctly" interpret, namely as a unicode character,) searchd sends back the latter format, which Haskell then misinterprets and represents as single bytes:

> putStrLn "\195\188"
ü

This is obviously not what I'd like to have.

Currently, I'm making use of the fact that I use Data.Text internally in my program and just use encodeUtf8 on everything I send to searchd via query or runQueries and then use decodeUtf8 on everything I get back from there, but, obviously, that's really not the best way to handle this.

It'd be nice to have some sort of mechanism to do this internally in haskell-shpinx-client, using either Data.Text or something similar.

Add back into stackage?

sphinx has dropped out of stackage nightly because of commercialhaskell/stackage@dce0852 (Michael Snoyman had it listed in his group of packages). I'd quite like to see it retained in stackage, so is there any reason not to put it back? Do you want to do it, or shall I list it under me?

Running too many queries with runQueries at once will raise "exception: too few bytes."

(I'm reporting this more as a reminder to myself and so that people can find it who might get bitten by this bug. I'll have to try and fix this later, but it might not be trivial.)

(There's also #4, which might be similar to this, but it's definitely a different bug.)

The title already says it all: if you run more than 32 queries at once with runQueries, you will get this nasty exception.

> let q = Query { queryString = "foo", queryIndexes = "*", queryComment = "" }
> runQueries searchConf (replicate 32 q)
-- … good stuff
> runQueries searchConf (replicate 33 q)
Error 1 "*** Exception: too few bytes. Failed reading at byte position 1650549796

This appears to happen before queries get sent to sphinx, so it seems a problem on our end. I won't have time to fix this immediately, but it's on my TODO-list. My own application frequently sends a bunch of queries like this, so I'll have to work around it (it's easy, just use Data.List.Split's chunksOf function.)

I'll probably have time to look at it, say, next weekend or so, hopefully earlier.

Publishing private utility code soon

We have some utility functions and helpers for integration tests built on top of this library as well as sphinx cli commands, that we will try to publish soon in a new repo.

I am noting here in case others have private utility functions they might consider publishing soon as well.

Consider wrapped exception for going beyond max_matches

If one uses a limit and offset that goes beyond the allowed max_matches, you get a low level Data.Binary error. Would be nice if you could get the underlying exception instead. Not important though.

Current error message: ... : Data.Binary.Get.runGet at position 4: not enough bytes

Underlying error message in mysql:

mysql> select * from note_core where match('note') limit 1000,1 option max_matches=1000;
ERROR 1064 (42000): query 0 error: offset out of bounds (offset=1000, max_matches=1000)

Feel free to close this issue if not a priority.

I haven't tried extensively to reproduce this separate from the larger program where this exception is occurring. I am just assuming that it is coming from this package.

Exception: too few bytes. Failed reading at byte position 1885696561

So I started the server:

chris@midnight:~/Projects/me/ircbrowse$ searchd
Sphinx 2.0.5-release (r3308)
Copyright (c) 2001-2012, Andrew Aksyonoff
Copyright (c) 2008-2012, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file './sphinx.conf'...
WARNING: compat_sphinxql_magics=1 is deprecated; please update your application and config
listening on all interfaces, port=9312
precaching index 'event_texts'
precached 1 indexes in 0.050 sec                            

And I ran this in GHCi:

λ>  query defaultConfig { host="localhost", port=9312 } "*" "potato"
Error 1 "*** Exception: too few bytes. Failed reading at byte position 1885696561

I already tried sphinx 2.0.6 first, then I tried this 2.0.5 release, same problem. What version of sphinx is the good version for this library?

Returning @count for group by queries

Quick question: Does this library return the per-group @count attribute for grouping results?

Here's the documentation from Sphinx:

5.7. Grouping (clustering) search results
The final search result set then contains one best match per group. Grouping function value and per-group match count are returned along as "virtual" attributes named @group and @count respectively.

http://sphinxsearch.com/docs/current.html

Exception: too few bytes

Arch has two sphinx packages in the repos: sphinx-release (0.9.9) and sphinx-svn which is development snapshot. I've tried with either package and a fresh cabal install of 0.5.2.1.

Sample code:

import Prelude                                                                                                                                                                                                                            
import Text.Search.Sphinx
import Text.Search.Sphinx.Types

main :: IO ()
main = do
    -- index exists, and "brighton" returns results when using
    -- the 'search' CLI tool
    res <- query config "renters-idx" "brighton"
    putStrLn $ show res

config = defaultConfig
    { port = 9312
    , mode = Any
    }

Error message from ghci:

*** Exception: too few bytes. Failed reading at byte position 2

What's seen in searchd logs:

WARNING: failed to receive client request body (client=127.0.0.1:37954, exp=159)

If you need other details, please let me know.

How to trigger fullscan

I have this in executeSearch:

res <- liftIO $ case qstring of
        "" -> S.query configFull index qstring
        _ -> S.query configDefault index qstring

and

configFull = S.defaultConfig
        { S.port = sport
        , S.mode = ST.Fullscan
        }

This results in

Error 1 "index gifts-idx: fullscan requires extern docinfo"

My config does have docinfo = extern. So something else is wrong, but what?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.