Giter Club home page Giter Club logo

Comments (3)

bittercoder avatar bittercoder commented on August 29, 2024 1

I was more meaning changing the how the fulltext index statement is generated, here's a rough C# example that should hopefully help:

    public static void ConfigureMarten(DocumentMapping<MealAggregate> mapping)
    {
        Expression<Func<MealAggregate, object>>[] expressions =
            [x => x.Id!, x => x.Name!, x => x.Protein!, x => x.Source!, x => x.Created!, x => x.Updated!, x=>x.Deleted!];
        
        var members = expressions
            .Select(FindMembers.Determine).ToArray();

        var regexCleaningDocumentConfig = StringExtensions.Join(members.Select(x => $"REGEXP_REPLACE(COALESCE((data ->> '{x[0].Name}'),''), '[\\W_]+', ' ')"), " || ' ' || ");

        var index = mapping.FullTextIndex("english", expressions);
        var documentConfig = $"({regexCleaningDocumentConfig})";

        index.DocumentConfig = regexCleaningDocumentConfig;
    }

I haven't tested the above code, so there may well be problems - but hopefully this is enough to help you find a potential solution.

from marten.

bittercoder avatar bittercoder commented on August 29, 2024

If your using web style search then I would suggest having a look at websearch_to_tsquery and running your search text through that to see what the result vectors look like that are being searched for. Additionally running your source values (the url for example) through to_tsvector and seeing how that is translated.

Under the hood the full text index in postgres tokenises the indexed value. It does break apart urls, but I don't believe it does so in a way that would support your search use case e.g.

select to_tsvector('english', 'https://example-recipes.com/greek-chicken');

Will tokenise as:

'/greek-chicken':3 'example-recipes.com':2 'example-recipes.com/greek-chicken':1

None of those will match on a search term of recipe (or even using a 'starts with' lexeme with prefix matching wont work, because none of the tokens start with the word recipe.

In this instance you might want to change your approach to combine a free-text search with a like %term% type query over just the data->>Source property (though depending on how much data you have that will could be quite slow as it's likely to require a full table scan). Alternatively you could tweak the raw DocumentConfig of the indexes configuration in Marten to manipulate the data->>'Source' property being indexed to break up the url in a way that better fits your search use case (e.g. maybe replacing all non-letter characters with spaces via a regex, so that the individual words in the url like recipe will be searchable and stemmed correctly).

from marten.

alexmorask avatar alexmorask commented on August 29, 2024

@bittercoder Thanks a ton for the response.

With regards to your last suggestion of tweaking the index for the data->>'Source' property, are you referring to something along the lines of using a Calculated Index or adding this transformation elsewhere?

For what it's worth, I tried adding a calculated index to that field just to test it out:

let getSourceIndex = FunctionAs.LinqExpression<MealAggregate, obj> (fun (aggregate: MealAggregate) ->
    match aggregate.Source with
    | Some source ->
        Regex.Replace(source, @"[\W_]+", " ") |> box
    | None -> "" |> box)

options.Schema.For<MealAggregate>().Index(getSourceIndex, (fun x -> x.Name <- "test_index")) |> ignore

(The compiler wasn't playing nicely with automatically converting the F# function to a LINQ expression so I had to do it)

The index creation succeeded, but resulted in the following:

CREATE INDEX IF NOT EXISTS test_index
    ON public.mt_doc_mealaggregate USING btree
    ("((data -> 'Source'::text) ->> 'Value'::text)" COLLATE pg_catalog."default" ASC NULLS LAST)
    TABLESPACE pg_default;

which doesn't look like it's completing the Regex transformation I need. I don't see any examples for creating indexes in the docs outside of relatively simple property selection (_.Schema.For<User>().Index(x => x.UserName);), so I assume I'm just trying to do something unsupported.

Either way, greatly appreciate your help.

from marten.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.