Giter Club home page Giter Club logo

Comments (31)

Shazwazza avatar Shazwazza commented on July 29, 2024 4

@nikcio I started having a look through all this yesterday and so far all I can say is what an amazing job you've done so far. I'll keep reviewing over this week and we can determine if there are any tweaks necessary. @nzdev also thanks a ton for your recent PRs and help. Hopefully can get this all merged in for xmas/NY and get what will most likely be a new major release out.

from examine.

Shazwazza avatar Shazwazza commented on July 29, 2024 2

Yeah for sure, sorry it's been a hectic month 😕 will get it out tomorrow. Thanks so much for pushing this along and all your support.

from examine.

nikcio avatar nikcio commented on July 29, 2024 1

@bergmania in Apache Lucene Faceted Search User's Guide there's written the following under 2.2 Facet Associations

So far we've discussed categories as binary features, where a document either belongs to a category, or not.

While counts are useful in most situations, they are sometimes not sufficiently informative for the user, with respect to deciding which subcategory is more important to display.

For this, the facets package allows to associate a value with a category. The search time interpretation of the associated value is application dependent. For example, a possible interpretation is as a match level (e.g., confidence level). This value can then be used so that a document that is very weakly associated with a certain category will only contribute little to this category's aggregated weight.

So it's possible to use this feature to get floating values. I'm not quite sure myself excatly how this is configured and used so I mostly just based the value type off the FacetResult type from the Lucene.Net.Facet package because it's the output of the different facet readers in Lucene.NET.

from examine.

Shazwazza avatar Shazwazza commented on July 29, 2024 1

@nikcio Sorry to keep you waiting on this one, I have all of these starred in my inbox and will get to them soon, just a bit swamped this week.

from examine.

nzdev avatar nzdev commented on July 29, 2024 1

Added support for efficient deep paging (SearchAfter) for faceted and non faceted search #321.

from examine.

nikcio avatar nikcio commented on July 29, 2024 1

@dealloc I believe #311 is very close to being done. Just waiting for the stars to align😅🤞

(I'm a little unsure what we are waiting on to be honest 😬😅)

from examine.

nikcio avatar nikcio commented on July 29, 2024 1

Status:

Here is how I see the status of the Examine repo. What do you think @Shazwazza and @nzdev?

Merged PRs (Release/4.0)

Needs to be merged from the Release/3.0 branch - This is done in #345

Still needs to be merged into Release/4.0

  • #345
  • #339 - This gives better backwards compatibility for Umbraco

Still needs to be done

[This list is based on #345 being merged into release/4.0]

The massive amount of work that has been going on have created some warnings in the project where some are more important than others. Here is a run down of the ones I think we should look at before a stable 4.0 release.

  • Null reference warnings
    • LuceneIndex.cs - line 1212
    • LuceneIndex.cs - line 1288
  • XML warnings
    • There's a few places where there's still missing a little bit of XML docs.

Other PRs that could properly come in another future release:

Stale PRs?

Preview/Alpha/Beta release

I think the best way to get some kind of feel of what still needs to be done is only possible by making an early release and hear around the community for people to test it out. - This can be done when the current PRs in the "Still needs to be merged into Release/4.0" and #345 are merged in. @Shazwazza

from examine.

nzdev avatar nzdev commented on July 29, 2024 1

I think for now it would make sense to merge the pr that helps with V3 compatibility and then release a v4 as it's already a big change. After that it's possible to introduce another v4.x release with the other APIs for filtering, function queries, facet drill down and spatial.

from examine.

bjarnef avatar bjarnef commented on July 29, 2024 1

Would it be possible to have a Beta or RC build of a release with Facets feature of v3/v4?

from examine.

bjarnef avatar bjarnef commented on July 29, 2024 1

@Shazwazza any update on this? 😊 we would love to test this further and we have potential projects where facets would be useful, both in terms of commerce or regular Umbraco content.

from examine.

Shazwazza avatar Shazwazza commented on July 29, 2024 1

Just getting betas out now, just pushed 3.2.0-beta https://github.com/Shazwazza/Examine/releases/tag/v3.2.0-beta.9

from examine.

Shazwazza avatar Shazwazza commented on July 29, 2024 1

And this one out now too https://github.com/Shazwazza/Examine/releases/tag/v4.0.0-beta.1

from examine.

bergmania avatar bergmania commented on July 29, 2024

Wow @nikcio 🤩

Good work on this proposal :)

From a user point of view, I also like the approach 2 the most, but it requires more of the implementations.

Sorry for my lack of knowledge, but in what case will IFacetValue.Value not be an integer/long?

from examine.

nikcio avatar nikcio commented on July 29, 2024

@Shazwazza I've created some PR's that could implement this proposal and add some great functionality / Documentation to Examine. Please let me know if I can do anything to help the PR's along.

PRs:
#311 (Facet implementation)
#312 (XML docs on facets and in the project where missing)
#313 (Nullable feature for project and facets feature)

from examine.

nzdev avatar nzdev commented on July 29, 2024

WIP https://github.com/nzdev/Examine/tree/v3/feature/facet-taxonomy Facet Taxonomy Index support. Needed for Hierarchical facets. Also is something like 20%-25% faster according to Lucene.Net docs.

from examine.

dealloc avatar dealloc commented on July 29, 2024

Has there been an update on this?

from examine.

nzdev avatar nzdev commented on July 29, 2024

I'm wondering if the facetconfig class could be abstracted away by setting hierarchy/ multi facets on the index fields instead

from examine.

nikcio avatar nikcio commented on July 29, 2024

Just to keep this thread continuous also see #345 (comment) (From @nzdev )

Here's what I'm thinking. Have the next release of Examine be 4.0, but avoid breaking API changes for 3.x. This means Umbraco 10 and 12 can choose to relax the allowed Examine version to be v3 or v4.

Steps:

  1. Merge PR Record v3 shipped API using Microsoft.CodeAnalysis.PublicApiAnalyzers #346 which tracks the shipped API for V3.
  2. Merge PR Fix compatibility with V3 API #347 which merges V3 into V4 and fixes any API compatibility issues.
  3. Rebase Merges the changes from the release/3.0 branch to release/4.0 branch #345 to fix any nullability / xml docs issues.
  4. Add the new api's as unshipped to the txt files and release a beta
  5. Allow time for feedback, resolve feedback, add new api to the shipped.txt files. (regen the files to include tracking API nullability annotations. This is due to v3 not making nullability claims)
  6. Release 4.0

from examine.

nzdev avatar nzdev commented on July 29, 2024

#347 supersedes #339

from examine.

bjarnef avatar bjarnef commented on July 29, 2024

If possible I think it would be great if the support for Spatial API #328 is included in v4 as well.

Something we could have used in a recent project, is faceted search, but where one of the facets is search on items within a distance, e.g. 10, 20, .. or 100 km. It that case I guess facets would be combined with spatial search.

In this specific project we used something like this to combine in with the existing (filtered) query.

public LuceneSearchResults SearchByDistance(Query query, Coordinate coordinate, int distanceInKm, QueryOptions? options = null)
{
    if (Index is not LuceneIndex luceneIndex)
        throw new InvalidOperationException($"Index {Index.Name} is not a LuceneIndex");

    int maxLevels = 11;

    // Create an SpatialStrategy
    var ctx = SpatialContext.Geo;
    var strategy = new RecursivePrefixTreeStrategy(
                    new GeohashPrefixTree(ctx, maxLevels),
                    fieldName: Constants.Examine.CourseInstance.FieldNames.GeoLocation);

    var lat = coordinate.Latitude;
    var lng = coordinate.Longitude;

    var results = DoSpatialSearch(ctx, strategy, luceneIndex, query, distanceInKm, lat, lng, options ?? QueryOptions.Default);

    return results;
}

private static LuceneSearchResults DoSpatialSearch(
            SpatialContext ctx, SpatialStrategy strategy,
            LuceneIndex index, Query q, double distanceInKm, double lat, double lng,
            QueryOptions options)
  {
      var searcher = (LuceneSearcher)index.Searcher;
      var searchContext = searcher.GetSearchContext();

      using ISearcherReference searchRef = searchContext.GetSearcher();

      var indexSearcher = searchRef.IndexSearcher;

      GetXYFromCoords(lat, lng, out var x, out var y);

      var distance = DistanceUtils.Dist2Degrees(distanceInKm, DistanceUtils.EarthMeanRadiusKilometers);

      // Make a circle around the search point
      var shape = ctx.MakeCircle(x, y, distance);
      var args = new SpatialArgs(
                  SpatialOperation.Intersects, shape);

      // Create the Lucene Filter
      var filter = strategy.MakeFilter(args);

      // Create the Lucene Query
      var query = strategy.MakeQuery(args);

      var startingPoint = ctx.MakePoint(x, y);
      var valueSource = strategy.MakeDistanceValueSource(startingPoint);

      var sortByDistance = new Sort(valueSource.GetSortField(false)).Rewrite(indexSearcher);

      ValueSourceFilter vsf = new ValueSourceFilter(new QueryWrapperFilter(query), valueSource, 0, distance);
      var filteredSpatial = new FilteredQuery(new MatchAllDocsQuery(), vsf);
      var spatialRankingQuery = new FunctionQuery(valueSource);

      IList<BooleanClause> existingClauses = ((BooleanQuery)q).GetClauses();

      BooleanQuery bq = new()
      {
          { filteredSpatial, Occur.MUST },
          { spatialRankingQuery, Occur.MUST }
      };

      var includesStartDate = existingClauses.Where(x => x.Query.ToString().Contains("startDate")).Any();
      foreach (var c in existingClauses)
      {
          var queryString = c.Query.ToString();
          if (queryString.Contains("latestValidDate"))
          {
              if (!includesStartDate)
              {
                  bq.Add(GetRangeQuery(queryString), Occur.MUST);
                  continue;
              }
              else
              {
                  continue;
              }
          }
          if (queryString.Contains("startDate"))
          {
              bq.Add(GetRangeQuery(queryString), Occur.MUST);
              continue;
          }

          bq.Add(c);
      }

      int maxDoc = indexSearcher.IndexReader.MaxDoc;

      var maxResults = Math.Min((options.Skip + 1) * options.Take, maxDoc);
      maxResults = maxResults >= 1 ? maxResults : QueryOptions.DefaultMaxResults;

      ICollector topDocsCollector = TopFieldCollector.Create(sortByDistance, maxResults, false, false, false, false);

      indexSearcher.Search(bq, filter, topDocsCollector);

      TopDocs topDocs = ((TopFieldCollector)topDocsCollector).GetTopDocs(options.Skip, options.Take);

      var totalItemCount = topDocs.TotalHits;

      var results = new List<ISearchResult>();
      for (int i = 0; i < topDocs.ScoreDocs.Length; i++)
      {
          var result = GetSearchResult(i, topDocs, indexSearcher);
          results.Add(result);
      }

      return new LuceneSearchResults(results, totalItemCount);
  }

from examine.

nzdev avatar nzdev commented on July 29, 2024

Remaining tasks

  1. Merge #349
  2. Release a beta
  3. Allow time for feedback, resolve feedback, add new api to the shipped.txt files (Cut from unshipped and add to shipped)
  4. Release 4.0

from examine.

Shazwazza avatar Shazwazza commented on July 29, 2024

@nzdev + @nikcio the build for a potential beta is here https://github.com/Shazwazza/Examine/actions/runs/6165490399

If anyone has time, the artifacts have the created Nuget package, would be awesome if someone could test consuming that locally before I publish it to nuget.org?

from examine.

nzdev avatar nzdev commented on July 29, 2024

Works for me

from examine.

nikcio avatar nikcio commented on July 29, 2024

@Shazwazza Let's get the beast out there. I don't have time myself right now to test it but if it works for @nzdev that should be good enough to release the beta 🚀

from examine.

nzdev avatar nzdev commented on July 29, 2024

Hi @Shazwazza . Can you please publish the beta to Nuget.
Thanks

from examine.

Shazwazza avatar Shazwazza commented on July 29, 2024

I'm just trying to get the docfx build running against the release/v4.0 branch but it is failing which I think is due to having attributes on things that cannot be inherited, but we have a lot of so its a bit hard to go through them all. I've found a few that cannot inherit so will keep at it. I didn't want to Tweet the releases until the docs were up.

from examine.

Shazwazza avatar Shazwazza commented on July 29, 2024

Keeps failing with

[23-10-27 10:21:17.275]Error:Error extracting metadata for /github/workspace/src/Examine.Lucene/Examine.Lucene.csproj,/github/workspace/src/Examine.Core/Examine.Core.csproj,/github/workspace/src/Examine.Host/Examine.csproj: System.NullReferenceException: Object reference not set to an instance of an object
  at Microsoft.DocAsCode.Metadata.ManagedReference.CopyInherited.InheritDoc (Microsoft.DocAsCode.Metadata.ManagedReference.MetadataItem dest, Microsoft.DocAsCode.Metadata.ManagedReference.ResolverContext context) [0x0007f] in <a8c39[85](https://github.com/Shazwazza/Examine/actions/runs/6672776075/job/18137309467#step:6:86)37c454be982e8eaec7eb97dd4>:0 
  at Microsoft.DocAsCode.Metadata.ManagedReference.CopyInherited+<>c__DisplayClass0_0.<Run>b__1 (Microsoft.DocAsCode.Metadata.ManagedReference.MetadataItem current, Microsoft.DocAsCode.Metadata.ManagedReference.MetadataItem parent) [0x00008] in <a8c398537c454be982e8eaec7eb97dd4>:0 
  at Microsoft.DocAsCode.Common.TreeIterator.Preorder[T] (T current, T parent, System.Func`2[T,TResult] childrenGetter, System.Func`3[T1,T2,TResult] action) [0x0000c] in <f27dcd834d6d4f32ac0a576c1732f2f1>:0 
  at Microsoft.DocAsCode.Common.TreeIterator.Preorder[T] (T current, T parent, System.Func`2[T,TResult] childrenGetter, System.Func`3[T1,T2,TResult] action) [0x00036] in <f27dcd834d6d4f32ac0a576c1732f2f1>:0 
  at Microsoft.DocAsCode.Common.TreeIterator.Preorder[T] (T current, T parent, System.Func`2[T,TResult] childrenGetter, System.Func`3[T1,T2,TResult] action) [0x00036] in <f27dcd834d6d4f32ac0a576c1732f2f1>:0 
  at Microsoft.DocAsCode.Common.TreeIterator.Preorder[T] (T current, T parent, System.Func`2[T,TResult] childrenGetter, System.Func`3[T1,T2,TResult] action) [0x00036] in <f27dcd834d6d4f32ac0a576c1732f2f1>:0 
  at Microsoft.DocAsCode.Metadata.ManagedReference.CopyInherited.Run (Microsoft.DocAsCode.Metadata.ManagedReference.MetadataModel yaml, Microsoft.DocAsCode.Metadata.ManagedReference.ResolverContext context) [0x00013] in <a8c398537c454be982e8eaec7eb97dd4>:0 
  at Microsoft.DocAsCode.Metadata.ManagedReference.YamlMetadataResolver.ExecutePipeline (Microsoft.DocAsCode.Metadata.ManagedReference.MetadataModel yaml, Microsoft.DocAsCode.Metadata.ManagedReference.ResolverContext context) [0x00015] in <a8c398537c454be982e8eaec7eb97dd4>:0 
  at Microsoft.DocAsCode.Metadata.ManagedReference.YamlMetadataResolver.ResolveMetadata (System.Collections.Generic.Dictionary`2[TKey,TValue] allMembers, System.Collections.Generic.Dictionary`2[TKey,TValue] allReferences, System.Boolean preserveRawInlineComments) [0x00092] in <a8c398537c454be982e8eaec7eb97dd4>:0 
  at Microsoft.DocAsCode.Metadata.ManagedReference.ExtractMetadataWorker+<ResolveAndExportYamlMetadata>d__19.MoveNext () [0x0003b] in <a8c398537c454be982e8eaec7eb97dd4>:0 
  at System.Collections.Generic.List`1[T].AddEnumerable (System.Collections.Generic.IEnumerable`1[T] enumerable) [0x00059] in <533173d24dae460[89](https://github.com/Shazwazza/Examine/actions/runs/6672776075/job/18137309467#step:6:90)9d2b10[97](https://github.com/Shazwazza/Examine/actions/runs/6672776075/job/18137309467#step:6:98)5534bb0>:0 
  at System.Collections.Generic.List`1[T]..ctor (System.Collections.Generic.IEnumerable`1[T] collection) [0x00062] in <533173d24dae460899d2b10975534bb0>:0 
  at System.Linq.Enumerable.ToList[TSource] (System.Collections.Generic.IEnumerable`1[T] source) [0x00018] in <5b415632df1f4365ae2242b1a257bb5b>:0 
  at Microsoft.DocAsCode.Metadata.ManagedReference.ExtractMetadataWorker.SaveAllMembersFromCacheAsync () [0x00be7] in <a8c3[98](https://github.com/Shazwazza/Examine/actions/runs/6672776075/job/18137309467#step:6:99)537c454be982e8eaec7eb97dd4>:0 
  at Microsoft.DocAsCode.Metadata.ManagedReference.ExtractMetadataWorker.ExtractMetadataAsync () [0x000c0] in <a8c398537c454be982e8eaec7eb97dd4>:0

see https://github.com/Shazwazza/Examine/actions/runs/6672776075/job/18137309467

from examine.

nzdev avatar nzdev commented on July 29, 2024

Fixed on #356 @Shazwazza

from examine.

nzdev avatar nzdev commented on July 29, 2024

I've raised a few prs that provide abstractions for the rest of the faceting feature set.

from examine.

dealloc avatar dealloc commented on July 29, 2024

What is blocking this feature currently from being released?
We're doing a rewrite of some pretty complex software that would greatly be simplified if Examine had facets out of the box (and geospatial, but that's not in context here)

from examine.

Shazwazza avatar Shazwazza commented on July 29, 2024

Nothing is blocking this, it is already released. I will close this proposal task. There's even docs for it https://shazwazza.github.io/Examine/articles/configuration.html#facets-configuration. Use the latest version of Examine for this functionality.

from examine.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.