Comments (31)
@nikcio I started having a look through all this yesterday and so far all I can say is what an amazing job you've done so far. I'll keep reviewing over this week and we can determine if there are any tweaks necessary. @nzdev also thanks a ton for your recent PRs and help. Hopefully can get this all merged in for xmas/NY and get what will most likely be a new major release out.
from examine.
Yeah for sure, sorry it's been a hectic month 😕 will get it out tomorrow. Thanks so much for pushing this along and all your support.
from examine.
@bergmania in Apache Lucene Faceted Search User's Guide there's written the following under 2.2 Facet Associations
So far we've discussed categories as binary features, where a document either belongs to a category, or not.
While counts are useful in most situations, they are sometimes not sufficiently informative for the user, with respect to deciding which subcategory is more important to display.
For this, the facets package allows to associate a value with a category. The search time interpretation of the associated value is application dependent. For example, a possible interpretation is as a match level (e.g., confidence level). This value can then be used so that a document that is very weakly associated with a certain category will only contribute little to this category's aggregated weight.
So it's possible to use this feature to get floating values. I'm not quite sure myself excatly how this is configured and used so I mostly just based the value type off the FacetResult type from the Lucene.Net.Facet package because it's the output of the different facet readers in Lucene.NET.
from examine.
@nikcio Sorry to keep you waiting on this one, I have all of these starred in my inbox and will get to them soon, just a bit swamped this week.
from examine.
Added support for efficient deep paging (SearchAfter) for faceted and non faceted search #321.
from examine.
@dealloc I believe #311 is very close to being done. Just waiting for the stars to align😅🤞
(I'm a little unsure what we are waiting on to be honest 😬😅)
from examine.
Status:
Here is how I see the status of the Examine repo. What do you think @Shazwazza and @nzdev?
Merged PRs (Release/4.0)
Needs to be merged from the Release/3.0 branch - This is done in #345
Still needs to be merged into Release/4.0
Still needs to be done
[This list is based on #345 being merged into release/4.0
]
The massive amount of work that has been going on have created some warnings in the project where some are more important than others. Here is a run down of the ones I think we should look at before a stable 4.0 release.
- Null reference warnings
LuceneIndex.cs - line 1212
LuceneIndex.cs - line 1288
- XML warnings
- There's a few places where there's still missing a little bit of XML docs.
Other PRs that could properly come in another future release:
Stale PRs?
Preview/Alpha/Beta release
I think the best way to get some kind of feel of what still needs to be done is only possible by making an early release and hear around the community for people to test it out. - This can be done when the current PRs in the "Still needs to be merged into Release/4.0" and #345 are merged in. @Shazwazza
from examine.
I think for now it would make sense to merge the pr that helps with V3 compatibility and then release a v4 as it's already a big change. After that it's possible to introduce another v4.x release with the other APIs for filtering, function queries, facet drill down and spatial.
from examine.
Would it be possible to have a Beta or RC build of a release with Facets feature of v3/v4?
from examine.
@Shazwazza any update on this? 😊 we would love to test this further and we have potential projects where facets would be useful, both in terms of commerce or regular Umbraco content.
from examine.
Just getting betas out now, just pushed 3.2.0-beta https://github.com/Shazwazza/Examine/releases/tag/v3.2.0-beta.9
from examine.
And this one out now too https://github.com/Shazwazza/Examine/releases/tag/v4.0.0-beta.1
from examine.
Wow @nikcio 🤩
Good work on this proposal :)
From a user point of view, I also like the approach 2 the most, but it requires more of the implementations.
Sorry for my lack of knowledge, but in what case will IFacetValue.Value
not be an integer/long?
from examine.
@Shazwazza I've created some PR's that could implement this proposal and add some great functionality / Documentation to Examine. Please let me know if I can do anything to help the PR's along.
PRs:
#311 (Facet implementation)
#312 (XML docs on facets and in the project where missing)
#313 (Nullable feature for project and facets feature)
from examine.
WIP https://github.com/nzdev/Examine/tree/v3/feature/facet-taxonomy Facet Taxonomy Index support. Needed for Hierarchical facets. Also is something like 20%-25% faster according to Lucene.Net docs.
from examine.
Has there been an update on this?
from examine.
I'm wondering if the facetconfig class could be abstracted away by setting hierarchy/ multi facets on the index fields instead
from examine.
Just to keep this thread continuous also see #345 (comment) (From @nzdev )
Here's what I'm thinking. Have the next release of Examine be 4.0, but avoid breaking API changes for 3.x. This means Umbraco 10 and 12 can choose to relax the allowed Examine version to be v3 or v4.
Steps:
- Merge PR Record v3 shipped API using Microsoft.CodeAnalysis.PublicApiAnalyzers #346 which tracks the shipped API for V3.
- Merge PR Fix compatibility with V3 API #347 which merges V3 into V4 and fixes any API compatibility issues.
- Rebase Merges the changes from the
release/3.0
branch torelease/4.0
branch #345 to fix any nullability / xml docs issues.- Add the new api's as unshipped to the txt files and release a beta
- Allow time for feedback, resolve feedback, add new api to the shipped.txt files. (regen the files to include tracking API nullability annotations. This is due to v3 not making nullability claims)
- Release 4.0
from examine.
from examine.
If possible I think it would be great if the support for Spatial API #328 is included in v4 as well.
Something we could have used in a recent project, is faceted search, but where one of the facets is search on items within a distance, e.g. 10, 20, .. or 100 km. It that case I guess facets would be combined with spatial search.
In this specific project we used something like this to combine in with the existing (filtered) query.
public LuceneSearchResults SearchByDistance(Query query, Coordinate coordinate, int distanceInKm, QueryOptions? options = null)
{
if (Index is not LuceneIndex luceneIndex)
throw new InvalidOperationException($"Index {Index.Name} is not a LuceneIndex");
int maxLevels = 11;
// Create an SpatialStrategy
var ctx = SpatialContext.Geo;
var strategy = new RecursivePrefixTreeStrategy(
new GeohashPrefixTree(ctx, maxLevels),
fieldName: Constants.Examine.CourseInstance.FieldNames.GeoLocation);
var lat = coordinate.Latitude;
var lng = coordinate.Longitude;
var results = DoSpatialSearch(ctx, strategy, luceneIndex, query, distanceInKm, lat, lng, options ?? QueryOptions.Default);
return results;
}
private static LuceneSearchResults DoSpatialSearch(
SpatialContext ctx, SpatialStrategy strategy,
LuceneIndex index, Query q, double distanceInKm, double lat, double lng,
QueryOptions options)
{
var searcher = (LuceneSearcher)index.Searcher;
var searchContext = searcher.GetSearchContext();
using ISearcherReference searchRef = searchContext.GetSearcher();
var indexSearcher = searchRef.IndexSearcher;
GetXYFromCoords(lat, lng, out var x, out var y);
var distance = DistanceUtils.Dist2Degrees(distanceInKm, DistanceUtils.EarthMeanRadiusKilometers);
// Make a circle around the search point
var shape = ctx.MakeCircle(x, y, distance);
var args = new SpatialArgs(
SpatialOperation.Intersects, shape);
// Create the Lucene Filter
var filter = strategy.MakeFilter(args);
// Create the Lucene Query
var query = strategy.MakeQuery(args);
var startingPoint = ctx.MakePoint(x, y);
var valueSource = strategy.MakeDistanceValueSource(startingPoint);
var sortByDistance = new Sort(valueSource.GetSortField(false)).Rewrite(indexSearcher);
ValueSourceFilter vsf = new ValueSourceFilter(new QueryWrapperFilter(query), valueSource, 0, distance);
var filteredSpatial = new FilteredQuery(new MatchAllDocsQuery(), vsf);
var spatialRankingQuery = new FunctionQuery(valueSource);
IList<BooleanClause> existingClauses = ((BooleanQuery)q).GetClauses();
BooleanQuery bq = new()
{
{ filteredSpatial, Occur.MUST },
{ spatialRankingQuery, Occur.MUST }
};
var includesStartDate = existingClauses.Where(x => x.Query.ToString().Contains("startDate")).Any();
foreach (var c in existingClauses)
{
var queryString = c.Query.ToString();
if (queryString.Contains("latestValidDate"))
{
if (!includesStartDate)
{
bq.Add(GetRangeQuery(queryString), Occur.MUST);
continue;
}
else
{
continue;
}
}
if (queryString.Contains("startDate"))
{
bq.Add(GetRangeQuery(queryString), Occur.MUST);
continue;
}
bq.Add(c);
}
int maxDoc = indexSearcher.IndexReader.MaxDoc;
var maxResults = Math.Min((options.Skip + 1) * options.Take, maxDoc);
maxResults = maxResults >= 1 ? maxResults : QueryOptions.DefaultMaxResults;
ICollector topDocsCollector = TopFieldCollector.Create(sortByDistance, maxResults, false, false, false, false);
indexSearcher.Search(bq, filter, topDocsCollector);
TopDocs topDocs = ((TopFieldCollector)topDocsCollector).GetTopDocs(options.Skip, options.Take);
var totalItemCount = topDocs.TotalHits;
var results = new List<ISearchResult>();
for (int i = 0; i < topDocs.ScoreDocs.Length; i++)
{
var result = GetSearchResult(i, topDocs, indexSearcher);
results.Add(result);
}
return new LuceneSearchResults(results, totalItemCount);
}
from examine.
Remaining tasks
- Merge #349
- Release a beta
- Allow time for feedback, resolve feedback, add new api to the shipped.txt files (Cut from unshipped and add to shipped)
- Release 4.0
from examine.
@nzdev + @nikcio the build for a potential beta is here https://github.com/Shazwazza/Examine/actions/runs/6165490399
If anyone has time, the artifacts have the created Nuget package, would be awesome if someone could test consuming that locally before I publish it to nuget.org?
from examine.
Works for me
from examine.
@Shazwazza Let's get the beast out there. I don't have time myself right now to test it but if it works for @nzdev that should be good enough to release the beta 🚀
from examine.
Hi @Shazwazza . Can you please publish the beta to Nuget.
Thanks
from examine.
I'm just trying to get the docfx build running against the release/v4.0 branch but it is failing which I think is due to having attributes on things that cannot be inherited, but we have a lot of so its a bit hard to go through them all. I've found a few that cannot inherit so will keep at it. I didn't want to Tweet the releases until the docs were up.
from examine.
Keeps failing with
[23-10-27 10:21:17.275]Error:Error extracting metadata for /github/workspace/src/Examine.Lucene/Examine.Lucene.csproj,/github/workspace/src/Examine.Core/Examine.Core.csproj,/github/workspace/src/Examine.Host/Examine.csproj: System.NullReferenceException: Object reference not set to an instance of an object
at Microsoft.DocAsCode.Metadata.ManagedReference.CopyInherited.InheritDoc (Microsoft.DocAsCode.Metadata.ManagedReference.MetadataItem dest, Microsoft.DocAsCode.Metadata.ManagedReference.ResolverContext context) [0x0007f] in <a8c39[85](https://github.com/Shazwazza/Examine/actions/runs/6672776075/job/18137309467#step:6:86)37c454be982e8eaec7eb97dd4>:0
at Microsoft.DocAsCode.Metadata.ManagedReference.CopyInherited+<>c__DisplayClass0_0.<Run>b__1 (Microsoft.DocAsCode.Metadata.ManagedReference.MetadataItem current, Microsoft.DocAsCode.Metadata.ManagedReference.MetadataItem parent) [0x00008] in <a8c398537c454be982e8eaec7eb97dd4>:0
at Microsoft.DocAsCode.Common.TreeIterator.Preorder[T] (T current, T parent, System.Func`2[T,TResult] childrenGetter, System.Func`3[T1,T2,TResult] action) [0x0000c] in <f27dcd834d6d4f32ac0a576c1732f2f1>:0
at Microsoft.DocAsCode.Common.TreeIterator.Preorder[T] (T current, T parent, System.Func`2[T,TResult] childrenGetter, System.Func`3[T1,T2,TResult] action) [0x00036] in <f27dcd834d6d4f32ac0a576c1732f2f1>:0
at Microsoft.DocAsCode.Common.TreeIterator.Preorder[T] (T current, T parent, System.Func`2[T,TResult] childrenGetter, System.Func`3[T1,T2,TResult] action) [0x00036] in <f27dcd834d6d4f32ac0a576c1732f2f1>:0
at Microsoft.DocAsCode.Common.TreeIterator.Preorder[T] (T current, T parent, System.Func`2[T,TResult] childrenGetter, System.Func`3[T1,T2,TResult] action) [0x00036] in <f27dcd834d6d4f32ac0a576c1732f2f1>:0
at Microsoft.DocAsCode.Metadata.ManagedReference.CopyInherited.Run (Microsoft.DocAsCode.Metadata.ManagedReference.MetadataModel yaml, Microsoft.DocAsCode.Metadata.ManagedReference.ResolverContext context) [0x00013] in <a8c398537c454be982e8eaec7eb97dd4>:0
at Microsoft.DocAsCode.Metadata.ManagedReference.YamlMetadataResolver.ExecutePipeline (Microsoft.DocAsCode.Metadata.ManagedReference.MetadataModel yaml, Microsoft.DocAsCode.Metadata.ManagedReference.ResolverContext context) [0x00015] in <a8c398537c454be982e8eaec7eb97dd4>:0
at Microsoft.DocAsCode.Metadata.ManagedReference.YamlMetadataResolver.ResolveMetadata (System.Collections.Generic.Dictionary`2[TKey,TValue] allMembers, System.Collections.Generic.Dictionary`2[TKey,TValue] allReferences, System.Boolean preserveRawInlineComments) [0x00092] in <a8c398537c454be982e8eaec7eb97dd4>:0
at Microsoft.DocAsCode.Metadata.ManagedReference.ExtractMetadataWorker+<ResolveAndExportYamlMetadata>d__19.MoveNext () [0x0003b] in <a8c398537c454be982e8eaec7eb97dd4>:0
at System.Collections.Generic.List`1[T].AddEnumerable (System.Collections.Generic.IEnumerable`1[T] enumerable) [0x00059] in <533173d24dae460[89](https://github.com/Shazwazza/Examine/actions/runs/6672776075/job/18137309467#step:6:90)9d2b10[97](https://github.com/Shazwazza/Examine/actions/runs/6672776075/job/18137309467#step:6:98)5534bb0>:0
at System.Collections.Generic.List`1[T]..ctor (System.Collections.Generic.IEnumerable`1[T] collection) [0x00062] in <533173d24dae460899d2b10975534bb0>:0
at System.Linq.Enumerable.ToList[TSource] (System.Collections.Generic.IEnumerable`1[T] source) [0x00018] in <5b415632df1f4365ae2242b1a257bb5b>:0
at Microsoft.DocAsCode.Metadata.ManagedReference.ExtractMetadataWorker.SaveAllMembersFromCacheAsync () [0x00be7] in <a8c3[98](https://github.com/Shazwazza/Examine/actions/runs/6672776075/job/18137309467#step:6:99)537c454be982e8eaec7eb97dd4>:0
at Microsoft.DocAsCode.Metadata.ManagedReference.ExtractMetadataWorker.ExtractMetadataAsync () [0x000c0] in <a8c398537c454be982e8eaec7eb97dd4>:0
see https://github.com/Shazwazza/Examine/actions/runs/6672776075/job/18137309467
from examine.
Fixed on #356 @Shazwazza
from examine.
I've raised a few prs that provide abstractions for the rest of the faceting feature set.
from examine.
What is blocking this feature currently from being released?
We're doing a rewrite of some pretty complex software that would greatly be simplified if Examine had facets out of the box (and geospatial, but that's not in context here)
from examine.
Nothing is blocking this, it is already released. I will close this proposal task. There's even docs for it https://shazwazza.github.io/Examine/articles/configuration.html#facets-configuration. Use the latest version of Examine for this functionality.
from examine.
Related Issues (20)
- And( q=> q.GroupedOr(...)) adds and (+) to the first term of the groupedOr HOT 3
- Abstaction of LuceneIndex.cs HOT 5
- Any plan to release new Version HOT 2
- Sorting and paging highlight same both menu items HOT 6
- Content without an English (default language) version is not indexed HOT 1
- Indexing new valuesets adds unique documents instead of updating existing with the same __NodeId HOT 8
- Synchronous indexing HOT 8
- Failed to retrieve indexer details. HOT 3
- Query by Id does not return search result HOT 5
- ❓ How to tell if an Examine Index is Healthy? Possible ASPNET HealthCheck 💡 HOT 2
- NativeQuery performance CPU usage HOT 5
- Same query but different results if executed as NativeQuery vs Fluent API HOT 7
- Hardcoded default limit of max 500 search results is not obvious
- GetMultiFieldQuery shouldn't return an empty lucene query if there are no field values
- Examine on load balanced environment HOT 8
- Getting Searcher Synchronously? HOT 4
- Lucene.Net.Index.CorruptIndexException: invalid deletion count: 2 vs docCount=1 HOT 7
- Wildcard search in `GroupedOr()` HOT 13
- How to make a boosted phrase with FluentAPI? HOT 2
- Field $facets was not indexed with SortedSetDocValues
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from examine.