Comments (1)
Hi,
LuceneRDD will not help you in this scenario. However, if you do have the data that created the prebuilt Lucene index, then LuceneRDD might be of help. You can create a "sharded" Lucene index on which you can process your large corpus data with.
from spark-lucenerdd.
Related Issues (20)
- Improve test coverage on Lucene Analyzers per field
- Remove dependency on sbt-spark-package HOT 2
- Support Scala 2.12 HOT 4
- Update to SBT 1.x HOT 1
- [Implicits] Support MapType for Spark DataFrames
- [question]want to knnSearch on for every record from other data frame - HOT 1
- Label Entity Linkage tasks using `sc.setJobGroup`
- Improve logging HOT 2
- Why is indexing entering a loop? HOT 4
- Weird results when running it distributed vs local HOT 5
- Help debugging a OOM issue when the search population increases HOT 3
- Question about blockdedup and call to count()
- Typesafe config is generating the error UTFDataFormatException: encoded string too long HOT 2
- Serialization Issue with org.apache.lucene.facet.FacetsConfig HOT 4
- RDD is removing null columns on fuzzy linking HOT 3
- How to search with lucenerdd in another queries' rdd? HOT 1
- Greatly improved linking performance
- Elasticsearch Snapshots? HOT 2
- Compiler warnings
- Spark 3.4.1 no longer ships slf4j HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spark-lucenerdd.