Information retrieval for Romanian documents.
- Diacritics: write with/without them and still get results
- Steamming: if you search for the word Mama, you cand find Mamele, memelor, mamei
- Stopwords: ignore words such as şi, în, a, cu, etc.
- Indexing and searching in any text files: .txt, .word, .rtf, .pdf, .html.
- Highlighter: find text snippets from a hit document, and highlight tokens matching the query.
- Limit search using: last modified date and file format.
- Java 1.8.0_73
- Apache Lucene 6.4.2
- Apache Tika 1.14
- Run main.IndexFiles for reading the files under the Docs/ folder.
- Run main.SearchFiles to look for the chosen words.