This repo is me trying at implementing a search engine. I'm interested in NLP so I'd like to incorporate that aspect into the project. I don't know yet if an NLP approach to the search engine problem is even valid or highly unpractical.
- Importing a great amount of websites into a database [:heavy_check_mark:]
- Downloading huge amount of wiki pages [:heavy_check_mark:]
- Importing them into a mysql database [:heavy_check_mark:]
- Deciding, which indexing method sounds practical and interesting (document term matrix maybe) [:heavy_check_mark:] (went for reverse index)
- Indexing the websites in the decided manner [:heavy_check_mark:]
- Try to match search queries to documents by their similarity. Or isn't that how this works? I don't know yet. [:heavy_check_mark:] (listing results sorted by their tf_idf)