Implements a spider to crawl trip advisor looking for restaurant reviews.
Project is created with:
- Python version: 3..
- Scrapy
To run this project, install it locally:
pip install -r requirements.txt
scrapy ReviewRestoTA --overwrite-output=TA_reviews/scrapped_data/scrapped_data.jl
Data from webscraping will be in ../trip_advisor_scrap/TA_reviews/TA_reviews/scrapped_data.
Preprocessed data obtained in first part of the project. Main delivery in Deliverable.ipynb Performed:
- Data cleaning
- Data exploration
- Tokenization, stemming, and lemmatization
- TF-IDF
- Python version: 3..
- nltk
- pycld2
- Wordcloud
Make sure to install dependencies before running the notebook. Also make sure that the steps taken in Deliverable 1 have all been taken.
pip install -r requirements.txt
Performed data augmentation and embedding methods on the preprocessed data from Deliverable 2. Main delivery in Deliverable3.ipynb Performed:
- Data augmentation
- Word2Vec
- LSI
- FasText
- SVD
- Python version: 3..
- gensim
- sklearn
- nltk
Make sure to install dependencies before running the notebook. Also make sure that the steps taken in Deliverable 2 have all been taken.
pip install -r requirements.txt