This repository contains code for text classification of IMDB movie reviews. The goal is to predict whether a movie review is positive or negative. The dataset consists of movie reviews from the Internet Movie Database (IMDB).
The IMDB movie reviews dataset is used for training and evaluation. The dataset contains movie reviews as text, and each review is labeled as either positive or negative sentiment.
- The dataset is loaded using the
tensorflow_datasets
library, which provides the IMDB_reviews/subwords8k dataset with subword tokenization. - The training and test data are shuffled and padded to a fixed length using
padded_batch
. - The tokenizer is obtained from the dataset's information and used to convert text data to sequences and pad them.
- The LSTM-based model architecture consists of an Embedding layer followed by two Bidirectional LSTM layers to capture sequential information from the text.
- The model has a Dense layer for feature extraction and a final Dense layer with a sigmoid activation function for binary classification (positive or negative).
- The model is trained using binary cross-entropy loss and the Adam optimizer for 10 epochs.
- The training and validation accuracy and loss are plotted to assess the model's performance.
- The dataset is loaded using
tensorflow_datasets
, and the movie reviews and their corresponding labels are extracted. - The sentences are converted to sequences using
Tokenizer
, and the sequences are padded to a fixed length.
- The embedding-based model architecture consists of an Embedding layer followed by a Flatten layer to flatten the sequence data.
- The model then has a Dense layer for feature extraction and a final Dense layer with a sigmoid activation function for binary classification.
- The model is trained using binary cross-entropy loss and the Adam optimizer for 100 epochs.
- The training and validation accuracy and loss are plotted to assess the model's performance.
- The
save_for_embedding
function is included to save the word embeddings for visualization using the Embedding Projector.
Feel free to experiment with different hyperparameters, model architectures, or other methods to further improve the classification accuracy.
For any questions or suggestions, please contact Francesco Alotto. Happy movie review classification with AI! ๐ฅ๐ค