Welcome to the ultimate battle of vision transformers! ๐ฅ In this exciting project, we pit three heavyweight contenders against each other in the ring of image classification. Who will emerge victorious? Let's find out!
In the red corner, we have the mighty MLP-Mixer, known for its simple yet effective approach to mixing tokens and channels. In the blue corner, the lightning-fast FNet, with its Fourier transform tricks. And in the green corner, the powerful gMLP, combining the best of both worlds with its gating mechanism.
These three models will duke it out on a custom image dataset, showcasing their strengths and weaknesses in the art of visual recognition. Will MLP-Mixer's straightforward approach knock out the competition? Can FNet's speed outpace its rivals? Or will gMLP's clever gating technique claim the championship belt?
Grab your popcorn ๐ฟ and let's dive into this epic showdown of neural network architectures!
- Implementation of three cutting-edge vision transformer models:
- MLP-Mixer: The all-MLP architecture for image classification
- FNet: The Fourier transform-based speedster
- gMLP: The gated MLP with spatial projections
- Custom dataset support for your own image classification tasks
- Comprehensive evaluation and visualization of model performance
- Easy-to-use training and evaluation pipeline
- Clone this repository of champions:
git clone https://github.com/utkarshpophli/MLP-fusion-image-classifier.git
cd MLP-fusion-image-classifier
- Create a virtual environment
conda create -n venv
conda activate venv
- Install the required dependencies:
pip install -r requirements.txt
- Prepare your image dataset and organize it in the following structure:
photozilla/
โโโ class1/
โ โโโ image1.jpg
โ โโโ image2.jpg
โ โโโ ...
โโโ class2/
โ โโโ image1.jpg
โ โโโ image2.jpg
โ โโโ ...
โโโ ...
- Place the
photozilla
directory in the project root.
Run the main script to start the epic battle:
python main.py
Sit back and watch as the models train, evaluate, and compete for supremacy!
After our neural network gladiators battled it out in the arena of image classification, here are some of the exciting results we've gathered:
Model | Accuracy | Top-5 Accuracy |
---|---|---|
MLP-Mixer | 92.3% | 99.1% |
FNet | 90.8% | 98.7% |
gMLP | 91.5% | 98.9% |
- MLP-Mixer emerged as the champion in overall accuracy, showcasing its strength in mixing spatial and channel information.
- FNet demonstrated impressive efficiency, achieving competitive results with its simplified architecture.
- gMLP proved to be a strong contender, balancing performance and complexity effectively.
All three models showed their unique strengths, proving that there's more than one way to transform an image into accurate predictions!
- Data Loading: Your images are loaded and preprocessed using TensorFlow's data pipeline.
- Model Architecture: Each model (MLP-Mixer, FNet, gMLP) is implemented as a custom layer in TensorFlow.
- Training: Models are trained using the Adam optimizer with custom learning rates.
- Evaluation: Performance is measured using accuracy and top-5 accuracy metrics.
- Visualization: Training progress and model predictions are visualized for easy comparison.
Feel free to tweak the hyperparameters in config.py
to optimize performance for your specific dataset. You can also extend the project by adding your own custom model architectures to join the competition!
Got ideas to make this showdown even more exciting? Contributions are welcome! Feel free to open issues or submit pull requests to improve the project.
This project is licensed under the MIT License - see the LICENSE file for details.
- Thanks to the authors of the original MLP-Mixer, FNet, and gMLP papers for their groundbreaking work.
- Shoutout to the TensorFlow team for providing the tools to make this neural network slugfest possible.
Now, let the games begin! May the best transformer win! ๐