Handwritten character recognition using Pytorch and EMNIST
Create a conda environment with the packages listed in environment.yml.
To create an environment with the configuration from environment.yml, use the following command:
conda env create -f environment.yml
A new conda environment called "emnist" will be created.
Activate the conda environment:
conda activate emnist
The emnist handwritten character recornition dataset will be automatically downloaded by the code and the data will be organized into appropriate folder for pytorch to access.
It may take some minutes (~30) to organize the data.
As an alternative, you can download the organized data zip file from https://drive.google.com/drive/folders/1JLE0kz9ctZ4HI2vA6gZbec1MY1so5QLK?usp=sharing
(download size = ~420 MB, after extraction = 3.2 GB) and extract it in the folder ./data/emnist
.
The data directory should have the following structure:
./data/emnist
- train
- 0/<images>
- 1/<images>
...
- test
- 0/<images>
- 1/<images>
...
To train and test, run the following:
cd src/
python doall.py
The checkpoints are saved under the folder ./scratch/
.
Demo notebook with a trained checkpoint is here: https://github.com/InnovArul/emnist-pytorch/blob/master/demos/character_recognition_demo.ipynb
Imbalanced Data Sampler is copied from https://github.com/ufoym/imbalanced-dataset-sampler