Giter Club home page Giter Club logo

deepessay's Introduction

Logo

πŸ“‹ Table of Contents

πŸš€ Grade your IELTS essay with BERT

Welcome to the IELTS Essay Grading Web Application! This web app is designed to provide users with a convenient and efficient way to have their IELTS essays assessed and receive a predicted score using a Machine Learning model.

Python Flask TensorFlow BERT Docker Microsoft Azure REST

Demo

⭐ Features

  • Submit Essays: Users can submit their IELTS essays directly through the web application. The process is user-friendly and straightforward.

  • Machine Learning Essay Grading: The heart of this application is a finely-tuned BERT (Bidirectional Encoder Representations from Transformers) model. This model analyzes and assesses the submitted essays, considering a variety of linguistic and structural aspects.

  • Predicted Score: After processing the essay, the application provides users with a predicted IELTS score. This score is an estimate of how the essay might be rated in the actual IELTS exam, helping users gauge their writing proficiency.

  • Warning functionality: The application includes a warning feature that checks the submitted text. It will display a warning if the essay is too short or if the text does not meet the minimum requirements. This ensures that users are provided with guidance on submitting valid essays.

Warnings demo

πŸ“Š Model choice

Detailed training overview with EDA and Feature engineering can be found in the notebook.
Dataset: IELTS Writing Scored Essays Dataset

After analysing different approaches I decided to continue with 3 models:

  1. BERT fine-tuned for a regression task
  2. BERT output concatenated with numerical features
  3. BERT output concatenated with numerical and binary features

The model structures and corresponding Mean Absolute Error (MAE) metrics are shown in the figures below: Models structure Models MAE

Although more complex models produce better results, after testing, it was decided to use a text model for lower latency.

🧰 Tech Stack

  • Framework: Flask
  • NLP: TensorFlow, BERT, Hugging Face Transformers, Sklearn
  • Deployment: Docker, Microsoft Azure
  • Frontend: HTML, CSS, JavaScript
  • Version Control: Git, GitHub
  • Testing: REST client

πŸ“ Project structure

+---app
|   |   main.py
|   |   text_validation.py
|   |   __init__.py
|   |
|   +---ML
|   |   |   pipeline.py
|   |   |   __init__.py
|   |   |
|   |   \---models
|   |       +---training_bert_num
|   |       |
|   |       +---training_bert_num_bin
|   |       |
|   |       \---training_bert_text
|   |   
|   +---static
|   |
|   \---templates
|         index.html
|         warning.html
|   
+---assets
|
|   .gitignore
|   Dockerfile
|   IELTS_Grading_with_BERT.ipynb
|   LICENSE
|   README.md
\   requirements.txt

πŸ’» Run Locally

  1. Clone the project
  git clone https://github.com/Logisx/IELTS-Grading.git
  1. Go to the project directory
  cd my-project
  1. Install dependencies
  pip install -r requirements.txt
  1. Train a model in a notebook and save the weights to:
  ./app/ML/models/training_bert_text
  1. Start the server
  python app/main.py

πŸ—ΊοΈ Roadmap

  1. Testing features: Develop unit tests and integrations test.
  2. Data collection: Aggregate more data to improve accuracy.
  3. Educational insights feature: Along with the score, the application will offer insights and suggestions for improvement, making it a valuable educational tool for those looking to enhance their writing skills.

βš–οΈ License

MIT License

πŸ”— Links

linkedin

deepessay's People

Contributors

logisx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

deepessay's Issues

Could not open ./app/ML/models/training_bert_text/cp.ckpt

Hi, Im getting the following error after running deepessay,

If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.
2024-06-19 16:09:36.180746: W tensorflow/core/util/tensor_slice_reader.cc:98] Could not open ./app/ML/models/training_bert_text/cp.ckpt: FAILED_PRECONDITION: app/ML/models/training_bert_text/cp.ckpt; Is a directory: perhaps your file is in a different file format and you need to use a different restore operator?

directory index

app/ML/models/
β”œβ”€β”€ training_bert_num
β”‚   └── cp.ckpt
β”‚       β”œβ”€β”€ assets
β”‚       β”œβ”€β”€ fingerprint.pb
β”‚       β”œβ”€β”€ keras_metadata.pb
β”‚       β”œβ”€β”€ saved_model.pb
β”‚       └── variables
β”‚           β”œβ”€β”€ variables.data-00000-of-00001
β”‚           └── variables.index
β”œβ”€β”€ training_bert_num_bin
β”‚   └── cp.ckpt
β”‚       β”œβ”€β”€ assets
β”‚       β”œβ”€β”€ fingerprint.pb
β”‚       β”œβ”€β”€ keras_metadata.pb
β”‚       β”œβ”€β”€ saved_model.pb
β”‚       └── variables
β”‚           β”œβ”€β”€ variables.data-00000-of-00001
β”‚           └── variables.index
└── training_bert_text
    └── cp.ckpt
        β”œβ”€β”€ assets
        β”œβ”€β”€ fingerprint.pb
        β”œβ”€β”€ keras_metadata.pb
        β”œβ”€β”€ saved_model.pb
        └── variables
            β”œβ”€β”€ variables.data-00000-of-00001
            └── variables.index

while I can run deepeassy despite this error(??), I'm not sure if this might cause problems or not.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbe in position 135: invalid start byte

UnicodeDecodeError Traceback (most recent call last)
File ~\AppData\Local\Temp\ipykernel_16216\777528428.py:1
----> 1 bert_text_model.load_weights('training_bert_text/cp.ckpt')
2 bert_num_model.load_weights('training_bert_num/cp.ckpt')
3 bert_num_binary_model.load_weights('training_bert_num_bin/cp.ckpt')

File D:\PYH\Lib\site-packages\keras\src\utils\traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs)
67 filtered_tb = _process_traceback_frames(e.traceback)
68 # To get the full stack trace, call:
69 # tf.debugging.disable_traceback_filtering()
---> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb

File D:\PYH\Lib\site-packages\tensorflow\python\training\py_checkpoint_reader.py:92, in NewCheckpointReader(filepattern)
83 """A function that returns a CheckPointReader.
84
85 Args:
(...)
89 A CheckpointReader object.
90 """
91 try:
---> 92 return CheckpointReader(compat.as_bytes(filepattern))
93 # TODO(b/143319754): Remove the RuntimeError casting logic once we resolve the
94 # issue with throwing python exceptions from C++.
95 except RuntimeError as e:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbe in position 135: invalid start byte

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.