microsoft / inmt-lite Goto Github PK

Interactive Neural Machine Translation-lite (INMT-lite) is a framework to train and develop lite versions (.tflite) of models for neural machine translation (NMT) that can be run on embedded devices like mobile phones and tablets that have low computation power and space. The tflite models generated can be used to build the offline version of INMT mobile, a mobile version of INMT web.

License: MIT License

Python 3.24% Shell 0.06% Jupyter Notebook 6.00% CMake 0.43% C 0.02% C++ 89.53% Kotlin 0.38% Java 0.10% PowerShell 0.10% Roff 0.16%

inmt-lite's Issues

Constructing a script for automatic android build on provding model config

The model generated has to be manually pasted into the app asset folder and also some code has to be manually changed to build the app.

A script to automatically build the app using model configuration providing all the parameters about model including vocabulary and model path would smooth the process

models seem no longer available

wget https://inmtlite.blob.core.windows.net/inmtlite-public-access-models/Hi_Gondi_Deployment_mt5_model.zip
->
ERROR 409: Public access is not permitted on this storage account..

Training Convergence

Hi,
Thank you for your great work.
I followed your instructions and tried to train a TFLite model.
But it looks like the models do not converge well.

The loss is as follows:
Epoch 1 Batch 0 Loss 10.1416
Epoch 1 Batch 100 Loss 6.4002
Epoch 1 Batch 200 Loss 5.1489
Epoch 1 Training Loss 6.0853
Time taken for 1 epoch 76.00192999839783 sec

Epoch 1 Validation Loss 11.4574
Time taken for validation 2.3923656940460205 sec

......

Epoch 99 Batch 0 Loss 1.8126
Epoch 99 Batch 100 Loss 2.3800
Epoch 99 Batch 200 Loss 2.4746
Epoch 99 Training Loss 2.4855
Time taken for 1 epoch 52.326969385147095 sec

Epoch 99 Validation Loss 25.6011
Time taken for validation 2.445949077606201 sec

Epoch 100 Batch 0 Loss 2.5111
Epoch 100 Batch 100 Loss 2.7126
Epoch 100 Batch 200 Loss 2.5386
Epoch 100 Training Loss 2.5118
Time taken for 1 epoch 56.2775776386261 sec

Epoch 100 Validation Loss 25.9406
Time taken for validation 6.213912010192871 sec

And I tested the ACC of the model. It is only around 8.3.

What should I do to improve the performance of the model.

Build Data Pipelines for training on larger datasets

Presently, the model can train on 320,000 sentence pairs consisting 14 tokens each on a Tesla P100 GPU. The dataset is loaded all at once in memory.

Constructing data pipelines would allow picking up only batch size of data into memory allowing it to train on larger datasets

Update Readme with full documentation

Performance model

Hello experts,

Thank you for your contribution.

I tried to train a model with 90000 sentences from English to Spanish, but the performance of my model is not good.

I tried to change recurrent_hidden to 1024, also, I changed --src_word_vec_size and --tgt_word_vec_size, as you recomended, but the problem is the same.

Below I show the results for 20 epochs. As you can see, the validation lost is bad.

Could you help me, please?

  Epoch 10 Batch 0 Loss 5.6960
  Epoch 10 Batch 100 Loss 5.3570
  Epoch 10 Training Loss 5.4667
  Time taken for 1 epoch 529.2065329551697 sec
  
  Epoch 10 Validation Loss 10.6312
  Time taken for validation 60.65908360481262 sec
  
  Epoch 11 Batch 0 Loss 4.7367
  Epoch 11 Batch 100 Loss 5.1832
  Epoch 11 Training Loss 5.1685
  Time taken for 1 epoch 529.9154849052429 sec
  
  Epoch 11 Validation Loss 10.8019
  Time taken for validation 59.725284814834595 sec
  
  Epoch 12 Batch 0 Loss 4.7126
  Epoch 12 Batch 100 Loss 5.1652
  Epoch 12 Training Loss 5.1701
  Time taken for 1 epoch 532.0332324504852 sec
  
  Epoch 12 Validation Loss 11.5809
  Time taken for validation 60.39210081100464 sec
  
  Epoch 13 Batch 0 Loss 4.7674
  Epoch 13 Batch 100 Loss 5.1166
  Epoch 13 Training Loss 5.0376
  Time taken for 1 epoch 527.922877073288 sec
  
  Epoch 13 Validation Loss 10.9454
  Time taken for validation 60.049590826034546 sec
  
  Epoch 14 Batch 0 Loss 4.6097
  Epoch 14 Batch 100 Loss 5.0026
  Epoch 14 Training Loss 4.9551
  Time taken for 1 epoch 533.555011510849 sec
  
  Epoch 14 Validation Loss 11.2711
  Time taken for validation 62.934301137924194 sec
  
  Epoch 15 Batch 0 Loss 4.5907
  Epoch 15 Batch 100 Loss 4.6141
  Epoch 15 Training Loss 4.8234
  Time taken for 1 epoch 530.599461555481 sec
  
  Epoch 15 Validation Loss 11.9423
  Time taken for validation 59.88873314857483 sec
  
  Epoch 16 Batch 0 Loss 4.5649
  Epoch 16 Batch 100 Loss 4.8182
  Epoch 16 Training Loss 4.7378
  Time taken for 1 epoch 533.0982737541199 sec
  
  Epoch 16 Validation Loss 11.9251
  Time taken for validation 62.00506854057312 sec
  
  Epoch 17 Batch 0 Loss 4.2148
  Epoch 17 Batch 100 Loss 4.1748
  Epoch 17 Training Loss 4.5810
  Time taken for 1 epoch 530.5870060920715 sec
  
  Epoch 17 Validation Loss 11.7330
  Time taken for validation 60.24014854431152 sec
  
  Epoch 18 Batch 0 Loss 4.2759
  Epoch 18 Batch 100 Loss 4.6324
  Epoch 18 Training Loss 4.5561
  Time taken for 1 epoch 534.9815211296082 sec
  
  Epoch 18 Validation Loss 12.2905
  Time taken for validation 61.87094283103943 sec
  
  Epoch 19 Batch 0 Loss 4.3838
  Epoch 19 Batch 100 Loss 4.7557
  Epoch 19 Training Loss 4.4805
  Time taken for 1 epoch 533.8275711536407 sec
  
  Epoch 19 Validation Loss 12.4593
  Time taken for validation 62.41018629074097 sec
  
  Epoch 20 Batch 0 Loss 4.0156
  Epoch 20 Batch 100 Loss 4.3589
  Epoch 20 Training Loss 4.3792
  Time taken for 1 epoch 533.2257282733917 sec
  
  Epoch 20 Validation Loss 12.6180
  Time taken for validation 63.08395957946777 sec

Missing files

You mentioned that for making the model Android-Compatible: We use an entirely different tokenization procedure.
Could you let us know where are these files?

Run final_tokenizer_train.py
Run spm_extractor.py

I couldn't find them in GitHub.

Plotting training graph

We have to provide a mechanism to plot training and validation graphs by storing the metrics

Allowing to continue train for a given checkpoint

Present implementation does not allow to continue training after an interruption by restoring the last checkpoint.

microsoft / inmt-lite Goto Github PK

inmt-lite's Issues

Constructing a script for automatic android build on provding model config

models seem no longer available

Training Convergence

Build Data Pipelines for training on larger datasets

Update Readme with full documentation

Performance model

Missing files

Plotting training graph

Allowing to continue train for a given checkpoint

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent