Giter Club home page Giter Club logo

multilingual_similarity_compare's Introduction

  • ๐Ÿ‘‹ Hi, Iโ€™m Mastafa.
  • ๐Ÿ‘€ Iโ€™m interested in Machine Learning. As a Data Scientist at Microsoft, I help shape solutions to increase satisfaction across Microsoft products, currently focusing on Microsoft Copilot. As a teacher, I help understand NLP models and how we can use them in practical solutions.
  • ๐ŸŒฑ Iโ€™m currently learning all sorts of stuff in Deep Learning and Statistics.
  • ๐Ÿ’ž๏ธ Iโ€™m looking to collaborate on anything cutting edge!
  • ๐Ÿ“ซ How to reach me: use my Medium and my Linkedin to reach out to me.

multilingual_similarity_compare's People

Contributors

mastafaf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

multilingual_similarity_compare's Issues

Why is mean pooling giving such bad performance?

! sh similarity_distilBERT_batch.sh 50 mean True 

Output:

Confusion matrix:
langs   cs       de       en       es       fr       avg     
cs     0.00%   99.97%   99.97%   99.97%   99.97%   99.97%
de   100.00%    0.00%  100.00%  100.00%  100.00%  100.00%
en    99.97%   99.97%    0.00%   99.97%   99.97%   99.97%
es    99.97%   99.97%   99.93%    0.00%  100.00%   99.97%
fr    99.97%   99.97%   99.97%   99.97%    0.00%   99.97%
avg   99.98%   99.97%   99.97%   99.98%   99.98%   99.97%

Test XLM-R with mean pooling after solving issue on mean pooling

Input:

sh similarity_XLM-R_batch.sh 40 mean True 

Output:

Confusion matrix:
langs   cs       de       en       es       fr       avg     
cs     0.00%   91.31%   97.64%   98.37%   94.34%   95.41%
de    93.84%    0.00%   88.05%   92.87%   95.40%   92.54%
en    91.21%   77.79%    0.00%   72.56%   94.77%   84.08%
es    95.47%   93.14%   61.84%    0.00%   90.28%   85.18%
fr    91.97%   81.25%   77.12%   71.96%    0.00%   80.58%
avg   93.12%   85.87%   81.16%   83.94%   93.70%   87.56%

Input:

sh similarity_XLM-R_batch.sh 100 mean True 

Output:

Confusion matrix:
langs   cs       de       en       es       fr       avg     
cs     0.00%   91.24%   97.77%   97.80%   92.11%   94.73%
de    94.31%    0.00%   86.55%   92.44%   94.67%   91.99%
en    91.08%   74.93%    0.00%   71.56%   77.69%   78.81%
es    95.37%   88.38%   58.54%    0.00%   90.38%   83.17%
fr    92.71%   91.58%   87.41%   89.14%    0.00%   90.21%
avg   93.36%   86.53%   82.57%   87.74%   88.71%   87.78%

Time comparison between GPU and CPU

  • distilBERT with CPU, MAX_LEN = 100
    100% 3003/3003 [06:54<00:00, 7.30it/s]
    100% 3003/3003 [07:15<00:00, 7.25it/s]
    100% 3003/3003 [07:06<00:00, 7.75it/s]
    100% 3003/3003 [06:35<00:00, 7.91it/s]
    100% 3003/3003 [06:31<00:00, 8.07it/s]

  • distilBERT with GPU, MAX_LEN = 100:
    LENGTH OF TOTAL DOCUMENT 3003
    -- Encoder: 3003 sentences in 35s
    LENGTH OF TOTAL DOCUMENT 3003
    -- Encoder: 3003 sentences in 35s
    LENGTH OF TOTAL DOCUMENT 3003
    -- Encoder: 3003 sentences in 35s
    LENGTH OF TOTAL DOCUMENT 3003
    --Encoder: 3003 sentences in 35s
    LENGTH OF TOTAL DOCUMENT 3003
    -- Encoder: 3003 sentences in 34s

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.