Giter Club home page Giter Club logo

segmentation_strategies_speech_translation's Introduction

Segmentation Strategies

This project consisted of comparing three popular VAD toolkits and understanding the outcome of applying automatic segmentations on a state-of-the-art multilanguage translation model compared to a cascaded one.

Screenshot

Manual Segmentations and reference translations can be found in the corresponding sets. The segmentations were performed using the three mentioned toolkits on a local machine. Next, the extraction of the MFCC features was performed on the same local machine by using the Kaldi toolkit. The resulting features (representing all the different segmentations) were uploaded on Google Drive. Then, through a GPU Hardware-accelerated Google Colab session, the cascaded end-to-end and cascaded were employed for each segmentation.


Results

Dominating segmentation strategy employing the end-to-end translation model



Language
pair
Best
segmentation
Segments
count
Segments
Difference*
BLEU score
Difference*
pt-es_test voxseg -s 0.90 1294 23.5% 17.7%
pt-es_valid voxseg -s 0.95 1139 11.7% 18.1%
it-en_test voxseg -s 0.95 1223 22.2% 14.9%
it-en_valid webrtcvad -p 2 1075 14.4% 9.8%
it-es_test voxseg -s 0.95
inaspeech -r 0.15
inaspeech -r 0.20
1223
1228
1335
22.2%
22.6%
30.8%
15.9%
it-es_valid webrtcvad -p 2 1075 14.4% 11.7%
es-en_test webrtcvad -p 0 1116 11.4% 6.5%
es-en_valid webrtcvad -p 0
webrtcvad -p 1
1082
1117
11.4%
14.6%
9.5%
pt-en_test voxseg -s 0.90 1294 23.5% 15.1%
pt-en_valid voxseg -s 0.95
inaspeech -r 0.05
1139
1199
11.7%
16.8%
16.5%

Table displaying the best-found segmentation toolkit, the corresponding parameter, and the number of segments created. *The table also shows the percentage difference in segments counts and BLEU score compared to the scores given by the end-to-end translation model when utilizing the manual segmentation

Dominating segmentation strategy when employing the cascaded translation model




Language
pair
Best
segmentation
Segments
count
Segments
Difference*
BLEU score
Difference*
pt-es_test voxseg -s 0.90 1294 23.5% 19.3%
pt-es_valid inaspeech -r 0.05 1199 16.8% 19.0%
it-en_test inaspeech -r 0.15
inaspeech -r 0.20
1228
1335
22.6%
30.8%
13.1%
it-en_valid webrtcvad -p 2 1075 14.4% 11.0%
it-es_test inaspeech -r 0.15 1228 22.6% 16.6%
it-es_valid voxseg -s 0.90
webrtcvad -p 2
991
1075
6.2%
14.4%
12.8%
es-en_test webrtcvad -p 0 1116 11.4% 6.0%
es-en_valid webrtcvad -p 1 1117 14.6% 7.7%
pt-en_test voxseg -s 0.90 1294 23.5% 16.6%
pt-en_valid inaspeech -r 0.05 1199 16.8% 17.4%

Table displaying the best-found segmentation toolkit, the corresponding parameter, and the number of segments created. *The table also shows the percentage difference in segments counts and BLEU score compared to the scores given by the cascaded translation model when utilizing the manual segmentation.

segmentation_strategies_speech_translation's People

Contributors

mihaitudor9 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.