Giter Club home page Giter Club logo

parp-asr's Introduction

PARP

Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition

The PARP paper presented a simple and efficient pruning method for sparse subnetwork discovery from self-supervised pre-trained initializations (wav2vec 2.0/XLSR-53) that can be finetuned to the same downstream low-resource ASR results. The authors have conducted extensive experiments in various speech related tasks to demonstrate the algorithm's effectiveness in performing better with pruning and finetuning pretrained models

PARP Steps

  1. Language-Agnostic Initial Subnetwork - Directly prune pre-trained SSL such as wav2vec2/XLSR at target sparsity, and obtain an initial subnetwork and an initial pruning mask. Alternatively, prune a non-target language finetuned wav2vec2/XLSR.
  2. Language-Aware Subnetwork Adjustment - Finetune the initial subnetwork on target downstream task/language. During finetuning, zero out the pruned weights specified by the pruning mask, but allow the weights be updated by gradient descent during backpropogation. After a few number of model updates, re-prune the updated subnetwork at target sparsity again.

One of the experimental limitations as mentioned in the paper is that their experiments are on relatively large pre-trained models (315M parameters for wav2vec2-large and xlsr). So, it would be interesting to investigate if small pre-trained models can also be pruned and whether the observation holds for them

In this hacker role, we explore this idea and try to perform the pruning + finetuning on two models -

  1. whisper-tiny - 37.8 M parameters
  2. wave2vec-base - 94.4 M parameters

Instructions

  • Download the TIMIT dataset from here, untar it, and extract it in data directory
  • Set up your huggingface login token (with write permissions) and use it when prompted during running the code
  • The notebooks - parp.ipynb and whisper_tiny.ipynb contain the pipeline starting from pruning the models to training them, for wav2vec-base and whisper-tiny respectively
  • The training took too much time, so you can run the python scripts for training as -
    huggingface-cli login --token $HUGGINGFACE_TOKEN --add-to-git-credential
    
    python3 train.py
    
    train.py is for wav2vec-base and w_train.py for whisper-tiny
  • Note that converting the pipeline to whisper took significant effort because authors initially used wav2vec models, and still during last eval step in re-training, the tokenizer's batch-decode fails
  • The detailed result for wav2vec-base can be seen on https://huggingface.co/atishayj25/parp-wave2vec, where the final version got 0.34 WER after 2500 steps

Pruning Method

  • In the paper, they discovered that subnetworks identified through both task-specific and task-agnostic pruning methods share significant similarities in certain aspects.
  • Specifically, they used different initialization techniques for different scenarios, such as using MPI on wav2vec2 and xlsr or OMP on a different spoken language for H2L and CSR, and MPI on wav2vec2 for LSR.
  • We used pruning_method = prune.L1Unstructured for pruning which means prune (currently unpruned) units in a tensor by zeroing out the ones with the lowest L1-norm, that is just using unstructured magnitude pruning as mentioed in the image above.

Dataset

The Dataset which we used for training is TIMIT, which is a corpus of read speech designed to provide speech data for acoustic-phonetic studies and for the development and evaluation of ASR systems. The Timit dataset can be downloaded from here

References

parp-asr's People

Contributors

atishay25 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.