Giter Club home page Giter Club logo

conformer's Introduction

Conformer

Implementation of the convolutional module from the Conformer paper, for improving the local inductive bias in Transformers.

Install

$ pip install conformer

Usage

The Conformer convolutional module, the main novelty of the paper

import torch
from conformer import ConformerConvModule

layer = ConformerConvModule(
    dim = 512,
    causal = False,             # auto-regressive or not - 1d conv will be made causal with padding if so
    expansion_factor = 2,       # what multiple of the dimension to expand for the depthwise convolution
    kernel_size = 31,           # kernel size, 17 - 31 was said to be optimal
    dropout = 0.                # dropout at the very end
)

x = torch.randn(1, 1024, 512)
x = layer(x) + x

1 Conformer Block

import torch
from conformer import ConformerBlock

block = ConformerBlock(
    dim = 512,
    dim_head = 64,
    heads = 8,
    ff_mult = 4,
    conv_expansion_factor = 2,
    conv_kernel_size = 31,
    attn_dropout = 0.,
    ff_dropout = 0.,
    conv_dropout = 0.
)

x = torch.randn(1, 1024, 512)

block(x) # (1, 1024, 512)

Conformer - just multiple ConformerBlock from above

import torch
from conformer import Conformer

conformer = Conformer(
    dim = 512,
    depth = 12,          # 12 blocks
    dim_head = 64,
    heads = 8,
    ff_mult = 4,
    conv_expansion_factor = 2,
    conv_kernel_size = 31,
    attn_dropout = 0.,
    ff_dropout = 0.,
    conv_dropout = 0.
)

x = torch.randn(1, 1024, 512)

conformer(x) # (1, 1024, 512)

Todo

  • switch to a better relative positional encoding. shaw's is dated
  • flash attention with a better RPE

Citations

@misc{gulati2020conformer,
    title   = {Conformer: Convolution-augmented Transformer for Speech Recognition},
    author  = {Anmol Gulati and James Qin and Chung-Cheng Chiu and Niki Parmar and Yu Zhang and Jiahui Yu and Wei Han and Shibo Wang and Zhengdong Zhang and Yonghui Wu and Ruoming Pang},
    year    = {2020},
    eprint  = {2005.08100},
    archivePrefix = {arXiv},
    primaryClass = {eess.AS}
}

conformer's People

Contributors

lucidrains avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

conformer's Issues

Can you add the training process.

I am new in speech processing and cannot understand how to train this model. if you could add it will be great for starting point

Conformer encoder model architecture

Hi @lucidrains, here's Conformer encoder model architecture, it has a few other Modules:

image

I have not started working on it, should I do a PR to include Multi-Headed Self-Attention Module and Feed forward module?

image

image

Default expansion factor of 2 for inner dimension in conv module not present in other implementations

Hey lucidrains, firstly thanks for providing this implementation!

I've noticed in the ConformerConvModule the inner dimension is increased by an expansion_factor which defaults to 2
inner_dim = dim * expansion_factor
In the implementations by espnet (https://espnet.github.io/espnet/_modules/espnet/nets/pytorch_backend/conformer/convolution.html) and nvidia (https://github.com/NVIDIA/NeMo/blob/94a464fc4eb2927140940cc835a0ab69ee0347b5/nemo/collections/asr/parts/submodules/conformer_modules.py) the dimensions of the convolutional module are the same as the encoder dimensions i.e no expansion.

Obviously this just defaults to two and is configurable but - I'm wondering whether this is unintentionally the default, or if you think/have found this to be a good use of parameters?

the conformer paper is a bit sparse on details but I think the expansion factor they refer to in there corresponds to the out_channels for the pointwise convolution, which is also applied here i.e
nn.Conv1d(dim, inner_dim * 2, 1),

Thanks ! (:

'context' in Attention Module

Hi, thanks for releasing this code. I'm enjoying playing with this.

Could you please explain what is 'context' in your Attention module, if possible? link.

I guess it has a similar role to the previous hidden state for reuse in Transformer-XL paper link (which is explained in Section 3.2), but the way you implemented is little different.

Shorter and more einops-y?

@lucidrains I enjoy your super-readable implementations, and I think we can make them ever simpler.
Here are a couple of suggestions:

Einops provides layers (einops.layers.torch) which could make it more verbose:

# was 
Transpose((1, 2))
<convolutional part>
Transpose((1, 2))

# now
Rearrange('b token c -> b c token'),
<convolutional part>
Rearrange('b c token -> b token c'),

This part of code looks a bit over-engineered:

q, k, v = (self.to_q(x), *self.to_kv(context).chunk(2, dim = -1))
q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> b h n d', h = h), (q, k, v))

How about using a new layer from einops:

self.to_qkv = WeightedEinsum('b n c -> qkv b h n d', qkv=3, h=heads, d=inner_dim//heads)
# then in forward
q, k, v = self.to_qkv(x)

This change will also remove need to store/pass heads info and will remove 2-4 more lines.

Sounds like a great place to use einsum? Still need add one None-axis afterwards, but I'd give it a go to make it obvious

mask = mask[:, None, :, None] * context_mask[:, None, None, :]

PS. Actually enjoyed reading the code. Like read a paper in several minutes :)

Attribute error in conformer.py

Hello Lucidrains, thank you for your wonderful work!

When I followed the usage of README by:

import torch
from conformer import ConformerBlock

block = ConformerBlock(
    dim = 512,
    dim_head = 64,
    heads = 8,
    ff_mult = 4,
    conv_expansion_factor = 2,
    conv_kernel_size = 31,
    attn_dropout = 0.,
    ff_dropout = 0.,
    conv_dropout = 0.
)

x = torch.randn(1, 1024, 512)
block(x) # (1, 1024, 512)

Some Attribute Error happened:

Traceback (most recent call last):
  File "/home/xxx/ASR/lucidrains-conformer/try.py", line 17, in <module>
    block(x) # (1, 1024, 512)
  File "/home/xxx/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xxx/ASR/lucidrains-conformer/conformer/conformer.py", line 197, in forward
    x = self.attn(x, mask = mask) + x
  File "/home/xxx/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xxx/ASR/lucidrains-conformer/conformer/conformer.py", line 64, in forward
    return self.fn(x, **kwargs)
  File "/home/xxx/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/xxx/ASR/lucidrains-conformer/conformer/conformer.py", line 100, in forward
    dist = dist.clip(-max_pos_emb, max_pos_emb) + max_pos_emb
AttributeError: 'Tensor' object has no attribute 'clip'

May I ask what the problem is?

Thank you very much for your help.

Architecture not fully consistent with article

Hi
I think there are two smal mistakes ni your implementation.

1- The feed forward module starts with a normalization layer which is omitted in your code. Furthermore, the module is more the module as your code it plus the identity. That's done in the conformer block, but maybe is can be more consistent to do it directly in the FFN ?
2- In the conformer block, the article mentionned that the MHSA is sandwitched between two half-step feed-forward modules, not full modules

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.