Giter Club home page Giter Club logo

transformerx's People

Contributors

danielemurgolo avatar sigma1326 avatar soran-ghaderi avatar valanm22 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

transformerx's Issues

AddNorm advanced features

  • normalization: batch normalization, instance normalization, and layer normalization.
  • specify an epsilon value for numerical stability in the normalization layer.
  • residual connection with dropout.
  • different activation functions.
  • regularization of the kernel and bias weights.
  • normalization techniques that can help to stabilize the training process and improve the performance of deep neural networks.

New residual and residual gate layers

A list of new embedding layers to be added

To contribute, please:

  1. Create a new issue, copy an available (not closed and not opened by someone else) subtask name and paste it in the title followed by the reference of the task. (ex. subtask_name #source_issue_number). Make sure the subtask is not already opened by someone else. Another way is to hover over the subtasks and click on the Open convert to issue
  2. Copy and paste your new issue link here in the comments.
  3. Fork the repository
  4. Add your changes
  5. Create a pull request and mention your issue link
  • Residual
  • GRU Gating
  • Inverted Residual Block
  • Bottleneck Residual Block

Fix bug in `test_main` module

When running running test_main module, I encountered this error:

test_main.py:42: in <module> encoder_blk = TransformerEncoderBlock(24, 24, 24, 24, norm_shape, 48, 8, 0.5) E TypeError: TransformerEncoderBlock.__init__() takes from 6 to 7 positional arguments but 9 were given

It seems invalid and needs to be fixed.

Readme typo

The JAX in the roadmap is written as JAZ!

Refactor softmax_attention

Separate the masked_softmax and sequence_mask functions. Also add support for calibration using temperature scaling

New attention masking layers

Description: We need to implement several new attention masking layers in our model to improve its performance on specific tasks. The following masking layers need to be implemented:

  • Global
  • Block local
  • Band
  • #87
  • Random
  • Compound
  • axial

It is important to carefully consider the design and implementation of these masking layers to ensure they are effective and efficient.

Deadline for each layer: 2 weeks after opening the issue. After the deadline, the issue opened will be closed to make it available for other contributors.

Fix `__call__` method overriding in layer package

Most of the base classes in layers package have a method like def call(self, X, state, **kwargs):. These classes are designed to be callable, However, this syntax is incorrect and needs to be fixed.

Documentation

Document the layers API

For contributing, please:

  1. Create a new issue, copy an available (not closed and not opened by someone else) subtask name and paste it in the title followed by the reference of the task. (ex. subtask_name #source_issue_number). Make sure the subtask is not already opened by someone else.
  2. Copy and paste your new issue link here in the comments.
  3. Fork the repository
  4. Add your changes
  5. Create a pull request and mention your issue link

pyproject.toml license file

The License name should be modified so that the license should be derived from the project instead of manually.

PositionWiseFFN advanced features

The current PositionWiseFFN class is a simple implementation of a feed-forward neural network that operates on the feature dimension of the input tensor. Here are other more advanced options and features to improve or modify the implementation:

  • activation functions
  • initialization
  • non-linear projection
  • contextualized embeddings
  • dropout

Test cases for the layers

Test cases for the following layers:


For contributing to the project, please follow these steps:

  1. First, check if the subtask you want to work on is available by looking at the open issues in the repository. If it is already assigned to someone else or has been closed, please choose a different subtask.
  2. Once you have selected a subtask, create a new issue with the number of the issue as its title (e.g. "#1234").
  3. Copy the link to your newly created issue and paste it into the comments section below.
  4. Next, fork the repository and make the necessary changes to complete the subtask.
  5. Once you are satisfied with your changes, create a pull request and mention the issue link in the description so that your changes can be reviewed and merged.

Refactor implementation - AbsolutePositionalEncoding class

  • The sin and cos functions are being called from both numpy and tensorflow. This can cause errors if the types are not properly converted between the two libraries. In this case, it's better to just use the tf.sin and tf.cos functions. Also, less conversion leads to less time and space complexity.

  • The P matrix is created using numpy and then converted to a tensorflow tensor using tf.convert_to_tensor. This is unnecessary, as the P matrix can be created directly as a tensorflow tensor using tf.sin and tf.cos.

  • The tf.keras.layers.Dropout layer is used to apply dropout to the entire X tensor, including the positional encoding. This is not ideal, as the positional encoding is meant to be fixed and not subject to dropout. Instead, the dropout layer should be applied only to the input features.

AddNorm

Test cases for this layer.

Handle input arguments and raise exceptions

Handle the inputs and raise issues in case inappropriate was data provided.

For contributing, please:

  1. Create a new issue, copy an available (not closed and not opened by someone else) subtask name and paste it in the title followed by the reference of the task. (ex. subtask_name #source_issue_number). Make sure the subtask is not already opened by someone else.
  2. Copy and paste your new issue link here in the comments.
  3. Fork the repository
  4. Add your changes
  5. Create a pull request and mention your issue link
  • #34
  • #37
  • DotProductAttention
  • TransformerEncoder
  • TransformerEncoderBlock
  • TransformerDecoder
  • TransformerDecoderBlock
  • PositionalEncoding
  • PositionwiseFFN

New feedforward layers

A list of new embedding layers to be added

To contribute, please:

  1. Create a new issue, copy an available (not closed and not opened by someone else) subtask name and paste it in the title followed by the reference of the task. (ex. subtask_name #source_issue_number). Make sure the subtask is not already opened by someone else. Another way is to hover over the subtasks and click on the Open convert to issue
  2. Copy and paste your new issue link here in the comments.
  3. Fork the repository
  4. Add your changes
  5. Create a pull request and mention your issue link
  • GLU

New embedding layers

A list of new embedding layers to be added

To contribute, please:

  1. Create a new issue, copy an available (not closed and not opened by someone else) subtask name and paste it in the title followed by the reference of the task. (ex. subtask_name #source_issue_number). Make sure the subtask is not already opened by someone else. Another way is to hover over the subtasks and click on the Open convert to issue
  2. Copy and paste your new issue link here in the comments.
  3. Fork the repository
  4. Add your changes
  5. Create a pull request and mention your issue link

Embedding:

  • TokenEmbedding

Positional embedding:

  • #47
  • Fixed Positional Embedding
  • Relative Positional Embedding
  • Dynamic Position Bias
  • Alibi Positional Bias https://arxiv.org/pdf/2108.12409v2.pdf
  • Learned Alibi PositionalBias
  • Rotary Embedding
  • Conditional Positional Encoding

Please refer to the following useful links:

New attention layers

A list of new attention layers to be added

You can pick any of the following frameworks and implement your layers:
Tensorflow
Pytorch
JAX
Numpy

To contribute, please:

  1. Create a new issue, copy an available (not closed and not opened by someone else) subtask name and paste it in the title followed by the reference of the task. (ex. subtask_name #source_issue_number). Make sure the subtask is not already opened by someone else. Another way is to hover over the subtasks and click on the Open convert to issue
  2. Copy and paste your new issue link here in the comments.
  3. Fork the repository
  4. Add your changes
  5. Create a pull request and mention your issue link

Please note that we use Numpy style for the docstrings and provide examples as much as possible!
Also please try to provide unit tests using Pytest

Whenever you need help with implementation or any other related issues you might be dealing with, please reach out to me via discussions or Discord server community @soran-ghaderi or @sigma1326

  • Strided Attention
  • Fixed Factorized Attention
  • Additive Attention
  • RAN
  • RAM
  • STN
  • Temporal Attention
  • Channel Attention
  • Axial Attention
  • Sliding Window Attention
  • Global And Sliding Window Attention
  • Dilated Sliding Window Attention
  • Dynamic Convolution
  • Content-Based Attention
  • Global-Local Attention
  • Attention Gate
  • Class Attention
  • Location-Based Attention
  • Channel-Wise Soft Attention
  • FAVOR+
  • Disentangled Attention Mechanism
  • Location Sensitive Attention
  • LSH Attenention
  • TAM
  • SRM
  • BAM
  • Set Transformer
  • Coordinate Attention
  • BigBird
  • Rendezvous
  • Adaptive Masking
  • DANet
  • Bi-Attention
  • RGA
  • SEAM
  • SPNet
  • DMA
  • GALA
  • Neighborhood Attention
  • Channel Squeeze And Spatial Excitation
  • GCT
  • Routing Attention
  • Cross-Covariance Attention
  • 3D SA
  • Sparse Sinkhorn Attention
  • Concurrent Spatial And Channel Squeeze And Excitation
  • Deformable ConvNets
  • SCA-CNN
  • Channel And Spatial Attention
  • Locally-Grouped Self-Attention
  • Class ActivationGuided Attention Mechanism
  • Factorized Dense Synthesized Attention
  • HyperHyperNetwork
  • ProCAN
  • scSE
  • MHMA
  • Branch Attention

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.