moon23k,github

✋ Hello World

Hi there! I'm moon, somebody who's focusing on solving problems through artificial intelligence. AI has many subcategories, of which I find Natural Language Processing the most interesting. By profession, I'm a NLP Machine Learning Engineer. As an engineer, I aim to develop a model that can communicate naturally with people. So the codes in all my repos will contain the progress towards the goal. In addition to the codes in my git repos, reviews of the papers and personal research on artificial intelligence techniques are recorded on my notion page. If you would like to contact me, please contact me at the email address in the left information field.

🤖 Model Architecture

Model architecture is a crucial element in machine learning engineering. The choice of model architecture can significantly impact performance. A series of projects, concentrating on model architecture, are presented below to establish standards for the suitable model structures in three NLG tasks: Translation, Dialogue Generation, and Summarization.

• RNN Seq2Seq • RNN Seq2Seq with Attention • Transformer

• Transformer Variants • Encoder Decoder Balance • PLM Fusion

🏃‍♂️ Training Strategy

In the typical training process of a Seq2Seq model for Natural Language Generation, the issue of 'Exposure Bias' and the discrepancy between training and inference inevitably arises. The most ideal solution is to train the model on a large and diverse dataset, but in reality, this is a challenging endeavor. To overcome these constraints and enhance training effectiveness, several training strategies are proposed below. Among these, Auxiliary Training and Scheduled Sampling aim to make the most of GPU parallel processing while facilitating complementary learning. On the other hand, Generative Training and SeqGAN Training may have lower training efficiency but serve as strategies to extract maximum performance in extremely data-restricted environments.

• Auxiliary Training • Scheduled Sampling • Pre Training • Generative Training • SeqGAN

⏰ Toward Efficiency

Large-scale models with numerous parameters tend to deliver better performance. Many recent research focus on training even larger models on extensive datasets to achieve superior results. However, deploying such large-scale models in typical computing environments can be restrictive. To address this issue, the following project introduces an efficient approach that maintains a certain level of performance while mitigating computational demands.

• Efficient Training • Efficient PreTrained Language Models • Param Efficient Fine-Tuning

🔄 Neural Machine Translation

Machine translation is the task of converting Text from Source Language into Target Language using a computer processing. The hegemony of machine translation was Rule-Based at the earliest, followed by SMT, and now NMT has been established. NMT aims to derive more accurate and natural translation results using Neural Networks. Below are experiments of various Neural Network Architectures for this purpose.

• Back Translation • Multi-Lingual Translation • Code Translation • Machine Translation Blend

🗣️ Dialogue Generation

Dialogue Generation is a task to generate a response to a previous utterance, just like humans do in a conversational situation. However, it is very difficult for the model to understand the flow of the conversation and return appropriate answers. Below are a set of experiments to generate more natural responses like humans do.

• Characteristic Dialogue • Utilize SimEnt • Multi-Turn Dialgue • Dialgue Generation Blend

📝 Abstract Text Summarization

Summarization Task summarizes long text into short sentences through Neural Networks, and the task can be devided into Extractive and Abstractive methods. Extractive Summarization selects key sentences from original text to make summary, whereas Abstractive Summarization creates a new summary sentence through the model's decoder. The experiments below mainly deal with Abstractive summary tasks.

• Hierarchical Encoder • Sparse Attention • Summarization Blend

moon23k Goto Github PK

✋ Hello World

🤖 Model Architecture

🏃‍♂️ Training Strategy

⏰ Toward Efficiency

🔄 Neural Machine Translation

🗣️ Dialogue Generation

📝 Abstract Text Summarization

moon23k's Projects

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent