β Hello World
Β Hi there! I'm moon, somebody who's focusing on solving problems through artificial intelligence. AI has many subcategories, of which I find Natural Language Processing the most interesting. By profession, I'm a NLP Machine Learning Engineer. As an engineer, I aim to develop a model that can communicate naturally with people. So the codes in all my repos will contain the progress towards the goal. In addition to the codes in my git repos, reviews of the papers and personal research on artificial intelligence techniques are recorded on my notion page. If you would like to contact me, please contact me at the email address in the left information field.
π€ Model Architecture
Β Model architecture is a crucial element in machine learning engineering. The choice of model architecture can significantly impact performance. A series of projects, concentrating on model architecture, are presented below to establish standards for the suitable model structures in three NLG tasks: Translation, Dialogue Generation, and Summarization.
β β’ β RNN Seq2Seq β β β β β β β β Β
β’ β RNN Seq2Seq with Attention β β β β β
β’ β Transformer
β β’ β Transformer Variants β β β β β
β’ β Encoder Decoder Balance β β β β β β
β’ β PLM Fusion
πββοΈ Training Strategy
Β In the typical training process of a Seq2Seq model for Natural Language Generation, the issue of 'Exposure Bias' and the discrepancy between training and inference inevitably arises. The most ideal solution is to train the model on a large and diverse dataset, but in reality, this is a challenging endeavor. To overcome these constraints and enhance training effectiveness, several training strategies are proposed below. Among these,
Auxiliary Training
andScheduled Sampling
aim to make the most of GPU parallel processing while facilitating complementary learning. On the other hand,Generative Training
andSeqGAN Training
may have lower training efficiency but serve as strategies to extract maximum performance in extremely data-restricted environments.
β β’ β Auxiliary Training β β β β’ β Scheduled Sampling β β β β’ β Pre Training β β β β β’ β Generative Training β β β β’ β SeqGAN
β° Toward Efficiency
Β Large-scale models with numerous parameters tend to deliver better performance. Many recent research focus on training even larger models on extensive datasets to achieve superior results. However, deploying such large-scale models in typical computing environments can be restrictive. To address this issue, the following project introduces an efficient approach that maintains a certain level of performance while mitigating computational demands.
β β’ β Efficient Training β β β β β β’ β Efficient PreTrained Language Models β β β β β β’ β Param Efficient Fine-Tuning
π Neural Machine Translation
Β Machine translation is the task of converting Text from Source Language into Target Language using a computer processing. The hegemony of machine translation was Rule-Based at the earliest, followed by SMT, and now NMT has been established. NMT aims to derive more accurate and natural translation results using Neural Networks. Below are experiments of various Neural Network Architectures for this purpose.
β β’ β Back Translation β β β β’ β Multi-Lingual Translation β β β β’ β Code Translation β β β β’ β Machine Translation Blend
π£οΈ Dialogue Generation
Β Dialogue Generation is a task to generate a response to a previous utterance, just like humans do in a conversational situation. However, it is very difficult for the model to understand the flow of the conversation and return appropriate answers. Below are a set of experiments to generate more natural responses like humans do.
β β’ β Characteristic Dialogue β β β β’ β Utilize SimEnt β β β β’ β Multi-Turn Dialgue β β β β’ β Dialgue Generation Blend
π Abstract Text Summarization
Β Summarization Task summarizes long text into short sentences through Neural Networks, and the task can be devided into Extractive and Abstractive methods. Extractive Summarization selects key sentences from original text to make summary, whereas Abstractive Summarization creates a new summary sentence through the model's decoder. The experiments below mainly deal with Abstractive summary tasks.
β β’ Hierarchical Encoder β β β β β β’ Sparse Attention β β β β β β’ Summarization Blend