Giter Club home page Giter Club logo

statnlp_fundamental_reading's Introduction

statnlp fundamental reading

Intro

This is a reading group for both fundamental and cutting edge NLP research.

Course Materials

Reading Materials

Links

statnlp_fundamental_reading's People

Contributors

raleighz avatar nanguoshun avatar cartus avatar xuuuluuu avatar

Stargazers

Hong Pengfei avatar  avatar Bruno Cabral avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

statnlp_fundamental_reading's Issues

Structure RNN

Similar to structure CNNs, we do have structure RNNs to encode structural inductive bias especially in NLP (trees, DAG, etc.).

Here are some models:

Recursive Neural Networks: Parsing Natural Scenes and Natural Language
with Recursive Neural Networks
Tree LSTM: Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks

Dilated Convolution related papers

I think Dilated Convolution is a very important variant of vanilla convolution. We should investigate more about this. Here are some papers:

  1. Fast and Accurate Entity Recognition with Iterated Dilated Convolutions
  2. Neural Machine Translation in Linear Time (ByteNet)
  3. WAVENET: A GENERATIVE MODEL FOR RAW AUDIO.

Also, the difference between Deconvolution and Dilated Convolution is worth to be discussed.

1×1 convolutional layer

So Boyuan propose an unanswered question: if 1×1 convolutional layer can be considered as a fully-connected layer?

The answer is yes based on the description here: https://d2l.ai/chapter_convolutional-neural-networks/channels.html

You can refer to section 6.4.3.

To sum up:

  1. The 1×1 convolutional layer is equivalent to the fully-connected layer when applied on a per pixel basis.
  2. The 1×1 convolutional layer is typically used to adjust the number of channels between network layers and to control model complexity.

CNN ARCH

some question for cnn architecture

  • The definition of and underlying meaning of spatial resolution

  • Stride is not well-explained in note

  • What do you mean by “In some cases, we want to reduce the resolution drastically, if say we find our originall input resolution to be unwieldy.”

  • “odd height and width” ?

  • The "channel" definition, in filter and input image

RNNs Variants

There are couples of variants of RNNs that should be studied:

Speed: how to decouple the recurrent relationships, by introducing different strategies (some are borrowed from CNNs)
SRU: Simple Recurrent Units for Highly Parallelizable Recurrence (EMNLP18). They claimed that SRU achieves 5–9x speed-up over cuDNN-optimized LSTM.
Quasi-RNN: Quasi-Recurrent Neural Networks (ICLR17). They claimed that QRNN is 16 times faster than LSTMs at train and test time.
SRNN: Sliced Recurrent Neural Networks (COLING2018). They claimed that SRNNs are 136 times as fast as standard RNNs and could be even faster when we train longer sequences.

Arch:
NASCell: NEURAL ARCHITECTURE SEARCH WITH REINFORCEMENT LEARNING (ICLR17). Using NAS to learn the architecture.
RCRN: Recurrently Controlled Recurrent Networks (NIPS18). Learn the recurrent gating functions using recurrent networks.

Residue Connection

Is there any modification about adding residue connection instead of Highway in LSTM?
If not why?

Question after reading this note

Key Concepts in GNN

In section Variants of GNN and Propogation Steps, some concepts just pop up from nowhere. What is Relational-GCN, could you give some explanations?

This problem is also applicable to the image attached here. It is a nice picture, but some keep concepts needed to be clarify. For example, what is aggregator? what is updator? what is spectral method? what is spacial? For the models mentioned in the picture, should you at least mention where they come from (conference+year)? what components inside this model and which tasks they are trying to dealing with (roughly)?

Structured Convolution

Another topic I would like to investigate (as always) is the convolution over structural data since language has structure.

Related works are Tree-structured Convolution and GCNs. Since we will compare CNNs with RNNs, Tree-LSTM, Recursive NN and GGNNs (GRNs) can be discussed in the same time.

Variants of GNN

In this section, the note mentioned GNN include GCN, GCNN, GAT, Gated GNN, Graph LSTM. None of these models have been explained or compare. What is the point to just indicate the names of them?

Also, I highly doubt that Gated GNN and Graph LSTM belongs to the GCN family. I think these models should be classified into the upper GNN family.

LRP(long-range dependency) evaluation between CNN, RNN and TF

There are 2 EMNLP18 papers empirically compare there networks:

  1. Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures.
    They use subject-verb agreement to test the LRD and word sense disambiguation to test the semantic features.
    Results show that: 1) self-attentional networks and CNNs do not outperform RNNs in modeling subject-verb agreement over long distances; 2) self-attentional networks perform distinctly better than RNNs and CNNs on word sense disambiguation.

  2. The Importance of Being Recurrent for Modeling Hierarchical Structure.
    They choose two subject-verb agreement to test the ability to capture syntactic dependencies, and logical inference to compare tree-based NNs against sequence-based NNs with respect to their ability to exploit hierarchical structures. Both models have to exploit hierarchical structural features.
    Conclusion: recurrency is indeed important to model hierarchical structure.

Language

we should use English rather than Chinese

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.