Similar to structure CNNs, we do have structure RNNs to encode structural inductive bias especially in NLP (trees, DAG, etc.).
Here are some models:
Recursive Neural Networks: Parsing Natural Scenes and Natural Language
with Recursive Neural Networks
Tree LSTM: Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
There are couples of variants of RNNs that should be studied:
Speed: how to decouple the recurrent relationships, by introducing different strategies (some are borrowed from CNNs)
SRU: Simple Recurrent Units for Highly Parallelizable Recurrence (EMNLP18). They claimed that SRU achieves 5–9x speed-up over cuDNN-optimized LSTM.
Quasi-RNN: Quasi-Recurrent Neural Networks (ICLR17). They claimed that QRNN is 16 times faster than LSTMs at train and test time.
SRNN: Sliced Recurrent Neural Networks (COLING2018). They claimed that SRNNs are 136 times as fast as standard RNNs and could be even faster when we train longer sequences.
Arch:
NASCell: NEURAL ARCHITECTURE SEARCH WITH REINFORCEMENT LEARNING (ICLR17). Using NAS to learn the architecture.
RCRN: Recurrently Controlled Recurrent Networks (NIPS18). Learn the recurrent gating functions using recurrent networks.
In section Variants of GNN and Propogation Steps, some concepts just pop up from nowhere. What is Relational-GCN, could you give some explanations?
This problem is also applicable to the image attached here. It is a nice picture, but some keep concepts needed to be clarify. For example, what is aggregator? what is updator? what is spectral method? what is spacial? For the models mentioned in the picture, should you at least mention where they come from (conference+year)? what components inside this model and which tasks they are trying to dealing with (roughly)?
Another topic I would like to investigate (as always) is the convolution over structural data since language has structure.
Related works are Tree-structured Convolution and GCNs. Since we will compare CNNs with RNNs, Tree-LSTM, Recursive NN and GGNNs (GRNs) can be discussed in the same time.
In this section, the note mentioned GNN include GCN, GCNN, GAT, Gated GNN, Graph LSTM. None of these models have been explained or compare. What is the point to just indicate the names of them?
Also, I highly doubt that Gated GNN and Graph LSTM belongs to the GCN family. I think these models should be classified into the upper GNN family.
There are 2 EMNLP18 papers empirically compare there networks:
Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures.
They use subject-verb agreement to test the LRD and word sense disambiguation to test the semantic features.
Results show that: 1) self-attentional networks and CNNs do not outperform RNNs in modeling subject-verb agreement over long distances; 2) self-attentional networks perform distinctly better than RNNs and CNNs on word sense disambiguation.
The Importance of Being Recurrent for Modeling Hierarchical Structure.
They choose two subject-verb agreement to test the ability to capture syntactic dependencies, and logical inference to compare tree-based NNs against sequence-based NNs with respect to their ability to exploit hierarchical structures. Both models have to exploit hierarchical structural features.
Conclusion: recurrency is indeed important to model hierarchical structure.
They introduced different classic CNN architectures from LeNet, AlexNet, VGG, GoogleNet, NiN, ResNet to DenseNet and their corresponding implementations. From my perspective, I think to understand these important work of CNNs is beneficial to us if we want to design our own architecture.