Giter Club home page Giter Club logo

dlrl-playground's Introduction

DL&RL_PlayGround

😄DL&RL游乐园

  • The code repo contains multiple code reproduction processes of various SOTA deep learning algorithms.
  • 这个代码仓库包含了经典、热门、以及SOTA的DL/RL算法的复现。
  • 我会在业余时间,对感兴趣的DL/RL算法进行复现,并第一时间更新维护该Repo。

关键依赖项

  • transformers
    • pip install transformers
  • torch 1.10.1
  • torchvision 0.11.2
  • torchtext 0.11.1

I. Seq2Seq(TO-DO)

II. Transformer Basic

① Step By Step之Transformer基础实现

III. VIT

① Transformers包中的VIT-Demo

  • 该接口可以直接调用预训练的VIT模型对给定图片进行分类。
  • VIT.py脚本

② Step By Step之Transformer基础实现

  • 项目跳转链接🔗: Step By Step之Transformer基础实现
  • VIT是Transformer在CV图片分类种的一种应用,VIT的实验结论是,在预训练Dataset足够大的前提下,所有数据集的表现是超过ResNet的。
  • VIT的本质是一个Transformer的Encoder网络。
  • 🚀️ Colab

③ Pre-trained VIT

  • 项目跳转链接🔗: Pre-trained VIT
  • 基于 ViT-B_16预训练模型 + VIT Model

IV. Swin Transformer

① Step By Step 之 Swin Transformer实现

  • 项目跳转链接🔗:Swin Transformer
  • Swin Transformer 被视为CNN的理想替代方案,其在设计时也融合了很多CNN的**。
  • Swin Transformer 结合CNN**,引入层次化构建方式构建层次化的Transformer,使得SwinT可以做层级式的特征提取(方便下游多尺度的检测、分割任务)。证明了Swin Transformer可以作为通用的视觉任务Backbone网络。
  • 详情:知乎: DLPlayGround之Swin-Transformer(v1)

V. Meta Learning Part(TO-DO)

VI. Offline RL

① Offline RL Introduction

② Decision Transformer

DT将RL看成一个序列建模问题(Sequence Modeling Problem ),不用传统RL方法,而使用网络直接输出动作进行决策。

③ BCQ

Batch-Constrained deep Q- Learning(BCQ)

  • 优化Value函数时候加入future uncertainty的衡量;
  • 加入了距离限制,通过state-conditioned generative model完成;
  • Q网络选择最高价值的动作;
  • 在价值更新时候,利用Double Q的估计取soft minimum; $r+\gamma max_{a_i}[\lambda min_{j=1,2}Q_{\theta' j}(s',a_i)+(1-\lambda)max {j=1,2}Q_{\theta'_j}(s',a_i)$ 是Convex Combination 而不是 Hard Minimum ...
  • 项目跳转链接🔗: BCQ

④ AWAC

关键点:

  • Trains well offline
  • Fine-tunes quickly online
  • Does not need to estimate a behavior model.
  • 项目跳转链接🔗: AWAC

Distributional RL

① C51

  • 项目跳转链接🔗: C51

② D4PG

Distributed Distributional Determinisitic Policy Gradient (D4PG)

D4PG将经验收集的Actor和策略学习的Learner分开:

  • 使用多个并行的Actor进行数据收集,即分布式的采样;
  • 分享一个大的经验数据缓存区,发送给Learner进行学习,Learner从Buffer中采样,将更新后的权重在同步到各个Actor上(ApeX);
  • 使用TD(N-steps)的方式进行处理,减小Bias;
  • 可以使用PER技术(优先经验回放);
  • Critic Net -- C51-based method.
  • 项目跳转链接🔗: D4PG

dlrl-playground's People

Contributors

hzcirving avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

gmr523

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.