Giter Club home page Giter Club logo

motionmamba's Introduction

Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM

Zeyu Zhang*, Akide Liu*, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tangโœ‰

*Equal Contribution โœ‰Corresponding author: [email protected]

Website arXiv Papers With Code Hugging Face BibTeX

Human motion generation stands as a significant pursuit in generative computer vision, while achieving long-sequence and efficient motion generation remains challenging. Recent advancements in state space models (SSMs), notably Mamba, have showcased considerable promise in long sequence modeling with an efficient hardware-aware design, which appears to be a promising direction to build motion generation model upon it. Nevertheless, adapting SSMs to motion generation faces hurdles since the lack of a specialized design architecture to model motion sequence. To address these challenges, we propose Motion Mamba, a simple and efficient approach that presents the pioneering motion generation model utilized SSMs. Specifically, we design a Hierarchical Temporal Mamba (HTM) block to process temporal data by ensemble varying numbers of isolated SSM modules across a symmetric U-Net architecture aimed at preserving motion consistency between frames. We also design a Bidirectional Spatial Mamba (BSM) block to bidirectionally process latent poses, to enhance accurate motion generation within a temporal frame. Our proposed method achieves up to 50% FID improvement and up to 4 times faster on the HumanML3D and KIT-ML datasets compared to the previous best diffusion-based method, which demonstrates strong capabilities of high-quality long sequence motion modeling and real-time human motion generation.

News

(3/15/2024) ๐ŸŽ‰ Our paper has been promoted by MarkTechPost!

(3/13/2024) ๐ŸŽ‰ Our paper has been featured in Daily Papers!

(3/13/2024) ๐ŸŽ‰ Our paper has been promoted by CVer!

Citation

@article{zhang2024motion,
  title={Motion Mamba: Efficient and Long Sequence Motion Generation with Hierarchical and Bidirectional Selective SSM},
  author={Zhang, Zeyu and Liu, Akide and Reid, Ian and Hartley, Richard and Zhuang, Bohan and Tang, Hao},
  journal={arXiv preprint arXiv:2403.07487},
  year={2024}
}

Acknowledgements

motionmamba's People

Contributors

akideliu avatar steve-zeyu-zhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

motionmamba's Issues

Clarification regarding "memory matrices"

Dear Authors,

Thank you for your excellent efforts. I was finding it difficult to understand the role of Memory Matrices in the HTM module.

This step utilizes a hierarchically structured set of scans, $K =$ { $S^{2N_nโˆ’1}, S^{2N_nโˆ’1โˆ’1}, . . . , S^1$ }, in conjunction with a corresponding series of memory matrices { $A_1, . . . , A_k$ }. Each sub-SSM scan first applies a 1-D convolution to x, resulting in $x^\prime_o$. $x^\prime_o$ is then linearly projected to derive $B_o$, $C_o$, and $\Delta_o$. These projections $B_o$, $C_o$ use $\Delta_o$ to effect transformations in $A_o$ and $B_o$ , respectively. After executing a sequence of SSM scans { $SSM_{A_1,x}, SSM_{A_2,x}, . . . , SSM_{A_k ,x}$ }, a set of outputs { $O_1, . . . , O_k$ } is compiled.

I would appreciate it if you could provide some info regarding what the memory matrices { $A_1, . . . , A_k$ } do and how they are used by $SSM_{A_i,x}$.

Request for Motion Mamba code

Dear authors,

thank you for your valuable work. Together with my team, we are trying to better understand some of the mechanics used in the paper. Could you please indicate if and when your code will be made available?

Best regards,
Simone Facchiano (La Sapienza University of Rome)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.