Comments (4)
I suspect this is an XLA compile time bug. Since omnistaging stages out bigger programs to XLA, it has a tendency to hit more compile time issues.
I think the best next step is to open a bug with the XLA team so that they can investigate. In some cases they can fix the issue causing the long compile time, but if not then we might need to brainstorm a workaround of some kind (no idea what that would look like until we understand the issue though!).
I'll reopen google/jax#4448, which may have been closed prematurely :P
Thanks for bringing this to our attention @n2cholas (and for pinging me @petebu)!
from dm-haiku.
Hi @n2cholas, I also get an OOM when running with OmniStaging on the public GPU colab kernel (12GB RAM). But by reducing the model size (e.g. vocab_size = 10, seq_length = 8
) or using a bigger GPU (32GB RAM) I can compile it ok, so I don't think this is a bug in Haiku.
I'm going to close this for now but please reopen if you'd like to follow up.
from dm-haiku.
Thanks for the response @petebu. In my view, this is still a regression bug, since this model worked fine before the new 0.2.0 JAX release (which automatically enabled omnistaging). Also, the model I have in the gist is pretty small for a language model (vocab of only 1000 and sequence length of 80), so it should work no problem. Before the 0.2.0 release, a similar but larger model with a vocab size of 10,000 and sequence length of 256 worked fine, so this smaller one shouldn't OOM.
I would love to help fix this issue, but I'm not familiar enough with the internals. If you could provide some suggestions on what to look into, I would be happy to help.
from dm-haiku.
I think the issue here is with core Jax (rather than Haiku) since OmniStaging can change compilation. @mattjj Any ideas?
from dm-haiku.
Related Issues (20)
- Is there a way to load parameters from Flax model? HOT 2
- Support model examples HOT 7
- Change to jax.interpreters.xla for JAX==0.4.14 HOT 3
- Warning: hk.LayerNorm when used in transformer decoder causes violation of autoregressive property HOT 1
- Reservoir Computing with Haiku
- Efficiency difference in using jax.lax.fori_loop vs looping over identical layers? HOT 2
- Please publish requirements.txt fix to pip
- How to use `apply` with additional parameters? HOT 1
- hk.Conv2DTranspose takes FOREVER to initialize and compile HOT 1
- 0.4.16 timeline HOT 2
- How to export haiku network parameters into Pytorch network?
- Modules got silently "reused" with `hk.vmap` HOT 2
- Wrong gradients in a Haiku network
- Direct Feedback Alignment
- Issue with wheels including docs and examples folder
- `haiku.experimental.flax` is not part of newest pip release HOT 1
- Train multiple hk.nets.MLP with one optimizer HOT 2
- TypeError: 'type' object is not subscriptable HOT 4
- Wrapping the ```init``` function inside ```jax.jit``` HOT 1
- Consider make flax an optional dependency HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dm-haiku.