Comments (8)
KFAC is now integrated (and the default optimiser) in the JAX branch.
from ferminet.
Yes, we hope to release a research-level preview of KFAC soon!
from ferminet.
Great to hear, thanks! Is there any ETA for this?
from ferminet.
This? https://github.com/deepmind/deepmind-research/tree/master/kfac_ferminet_alpha
from ferminet.
Hey @jsspencer , thanks a lot for open-sourcing KFAC implementation. Great work!
However, when I run training for Mg with 8 V-100 GPUs (batch size 512), I got an error as follows
terminate called after throwing an instance of 'std::runtime_error'
terminate called recursively
terminate called recursively
terminate called recursively
Fatal Python error: Aborted
Thread 0xterminate called recursively
00007f3da9bf2b80 (most recent call first):
File "terminate called recursively
/usr/lo what(): ccuSolver execution failedal/
lib/pyterminate called recursively
thon3.7/dist-packages/jax/interpreters/pxla.py", line 1204 in execute_replicated
File "/usr/local/lib/python3.7/dist-packages/jax/interpreters/pxla.py", line 648 in xla_pmap_impl
File "/usr/local/lib/python3.7/dist-packages/jax/core.py", line 631 in process_call
File "/usr/local/lib/python3.7/dist-packages/jax/core.py", line 1305 in process
File "/usr/local/lib/python3.7/dist-packages/jax/core.py", line 1266 in call_bind
File "/usr/local/lib/python3.7/dist-packages/jax/core.py", line 1302 in bind
File "/usr/local/lib/python3.7/dist-packages/jax/api.py", line 1574 in f_pmapped
File "/usr/local/lib/python3.7/dist-packages/jax/_src/traceback_util.py", line 139 in reraise_with_filtered_traceback
File "/home/tiger/.local/lib/python3.7/site-packages/kfac_ferminet_alpha/optimizer.py", line 567 in step
File "/opt/tiger/ferminet_jax/ferminet/train.py", line 497 in train
File "./bin/ferminet", line 35 in main
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251 in _run_main
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303 in run
File "./bin/ferminet", line 39 in <module>
Aborted (core dumped)
Any clue?
BTW, I was using jax 0.2.9 and jaxlib 0.1.59, not sure if related.
from ferminet.
BTW, I can successfully train the net using KFAC on smaller atoms like O and F, but not for Na nor Mg.
from ferminet.
Hard to know. My suspicion is that the batch size is so small that the estimates required for the curvature in KFAC are noisy. KFAC requires solving the linear equations Ax=b, which is done via a Cholesky decomposition and assumes A is symmetric and positive-definite. The latter requirement might not be met for noisy estimates.
from ferminet.
@jsspencer Got it, will give it a try, thanks!
from ferminet.
Related Issues (20)
- Question about exact_cusp function HOT 1
- Installation Error HOT 7
- How does training time scale w.r.t. model size? HOT 1
- Jax install - issue with correct version number HOT 1
- AttributeError: module 'jax.core' has no attribute 'extract_call_jaxpr' HOT 1
- Jax error running on A100 GPU (everything is okay on CPU) HOT 2
- unable to setup HOT 1
- The proper way to cite FermiNet repo HOT 1
- Ground State Energies HOT 2
- Question about pbc ewald part. HOT 2
- nan when training with 'adam' HOT 1
- About configs HOT 3
- Question About load Checkpoint HOT 1
- Evaluating logprob using batch_network in train HOT 1
- Issue on running pytest HOT 5
- Extension of PBC code to 1D HOT 7
- Something went wrong in RepeatedDenseBlock.update_curvature_matrix_estimate HOT 2
- Different results obtained from the paper for ch3nh2 HOT 2
- kfac_jax error when running H2 example script HOT 2
- Upstream breaking change in `kfac-jax`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ferminet.