Giter Club home page Giter Club logo

Comments (8)

jsspencer avatar jsspencer commented on July 19, 2024 2

KFAC is now integrated (and the default optimiser) in the JAX branch.

from ferminet.

jsspencer avatar jsspencer commented on July 19, 2024

Yes, we hope to release a research-level preview of KFAC soon!

from ferminet.

n-gao avatar n-gao commented on July 19, 2024

Great to hear, thanks! Is there any ETA for this?

from ferminet.

kngwyu avatar kngwyu commented on July 19, 2024

This? https://github.com/deepmind/deepmind-research/tree/master/kfac_ferminet_alpha

from ferminet.

connection-on-fiber-bundles avatar connection-on-fiber-bundles commented on July 19, 2024

Hey @jsspencer , thanks a lot for open-sourcing KFAC implementation. Great work!

However, when I run training for Mg with 8 V-100 GPUs (batch size 512), I got an error as follows

terminate called after throwing an instance of 'std::runtime_error'
terminate called recursively
terminate called recursively
terminate called recursively
Fatal Python error: Aborted

Thread 0xterminate called recursively
00007f3da9bf2b80 (most recent call first):
  File "terminate called recursively
/usr/lo  what():  ccuSolver execution failedal/
lib/pyterminate called recursively
thon3.7/dist-packages/jax/interpreters/pxla.py", line 1204 in execute_replicated
  File "/usr/local/lib/python3.7/dist-packages/jax/interpreters/pxla.py", line 648 in xla_pmap_impl
  File "/usr/local/lib/python3.7/dist-packages/jax/core.py", line 631 in process_call
  File "/usr/local/lib/python3.7/dist-packages/jax/core.py", line 1305 in process
  File "/usr/local/lib/python3.7/dist-packages/jax/core.py", line 1266 in call_bind
  File "/usr/local/lib/python3.7/dist-packages/jax/core.py", line 1302 in bind
  File "/usr/local/lib/python3.7/dist-packages/jax/api.py", line 1574 in f_pmapped
  File "/usr/local/lib/python3.7/dist-packages/jax/_src/traceback_util.py", line 139 in reraise_with_filtered_traceback
  File "/home/tiger/.local/lib/python3.7/site-packages/kfac_ferminet_alpha/optimizer.py", line 567 in step
  File "/opt/tiger/ferminet_jax/ferminet/train.py", line 497 in train
  File "./bin/ferminet", line 35 in main
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251 in _run_main
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303 in run
  File "./bin/ferminet", line 39 in <module>
Aborted (core dumped)

Any clue?

BTW, I was using jax 0.2.9 and jaxlib 0.1.59, not sure if related.

from ferminet.

connection-on-fiber-bundles avatar connection-on-fiber-bundles commented on July 19, 2024

BTW, I can successfully train the net using KFAC on smaller atoms like O and F, but not for Na nor Mg.

from ferminet.

jsspencer avatar jsspencer commented on July 19, 2024

Hard to know. My suspicion is that the batch size is so small that the estimates required for the curvature in KFAC are noisy. KFAC requires solving the linear equations Ax=b, which is done via a Cholesky decomposition and assumes A is symmetric and positive-definite. The latter requirement might not be met for noisy estimates.

from ferminet.

connection-on-fiber-bundles avatar connection-on-fiber-bundles commented on July 19, 2024

@jsspencer Got it, will give it a try, thanks!

from ferminet.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.