ngraymon / dnn-se Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 0.0 507 KB

This projects goal is to replicate the algorithm and experiments defined in 10.1103/PhysRevResearch.2.033429

Home Page: https://journals.aps.org/prresearch/abstract/10.1103/PhysRevResearch.2.033429

Python 100.00%

dnn-se's People

Contributors

Stargazers

Watchers

dnn-se's Issues

Tensorflow

per this error which comes up when using numpy 1.21

Traceback (most recent call last):
  File "driver.py", line 753, in <module>
    main(molecule, spins)
  File "driver.py", line 696, in main
    mcmc_object = prepare_mcmc(mcmc_config, network_object, init_means, precision)
  File "driver.py", line 635, in prepare_mcmc
    mcmc_object = MonteCarlo(
  File "/DNN-SE/monte_carlo.py", line 103, in __init__
    self.walkers = self._initial_random_states()
  File "/DNN-SE/monte_carlo.py", line 161, in _initial_random_states
    samples = rng.normal(
  File "_generator.pyx", line 1131, in numpy.random._generator.Generator.normal
  File "_common.pyx", line 562, in numpy.random._common.cont
  File "_common.pyx", line 480, in numpy.random._common.cont_broadcast_2
  File "_common.pyx", line 229, in numpy.random._common.validate_output_shape
ValueError: Output size (3, 10, 1) is not compatible with broadcast dimensions of inputs (3, 10, 3).

see here, we should add a requirements file to the repository to fix the versions of the packages we use.

Compare Pseudo Code to TensorFlow

Should have been done by the 4th give or take

Familiarize with ML accelerators

short notes about online gpus. where the limitaitons are and when they will throttle?
Look into how code is run.

Picking the first system (atom)

Helium atom?

from the Tables I and II

I have an account with amazon lightsail which I'm using to run simple vulture/coverage checking on a gitlab project. We should be able to use that for pipeline things for this project as well; I would imagine the setup is somewhat similar.

Understanding the minimization function

Per page 6

summarize how the minimization works
write out the loss function explicit
list the bullet points of each moving piece?

Simple decomposition

Bullet point lists of different pieces used in the paper:

what input/data sets they used
details about testing
network shape
skimming of references? arguments for why their approach is bad/good?

Investigate meaningfulness of results

Given that the code should now run without errors we need to check if it is mathematical sound and that we can generate meaningful results.
Should be done by the 19th

Pick an atom for our first target of replicating the results

Any of the atoms on Table I should be fine?

Write `generate_electron_position_vector` function

Write a function generate_electron_position_vector
( most likely in the hamiltonian module) which replicates the function assign_electrons and returns a list of 3*N floats which are the means for drawing samples and represent the electron positions.

Complete MCMC code

Current functions that are not complete / will need work in the future:

compute_probability_density_of_state: this requires extracting the wavefunction/pdf which is constructed by the network, so either its a call to .forward() or possibly just retrieving a previously stored psi value?
compute_proposal_density is implemented although it may need to be updated if the formula is not correct. The dimensions of the tensors will probably require some small changes.
propose_new_state simply samples from a normal distribution at the moment, but I believe in the final version we will need to include the use of the envelope?
pre_train has not been implemented as it is unclear if we are going to attempt to preform the pretraining
Need to write a function which can be called by kfac and/or the network during the training procedure
Need to add a configuration function/capability which can be called by driver.py

Additionally there is not use of pytorch, it is all numpy code and therefore CPU bound.
We may want to change this?

Big Picture layout

what is beyond the scope
all the extra moving pieces from the appendices
how much results and stuff do we do

Confirm code runs without errors

Using a simple system like hydrogen confirm that we can execute all the code required to train the network and run the (V)MC simulation to produce a ground state wavefunction. The results do not have to be meaningful, we're just checking for execution flow and runtime errors.
Should be done by the 13th

Comparisons

Summarize the comparisons made in the introduction

Thoughts on the athletic jumping paper

Possible issues

Auto encoders are covered later in the course.

Training time considerations

pg 6: training the P-VAE takes

Using a mini-batch size of 128, the training takes 80 epochs within 2 minutes on an NVIDIA GeForce GTX 1080 GPU and an Intel i7- 8700k CPU

pg 10: One strategy evaluation for a single initial state [...] typically takes about six hours

a Dell Precision 7920 Tower workstation, with dual Intel Xeon Gold 6248R CPUs (3.0
GHz, 48 cores) and an Nvidia Quadro RTX 6000 GPU

and

Since the evaluation of one BDS sample takes about six hours, the Stage 1 exploration takes about 60 hours in total [...] are further optimized [...] which takes another 20 hours.

so Stage 1 takes 80 hours? and it appears Stage 2 takes 60?

The total time required for Stage 2 is roughly 60 hours.

First pass thoughts

They outline the workflow in Figure 2 on page 4. They say

the run-up controller and its training are not shown in Figure 2

.. so we might need to look at that?

I think if I understand correctly the idea is to do two passes; the first pass gives some "decent" jumping strategies but the second pass is where they discover the visually distinct strategies. The point of the BDS and the decoupling is that instead of trying to optimize the jump by exploring the entire possibility space of all jumps they instead fix the jumper at the start and do a quick exploration of 'reasonable' orientations of the body, AND THEN attempt to optimize jumping strategies based on that initial orientation.

and then the argument against being biased by the initial state is that they can explore unseen states/actions though the DRL? I think that is reinforced by this quote from page 11

we perform novel policy search for five DRL iterations from each good initial state of Stage 1

although at the end of the day they say that Stage 2 only discovers 2 additional strategies and most of them are discovered in Stage 1; which seems to say that Stage 1 is the most important?

I wonder how true this statement from page 4 is

The take-off state is key to our exploration strategy, as it is a strong determinant of the resulting jump strategy

Is this something that is just clear based on other papers? Is this a guess? Or was this something they discovered in their research? Feels like the kind of thing that needs references.

In section 7.1.1 on page 10 I'm curious about the 'failures'... do they have a formal definition for a failure?

The other four samples generated either repetitions or failures

Now this is an interesting statement:

The performance and realism of our simulated jumps are bounded by many simplifications in our modeling and simulation. We simplify the athlete’s feet and specialized high jump shoes as rigid rectangular boxes, which reduces the maximum heights the virtual athlete can clear. We model the high jump crossbar as a wall at training time and as a rigid bar at run time, while real bars are made from more elastic materials such as fiberglass. We use a rigid box as the landing surface, while real-world landing cushions protect the athlete from breaking his neck and back, and also help him roll and get up in a fluid fashion.

I wonder how viable it would be to change the feet to be an oval instead of a rectangular box?

Presentation summary

Note:

All teammates should participate in the presentation done by their group and every group member has to get an equal chance to speak on a segment of the slides.
Make sure all your group members are available to be present in the class and talk on that day before you sign up, because if you sign up to reserve a time and you end up needing to change your date you will have to convince another group to swap with you.

Format:

You will present a slide deck in the class and talk over it. Group lead should send me the slides before noon of your presentation. Since creating exposure to applications of ML in physics is part of the objectives of the course, your slide deck will be made available to your classmates as part of the course material. The time and title of each presentation will also be posted after the sign-up round is over. Your presentation is 20 minutes followed by 10 minutes of Q&A. In the Q&A segment you address questions from your audience (me and your classmates) and I might have further feedback regarding your project plans.
You should coordinate for all group members to have an equal opportunity to speak (~5min per person for groups of size 4 and ~4min per person for groups of size 5). Your team has to remain strictly within the 20-minute time limit. Consider doing internal dry-runs of your presentations in your group; consider what questions might come up; and, discuss how you'd address them.

Content:

The main goal of the midterm presentation is to demonstrate that:
1. your group has read the paper and understood it properly
2. you have a clear idea of what you'd be doing in your course project.

Suggested Outline:

Problem statement. What is the problem tackled by your paper? Give an introduction to the scientific background assumed and/or discussed in the paper, and the computational challenge it intends to overcome using ML.
- It probably makes the most sense for me to do this part. The problem is to solve the Schrodinger equation, i.e. has all the answers for any quantum mechanical system, pretty much what all of chemistry/physics is trying to do.
Methodology. How does the paper solve this problem using ML? Give a summary of the ML techniques, methods, learning paradigm, models, and architectures used by the paper and how the proposed solutions solve the problem stated in (1). Be prepared for questions regarding conventional state-of-the-art for the problem and why/if ML is a better technique to solving the problem.
- It probably makes the most sense for William and JB to explain both the network and the loss function?
Results. What results did the authors achieve? Give a summary of the original numerical experiments done in the paper. Show the tables, figures, charts, etc. reported by the paper and explain what is reported/claimed in each figure on the assessed application of ML to solving the problem stated in (1).
- Maybe Andrew should explain this part?
Your roadmap. What would you be doing in the coming month? Give a clear presentation of what parts of the paper you intend to focus on for your final project. Discuss further details on how you'd generate training/validation data; how you'd implement the experiment; what compute resources you have access to or you'd need access to; what reports do you expect to be able to generate/reproduce; what challenges you think you might be facing; how you'd break the upcoming month into reasonable milestones to ensure your project stays on track. You should also speak to project management considerations: how do you plan to divide the tasks between yourselves; what will the role of each group member be; who's task will depend on completion of other person's task; how do your resolve and optimize your dependencies; which milestones have a higher risk of incurring delays; and, what your plans are in case you face such delays.
- Matthew could do this part?