Possible issues
- Auto encoders are covered later in the course.
Training time considerations
- pg 6: training the P-VAE takes
Using a mini-batch size of 128, the training takes 80 epochs within 2 minutes on an NVIDIA GeForce GTX 1080 GPU and an Intel i7- 8700k CPU
- pg 10: One strategy evaluation for a single initial state [...] typically takes about six hours
a Dell Precision 7920 Tower workstation, with dual Intel Xeon Gold 6248R CPUs (3.0
GHz, 48 cores) and an Nvidia Quadro RTX 6000 GPU
and
Since the evaluation of one BDS sample takes about six hours, the Stage 1 exploration takes about 60 hours in total [...] are further optimized [...] which takes another 20 hours.
so Stage 1 takes 80 hours? and it appears Stage 2 takes 60?
The total time required for Stage 2 is roughly 60 hours.
First pass thoughts
They outline the workflow in Figure 2 on page 4. They say
the run-up controller and its training are not shown in Figure 2
.. so we might need to look at that?
I think if I understand correctly the idea is to do two passes; the first pass gives some "decent" jumping strategies but the second pass is where they discover the visually distinct strategies. The point of the BDS and the decoupling is that instead of trying to optimize the jump by exploring the entire possibility space of all jumps they instead fix the jumper at the start and do a quick exploration of 'reasonable' orientations of the body, AND THEN attempt to optimize jumping strategies based on that initial orientation.
and then the argument against being biased by the initial state is that they can explore unseen states/actions though the DRL? I think that is reinforced by this quote from page 11
we perform novel policy search for five DRL iterations from each good initial state of Stage 1
although at the end of the day they say that Stage 2 only discovers 2 additional strategies and most of them are discovered in Stage 1; which seems to say that Stage 1 is the most important?
I wonder how true this statement from page 4 is
The take-off state is key to our exploration strategy, as it is a strong determinant of the resulting jump strategy
Is this something that is just clear based on other papers? Is this a guess? Or was this something they discovered in their research? Feels like the kind of thing that needs references.
In section 7.1.1 on page 10 I'm curious about the 'failures'... do they have a formal definition for a failure?
The other four samples generated either repetitions or failures
Now this is an interesting statement:
The performance and realism of our simulated jumps are bounded by many simplifications in our modeling and simulation. We simplify the athlete’s feet and specialized high jump shoes as rigid rectangular boxes, which reduces the maximum heights the virtual athlete can clear. We model the high jump crossbar as a wall at training time and as a rigid bar at run time, while real bars are made from more elastic materials such as fiberglass. We use a rigid box as the landing surface, while real-world landing cushions protect the athlete from breaking his neck and back, and also help him roll and get up in a fluid fashion.
I wonder how viable it would be to change the feet to be an oval instead of a rectangular box?