There seems to be a bug in HagerZhangLineSearch (for instance, running the steepestdes

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Bug in HagerZhangLineSearch about argmin HOT 5 CLOSED

argmin-rs commented on May 30, 2024

Bug in HagerZhangLineSearch

from argmin.

Comments (5)

whereistejas commented on May 30, 2024

Hi @stefan-k,
I'm interested in this particular issue and would like to help fix it (if it is still relevant). Could you give me some pointers on where to get started? I'm also interested in contributing in general, so please feel to direct to me more urgent or important issues. :)

from argmin.

stefan-k commented on May 30, 2024

Hi @whereistejas ,

thanks for your interest to contribute! It's been a while since I opened this issue, so I can't remember the details anymore. The implementation follows this paper [1]. The code can be found in [2]. I think a good way to move forward is to reproduce the errors by running the steepestdescent and nonlinear_cg examples with the HagerZhangLineSearch. I suspect a wrong <, >, <= or >=.

While going through the implementation, I noticed a comment that not all stopping criteria are implemented, but I don't think that this is causing the problem.

If you have any questions feel free to get in touch via the issues or via gitter. I'm happy to help, but I may unfortunately be a bit slow to respond currently!

[1] William W. Hager and Hongchao Zhang. "A new conjugate gradient method with guaranteed descent and an efficient line search." SIAM J. Optim. 16(1), 2006, 170-192.
[2] https://github.com/argmin-rs/argmin/blob/master/src/solver/linesearch/hagerzhang.rs

from argmin.

whereistejas commented on May 30, 2024

Hi @stefan-k ,
I compared the output of the two line search methods in steepestdescent.

The output for Hager-Zhang method is:

[
  1.1999999999997424,
  1.2000000000001065
]

Where as the output for More-Thuente method is:

[
  1.0724442427550391,
  1.151531565672518
]

So, I'm guessing, this is the discrepancy you are talking about?

I also cross checked the algorithm:

argmin/src/solver/linesearch/hagerzhang.rs

Lines 252 to 304 in dc2a06c

 fn update<O: ArgminOp<Param = P, Output = F>>( 

 &mut self, 

 op: &mut OpWrapper<O>, 

 (a_x, a_f, a_g): Triplet<F>, 

 (b_x, b_f, b_g): Triplet<F>, 

 (c_x, c_f, c_g): Triplet<F>, 

 ) -> Result<(Triplet<F>, Triplet<F>), Error> { 

 // U0 

 if c_x <= a_x || c_x >= b_x { 

 // nothing changes. 

 return Ok(((a_x, a_f, a_g), (b_x, b_f, b_g))); 

 } 

 // U1 

 if c_g >= F::from_f64(0.0).unwrap() { 

 return Ok(((a_x, a_f, a_g), (c_x, c_f, c_g))); 

 } 

 // U2 

 if c_g < F::from_f64(0.0).unwrap() && c_f <= self.finit + self.epsilon_k { 

 return Ok(((c_x, c_f, c_g), (b_x, b_f, b_g))); 

 } 

 // U3 

 if c_g < F::from_f64(0.0).unwrap() && c_f > self.finit + self.epsilon_k { 

 let mut ah_x = a_x; 

 let mut ah_f = a_f; 

 let mut ah_g = a_g; 

 let mut bh_x = c_x; 

 loop { 

 let d_x = (F::from_f64(1.0).unwrap() - self.theta) * ah_x + self.theta * bh_x; 

 let d_f = self.calc(op, d_x)?; 

 let d_g = self.calc_grad(op, d_x)?; 

 if d_g >= F::from_f64(0.0).unwrap() { 

 return Ok(((ah_x, ah_f, ah_g), (d_x, d_f, d_g))); 

 } 

 if d_g < F::from_f64(0.0).unwrap() && d_f <= self.finit + self.epsilon_k { 

 ah_x = d_x; 

 ah_f = d_f; 

 ah_g = d_g; 

 } 

 if d_g < F::from_f64(0.0).unwrap() && d_f > self.finit + self.epsilon_k { 

 bh_x = d_x; 

 } 

 } 

 } 

 // return Ok(((a_x, a_f, a_g), (b_x, b_f, b_g))); 

 Err(ArgminError::InvalidParameter { 

 text: "HagerZhangLineSearch: Reached unreachable point in `update` method.".to_string(), 

 } 

 .into()) 

 }

With the algorithm in the paper, I couldn't find any mistakes.

from argmin.

stefan-k commented on May 30, 2024

Thanks for comparing the code with the paper! Yes, this is the discrepancy I was talking about. As you can see, it barely moves away from the initial position.
I have to admit, I'm a bit lost as to how to proceed. When looking at the implementation in Julia you can see from the comments at the top of the file that the original code (by the author of the paper) "has undergone numerous revisions since publication of the paper". Therefore, this may be a weird edge case which was fixed after publication of the paper (just a guess).
Do you want to have a look at this? It might be pretty tedious to go through the original code and figure out what was changed and what might be going wrong in our implementation. This problem definitely needs fixing, but I can totally understand if you want to tackle a problem which is more interesting and satisfying as your first contribution. If you want to set this issue aside for now and work on something else then I suggest you get in touch in the Gitter channel and we can find something interesting for you.

from argmin.

whereistejas commented on May 30, 2024

Hi @stefan-k,
I'm fine with working on this issue.

On further comparison, I did find some differences:

argmin/src/solver/linesearch/hagerzhang.rs

Lines 327 to 338 in dc2a06c

 // S2 

 if (c_x - bb_x).abs() < F::epsilon() { 

 c_bar_x = self.secant(b_x, b_g, bb_x, bb_g); 

 } 

 // S3 

 if (c_x - aa_x).abs() < F::epsilon() { 

 c_bar_x = self.secant(a_x, a_g, aa_x, aa_g); 

 } 

 // S4 

 if (c_x - aa_x).abs() < F::epsilon() || (c_x - bb_x).abs() < F::epsilon() {

The if conditions here are different from what is mentioned in the paper:

This is my current lead.

from argmin.

Bug in HagerZhangLineSearch about argmin HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	fn update<O: ArgminOp<Param = P, Output = F>>(
	&mut self,
	op: &mut OpWrapper<O>,
	(a_x, a_f, a_g): Triplet<F>,
	(b_x, b_f, b_g): Triplet<F>,
	(c_x, c_f, c_g): Triplet<F>,
	) -> Result<(Triplet<F>, Triplet<F>), Error> {
	// U0
	if c_x <= a_x \|\| c_x >= b_x {
	// nothing changes.
	return Ok(((a_x, a_f, a_g), (b_x, b_f, b_g)));
	}

	// U1
	if c_g >= F::from_f64(0.0).unwrap() {
	return Ok(((a_x, a_f, a_g), (c_x, c_f, c_g)));
	}

	// U2
	if c_g < F::from_f64(0.0).unwrap() && c_f <= self.finit + self.epsilon_k {
	return Ok(((c_x, c_f, c_g), (b_x, b_f, b_g)));
	}

	// U3
	if c_g < F::from_f64(0.0).unwrap() && c_f > self.finit + self.epsilon_k {
	let mut ah_x = a_x;
	let mut ah_f = a_f;
	let mut ah_g = a_g;
	let mut bh_x = c_x;
	loop {
	let d_x = (F::from_f64(1.0).unwrap() - self.theta) * ah_x + self.theta * bh_x;
	let d_f = self.calc(op, d_x)?;
	let d_g = self.calc_grad(op, d_x)?;
	if d_g >= F::from_f64(0.0).unwrap() {
	return Ok(((ah_x, ah_f, ah_g), (d_x, d_f, d_g)));
	}
	if d_g < F::from_f64(0.0).unwrap() && d_f <= self.finit + self.epsilon_k {
	ah_x = d_x;
	ah_f = d_f;
	ah_g = d_g;
	}
	if d_g < F::from_f64(0.0).unwrap() && d_f > self.finit + self.epsilon_k {
	bh_x = d_x;
	}
	}
	}

	// return Ok(((a_x, a_f, a_g), (b_x, b_f, b_g)));
	Err(ArgminError::InvalidParameter {
	text: "HagerZhangLineSearch: Reached unreachable point in `update` method.".to_string(),
	}
	.into())
	}

	// S2
	if (c_x - bb_x).abs() < F::epsilon() {
	c_bar_x = self.secant(b_x, b_g, bb_x, bb_g);
	}

	// S3
	if (c_x - aa_x).abs() < F::epsilon() {
	c_bar_x = self.secant(a_x, a_g, aa_x, aa_g);
	}

	// S4
	if (c_x - aa_x).abs() < F::epsilon() \|\| (c_x - bb_x).abs() < F::epsilon() {