Skip to content

Comments

Add an upper bound function for the operator norm.#275

Open
MaxenceGollier wants to merge 9 commits intoJuliaSmoothOptimizers:masterfrom
MaxenceGollier:opnorm
Open

Add an upper bound function for the operator norm.#275
MaxenceGollier wants to merge 9 commits intoJuliaSmoothOptimizers:masterfrom
MaxenceGollier:opnorm

Conversation

@MaxenceGollier
Copy link
Collaborator

#270 (comment)

@MohamedLaghdafHABIBOULLAH @d1Lab,

Can you fetch my branch and make some benchmarks ? I think we should try to see if it works because it is very cheap.

@codecov
Copy link

codecov bot commented Jan 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.61%. Comparing base (e0f214d) to head (1c31211).
⚠️ Report is 271 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master     #275       +/-   ##
===========================================
+ Coverage   61.53%   84.61%   +23.08%     
===========================================
  Files          11       13        +2     
  Lines        1292     1671      +379     
===========================================
+ Hits          795     1414      +619     
+ Misses        497      257      -240     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@MohamedLaghdafHABIBOULLAH
Copy link
Collaborator

MohamedLaghdafHABIBOULLAH commented Jan 8, 2026

@MaxenceGollier @dpo @d1Lab, I conducted some benchmarks with the joss example, with solve! being more constraining:

  solve!(solver, reg_nlp, stats;
       atol=1e-6,
       rtol=1e-6,
       verbose=1,
       sub_kwargs=(max_iter=1000,),
       opnorm_maxiter= ...  % see below
   )

Below are my remarks concerning different operator–norm estimation
strategies when L-SR1 is used, which becomes problematic because $B_k$ may
be indefinite with a very small norm. Tests were performed on Julia
1.11.7 and Julia 1.12.2.

  1. Behavior with opnorm_maxiter in {5,100}
    The solver fails on both versions. (with $$\xi < 0$$).

  2. Behavior with opnorm_maxiter = 10

It works on both versions, showing that the instability is not simply due to using too few or too many power-method iterations.

  1. Behavior with opnorm_maxiter = -1 (upper bound)

fails on both versions. (with $$\xi < 0$$).

  1. Behavior with ARPACK

Using ARPACK to compute the dominant eigenvalue leads to instability:

  • repeated runs do not produce identical results,
  • sometimes the method converges,
  • sometimes it fails with xi < 0,
  • both Julia versions exhibit this behavior.

ARPACK introduces non-determinism and rounding differences that R2N does
not tolerate well.

Even for a small value such as $\theta = 0.1$, all the failures above persist. This suggests that choosing an appropriate $\theta$ is delicate: the solver succeeds with $\theta$ close to~1 when opnorm_maxiter = 10, but fails with $\theta = 0.1$

L-BFGS showed no errors with any operator-norm option.
So the instabilities above might be caused by the potentially indefinite LSR1 matrix rather than by the operator-norm estimation.

@MaxenceGollier
Copy link
Collaborator Author

MaxenceGollier commented Jan 8, 2026

Wouldn't it be more numerically stable to have $$\sigma‖s_{cp}‖$$ as stationarity measure ? It is also a generalization of the norm of the gradient for smooth optimization and does not involve potential catastrophic cancellation which might be the cause for the $$\xi &lt; 0$$.
Moreover, the inequality $$\sigma‖s_{cp}‖ \leq \sigma^{-1/2}\xi^{1/2}$$ keeps the convergence proof valid with that stationarity measure.
Just a suggestion... @dpo @d1Lab.

@MohamedLaghdafHABIBOULLAH
Copy link
Collaborator

MohamedLaghdafHABIBOULLAH commented Jan 9, 2026

Hey @MaxenceGollier — I apologize, most of the errors mentioned earlier
(specifically when opnorm = -1) are actually due to R2 and not R2N. The
core issue comes from the fact that:

    ERROR: LoadError: R2: prox-gradient step should produce a decrease but ξ = -7.653729494313895e288

In this situation, the proximal operator is not computed accurately for
the RootNormLhalf (which contains cos and arcos...), which leads to an incorrect (and extremely large
negative) value of ξ in contrast to the L0 or L1.

I ran additional experiments using:

    sub_kwargs = (max_iter = 0,)
    atol = rtol = 1e-4

to force the selection of $$s_{k,cp}$$ exactly.
Under these conditions, if the norm of Bk is not well approximated, we may still encounter an error.

Specifically:

  • When opnorm = -1, we do not observe errors of the form
    ERROR: LoadError: R2N: failed to compute a step: ξ
    which confirms that R2N is robust in this case.

  • These errors do appear when opnorm ∈ {10, 50, 100}, which suggests
    that inaccurate spectral‑norm estimates can destabilize the step
    computation.

So overall, this is good news:

  • R2N is not the source of the problematic behavior if opnorm_maxiter = -1
  • The main issue is linked to R2 combined with inaccurate norm
    estimation for Bk and the proximal operator of RootNormLhalf we should address this issue

@MohamedLaghdafHABIBOULLAH
Copy link
Collaborator

MohamedLaghdafHABIBOULLAH commented Jan 9, 2026

Wouldn't it be more numerically stable to have σ ‖ s c p ‖ as stationarity measure ? It is also a generalization of the norm of the gradient for smooth optimization and does not involve potential catastrophic cancellation which might be the cause for the ξ < 0 . Moreover, the inequality σ ‖ s c p ‖ ≤ σ − 1 / 2 ξ 1 / 2 keeps the convergence proof valid with that stationarity measure. Just a suggestion... @dpo @d1Lab.

@MaxenceGollier

I am not entirely sure whether this will make the problem disappear, because it is not
$$\xi$$ that is used as the stationarity measure, but rather $$\xi_1$$, which is related to
the step $$s_{\mathrm{cp}}$$ as you mentioned above.

The check on the condition $$\xi &lt; 0$$ is mainly for debugging purposes and to make the
implementation consistent with the theoretical framework (sufficient decrease).

In fact, as far as I can see in the benchmarks $$\xi_1$$ is always positive as long as you compute your prox exactly.

@MaxenceGollier
Copy link
Collaborator Author

Ok. Still, can we compare in term of time or number of iterations opnorm_max_iter ?
Actually, it would be most interesting to see the upper_bound approach vs the approach with arpack.

@MaxenceGollier
Copy link
Collaborator Author

Could you take a look sometime @dpo.

@MaxenceGollier MaxenceGollier requested a review from dpo January 26, 2026 18:27
Copilot AI review requested due to automatic review settings February 23, 2026 17:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request addresses issue #270 by adding an upper bound function for operator norms to improve robustness when the quasi-Newton approximation Bₖ (particularly SR1) is ill-conditioned. The PR introduces a new opnorm_upper_bound function with specialized implementations for different operator types, and updates the power_method! function to return a success indicator.

Changes:

  • Added opnorm_upper_bound function with specialized implementations for LBFGS, LSR1, diagonal, matrix, and general linear operators
  • Modified power_method! to return a tuple (value, found) indicating success
  • Updated R2N algorithm to use opnorm_upper_bound when opnorm_maxiter ≤ 0
  • Added comprehensive tests for the new functionality
  • Bumped LinearOperators dependency from 2.10.0 to 2.13.0 to access the opnorm_upper_bound field in operator data structures

Reviewed changes

Copilot reviewed 5 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/utils.jl Implements opnorm_upper_bound functions for various operator types and updates power_method! return signature
src/R2N.jl Updates R2N algorithm to use opnorm_upper_bound and handles new power_method! signature; updates documentation
src/TR_alg.jl Updates to handle new power_method! return signature
test/test-utils.jl Adds comprehensive tests for opnorm_upper_bound across different operator types
test/runtests.jl Includes new test file and adds LinearOperators import
Project.toml Bumps LinearOperators version requirement to 2.13.0

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

MaxenceGollier and others added 2 commits February 23, 2026 13:32
Co-authored-by: Mohamed Laghdaf <81633807+MohamedLaghdafHABIBOULLAH@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants