Advanced Usage
This section describes different optional user parameters available in DFO-LS.
In the last section (Using DFO-LS), we introduced dfols.solve()
, which has the optional input user_params
. This is a Python dictionary of user parameters. We will now go through the settings which can be changed in this way. More details are available in the papers [CFMR2018], [HR2022] and [LLR2024].
The default values, used if no override is given, in some cases vary depending on whether objfun
has stochastic noise; that is, whether evaluating objfun(x)
several times at the same x
gives the same result or not. Whether or not this is the case is determined by the objfun_has_noise
input to dfols.solve()
(and not by inspecting objfun
, for instance).
General Algorithm Parameters
general.rounding_error_constant
- Internally, all interpolation points are stored with respect to a base point \(x_b\); that is, we store \(\{y_t-x_b\}\), which reduces the risk of roundoff errors. We shift \(x_b\) to \(x_k\) when \(\|s_k\| \leq \text{const}\|x_k-x_b\|\), where ‘const’ is this parameter. Default is 0.1.general.safety_step_thresh
- Threshold for when to call the safety step, \(\|s_k\| \leq \gamma_S \rho_k\). Default is \(\gamma_S =0.5\).general.check_objfun_for_overflow
- Whether to cap the value of \(r_i(x)\) when they are large enough that an OverflowError will be encountered when trying to evaluate \(f(x)\). Default isTrue
.
Logging and Output
logging.n_to_print_whole_x_vector
- If printing all function evaluations to screen/log file, the maximumlen(x)
for which the full vectorx
should be printed also. Default is 6.logging.save_diagnostic_info
- Flag so save diagnostic information at each iteration. Default isFalse
.logging.save_poisedness
- If saving diagnostic information, whether to include the \(\Lambda\)-poisedness of \(Y_k\) in the diagnostic information. This is the most computationally expensive piece of diagnostic information. Default isTrue
.logging.save_xk
- If saving diagnostic information, whether to include the full vector \(x_k\). Default isFalse
.logging.save_rk
- If saving diagnostic information, whether to include the full vector \([r_1(x_k)\:\cdots\:r_m(x_k)]\). The value \(f(x_k)\) is always included. Default isFalse
.
Initialization of Points
init.random_initial_directions
- Build the initial interpolation set using random directions (as opposed to coordinate directions). Default as of version 1.2 isFalse
.init.random_directions_make_orthogonal
- If building initial interpolation set with random directions, whether or not these should be orthogonalized. Default isTrue
.init.run_in_parallel
- If using random directions or non-random with inputnpt
at mostlen(x0)+1
, whether or not to ask for allobjfun
to be evaluated at all points without any intermediate processing. Default isFalse
.
Trust Region Management
tr_radius.eta1
- Threshold for unsuccessful trust region iteration, \(\eta_1\). Default is 0.1.tr_radius.eta2
- Threshold for very successful trust region iteration, \(\eta_2\). Default is 0.7.tr_radius.gamma_dec
- Ratio to decrease \(\Delta_k\) in unsuccessful iteration, \(\gamma_{dec}\). Default is 0.5 for smooth problems or 0.98 for noisy problems (i.e.objfun_has_noise = True
).tr_radius.gamma_inc
- Ratio to increase \(\Delta_k\) in very successful iterations, \(\gamma_{inc}\). Default is 2.tr_radius.gamma_inc_overline
- Ratio of \(\|s_k\|\) to increase \(\Delta_k\) by in very successful iterations, \(\overline{\gamma}_{inc}\). Default is 4.tr_radius.alpha1
- Ratio to decrease \(\rho_k\) by when it is reduced, \(\alpha_1\). Default is 0.1 for smooth problems or 0.9 for noisy problems (i.e.objfun_has_noise = True
).tr_radius.alpha2
- Ratio of \(\rho_k\) to decrease \(\Delta_k\) by when \(\rho_k\) is reduced, \(\alpha_2\). Default is 0.5 for smooth problems or 0.95 for noisy problems (i.e.objfun_has_noise = True
).
Termination on Small Objective Value
model.abs_tol
- Tolerance on \(f(x_k)\); quit if \(f(x_k)\) is below this value. Default is \(10^{-12}\).model.rel_tol
- Relative tolerance on \(f(x_k)\); quit if \(f(x_k)/f(x_0)\) is below this value. Default is \(10^{-20}\).
Termination on Slow Progress
slow.history_for_slow
- History used to determine whether the current iteration is ‘slow’. Default is 5.slow.thresh_for_slow
- Threshold for objective decrease used to determine whether the current iteration is ‘slow’. Default is \(10^{-4}\).slow.max_slow_iters
- Number of consecutive slow successful iterations before termination (or restart). Default is20*len(x0)
.
Stochastic Noise Information
noise.quit_on_noise_level
- Flag to quit (or restart) if all \(f(y_t)\) are within noise level of \(f(x_k)\). Default isFalse
for smooth problems orTrue
for noisy problems.noise.scale_factor_for_quit
- Factor of noise level to use in termination criterion. Default is 1.noise.multiplicative_noise_level
- Multiplicative noise level in \(f\). Can only specify one of multiplicative or additive noise levels. Default isNone
.noise.additive_noise_level
- Additive noise level in \(f\). Can only specify one of multiplicative or additive noise levels. Default isNone
.
Interpolation Management
interpolation.precondition
- whether or not to scale the interpolation linear system to improve conditioning. Default isTrue
.interpolation.throw_error_on_nans
- whether or not to thrownumpy.linalg.LinAlgError
if trying to interpolate to NaN objective values. IfFalse
, DFO-LS should terminate gracefully with an error flag. Default isFalse
.
Regression Model Management
regression.num_extra_steps
- In successful iterations, the number of extra points (other than accepting the trust region step) to move, useful when \(|Y_k|>n+1\) (\(n\) islen(x0)
). Default is 0.regression.increase_num_extra_steps_with_restart
- The amount to increaseregression.num_extra_steps
by with each restarts, for instance if increasing the number of points with each restart. Default is 0.regression.momentum_extra_steps
- If moving extra points in successful iterations, whether to use the ‘momentum’ method. If not, uses geometry-improving steps. Default isFalse
.
Multiple Restarts
restarts.use_restarts
- Whether to do restarts when \(\rho_k\) reaches \(\rho_{end}\), or (optionally) when all points are within noise level of \(f(x_k)\). Default isFalse
for smooth problems orTrue
for noisy problems.restarts.max_unsuccessful_restarts
- Maximum number of consecutive unsuccessful restarts allowed (i.e.~restarts which did not reduce the objective further). Default is 10.restarts.rhoend_scale
- Factor to reduce \(\rho_{end}\) by with each restart. Default is 1.restarts.use_soft_restarts
- Whether to use soft or hard restarts. Default isTrue
.restarts.soft.num_geom_steps
- For soft restarts, the number of points to move. Default is 3.restarts.soft.move_xk
- For soft restarts, whether to preserve \(x_k\), or move it to the best new point evaluated. Default isTrue
.restarts.increase_npt
- Whether to increase \(|Y_k|\) with each restart. Default isFalse
.restarts.increase_npt_amt
- Amount to increase \(|Y_k|\) by with each restart. Default is 1.restarts.hard.increase_ndirs_initial_amt
- Amount to increasegrowing.ndirs_initial
by with each hard restart. To avoid a growing phase, it is best to set it to the same value asrestarts.increase_npt_amt
. Default is 1.restarts.hard.use_old_rk
- If using hard restarts, whether or not to recycle the objective value at the best iterate found when performing a restart. This saves one objective evaluation. Default isTrue
.restarts.max_npt
- Maximum allowed value of \(|Y_k|\), useful if increasing with each restart. Default isnpt
, the input parameter todfols.solve()
.restarts.soft.max_fake_successful_steps
- The maximum number of successful steps in a given run where the new (smaller) objective value is larger than the best value found in a previous run. Default ismaxfun
, the input todfols.solve()
.restarts.auto_detect
- Whether or not to automatically determine when to restart. This is an extra condition, and restarts can still be triggered by small trust region radius, etc. Default isTrue
.restarts.auto_detect.history
- How many iterations of data on model changes and trust region radii to store. There are two criteria used: trust region radius decreases (no increases over the history, more decreases than no changes), and change in model Jacobian (consistently increasing trend as measured by slope and correlation coefficient of line of best fit). Default is 30.restarts.auto_detect.min_chgJ_slope
- Minimum rate of increase of \(\log(\|J_k-J_{k-1}\|_F)\) over the past iterations to cause a restart. Default is 0.015.restarts.auto_detect.min_correl
- Minimum correlation of the data set \((k, \log(\|J_k-J_{k-1}\|_F))\) required to cause a restart. Default is 0.1.
Dynamically Growing Initial Set
growing.ndirs_initial
- Number of initial points to add (excluding \(x_k\)). This should only be changed to a value less than \(n\), and only if the default setup cost of \(n+1\) evaluations ofobjfun
is impractical. If this is set to be less than the default, the input valuenpt
should be set to \(n\). If the default is used, all the below parameters have no effect on DFO-LS. Default isnpt-1
.growing.full_rank.use_full_rank_interp
- Ifgrowing.ndirs_initial
is less thannpt
, whether to perturb the interpolated \(J_k\) to make it full rank, allowing the trust region step to include components in the full search space. Default isTrue
if \(m\geq n\) andFalse
otherwise (opposite togrowing.perturb_trust_region_step
).growing.perturb_trust_region_step
- Whether to perturb the trust region step by an orthogonal direction not yet searched. This is an alternative togrowing.full_rank.use_full_rank_interp
. Default isFalse
if \(m\geq n\) andTrue
otherwise (opposite togrowing.full_rank.use_full_rank_interp
).growing.delta_scale_new_dirns
- When adding new search directions, the length of the step as a multiple of \(\Delta_k\). Default is 1, or 0.1 ifgrowing.perturb_trust_region_step=True
.growing.full_rank.scale_factor
- Magnitude of extra components added to \(J_k\). Default is \(10^{-2}\).growing.full_rank.svd_scale_factor
- Floor singular values of \(J_k\) at this factor of the last nonzero value. Default is 1.growing.full_rank.min_sing_val
- Absolute floor on singular values of \(J_k\). Default is \(10^{-6}\).growing.full_rank.svd_max_jac_cond
- Cap on condition number of \(J_k\) after applying floors to singular values (effectively another floor on the smallest singular value, since the largest singular value is fixed). Default is \(10^8\).growing.do_geom_steps
- While still growing the initial set, whether to do geometry-improving steps in the trust region algorithm, as per the usual algorithm. Default isFalse
.growing.safety.do_safety_step
- While still growing the initial set, whether to perform safety steps, or the regular trust region steps. Default isTrue
.growing.safety.reduce_delta
- While still growing the initial set, whether to reduce \(\Delta_k\) in safety steps. Default isFalse
.growing.safety.full_geom_step
- While still growing the initial set, whether to do a full geometry-improving step within safety steps (the same as the post-growing phase of the algorithm). Since this involves reducing \(\Delta_k\), cannot beTrue
ifgrowing.safety.reduce_delta
isTrue
. Default isFalse
.growing.reset_delta
- Whether or not to reset trust region radius \(\Delta_k\) to its initial value at the end of the growing phase. Default isFalse
.growing.reset_rho
- Whether or not to reset trust region radius lower bound \(\rho_k\) to its initial value at the end of the growing phase. Default isFalse
.growing.gamma_dec
- Trust region decrease parameter during the growing phase. Default istr_radius.gamma_dec
.growing.num_new_dirns_each_iter
- Number of new search directions to add with each iteration where we do not have a full set of search directions. Default is 0, as this approach is not recommended.
Dykstra’s Algorithm
dykstra.d_tol
- Tolerance on the stopping conditions of Dykstra’s algorithm. Default is \(10^{-10}\).dykstra.max_iters
- The maximum number of iterations Dykstra’s algorithm is allowed to take before stopping. Default is \(100\).
Checking Matrix Rank
matrix_rank.r_tol
- Tolerance on what is the smallest posisble diagonal entry value in the QR factorization before being considered zero. Default is \(10^{-18}\).
Handling regularizer
func_tol.criticality_measure
- scale factor (of the current trust-region radius) to determine the accuracy of the calculated criticality/stationarity measure (smaller means more accurate). Default is \(10^{-3}\).func_tol.tr_step
- scale factor to determine the accuracy of the trust-region step (smaller is less accurate). Default is \(0.9\).func_tol.max_iters
- maximum number of subproblem (S-FISTA) iterations. Default is 500.sfista.max_iters_scaling
- by what factor to increase the minimum number of subproblem (S-FISTA) iterations. Must be at least 1. Default is 2.
References
Coralia Cartis, Jan Fiala, Benjamin Marteau and Lindon Roberts, Improving the Flexibility and Robustness of Model-Based Derivative-Free Optimization Solvers, ACM Transactions on Mathematical Software, 45:3 (2019), pp. 32:1-32:41 [preprint]
Matthew Hough and Lindon Roberts, Model-Based Derivative-Free Methods for Convex-Constrained Optimization, SIAM Journal on Optimization, 21:4 (2022), pp. 2552-2579 [preprint].
Yanjun Liu, Kevin H. Lam and Lindon Roberts, Black-box Optimization Algorithms for Regularized Least-squares Problems, arXiv preprint arXiv:2407.14915 (2024).