Advanced Usage

This section describes different optional user parameters available in DFO-LS.

In the last section (Using DFO-LS), we introduced dfols.solve(), which has the optional input user_params. This is a Python dictionary of user parameters. We will now go through the settings which can be changed in this way. More details are available in the papers [CFMR2018], [HR2022] and [LLR2024].

The default values, used if no override is given, in some cases vary depending on whether objfun has stochastic noise; that is, whether evaluating objfun(x) several times at the same x gives the same result or not. Whether or not this is the case is determined by the objfun_has_noise input to dfols.solve() (and not by inspecting objfun, for instance).

General Algorithm Parameters

general.rounding_error_constant - Internally, all interpolation points are stored with respect to a base point \(x_b\); that is, we store \(\{y_t-x_b\}\), which reduces the risk of roundoff errors. We shift \(x_b\) to \(x_k\) when \(\|s_k\| \leq \text{const}\|x_k-x_b\|\), where ‘const’ is this parameter. Default is 0.1.
general.safety_step_thresh - Threshold for when to call the safety step, \(\|s_k\| \leq \gamma_S \rho_k\). Default is \(\gamma_S =0.5\).
general.check_objfun_for_overflow - Whether to cap the value of \(r_i(x)\) when they are large enough that an OverflowError will be encountered when trying to evaluate \(f(x)\). Default is True.

Logging and Output

logging.n_to_print_whole_x_vector - If printing all function evaluations to screen/log file, the maximum len(x) for which the full vector x should be printed also. Default is 6.
logging.save_diagnostic_info - Flag so save diagnostic information at each iteration. Default is False.
logging.save_poisedness - If saving diagnostic information, whether to include the \(\Lambda\)-poisedness of \(Y_k\) in the diagnostic information. This is the most computationally expensive piece of diagnostic information. Default is True.
logging.save_xk - If saving diagnostic information, whether to include the full vector \(x_k\). Default is False.
logging.save_rk - If saving diagnostic information, whether to include the full vector \([r_1(x_k)\:\cdots\:r_m(x_k)]\). The value \(f(x_k)\) is always included. Default is False.

Initialization of Points

init.random_initial_directions - Build the initial interpolation set using random directions (as opposed to coordinate directions). Default as of version 1.2 is False.
init.random_directions_make_orthogonal - If building initial interpolation set with random directions, whether or not these should be orthogonalized. Default is True.
init.run_in_parallel - If using random directions or non-random with input npt at most len(x0)+1, whether or not to ask for all objfun to be evaluated at all points without any intermediate processing. Default is False.

Trust Region Management

tr_radius.eta1 - Threshold for unsuccessful trust region iteration, \(\eta_1\). Default is 0.1.
tr_radius.eta2 - Threshold for very successful trust region iteration, \(\eta_2\). Default is 0.7.
tr_radius.gamma_dec - Ratio to decrease \(\Delta_k\) in unsuccessful iteration, \(\gamma_{dec}\). Default is 0.5 for smooth problems or 0.98 for noisy problems (i.e. objfun_has_noise = True).
tr_radius.gamma_inc - Ratio to increase \(\Delta_k\) in very successful iterations, \(\gamma_{inc}\). Default is 2.
tr_radius.gamma_inc_overline - Ratio of \(\|s_k\|\) to increase \(\Delta_k\) by in very successful iterations, \(\overline{\gamma}_{inc}\). Default is 4.
tr_radius.alpha1 - Ratio to decrease \(\rho_k\) by when it is reduced, \(\alpha_1\). Default is 0.1 for smooth problems or 0.9 for noisy problems (i.e. objfun_has_noise = True).
tr_radius.alpha2 - Ratio of \(\rho_k\) to decrease \(\Delta_k\) by when \(\rho_k\) is reduced, \(\alpha_2\). Default is 0.5 for smooth problems or 0.95 for noisy problems (i.e. objfun_has_noise = True).

Termination on Small Objective Value

model.abs_tol - Tolerance on \(f(x_k)\); quit if \(f(x_k)\) is below this value. Default is \(10^{-12}\).
model.rel_tol - Relative tolerance on \(f(x_k)\); quit if \(f(x_k)/f(x_0)\) is below this value. Default is \(10^{-20}\).

Termination on Slow Progress

slow.history_for_slow - History used to determine whether the current iteration is ‘slow’. Default is 5.
slow.thresh_for_slow - Threshold for objective decrease used to determine whether the current iteration is ‘slow’. Default is \(10^{-4}\).
slow.max_slow_iters - Number of consecutive slow successful iterations before termination (or restart). Default is 20*len(x0).

Stochastic Noise Information

noise.quit_on_noise_level - Flag to quit (or restart) if all \(f(y_t)\) are within noise level of \(f(x_k)\). Default is False for smooth problems or True for noisy problems.
noise.scale_factor_for_quit - Factor of noise level to use in termination criterion. Default is 1.
noise.multiplicative_noise_level - Multiplicative noise level in \(f\). Can only specify one of multiplicative or additive noise levels. Default is None.
noise.additive_noise_level - Additive noise level in \(f\). Can only specify one of multiplicative or additive noise levels. Default is None.

Interpolation Management

interpolation.precondition - whether or not to scale the interpolation linear system to improve conditioning. Default is True.
interpolation.throw_error_on_nans - whether or not to throw numpy.linalg.LinAlgError if trying to interpolate to NaN objective values. If False, DFO-LS should terminate gracefully with an error flag. Default is False.

Regression Model Management

regression.num_extra_steps - In successful iterations, the number of extra points (other than accepting the trust region step) to move, useful when \(|Y_k|>n+1\) (\(n\) is len(x0)). Default is 0.
regression.increase_num_extra_steps_with_restart - The amount to increase regression.num_extra_steps by with each restarts, for instance if increasing the number of points with each restart. Default is 0.
regression.momentum_extra_steps - If moving extra points in successful iterations, whether to use the ‘momentum’ method. If not, uses geometry-improving steps. Default is False.

Multiple Restarts

restarts.use_restarts - Whether to do restarts when \(\rho_k\) reaches \(\rho_{end}\), or (optionally) when all points are within noise level of \(f(x_k)\). Default is False for smooth problems or True for noisy problems.
restarts.max_unsuccessful_restarts - Maximum number of consecutive unsuccessful restarts allowed (i.e.~restarts which did not reduce the objective further). Default is 10.
restarts.rhoend_scale - Factor to reduce \(\rho_{end}\) by with each restart. Default is 1.
restarts.use_soft_restarts - Whether to use soft or hard restarts. Default is True.
restarts.soft.num_geom_steps - For soft restarts, the number of points to move. Default is 3.
restarts.soft.move_xk - For soft restarts, whether to preserve \(x_k\), or move it to the best new point evaluated. Default is True.
restarts.increase_npt - Whether to increase \(|Y_k|\) with each restart. Default is False.
restarts.increase_npt_amt - Amount to increase \(|Y_k|\) by with each restart. Default is 1.
restarts.hard.increase_ndirs_initial_amt - Amount to increase growing.ndirs_initial by with each hard restart. To avoid a growing phase, it is best to set it to the same value as restarts.increase_npt_amt. Default is 1.
restarts.hard.use_old_rk - If using hard restarts, whether or not to recycle the objective value at the best iterate found when performing a restart. This saves one objective evaluation. Default is True.
restarts.max_npt - Maximum allowed value of \(|Y_k|\), useful if increasing with each restart. Default is npt, the input parameter to dfols.solve().
restarts.soft.max_fake_successful_steps - The maximum number of successful steps in a given run where the new (smaller) objective value is larger than the best value found in a previous run. Default is maxfun, the input to dfols.solve().
restarts.auto_detect - Whether or not to automatically determine when to restart. This is an extra condition, and restarts can still be triggered by small trust region radius, etc. Default is True.
restarts.auto_detect.history - How many iterations of data on model changes and trust region radii to store. There are two criteria used: trust region radius decreases (no increases over the history, more decreases than no changes), and change in model Jacobian (consistently increasing trend as measured by slope and correlation coefficient of line of best fit). Default is 30.
restarts.auto_detect.min_chgJ_slope - Minimum rate of increase of \(\log(\|J_k-J_{k-1}\|_F)\) over the past iterations to cause a restart. Default is 0.015.
restarts.auto_detect.min_correl - Minimum correlation of the data set \((k, \log(\|J_k-J_{k-1}\|_F))\) required to cause a restart. Default is 0.1.

Dynamically Growing Initial Set

growing.ndirs_initial - Number of initial points to add (excluding \(x_k\)). This should only be changed to a value less than \(n\), and only if the default setup cost of \(n+1\) evaluations of objfun is impractical. If this is set to be less than the default, the input value npt should be set to \(n\). If the default is used, all the below parameters have no effect on DFO-LS. Default is npt-1.
growing.full_rank.use_full_rank_interp - If growing.ndirs_initial is less than npt, whether to perturb the interpolated \(J_k\) to make it full rank, allowing the trust region step to include components in the full search space. Default is True if \(m\geq n\) and False otherwise (opposite to growing.perturb_trust_region_step).
growing.perturb_trust_region_step - Whether to perturb the trust region step by an orthogonal direction not yet searched. This is an alternative to growing.full_rank.use_full_rank_interp. Default is False if \(m\geq n\) and True otherwise (opposite to growing.full_rank.use_full_rank_interp).
growing.delta_scale_new_dirns - When adding new search directions, the length of the step as a multiple of \(\Delta_k\). Default is 1, or 0.1 if growing.perturb_trust_region_step=True.
growing.full_rank.scale_factor - Magnitude of extra components added to \(J_k\). Default is \(10^{-2}\).
growing.full_rank.svd_scale_factor - Floor singular values of \(J_k\) at this factor of the last nonzero value. Default is 1.
growing.full_rank.min_sing_val - Absolute floor on singular values of \(J_k\). Default is \(10^{-6}\).
growing.full_rank.svd_max_jac_cond - Cap on condition number of \(J_k\) after applying floors to singular values (effectively another floor on the smallest singular value, since the largest singular value is fixed). Default is \(10^8\).
growing.do_geom_steps - While still growing the initial set, whether to do geometry-improving steps in the trust region algorithm, as per the usual algorithm. Default is False.
growing.safety.do_safety_step - While still growing the initial set, whether to perform safety steps, or the regular trust region steps. Default is True.
growing.safety.reduce_delta - While still growing the initial set, whether to reduce \(\Delta_k\) in safety steps. Default is False.
growing.safety.full_geom_step - While still growing the initial set, whether to do a full geometry-improving step within safety steps (the same as the post-growing phase of the algorithm). Since this involves reducing \(\Delta_k\), cannot be True if growing.safety.reduce_delta is True. Default is False.
growing.reset_delta - Whether or not to reset trust region radius \(\Delta_k\) to its initial value at the end of the growing phase. Default is False.
growing.reset_rho - Whether or not to reset trust region radius lower bound \(\rho_k\) to its initial value at the end of the growing phase. Default is False.
growing.gamma_dec - Trust region decrease parameter during the growing phase. Default is tr_radius.gamma_dec.
growing.num_new_dirns_each_iter - Number of new search directions to add with each iteration where we do not have a full set of search directions. Default is 0, as this approach is not recommended.

Dykstra’s Algorithm

dykstra.d_tol - Tolerance on the stopping conditions of Dykstra’s algorithm. Default is \(10^{-10}\).
dykstra.max_iters - The maximum number of iterations Dykstra’s algorithm is allowed to take before stopping. Default is \(100\).

Checking Matrix Rank

matrix_rank.r_tol - Tolerance on what is the smallest posisble diagonal entry value in the QR factorization before being considered zero. Default is \(10^{-18}\).

Handling regularizer

func_tol.criticality_measure - scale factor (of the current trust-region radius) to determine the accuracy of the calculated criticality/stationarity measure (smaller means more accurate). Default is \(10^{-3}\).
func_tol.tr_step - scale factor to determine the accuracy of the trust-region step (smaller is less accurate). Default is \(0.9\).
func_tol.max_iters - maximum number of subproblem (S-FISTA) iterations. Default is 500.
sfista.max_iters_scaling - by what factor to increase the minimum number of subproblem (S-FISTA) iterations. Must be at least 1. Default is 2.

References

[CFMR2018]

Coralia Cartis, Jan Fiala, Benjamin Marteau and Lindon Roberts, Improving the Flexibility and Robustness of Model-Based Derivative-Free Optimization Solvers, ACM Transactions on Mathematical Software, 45:3 (2019), pp. 32:1-32:41 [preprint]

[HR2022]

Matthew Hough and Lindon Roberts, Model-Based Derivative-Free Methods for Convex-Constrained Optimization, SIAM Journal on Optimization, 21:4 (2022), pp. 2552-2579 [preprint].

[LLR2024]

Yanjun Liu, Kevin H. Lam and Lindon Roberts, Black-box Optimization Algorithms for Regularized Least-squares Problems, arXiv preprint arXiv:2407.14915 (2024).