Scheduler-Worker Model + Reliability Branching #766

nguidotti · 2026-01-13T17:26:58Z

This PR implements the schedule-worker model for B&B, where one thread (the scheduler) is responsible for scheduling tasks to the other threads (the workers). With this model, the CPU resources can be shared efficiently among the different tasks within the MIP solver. It also allows the scheduling policy to be changed in runtime (e.g., in the future, the solver can dynamically set the number of workers allocated to each diving heuristics at runtime).

This also implements a parallel reliability branching (Section 5.7 from [1]). This also fixes the incorrect pseudocost update in each node in the B&B tree.

Closes #526.
Closes #445.
Closes #700.

Performance over the MIPLIB2017 dataset (GH200):

================================================================================
main - 046eafd (1) vs reliability-branching-cuts (2)
================================================================================
Feasible solutions = 225 vs 224 (-1)
Optimal solutions = 44 vs 68 (+24)
Solutions with <0.1% primal gap = 109 vs 125 (+16)
Average nodes explored = 7298973 vs 4300451 (-2998522, -41.081%)
Shifted geomean for MIP gap = 0.2979 vs 0.2374 (-0.0605, -20.302%)
Average primal gap = 11.9935 vs 11.0833 (-0.9102, -7.589%)
Average primal integral per time = 19.3326 vs 30.7054 (+11.3728, +37.038%)
================================================================================

Reference

[1] T. Achterberg, “Constraint Integer Programming,” PhD, Technischen Universität Berlin, Berlin, 2007. doi: 10.14279/depositonce-1634.

Checklist

I am familiar with the Contributing Guidelines.
Testing
- New or existing tests cover these changes
- Added tests
- Created an issue to follow-up
- NA
Documentation
- The documentation is up to date with these changes
- Added new documentation
- NA

Summary by CodeRabbit

New Features
- New CLI option --reliability-branching and reliability-based trial branching in variable selection.
- New per-worker branch-and-bound framework and a CPU PCG random generator.
Improvements
- Master–worker scheduling for branch-and-bound, improved multithreaded efficiency and diagnostics.
- Safer concurrency (protected queues, reduced coarse locking) and refined pseudocost scoring for more robust variable choices.
- Adjusted threading and solver defaults.
Chores
- Simplified sub-MIP / RINS configuration for leaner execution.

… on the objective pseudcost estimate.

…eady explored.

…iteration and node limit to the diving threads.

…diving

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@cpp/src/dual_simplex/branch_and_bound.cpp`:
- Around line 203-214: feasible_solution_symbol is declared to take the removed
type bnb_worker_type_t causing build failures when SHOW_DIVING_TYPE is enabled;
change the function signature to accept search_strategy_t (i.e., inline char
feasible_solution_symbol(search_strategy_t strategy)) and update any callers if
necessary to pass a search_strategy_t value; keep the switch cases and return
values as-is to preserve behavior.

cpp/src/dual_simplex/branch_and_bound.cpp

chris-maes · 2026-02-05T21:00:14Z

cpp/src/dual_simplex/branch_and_bound.cpp

+  f_t lower_bound = get_lower_bound();
+  f_t abs_gap     = upper_bound_ - lower_bound;
+  f_t rel_gap     = user_relative_gap(original_lp_, upper_bound_.load(), lower_bound);
+


Not for this PR, but we should add the cuts in the single-threaded solve.

chris-maes · 2026-02-05T21:09:24Z

cpp/src/dual_simplex/branch_and_bound_worker.hpp

+template <typename i_t, typename f_t>
+class branch_and_bound_worker_t {
+ public:
+  const i_t worker_id;


Nit: There are a lot of member variables here. It would be good to add the _ post-fix so it is clear to the reader in the functions that they are accessing the member variables. For example, when reading the code it would have helped if I saw recompute_basis_ to know that you were accessing the member variable.

cpp/src/dual_simplex/pseudo_costs.cpp

chris-maes · 2026-02-05T21:22:59Z

cpp/src/dual_simplex/pseudo_costs.cpp

+    const int64_t alpha                = iter_factor * branch_and_bound_lp_iters;
+    const int64_t max_reliability_iter = alpha + reliability_branching_settings.bnb_lp_offset;
+
+    f_t gamma =


Can we have a more descriptive name than gamma?

cpp/src/dual_simplex/pseudo_costs.cpp

chris-maes · 2026-02-05T21:25:30Z

cpp/src/dual_simplex/pseudo_costs.cpp

+    reliable_threshold);
+
+  // Shuffle the unreliable list so every variable has the same chance to be selected.
+  if (unreliable_list.size() > max_num_candidates) { worker->rng.shuffle(unreliable_list); }


Where are you setting the seed from the random number generator? Could you set it to the seed inside simplex_solver_settings.hpp so that all generators use the same seed. That makes it possible for us to expose the seed easily later on

In the branch_and_bound_worker. Each worker has it own rng with its own seed and stream (note that the rng cannot be shared between workers due to the internal state). They need to have separated seeds and stream in order to ensure proper randomness.

cpp/src/dual_simplex/pseudo_costs.cpp

chris-maes · 2026-02-05T21:28:51Z

cpp/src/dual_simplex/pseudo_costs.cpp

+
+    if (toc(start_time) > settings.time_limit) { continue; }
+
+    pseudo_cost_mutex_down[j].lock();


Maybe not for this PR: but if two threads want deem the same variable unreliable, one thread will be locked out while the other solves the LP for the down branch. Could we have a thread try to grab the down lock and if it fails try to grab the up lock? That way we could have multiple threads working on the same variable.

Yes, but it do not think the additional complexity is worth it at the moment. According to the profiler results, there is little contention here as two threads are unlikely to try to initialize the same variable at the same time.

cpp/src/dual_simplex/pseudo_costs.cpp

chris-maes · 2026-02-05T21:34:12Z

cpp/src/dual_simplex/pseudo_costs.hpp

                         const std::vector<f_t>& solution,
                         logger_t& log);

+  i_t reliable_variable_selection(mip_node_t<i_t, f_t>* node_ptr,


Nit: We try to follow the Google style guide where input arguments (those with const) appear first, followed by input output arguments, and then pure output arguments.

https://google.github.io/styleguide/cppguide.html#Inputs_and_Outputs

"When ordering function parameters, put all input-only parameters before any output parameters. In particular, do not add new parameters to the end of the function just because they are new; place new input-only parameters before the output parameters. This is not a hard-and-fast rule. Parameters that are both input and output muddy the waters, and, as always, consistency with related functions may require you to bend the rule. Variadic functions may also require unusual parameter ordering."

chris-maes

Thanks for merging with cuts and congratulations on the number of problems solved to optimality with this PR Nicolas.

Some minor feedback. Would appreciate if max(..,0.0) was used for the objective and lock guard moved out of if predicate before merging. Thank you!

LGTM.

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cpp/src/dual_simplex/pseudo_costs.cpp (1)
450-461: ⚠️ Potential issue | 🟡 Minor

Guard frac against near-zero values to prevent pseudo-cost corruption.

If fractional_val is extremely close to an integer due to floating-point precision, frac can be near zero, causing change_in_obj / frac to produce infinity and corrupt pseudo-cost statistics. The same concern was raised for the trial-branching path (lines 672–706); this call-site shares the same pattern.
🛡️ Suggested guard
   const f_t frac          = node_ptr->branch_dir == rounding_direction_t::DOWN
                               ? node_ptr->fractional_val - std::floor(node_ptr->fractional_val)
                               : std::ceil(node_ptr->fractional_val) - node_ptr->fractional_val;
 
+  if (frac < 1e-9) { return; }
+
   if (node_ptr->branch_dir == rounding_direction_t::DOWN) {

🧹 Nitpick comments (2)

cpp/src/dual_simplex/pseudo_costs.cpp (2)
571-571: Unused variable branch_and_bound_explored.

branch_and_bound_explored is assigned on line 571 but never referenced. This may trigger a compiler warning.
♻️ Suggested fix
   const int64_t branch_and_bound_lp_iters = bnb_stats.total_lp_iters;
-  const int64_t branch_and_bound_explored = bnb_stats.nodes_explored;
   const i_t branch_and_bound_lp_iter_per_node =
     branch_and_bound_lp_iters / bnb_stats.nodes_explored;
651-710: Manual lock()/unlock() is not exception-safe.

If trial_branching throws (e.g., std::bad_alloc during LP/vector copy at line 157 or 173–177), the mutex at lines 679/710 is never unlocked. Using std::unique_lock with the same lock/unlock pattern (or restructuring with std::lock_guard in a scoped block) provides automatic cleanup.
♻️ Suggested RAII pattern for the down-branch block
-    pseudo_cost_mutex_down[j].lock();
-    if (pseudo_cost_num_down[j] < reliable_threshold) {
+    {
+      std::lock_guard<omp_mutex_t> guard(pseudo_cost_mutex_down[j]);
+      if (pseudo_cost_num_down[j] < reliable_threshold) {
         // Do trial branching on the down branch
         f_t obj = trial_branching(worker->leaf_problem,
                                   ...
                                   strong_branching_lp_iter);
 
         if (!std::isnan(obj)) {
           f_t change_in_obj = std::max(obj - node_ptr->lower_bound, eps);
           f_t change_in_x   = solution[j] - std::floor(solution[j]);
           pseudo_cost_sum_down[j] += change_in_obj / change_in_x;
           pseudo_cost_num_down[j]++;
         }
+      }
     }
-    pseudo_cost_mutex_down[j].unlock();
Apply the same pattern to the up-branch block (lines 683–710).

nguidotti · 2026-02-06T14:57:57Z

/merge

nguidotti and others added 30 commits December 12, 2025 10:22

implemented diving heuristics. sorted starting nodes for diving based…

c9081a6

… on the objective pseudcost estimate.

moved diving heuristics to a separated file

046a501

organized code. added toggle to disable each type of diving.

6ea6d72

restrict calling RINS to the best-first threads

5422b97

fix invalid branch var in line search diving

73c1a63

moved asserts

3a77cca

replace inf and max with STL calls

0f7af4e

Fix incorrect infeasible list

79368c3

implemented diving heuristics. sorted starting nodes for diving based…

6334ad7

… on the objective pseudcost estimate.

moved diving heuristics to a separated file

2c94a7c

organized code. added toggle to disable each type of diving.

0e815e1

moved asserts

29a2a33

unified global node heap. fathom node in the diving node that was alr…

5a3ef60

…eady explored.

refactoring code

c181ccf

fix style

501c20d

small fixes

0ee757a

unified the best-first and diving heap into a single object. imposed …

546b116

…iteration and node limit to the diving threads.

adjusted column spacing in bnb logs. added opening mode for logger.

06d531a

revert fixes for dual simplex. changed RINS and SubMIP to use guided …

ae742c2

…diving

moved bound propagation logs to debug mode

b5f2c7e

addressing code rabbit suggestions

eb5c695

added explicit conversion to float

5cf5ac0

missing code revert in basis update

d7dfb0d

fixed variable type

55e64cc

Merge remote-tracking branch 'origin/main' into node-queue

a26a816

Merge remote-tracking branch 'origin/main' into diving-heuristics

17d3d7c

added comments

0b1e994

added missing spacing

9effdc8

updated logs

6d43e03

Merge branch 'main' into adjust-bnb-logs

98f670f

nguidotti added 2 commits February 5, 2026 15:36

renaming variables

d65f258

split locks in pseudocost

725363d

coderabbitai bot reviewed Feb 5, 2026

View reviewed changes

cpp/src/dual_simplex/branch_and_bound.cpp Show resolved Hide resolved

fix crash in timtab1 due to double infinite pseudocost

d1989cc