-
Notifications
You must be signed in to change notification settings - Fork 125
Scheduler-Worker Model + Reliability Branching #766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler-Worker Model + Reliability Branching #766
Conversation
… on the objective pseudcost estimate.
… on the objective pseudcost estimate.
…iteration and node limit to the diving threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@cpp/src/dual_simplex/branch_and_bound.cpp`:
- Around line 203-214: feasible_solution_symbol is declared to take the removed
type bnb_worker_type_t causing build failures when SHOW_DIVING_TYPE is enabled;
change the function signature to accept search_strategy_t (i.e., inline char
feasible_solution_symbol(search_strategy_t strategy)) and update any callers if
necessary to pass a search_strategy_t value; keep the switch cases and return
values as-is to preserve behavior.
| f_t lower_bound = get_lower_bound(); | ||
| f_t abs_gap = upper_bound_ - lower_bound; | ||
| f_t rel_gap = user_relative_gap(original_lp_, upper_bound_.load(), lower_bound); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not for this PR, but we should add the cuts in the single-threaded solve.
| template <typename i_t, typename f_t> | ||
| class branch_and_bound_worker_t { | ||
| public: | ||
| const i_t worker_id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: There are a lot of member variables here. It would be good to add the _ post-fix so it is clear to the reader in the functions that they are accessing the member variables. For example, when reading the code it would have helped if I saw recompute_basis_ to know that you were accessing the member variable.
| const int64_t alpha = iter_factor * branch_and_bound_lp_iters; | ||
| const int64_t max_reliability_iter = alpha + reliability_branching_settings.bnb_lp_offset; | ||
|
|
||
| f_t gamma = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a more descriptive name than gamma?
| reliable_threshold); | ||
|
|
||
| // Shuffle the unreliable list so every variable has the same chance to be selected. | ||
| if (unreliable_list.size() > max_num_candidates) { worker->rng.shuffle(unreliable_list); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are you setting the seed from the random number generator? Could you set it to the seed inside simplex_solver_settings.hpp so that all generators use the same seed. That makes it possible for us to expose the seed easily later on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the branch_and_bound_worker. Each worker has it own rng with its own seed and stream (note that the rng cannot be shared between workers due to the internal state). They need to have separated seeds and stream in order to ensure proper randomness.
|
|
||
| if (toc(start_time) > settings.time_limit) { continue; } | ||
|
|
||
| pseudo_cost_mutex_down[j].lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not for this PR: but if two threads want deem the same variable unreliable, one thread will be locked out while the other solves the LP for the down branch. Could we have a thread try to grab the down lock and if it fails try to grab the up lock? That way we could have multiple threads working on the same variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it do not think the additional complexity is worth it at the moment. According to the profiler results, there is little contention here as two threads are unlikely to try to initialize the same variable at the same time.
| const std::vector<f_t>& solution, | ||
| logger_t& log); | ||
|
|
||
| i_t reliable_variable_selection(mip_node_t<i_t, f_t>* node_ptr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: We try to follow the Google style guide where input arguments (those with const) appear first, followed by input output arguments, and then pure output arguments.
https://google.github.io/styleguide/cppguide.html#Inputs_and_Outputs
"When ordering function parameters, put all input-only parameters before any output parameters. In particular, do not add new parameters to the end of the function just because they are new; place new input-only parameters before the output parameters. This is not a hard-and-fast rule. Parameters that are both input and output muddy the waters, and, as always, consistency with related functions may require you to bend the rule. Variadic functions may also require unusual parameter ordering."
chris-maes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for merging with cuts and congratulations on the number of problems solved to optimality with this PR Nicolas.
Some minor feedback. Would appreciate if max(..,0.0) was used for the objective and lock guard moved out of if predicate before merging. Thank you!
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
cpp/src/dual_simplex/pseudo_costs.cpp (1)
450-461:⚠️ Potential issue | 🟡 MinorGuard
fracagainst near-zero values to prevent pseudo-cost corruption.If
fractional_valis extremely close to an integer due to floating-point precision,fraccan be near zero, causingchange_in_obj / fracto produce infinity and corrupt pseudo-cost statistics. The same concern was raised for the trial-branching path (lines 672–706); this call-site shares the same pattern.🛡️ Suggested guard
const f_t frac = node_ptr->branch_dir == rounding_direction_t::DOWN ? node_ptr->fractional_val - std::floor(node_ptr->fractional_val) : std::ceil(node_ptr->fractional_val) - node_ptr->fractional_val; + if (frac < 1e-9) { return; } + if (node_ptr->branch_dir == rounding_direction_t::DOWN) {
🧹 Nitpick comments (2)
cpp/src/dual_simplex/pseudo_costs.cpp (2)
571-571: Unused variablebranch_and_bound_explored.
branch_and_bound_exploredis assigned on line 571 but never referenced. This may trigger a compiler warning.♻️ Suggested fix
const int64_t branch_and_bound_lp_iters = bnb_stats.total_lp_iters; - const int64_t branch_and_bound_explored = bnb_stats.nodes_explored; const i_t branch_and_bound_lp_iter_per_node = branch_and_bound_lp_iters / bnb_stats.nodes_explored;
651-710: Manuallock()/unlock()is not exception-safe.If
trial_branchingthrows (e.g.,std::bad_allocduring LP/vector copy at line 157 or 173–177), the mutex at lines 679/710 is never unlocked. Usingstd::unique_lockwith the same lock/unlock pattern (or restructuring withstd::lock_guardin a scoped block) provides automatic cleanup.♻️ Suggested RAII pattern for the down-branch block
- pseudo_cost_mutex_down[j].lock(); - if (pseudo_cost_num_down[j] < reliable_threshold) { + { + std::lock_guard<omp_mutex_t> guard(pseudo_cost_mutex_down[j]); + if (pseudo_cost_num_down[j] < reliable_threshold) { // Do trial branching on the down branch f_t obj = trial_branching(worker->leaf_problem, ... strong_branching_lp_iter); if (!std::isnan(obj)) { f_t change_in_obj = std::max(obj - node_ptr->lower_bound, eps); f_t change_in_x = solution[j] - std::floor(solution[j]); pseudo_cost_sum_down[j] += change_in_obj / change_in_x; pseudo_cost_num_down[j]++; } + } } - pseudo_cost_mutex_down[j].unlock();Apply the same pattern to the up-branch block (lines 683–710).
|
/merge |
This PR implements the schedule-worker model for B&B, where one thread (the scheduler) is responsible for scheduling tasks to the other threads (the workers). With this model, the CPU resources can be shared efficiently among the different tasks within the MIP solver. It also allows the scheduling policy to be changed in runtime (e.g., in the future, the solver can dynamically set the number of workers allocated to each diving heuristics at runtime).
This also implements a parallel reliability branching (Section 5.7 from [1]). This also fixes the incorrect pseudocost update in each node in the B&B tree.
Closes #526.
Closes #445.
Closes #700.
Performance over the MIPLIB2017 dataset (GH200):
Reference
[1] T. Achterberg, “Constraint Integer Programming,” PhD, Technischen Universität Berlin, Berlin, 2007. doi: 10.14279/depositonce-1634.
Checklist
Summary by CodeRabbit
New Features
Improvements
Chores