Conversation
|
/all_test |
GiGL Automation@ 24:19:30UTC : 🔄 @ 01:26:31UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 24:19:30UTC : 🔄 @ 24:25:57UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 24:19:31UTC : 🔄 @ 01:52:42UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 24:19:32UTC : 🔄 @ 24:28:30UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 24:19:32UTC : 🔄 @ 01:38:44UTC : ✅ Workflow completed successfully. |
|
/all_test |
GiGL Automation@ 23:59:14UTC : 🔄 @ 24:06:39UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 23:59:15UTC : 🔄 @ 01:22:05UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 23:59:17UTC : 🔄 @ 01:15:30UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 23:59:17UTC : 🔄 @ 24:07:40UTC : ✅ Workflow completed successfully. |
GiGL Automation@ 23:59:17UTC : 🔄 @ 01:19:53UTC : ✅ Workflow completed successfully. |
Expand the one-line docstring to include concrete examples showing how ROUND_ROBIN and CONTIGUOUS strategies distribute node IDs across compute nodes, including split filtering and fractional server assignment. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… check The world_size != num_compute_nodes validation was unnecessarily restrictive — callers may legitimately pass a different world_size. Also extract the validator to a module-level function since it no longer needs self. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The sliced tensor holds a reference to the original, but in the contiguous flow the original is a local variable that goes out of scope, so the slice effectively owns the data. Removing clone() avoids an unnecessary copy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the all_reduce count comparison with all_gather + sorted tensor comparison to catch cases where counts match but actual node IDs differ between CONTIGUOUS and ROUND_ROBIN strategies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Merge _make_rank_aware_async_mock and _make_rank_aware_ablp_async_mock into a single generic helper - Remove _assert_contiguous_node_ids and _assert_contiguous_ablp_inputs helpers, inline assertions directly in tests - Replace @parameterized.expand with separate named test methods for better readability - Fix stale variable reference in integration test log line Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Annotate _mock_request_server, _mock_async_request_server, _patch_remote_requests, and _create_server_with_splits kwargs with proper type hints. Add Callable, Iterator, and Any imports. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Making some changes to the way we distribute nodes for graph store mode.
This is one step in allowing us to reduce the produce load across the cluster, and decreasing cluster spin up time and increasing overall stability.
I'm also introducing
gigl/distributed/graph_store/messages.pyfor complicated RPC messages, so we don't have to rely on tuples for this.There's some minor clean up/etc in
remote_dist_dataset_test.pyto help reduce complexity there :)