Skip to content

Conversation

@realAsma
Copy link
Contributor

@realAsma realAsma commented Jan 8, 2026

What does this PR do?

Type of change: ?

Overview: ?

Usage

# Add a code snippet demonstrating how to use this

Testing

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

Signed-off-by: realAsma <akuriparambi@nvidia.com>
@realAsma realAsma requested a review from jenchen13 January 8, 2026 21:35
@realAsma realAsma requested a review from a team as a code owner January 8, 2026 21:35
@realAsma realAsma requested a review from cjluo-nv January 8, 2026 21:35
@realAsma realAsma marked this pull request as draft January 8, 2026 21:35
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 8, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@codecov
Copy link

codecov bot commented Jan 8, 2026

Codecov Report

❌ Patch coverage is 52.17391% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.63%. Comparing base (68d604d) to head (148c82c).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
.../torch/quantization/nn/modules/tensor_quantizer.py 16.66% 10 Missing ⚠️
modelopt/torch/quantization/model_calib.py 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #752      +/-   ##
==========================================
- Coverage   74.65%   74.63%   -0.03%     
==========================================
  Files         192      192              
  Lines       18969    18984      +15     
==========================================
+ Hits        14162    14169       +7     
- Misses       4807     4815       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines +114 to +116
sync_quantizer_amax_across_dp_ep(
child, module.parallel_state, get_module_device(module)
)
Copy link
Contributor Author

@realAsma realAsma Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please test if all MoE quantizers have amax after this line (locally)?

if `experts` in name and "weight_quantizer` in name:
     assert child.amax is not None

if synced_amax is not None:
# Move to target device
if target_device is not None:
synced_amax = synced_amax.to(target_device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to add
synced_amax = synced_amax.clone().detach()
otherwise the sharding metadata of global_offset=(0, 0) on all ranks will be kept during save checkpoint

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I am hoping you could take over the PR and address this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added below

Signed-off-by: jenchen13 <jennifchen@nvidia.com>
# Iterative max handles both scalar and tensor amax values
result = valid_amaxs[0]
for amax in valid_amaxs[1:]:
result = torch.maximum(result, amax)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if this line is comparing a scalar vs a tensor? how does it determine the max?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see https://docs.pytorch.org/docs/stable/generated/torch.maximum.html

it simply performs element wise maximum -> the shape does not matter as long as both are pytorch tensors (including scalar tensors)

Signed-off-by: realAsma <akuriparambi@nvidia.com>
"supported by the current distributed backend. This warning can be ignored"
"if happening during modelopt restore."
)
def sync_amax_across_distributed_group(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the current sync_amax_across_distributed_group moves the amax to cpu -> this is to accommodate the case were some amaxs are None and some are tensors. However this happens typically only for MoEs.
so can we do the old method of sync for non MoEs:

 dist.all_reduce(self._amax, op=dist.ReduceOp.MAX, group=parallel_group.group)

and the sync as object via CPU only for MoEs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants