-
Notifications
You must be signed in to change notification settings - Fork 233
add FP8 sweep and step_size flag #758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: fridah/block-mse
Are you sure you want to change the base?
Conversation
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
| for step, multiplier in enumerate(multipliers): | ||
| candidate_amax = self._initial_amax * multiplier | ||
| for step, candidate in enumerate(candidates): | ||
| if self._fp8_scale_sweep: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!!
| # For FP8 scale sweep, use FP8 values as multipliers of initial_amax | ||
| # This ensures we search in a reasonable range relative to max calibration | ||
| multiplier = candidate | ||
| candidate_amax = self._initial_amax * multiplier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is not in this case candidate_amax = (fp8_by_448 * 6.0)?
| # For FP8 scale sweep, use FP8 values as multipliers of initial_amax | |
| # This ensures we search in a reasonable range relative to max calibration | |
| multiplier = candidate | |
| candidate_amax = self._initial_amax * multiplier | |
| candidate_amax = (candidate * global_amax).view_as(self._initial_amax) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
fp8_scale = FP8(block_eqv_amax / 6.0 * (448/(global_amax/6.0))
so if we reverse calculate block_eqv_amax_from_fp8, we get:
block_eqv_amax = (fp8_scale / 448.0) * global_amax
candidate_amax = candidate * global_amax
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
What does this PR do?
Type of change: ?
Overview: ?
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Additional Information