Skip to content

Enable correct RoiAlign max mode with env var#27554

Draft
tianleiwu wants to merge 2 commits intomainfrom
tlwu/20260304/fix_roi_align_max_mode
Draft

Enable correct RoiAlign max mode with env var#27554
tianleiwu wants to merge 2 commits intomainfrom
tlwu/20260304/fix_roi_align_max_mode

Conversation

@tianleiwu
Copy link
Contributor

@tianleiwu tianleiwu commented Mar 4, 2026

Description

Re-implements the changes from PR #7354 with additional improvements. Fixes #6921 and #6146.

Problem

The RoiAlign operator's original max mode implementation used max(w1d1, w2d2, w3d3, w4d4) — taking the max of individually weighted pixel values per sample point. While this theoretically matches the ONNX reference implementation, an alternative approach — bilinear interpolation first (w1d1 + w2d2 + w3d3 + w4d4), then max across sample points — is arguably more correct for applications.

Additionally:

  • The ROI size clamping (max(roi_width, 1)) for non-half_pixel mode was present in both CPU and CUDA providers and incorrectly rejected zero-size ROIs that should produce valid point sampling.
  • roi_bin_grid_h/w values were not clamped to a minimum of 1, which could cause issues with certain ROI configurations.

Solution

  • Default behavior: Keeps the ONNX spec mathematical behavior — max(w1d1, w2d2, w3d3, w4d4) for max mode pooling. This matches the existing CUDA behavior and ensures ONNX conformance tests pass on CPU.
  • Environment variable ORT_ROIALIGN_MAX_USE_BILINEAR_INTERPOLATION=1: Enables bilinear interpolation max mode on the CPU Execution Provider, matching PyTorch/Detectron2 behavior.
  • Removes the max(roi_width, 1) / max(roi_height, 1) clamping that prevented zero-size ROI point sampling (fixed in both CPU and CUDA providers).
  • Adds roi_bin_grid_h/w clamping to a minimum of 1 in both CPU and CUDA providers.
  • Removes the outdated warning about incorrect max mode summation.
  • Skips the mask_rcnn_R_50_FPN_1x model tests (model was generated with an incorrect RoiAlign implementation).

Note on CUDA Execution Provider:
The CUDA provider has been fully updated in this PR to align precisely with the CPU provider. It now supports the ORT_ROIALIGN_MAX_USE_BILINEAR_INTERPOLATION environment variable toggle. Additionally, opsets 10-15 and opset 16 have been properly registered, resolving silent fallbacks to the CPU Execution Provider for newer generic models.

Changes

File Change
roialign.h Added use_max_bilinear_interp_ member, env var parsing via ParseEnvironmentVariableWithDefault, removed outdated warning
roialign.cc (CPU) Removed ROI size clamping, added roi_bin_grid min-1 clamping, added conditional max mode (ONNX spec vs bilinear interp)
roialign_impl.cu (CUDA) Removed ROI size clamping, added roi_bin_grid min-1 clamping, and fully implemented the ORT_ROIALIGN_MAX_USE_BILINEAR_INTERPOLATION toggle down to the interpolation device function.
cuda_execution_provider.cc Added opsets 10-15 (versioned) and opset 16 kernel registrations for RoiAlign.
roialign_test.cc Updated MaxModePositive expected values for ONNX spec, added test_roialign_mode_max (from ONNX test suite), added scoped env var tests for both modes, all operational on CPU and CUDA.
TestCase.cc Skip mask_rcnn_R_50_FPN_1x
skip_model_tests.h Skip mask_rcnn_R_50_FPN_1x_opset10 on WinML

Test Results

All 13 RoiAlign tests pass:

[  PASSED  ] 13 tests.
  RoiAlignTest.AvgModePositive
  RoiAlignTest.AvgModePositive_half_pixel
  RoiAlignTest.AvgModePositive_output_half_pixel
  RoiAlignTest.OnnxTest
  RoiAlignTest.MaxModePositive
  RoiAlignTest.MaxModePositive_test_roialign_mode_max
  RoiAlignTest.AvgModeNegativeInvalidMode
  RoiAlignTest.AvgModeNegativeSamplingRatio
  RoiAlignTest.AvgModeNegativeInvalidNumRoiDims
  RoiAlignTest.AvgModeNegativeInvalidSecondRoiDims
  RoiAlignTest.MismatchNumRois
  RoiAlignTest.MaxModePositive_test_roialign_mode_max_bilinear_interp
  RoiAlignTest.MaxModePositive_test_roialign_mode_max_env_off

@tianleiwu tianleiwu changed the title Enable correct RoiAlign max mode in CPU with env var Enable correct RoiAlign max mode with env var Mar 4, 2026
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can commit the suggested changes from lintrunner.

Comment on lines +10 to +18
#define ADD_VERSIONED_TYPED_ROIALIGN_OP(T) \
ONNX_OPERATOR_VERSIONED_TYPED_KERNEL_EX( \
RoiAlign, \
kOnnxDomain, \
10, \
15, \
T, \
kCudaExecutionProvider, \
(*KernelDefBuilder::Create()) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#define ADD_VERSIONED_TYPED_ROIALIGN_OP(T) \
ONNX_OPERATOR_VERSIONED_TYPED_KERNEL_EX( \
RoiAlign, \
kOnnxDomain, \
10, \
15, \
T, \
kCudaExecutionProvider, \
(*KernelDefBuilder::Create()) \
#define ADD_VERSIONED_TYPED_ROIALIGN_OP(T) \
ONNX_OPERATOR_VERSIONED_TYPED_KERNEL_EX( \
RoiAlign, \
kOnnxDomain, \
10, \
15, \
T, \
kCudaExecutionProvider, \
(*KernelDefBuilder::Create()) \

Comment on lines +23 to +30
#define ADD_TYPED_ROIALIGN_OP(T) \
ONNX_OPERATOR_TYPED_KERNEL_EX( \
RoiAlign, \
kOnnxDomain, \
16, \
T, \
kCudaExecutionProvider, \
(*KernelDefBuilder::Create()) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#define ADD_TYPED_ROIALIGN_OP(T) \
ONNX_OPERATOR_TYPED_KERNEL_EX( \
RoiAlign, \
kOnnxDomain, \
16, \
T, \
kCudaExecutionProvider, \
(*KernelDefBuilder::Create()) \
#define ADD_TYPED_ROIALIGN_OP(T) \
ONNX_OPERATOR_TYPED_KERNEL_EX( \
RoiAlign, \
kOnnxDomain, \
16, \
T, \
kCudaExecutionProvider, \
(*KernelDefBuilder::Create()) \

Comment on lines 83 to +85
#define SPECIALIZED_COMPUTE(T) \
REGISTER_KERNEL_TYPED(T) \
ADD_VERSIONED_TYPED_ROIALIGN_OP(T) \
ADD_TYPED_ROIALIGN_OP(T) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#define SPECIALIZED_COMPUTE(T) \
REGISTER_KERNEL_TYPED(T) \
ADD_VERSIONED_TYPED_ROIALIGN_OP(T) \
ADD_TYPED_ROIALIGN_OP(T) \
#define SPECIALIZED_COMPUTE(T) \
ADD_VERSIONED_TYPED_ROIALIGN_OP(T) \
ADD_TYPED_ROIALIGN_OP(T) \

T* top_data, \
const bool is_mode_avg, \
const bool use_max_bilinear_interp, \
const bool half_pixel, \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const bool half_pixel, \
const bool half_pixel, \

@tianleiwu tianleiwu marked this pull request as draft March 4, 2026 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RoiAlign CPU is not aligned to pixel centers (per the Mask RCNN paper and Facebook's Detectron2 implementation)

1 participant