Skip to content

Fix TBE v2 forward kernel for embedding dim > 1024 (#5326)#5569

Closed
cyyever wants to merge 1 commit intopytorch:mainfrom
cyyever:fix-tbe-v2-dim1024
Closed

Fix TBE v2 forward kernel for embedding dim > 1024 (#5326)#5569
cyyever wants to merge 1 commit intopytorch:mainfrom
cyyever:fix-tbe-v2-dim1024

Conversation

@cyyever
Copy link
Copy Markdown
Contributor

@cyyever cyyever commented Apr 2, 2026

No description provided.

@meta-cla meta-cla Bot added the cla signed label Apr 2, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Apr 6, 2026

@q10 has imported this pull request. If you are a Meta employee, you can view this in D99746894.

@cyyever cyyever force-pushed the fix-tbe-v2-dim1024 branch from f496f4f to 7736d40 Compare April 9, 2026 04:44
q10 pushed a commit to q10/FBGEMM that referenced this pull request Apr 15, 2026
…torch#5569)

Summary: Pull Request resolved: pytorch#5569

Test Plan:
## Test Commands

buck2 test fbcode//deeplearning/fbgemm/fbgemm_gpu/test/tbe:forward -- ForwardTest.test_forward_gpu_no_cache_fp16
buck2 test fbcode//deeplearning/fbgemm/fbgemm_gpu/test/tbe:forward -- ForwardTest.test_forward_gpu_no_cache_fp32

## What Is Tested

1. **test_forward_gpu_no_cache_fp16** - Exercises the v2 forward kernel (when use_experimental_tbe=True, generated by Hypothesis) with D in {2..256 step 16, 1024, 1280, 1536, 2048}. Validates FP16 forward correctness for both the small-L and large-L paths with the dynamic early exit fix. T, B, L are capped (T<=2, B<=16, L<=4) for D > 256 to prevent OOM.

2. **test_forward_gpu_no_cache_fp32** - Same D range as above with FP32 weights. Uses proportional max_TBL = max(1, 2048/D) scaling for large D to prevent OOM while still exercising the v2 kernel max_num_warps_per_row computation.

Both tests use Hypothesis-generated use_experimental_tbe in {True, False}, covering both the v1 (legacy) and v2 (experimental) kernel paths.

BUCK target: fbcode//deeplearning/fbgemm/fbgemm_gpu/test/tbe:forward

Reviewed By: henrylhtsang

Differential Revision: D99746894

Pulled By: q10
@meta-codesync meta-codesync Bot closed this in f042789 Apr 16, 2026
@meta-codesync
Copy link
Copy Markdown
Contributor

meta-codesync Bot commented Apr 16, 2026

@q10 merged this pull request in f042789.

@cyyever cyyever deleted the fix-tbe-v2-dim1024 branch April 16, 2026 03:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant