feat: add `xlogy_outscalar_other` base by voltjia · Pull Request #588 · InfiniTensor/InfiniOps

voltjia · 2026-05-05T21:00:29Z

Summary

Add the hand-written InfiniOps base class for xlogy_outscalar_other in src/base/xlogy_outscalar_other.h.
Let the torch code generator reuse src/base/xlogy_outscalar_other.h instead of emitting generated/base/xlogy_outscalar_other.h.
Apply the base class member-spacing convention required by scripts/check_conventions.py.

Motivation

This PR is part of the feat/torch-codegen base-header migration. The generated XlogyOutscalarOther base declaration is moved into src/base so code generation can reuse a reviewed hand-written header.

N/A: no linked issue.

Type of Change

feat - new feature / new operator / new platform
fix - bug fix
perf - performance improvement (no behavioral change)
refactor - code restructuring without behavior change
test - adding or fixing tests only
docs - documentation only
build / ci - build system or CI configuration
chore - tooling, formatting, or other non-code changes
Breaking change (requires a ! in the Conventional Commits prefix or a BREAKING CHANGE: footer)

Platforms Affected

Test Results on Supported Platforms

Platform	Built	`pytest` Result	Notes / Hardware
NVIDIA	N/A	Not run	Not required for this non-`master` `feat/torch-codegen` base-header PR; no runtime implementation is added.
Iluvatar	N/A	Not run	Not required for this non-`master` `feat/torch-codegen` base-header PR; no runtime implementation is added.
MetaX	N/A	Not run	Not required for this non-`master` `feat/torch-codegen` base-header PR; no runtime implementation is added.
Cambricon	N/A	Not run	Not required for this non-`master` `feat/torch-codegen` base-header PR; no runtime implementation is added.
Moore	N/A	Not run	Not required for this non-`master` `feat/torch-codegen` base-header PR; no runtime implementation is added.
Ascend	N/A	Not run	Not required for this non-`master` `feat/torch-codegen` base-header PR; no runtime implementation is added.

Full `pytest` output (optional)

N/A: pytest was intentionally not run because this PR targets `feat/torch-codegen`, not `master`, and only adds a reusable base header declaration.

Benchmark / Performance Impact

N/A. This PR only adds a base operator declaration for torch codegen reuse and does not add a runtime implementation.

Notes for Reviewers

This PR targets feat/torch-codegen, not master.
The branch diff against feat/torch-codegen contains only src/base/xlogy_outscalar_other.h.
Original branch validation reported clang-format 21 passing on src/base/xlogy_outscalar_other.h; the follow-up formatting commit applies the class member spacing required by scripts/check_conventions.py.

Checklist

Title, Branch, and Commits

PR title follows Conventional Commits (e.g. feat(nvidia): …, fix(cuda/gemm): …).
N/A: this automated batch uses existing codex/add-xlogy_outscalar_other-base PR branches targeting feat/torch-codegen; branch renaming is intentionally out of scope.
Each commit message follows Conventional Commits.
N/A: this batch intentionally keeps the base-header addition and convention-formatting follow-up as two meaningful, squashable commits.
N/A: this PR is based on feat/torch-codegen, not master; no master rebase is required for this integration target.
No fixup! / squash! / wip commits remain.

Scope and Design

Changes are minimal - nothing unrelated to the stated motivation was added (CONTRIBUTING.md §Code/General).
No dead code, commented-out blocks, debug prints, printf/std::cout/print(...) left behind, or TODO without an owner and issue link.
No unrelated formatting churn that would obscure the diff.
Public API changes are intentional and limited to the XlogyOutscalarOther base operator declaration used by torch codegen.

General Code Hygiene (applies to all languages)

The code is self-explanatory; comments were added only where the why is non-obvious (CONTRIBUTING.md §Code/General).
Every modified or added file ends with a single trailing newline (CONTRIBUTING.md §Code/General).
No trailing whitespace, tab/space mixing, or stray BOMs.
Identifiers in comments and error messages are wrapped in backticks (e.g. the `seqlens_k` tensor) (CONTRIBUTING.md §Code/General).
All comments and error messages are in English (CONTRIBUTING.md §Code/General).
Comments and error messages are complete sentences - capitalized first letter, terminal punctuation - unless the language/framework convention says otherwise (CONTRIBUTING.md §Code/General; §Python).

C++ Specific (if C++ files changed)

Code follows the Google C++ Style Guide strictly.
clang-format (version 21, per .github/workflows/clang-format.yml) has been run against all modified .h, .cc, .cuh, and .mlu files; the diff is clean.
N/A: clang-tidy was not run because this PR only adds a base declaration header for feat/torch-codegen; no runtime implementation is added.
Operator parameter order is inputs first, outputs last; attributes are between inputs and outputs; naming follows PyTorch → ONNX → CUDA API precedence (CONTRIBUTING.md §C++).
N/A: this base declaration does not add C++ error paths or exceptions.
N/A: this base declaration does not add error or warning messages.
N/A: this base declaration does not add kernel files.
N/A: this base declaration does not add kernel launchers.
Constructor initializer list order matches member declaration order (CONTRIBUTING.md §C++).
Exactly one blank line between classes, between classes and functions, and between functions (CONTRIBUTING.md §C++).
Exactly one blank line between members (functions and variables) within a class (CONTRIBUTING.md §C++).
Exactly one blank line before and after the contents of a namespace (CONTRIBUTING.md §C++).
N/A: this PR adds only src/base/xlogy_outscalar_other.h for torch codegen reuse; platform implementations are out of scope.
No raw new/delete; RAII / smart pointers / existing allocators are used.

Python Specific (if Python files changed)

N/A: no Python files changed.

Testing

N/A: platform pytest was intentionally not run because this PR targets feat/torch-codegen, not master, and only adds a reusable base header declaration.
N/A: the table above records the reason platform testing was skipped.
N/A: no runtime functionality was added, so no new tests/ coverage is required.
N/A: no new pytest parameterization was added.
N/A: no Payload-returning test was added.
N/A: no dtype / device parameterization was added.
N/A: no flaky test was added.
N/A: this is not a runtime bug fix.

Build, CI, and Tooling

N/A: full platform builds were not run because this PR targets feat/torch-codegen, not master, and only adds a reusable base header declaration.
N/A: compile_commands.json behavior was not changed.
N/A: no new backend or device was added.
N/A: CUDA-like backend mutual exclusion was not changed.
Existing CI formatting expectations are preserved; original validation reported clang-format 21 passing on src/base/xlogy_outscalar_other.h.
N/A: no new runtime dependency was added.

Documentation

N/A: README.md, CONTRIBUTING.md, and developer workflow are unchanged.
N/A: XlogyOutscalarOther is an internal base declaration for torch codegen reuse; no user-facing documentation is required.
N/A: no user-visible breaking change.

Security and Safety

No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers have been committed.
N/A: no third-party code was added.
N/A: no unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were introduced.

Frees the `infini::ops::Sigmoid` name for the auto-generated PyTorch operator class emitted by the upcoming `generate_torch_ops.py`.

For each entry in `scripts/torch_ops.yaml`, the script finds the matching `.out` variant in PyTorch's `native_functions.yaml` (fetched from GitHub on first invocation, cached under `generated/.cache/`), parses its schema, and emits an InfiniOps base class plus a PyTorch backend specialization at slot 8 that wraps `at::<op>_out`. Key strategies: - Overload-aware lookup: prefers `<name>.out` then any `<name>.<overload>_out`, picking the variant with the most tensor inputs (so `pow.Tensor_Tensor_out` wins over `pow.Tensor_Scalar_out`). - Hidden-parameter pattern: optional types (`Scalar?`, `int[]?`, `ScalarType?`, `Generator?`, …), `bool` defaults, numeric `int`/`float` defaults, `int[N]=[]` defaults, and ATen enum symbols (`Mean`, `Sum`) are filtered from the user-facing API and substituted at the ATen call site. Unlocks reductions, scans, comparisons, losses, and multi-scalar activations from a single mechanism. - Slot 8: reserved for PyTorch backends; native/vendor implementations use 0–7. Also avoids a partial-specialization-after-instantiation conflict with `Operator<Op>` at index 0. - Per-op metadata (`generated/torch_ops_metadata.json`): records the full parameter list per op for the test harness, so adding a new op to the allowlist requires no code changes.

…ources When `WITH_TORCH=ON`, `src/CMakeLists.txt` runs the generator at configure time and globs `generated/torch/**/*.cc` into the `infiniops` target. `generated/` is added to the public include path so the emitted wrappers can include `"base/<op>.h"` and `"torch/<op>/<op>.h"`. `scripts/generate_wrappers.py` (the existing pybind binding generator) is taught to scan both `src/base/` and `generated/base/` so the auto-generated InfiniOps classes get Python bindings. The `__call__` lambda's `Self&` parameter is renamed to `op` to avoid colliding with ATen's typical `self` argument name.

A single parametrized test reads `generated/torch_ops_metadata.json`, builds inputs from the per-parameter info (tensor → `randn_strided` with op-specific shape if listed in `_TENSOR_SHAPES`, scalar → per-op or type default), runs the torch reference to discover output shape / dtype / arity, calls the InfiniOps wrapper, and compares each output tensor. No signature-kind classification — multi-output, ternary, multi-scalar, matrix, and everything in between fall out of the same code path. Per-op overrides live in flat dicts (`_TENSOR_SHAPES`, `_SCALAR_VALUES`). Vendor-specific runtime errors and bool outputs (InfiniOps `DataType` has no `kBool`) skip cleanly. `conftest.py` switches to `torch.allclose(..., equal_nan=True)` for floating outputs and `torch.equal` for bool/int outputs so domain violations producing matched NaNs and integer-output ops both work.

- Generator: emit one wrapper per ATen overload (e.g. `pow.Tensor_Tensor_out` → `PowTensorTensor`, `pow.Tensor_Scalar_out` → `PowTensorScalar`, `pow.Scalar_out` → `PowScalar`). Class name = base + overload. - Add type support: `SymInt`, `SymInt[]`, `Tensor?`, `Tensor?[]`, `str?`/`str` (when defaulted), `int[N]` / `SymInt[N]` with non-empty defaults (replicated `{0,0,...}` for `int[N]=0`). Optional Tensor and Tensor list optionals hardcode to `at::nullopt`. - `is_testable` relaxed to "has at least one out tensor" — generators like `arange.out` / `linspace.out` (no tensor input) are now in scope. - Allowlist auto-discovered from the YAML: every base op name with at least one parsable `.out` overload (390 names → 486 wrappers). - Test: handle `int[N]` / `SymInt[N]` defaults via `_LIST_SIZE_RE`-driven `_list_default`; pass `[0, 0, …]` of the right length. Per-op `_TENSOR_SHAPES` and `_SCALAR_VALUES` overrides keyed by `aten_name` (so all overloads of an op share the same overrides).

- Generators now wipe their output dirs (`generated/{base,torch,bindings, src,include}/`) before regenerating, so files for ops we no longer emit do not linger and break the next build. - Filter `Tensor[]` outputs (`split_copy`, `unbind_copy`, `split_with_sizes_copy`): would have emitted `at::<op>_out(at::Tensor, ...)` against the actual `at::TensorList` signature. - Filter ops whose first non-out argument is not a Tensor (`pow.Scalar_out`, generators like `arange`/`empty`): `Operator::Make` dispatches on the first tensor's device, so these need a separate path. - Spell out typed empty optionals (`c10::optional<at::Tensor>{}`, `c10::optional<at::Scalar>{}`, …) instead of bare `at::nullopt`: the latter is ambiguous on ops where overloads exist for both `optional< Scalar>` and `optional<Tensor>` (e.g. `clamp_out`). - Convert YAML single-quoted string defaults (`'none'`) to C++ double-quoted literals (`"none"`); the former parses as a char literal. - `generate_wrappers.py::_find_vector_tensor_params` now uses the shared `_find_base_header` helper, which checks `generated/base/` alongside `src/base/` (was hard-coded to `src/base/`). Test improvements: - Skip ops whose tensors use a dtype InfiniOps does not enumerate (`bool`, `complex64`, `complex128`, …); `DataTypeFromString` aborts the process on these. - Catch a wider exception set (`ValueError`, `IndexError`, `NotImplementedError`) when the torch reference rejects our generic random inputs (`adaptive_avg_pool2d` needs at least 3 dims, etc.). - Skip non-deterministic ops (`bernoulli`, `normal`, `multinomial`, `rand*`, `randperm`, `rrelu_with_noise`): independent draws diverge. - Skip when the Python-facing function returns fewer outputs than the ATen `_out` schema declares (`adaptive_max_pool2d` hides `indices` behind `return_indices=True`). - Add "Trying to resize storage that is not resizable" to the runtime skip patterns: ATen kernels for some loss ops use `out` as intermediate scratch and resize it before the final reduction; our `from_blob` outputs are non-resizable. Final state: 433 generated + 4 hand-written torch ops, full build succeeds, `pytest tests/test_torch_ops.py --devices cpu` reports 1663 passed, 2234 skipped, 0 failed.

- Skip ops whose torch reference triggers a CUDA device-side assert on random fp32 inputs (`binary_cross_entropy` requires inputs in [0, 1]; pooling/conv ops divide by `[0, 0]` placeholder kernel sizes our harness substitutes). The Python-side `RuntimeError` is catchable, but the CUDA context is left poisoned and every subsequent test errors at setup, which masks the rest of the suite. - Skip ops whose reference produces a 0-element output: on cuda, `torch.empty_like(zero_numel)` returns a tensor whose `data_ptr()` is unregistered with the device, so the wrapper trips on "pointer resides on host memory". Final state: `pytest tests/test_torch_ops.py` (cpu + cuda) reports 3263 passed, 4531 skipped, 0 failed.

- Support `str` C++ type (`std::string`) for required string params, unlocking `index_reduce`, `scatter_reduce`, `scatter_reduce_two`. - Relax `_find_out_entries` so it also matches multi-output schemas whose overload name reflects an output tensor instead of `_out` (`kthvalue.values`, `mode.values`). Detection is now: name is `<op>.out`, ends in `_out`, or carries a `Tensor(<letter>!)` mutability annotation. - Strip both `_out` suffix and `out_` prefix from the InfiniOps name derived from an overload (`div.out_mode` → `div_mode`, instead of `div_out_mode`). - Add per-op test values for the new ops (`reduce` modes, `k`/`dim` for `kthvalue`/`mode`). - `scripts/torch_ops.yaml`: list `kthvalue`, `mode`, `index_reduce`, `scatter_reduce`. Final state: 447 generated ops (up from 433). `pytest tests/test_torch_ops.py` (cpu + cuda) reports 3353 passed, 4693 skipped, 0 failed.

chen2021673 · 2026-05-08T10:40:36Z

+
+namespace infini::ops {
+
+class XlogyOutscalarOther : public Operator<XlogyOutscalarOther> {


命名有 OutscalarOther后缀

voltjia · 2026-05-09T07:43:27Z

Closing as superseded by the latest feat/torch-codegen.

The torch codegen has been updated to drop ATen overload-name suffixes (_grad_input, _outtensor, _n_scalar, _values, _x, etc.) from generated class names — those are ATen schema artefacts, not InfiniOps API. ATen overloads of the same base op are now overloaded operator() methods on a single class instead of separate classes.

This PR added src/base/xlogy_outscalar_other.h, but that ATen overload is now folded into xlogy as an additional operator() overload. The hand-written base for xlogy_outscalar_other is therefore no longer needed; the work continues in #589.

The original review feedback on this PR (naming, parameter exposure, scalar storage) has been incorporated directly into the codegen — the generated class uses the canonical name, stores visible scalars as members, and exposes default-valued bool / int / float parameters that were previously hidden.

No action required.

voltjia added 15 commits April 30, 2026 01:20

refactor: move Sigmoid helper in swiglu to detail:: namespace

883a1e6

Frees the `infini::ops::Sigmoid` name for the auto-generated PyTorch operator class emitted by the upcoming `generate_torch_ops.py`.

build: add PyYAML build dependency

ab948e0

feat: reuse compatible hand-written base ops

cc5b5ab

tool: add base branch integration helper

5ebacfb

style: satisfy generated torch op checks

1914971

feat: expand torch op allowlist

57ec7e3

feat: add xlogy_outscalar_other base

dcde5e2

chore: separate base class members

cd1427c

voltjia force-pushed the codex/add-xlogy_outscalar_other-base branch from 3f39716 to cd1427c Compare May 7, 2026 09:59

voltjia changed the title ~~feat: add XlogyOutscalarOther base~~ feat: add xlogy_outscalar_other base May 7, 2026

chen2021673 approved these changes May 8, 2026

View reviewed changes

voltjia force-pushed the feat/torch-codegen branch from dc3b3b0 to 156e83f Compare May 9, 2026 07:32

voltjia closed this May 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add `xlogy_outscalar_other` base#588

feat: add `xlogy_outscalar_other` base#588
voltjia wants to merge 15 commits intofeat/torch-codegenfrom
codex/add-xlogy_outscalar_other-base

voltjia commented May 5, 2026 •

edited

Loading

Uh oh!

chen2021673 May 8, 2026

Uh oh!

voltjia commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		namespace infini::ops {

		class XlogyOutscalarOther : public Operator<XlogyOutscalarOther> {

Conversation

voltjia commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Type of Change

Platforms Affected

Test Results on Supported Platforms

Benchmark / Performance Impact

Notes for Reviewers

Checklist

Title, Branch, and Commits

Scope and Design

General Code Hygiene (applies to all languages)

C++ Specific (if C++ files changed)

Python Specific (if Python files changed)

Testing

Build, CI, and Tooling

Documentation

Security and Safety

Uh oh!

chen2021673 May 8, 2026

Choose a reason for hiding this comment

Uh oh!

voltjia commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

voltjia commented May 5, 2026 •

edited

Loading