Skip to content

feat: add xlogy_outscalar_other base#588

Closed
voltjia wants to merge 15 commits intofeat/torch-codegenfrom
codex/add-xlogy_outscalar_other-base
Closed

feat: add xlogy_outscalar_other base#588
voltjia wants to merge 15 commits intofeat/torch-codegenfrom
codex/add-xlogy_outscalar_other-base

Conversation

@voltjia
Copy link
Copy Markdown
Collaborator

@voltjia voltjia commented May 5, 2026

Summary

  • Add the hand-written InfiniOps base class for xlogy_outscalar_other in src/base/xlogy_outscalar_other.h.
  • Let the torch code generator reuse src/base/xlogy_outscalar_other.h instead of emitting generated/base/xlogy_outscalar_other.h.
  • Apply the base class member-spacing convention required by scripts/check_conventions.py.

Motivation

This PR is part of the feat/torch-codegen base-header migration. The generated XlogyOutscalarOther base declaration is moved into src/base so code generation can reuse a reviewed hand-written header.

N/A: no linked issue.

Type of Change

  • feat - new feature / new operator / new platform
  • fix - bug fix
  • perf - performance improvement (no behavioral change)
  • refactor - code restructuring without behavior change
  • test - adding or fixing tests only
  • docs - documentation only
  • build / ci - build system or CI configuration
  • chore - tooling, formatting, or other non-code changes
  • Breaking change (requires a ! in the Conventional Commits prefix or a BREAKING CHANGE: footer)

Platforms Affected

  • CPU (WITH_CPU)
  • NVIDIA (WITH_NVIDIA)
  • Iluvatar (WITH_ILUVATAR)
  • MetaX (WITH_METAX)
  • Cambricon (WITH_CAMBRICON)
  • Moore (WITH_MOORE)
  • Ascend (WITH_ASCEND)
  • PyTorch C++ bindings (WITH_TORCH)
  • Build system / CMake / CI
  • Python bindings / user-facing API

Test Results on Supported Platforms

Platform Built pytest Result Notes / Hardware
NVIDIA N/A Not run Not required for this non-master feat/torch-codegen base-header PR; no runtime implementation is added.
Iluvatar N/A Not run Not required for this non-master feat/torch-codegen base-header PR; no runtime implementation is added.
MetaX N/A Not run Not required for this non-master feat/torch-codegen base-header PR; no runtime implementation is added.
Cambricon N/A Not run Not required for this non-master feat/torch-codegen base-header PR; no runtime implementation is added.
Moore N/A Not run Not required for this non-master feat/torch-codegen base-header PR; no runtime implementation is added.
Ascend N/A Not run Not required for this non-master feat/torch-codegen base-header PR; no runtime implementation is added.
Full `pytest` output (optional)
N/A: pytest was intentionally not run because this PR targets `feat/torch-codegen`, not `master`, and only adds a reusable base header declaration.

Benchmark / Performance Impact

N/A. This PR only adds a base operator declaration for torch codegen reuse and does not add a runtime implementation.

Notes for Reviewers

  • This PR targets feat/torch-codegen, not master.
  • The branch diff against feat/torch-codegen contains only src/base/xlogy_outscalar_other.h.
  • Original branch validation reported clang-format 21 passing on src/base/xlogy_outscalar_other.h; the follow-up formatting commit applies the class member spacing required by scripts/check_conventions.py.

Checklist

Title, Branch, and Commits

  • PR title follows Conventional Commits (e.g. feat(nvidia): …, fix(cuda/gemm): …).
  • N/A: this automated batch uses existing codex/add-xlogy_outscalar_other-base PR branches targeting feat/torch-codegen; branch renaming is intentionally out of scope.
  • Each commit message follows Conventional Commits.
  • N/A: this batch intentionally keeps the base-header addition and convention-formatting follow-up as two meaningful, squashable commits.
  • N/A: this PR is based on feat/torch-codegen, not master; no master rebase is required for this integration target.
  • No fixup! / squash! / wip commits remain.

Scope and Design

  • Changes are minimal - nothing unrelated to the stated motivation was added (CONTRIBUTING.md §Code/General).
  • No dead code, commented-out blocks, debug prints, printf/std::cout/print(...) left behind, or TODO without an owner and issue link.
  • No unrelated formatting churn that would obscure the diff.
  • Public API changes are intentional and limited to the XlogyOutscalarOther base operator declaration used by torch codegen.

General Code Hygiene (applies to all languages)

  • The code is self-explanatory; comments were added only where the why is non-obvious (CONTRIBUTING.md §Code/General).
  • Every modified or added file ends with a single trailing newline (CONTRIBUTING.md §Code/General).
  • No trailing whitespace, tab/space mixing, or stray BOMs.
  • Identifiers in comments and error messages are wrapped in backticks (e.g. the `seqlens_k` tensor) (CONTRIBUTING.md §Code/General).
  • All comments and error messages are in English (CONTRIBUTING.md §Code/General).
  • Comments and error messages are complete sentences - capitalized first letter, terminal punctuation - unless the language/framework convention says otherwise (CONTRIBUTING.md §Code/General; §Python).

C++ Specific (if C++ files changed)

  • Code follows the Google C++ Style Guide strictly.
  • clang-format (version 21, per .github/workflows/clang-format.yml) has been run against all modified .h, .cc, .cuh, and .mlu files; the diff is clean.
  • N/A: clang-tidy was not run because this PR only adds a base declaration header for feat/torch-codegen; no runtime implementation is added.
  • Operator parameter order is inputs first, outputs last; attributes are between inputs and outputs; naming follows PyTorch → ONNX → CUDA API precedence (CONTRIBUTING.md §C++).
  • N/A: this base declaration does not add C++ error paths or exceptions.
  • N/A: this base declaration does not add error or warning messages.
  • N/A: this base declaration does not add kernel files.
  • N/A: this base declaration does not add kernel launchers.
  • Constructor initializer list order matches member declaration order (CONTRIBUTING.md §C++).
  • Exactly one blank line between classes, between classes and functions, and between functions (CONTRIBUTING.md §C++).
  • Exactly one blank line between members (functions and variables) within a class (CONTRIBUTING.md §C++).
  • Exactly one blank line before and after the contents of a namespace (CONTRIBUTING.md §C++).
  • N/A: this PR adds only src/base/xlogy_outscalar_other.h for torch codegen reuse; platform implementations are out of scope.
  • No raw new/delete; RAII / smart pointers / existing allocators are used.

Python Specific (if Python files changed)

N/A: no Python files changed.

Testing

  • N/A: platform pytest was intentionally not run because this PR targets feat/torch-codegen, not master, and only adds a reusable base header declaration.
  • N/A: the table above records the reason platform testing was skipped.
  • N/A: no runtime functionality was added, so no new tests/ coverage is required.
  • N/A: no new pytest parameterization was added.
  • N/A: no Payload-returning test was added.
  • N/A: no dtype / device parameterization was added.
  • N/A: no flaky test was added.
  • N/A: this is not a runtime bug fix.

Build, CI, and Tooling

  • N/A: full platform builds were not run because this PR targets feat/torch-codegen, not master, and only adds a reusable base header declaration.
  • N/A: compile_commands.json behavior was not changed.
  • N/A: no new backend or device was added.
  • N/A: CUDA-like backend mutual exclusion was not changed.
  • Existing CI formatting expectations are preserved; original validation reported clang-format 21 passing on src/base/xlogy_outscalar_other.h.
  • N/A: no new runtime dependency was added.

Documentation

  • N/A: README.md, CONTRIBUTING.md, and developer workflow are unchanged.
  • N/A: XlogyOutscalarOther is an internal base declaration for torch codegen reuse; no user-facing documentation is required.
  • N/A: no user-visible breaking change.

Security and Safety

  • No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers have been committed.
  • N/A: no third-party code was added.
  • N/A: no unsafe pointer arithmetic, uninitialized reads, or missing bounds checks were introduced.

voltjia added 15 commits April 30, 2026 01:20
Frees the `infini::ops::Sigmoid` name for the auto-generated PyTorch
operator class emitted by the upcoming `generate_torch_ops.py`.
For each entry in `scripts/torch_ops.yaml`, the script finds the
matching `.out` variant in PyTorch's `native_functions.yaml` (fetched
from GitHub on first invocation, cached under `generated/.cache/`),
parses its schema, and emits an InfiniOps base class plus a PyTorch
backend specialization at slot 8 that wraps `at::<op>_out`.

Key strategies:

- Overload-aware lookup: prefers `<name>.out` then any
  `<name>.<overload>_out`, picking the variant with the most tensor
  inputs (so `pow.Tensor_Tensor_out` wins over `pow.Tensor_Scalar_out`).

- Hidden-parameter pattern: optional types (`Scalar?`, `int[]?`,
  `ScalarType?`, `Generator?`, …), `bool` defaults, numeric `int`/`float`
  defaults, `int[N]=[]` defaults, and ATen enum symbols (`Mean`, `Sum`)
  are filtered from the user-facing API and substituted at the ATen
  call site.  Unlocks reductions, scans, comparisons, losses, and
  multi-scalar activations from a single mechanism.

- Slot 8: reserved for PyTorch backends; native/vendor implementations
  use 0–7.  Also avoids a partial-specialization-after-instantiation
  conflict with `Operator<Op>` at index 0.

- Per-op metadata (`generated/torch_ops_metadata.json`): records the
  full parameter list per op for the test harness, so adding a new op
  to the allowlist requires no code changes.
…ources

When `WITH_TORCH=ON`, `src/CMakeLists.txt` runs the generator at
configure time and globs `generated/torch/**/*.cc` into the `infiniops`
target.  `generated/` is added to the public include path so the
emitted wrappers can include `"base/<op>.h"` and `"torch/<op>/<op>.h"`.

`scripts/generate_wrappers.py` (the existing pybind binding generator)
is taught to scan both `src/base/` and `generated/base/` so the
auto-generated InfiniOps classes get Python bindings.  The `__call__`
lambda's `Self&` parameter is renamed to `op` to avoid colliding with
ATen's typical `self` argument name.
A single parametrized test reads `generated/torch_ops_metadata.json`,
builds inputs from the per-parameter info (tensor → `randn_strided`
with op-specific shape if listed in `_TENSOR_SHAPES`, scalar → per-op
or type default), runs the torch reference to discover output shape /
dtype / arity, calls the InfiniOps wrapper, and compares each output
tensor.

No signature-kind classification — multi-output, ternary, multi-scalar,
matrix, and everything in between fall out of the same code path.
Per-op overrides live in flat dicts (`_TENSOR_SHAPES`,
`_SCALAR_VALUES`).  Vendor-specific runtime errors and bool outputs
(InfiniOps `DataType` has no `kBool`) skip cleanly.

`conftest.py` switches to `torch.allclose(..., equal_nan=True)` for
floating outputs and `torch.equal` for bool/int outputs so domain
violations producing matched NaNs and integer-output ops both work.
- Generator: emit one wrapper per ATen overload (e.g. `pow.Tensor_Tensor_out`
  → `PowTensorTensor`, `pow.Tensor_Scalar_out` → `PowTensorScalar`,
  `pow.Scalar_out` → `PowScalar`).  Class name = base + overload.
- Add type support: `SymInt`, `SymInt[]`, `Tensor?`, `Tensor?[]`,
  `str?`/`str` (when defaulted), `int[N]` / `SymInt[N]` with non-empty
  defaults (replicated `{0,0,...}` for `int[N]=0`).  Optional Tensor and
  Tensor list optionals hardcode to `at::nullopt`.
- `is_testable` relaxed to "has at least one out tensor" — generators
  like `arange.out` / `linspace.out` (no tensor input) are now in scope.
- Allowlist auto-discovered from the YAML: every base op name with at
  least one parsable `.out` overload (390 names → 486 wrappers).
- Test: handle `int[N]` / `SymInt[N]` defaults via `_LIST_SIZE_RE`-driven
  `_list_default`; pass `[0, 0, …]` of the right length.  Per-op
  `_TENSOR_SHAPES` and `_SCALAR_VALUES` overrides keyed by `aten_name`
  (so all overloads of an op share the same overrides).
- Generators now wipe their output dirs (`generated/{base,torch,bindings,
  src,include}/`) before regenerating, so files for ops we no longer emit
  do not linger and break the next build.
- Filter `Tensor[]` outputs (`split_copy`, `unbind_copy`,
  `split_with_sizes_copy`): would have emitted `at::<op>_out(at::Tensor,
  ...)` against the actual `at::TensorList` signature.
- Filter ops whose first non-out argument is not a Tensor
  (`pow.Scalar_out`, generators like `arange`/`empty`): `Operator::Make`
  dispatches on the first tensor's device, so these need a separate path.
- Spell out typed empty optionals (`c10::optional<at::Tensor>{}`,
  `c10::optional<at::Scalar>{}`, …) instead of bare `at::nullopt`: the
  latter is ambiguous on ops where overloads exist for both `optional<
  Scalar>` and `optional<Tensor>` (e.g. `clamp_out`).
- Convert YAML single-quoted string defaults (`'none'`) to C++
  double-quoted literals (`"none"`); the former parses as a char literal.
- `generate_wrappers.py::_find_vector_tensor_params` now uses the
  shared `_find_base_header` helper, which checks `generated/base/`
  alongside `src/base/` (was hard-coded to `src/base/`).

Test improvements:
- Skip ops whose tensors use a dtype InfiniOps does not enumerate
  (`bool`, `complex64`, `complex128`, …); `DataTypeFromString` aborts
  the process on these.
- Catch a wider exception set (`ValueError`, `IndexError`,
  `NotImplementedError`) when the torch reference rejects our generic
  random inputs (`adaptive_avg_pool2d` needs at least 3 dims, etc.).
- Skip non-deterministic ops (`bernoulli`, `normal`, `multinomial`,
  `rand*`, `randperm`, `rrelu_with_noise`): independent draws diverge.
- Skip when the Python-facing function returns fewer outputs than the
  ATen `_out` schema declares (`adaptive_max_pool2d` hides `indices`
  behind `return_indices=True`).
- Add "Trying to resize storage that is not resizable" to the runtime
  skip patterns: ATen kernels for some loss ops use `out` as
  intermediate scratch and resize it before the final reduction; our
  `from_blob` outputs are non-resizable.

Final state: 433 generated + 4 hand-written torch ops, full build
succeeds, `pytest tests/test_torch_ops.py --devices cpu` reports
1663 passed, 2234 skipped, 0 failed.
- Skip ops whose torch reference triggers a CUDA device-side assert on
  random fp32 inputs (`binary_cross_entropy` requires inputs in [0, 1];
  pooling/conv ops divide by `[0, 0]` placeholder kernel sizes our
  harness substitutes).  The Python-side `RuntimeError` is catchable,
  but the CUDA context is left poisoned and every subsequent test
  errors at setup, which masks the rest of the suite.
- Skip ops whose reference produces a 0-element output: on cuda,
  `torch.empty_like(zero_numel)` returns a tensor whose `data_ptr()`
  is unregistered with the device, so the wrapper trips on
  "pointer resides on host memory".

Final state: `pytest tests/test_torch_ops.py` (cpu + cuda) reports
3263 passed, 4531 skipped, 0 failed.
- Support `str` C++ type (`std::string`) for required string params,
  unlocking `index_reduce`, `scatter_reduce`, `scatter_reduce_two`.
- Relax `_find_out_entries` so it also matches multi-output schemas
  whose overload name reflects an output tensor instead of `_out`
  (`kthvalue.values`, `mode.values`).  Detection is now: name is
  `<op>.out`, ends in `_out`, or carries a `Tensor(<letter>!)`
  mutability annotation.
- Strip both `_out` suffix and `out_` prefix from the InfiniOps name
  derived from an overload (`div.out_mode` → `div_mode`, instead of
  `div_out_mode`).
- Add per-op test values for the new ops (`reduce` modes, `k`/`dim`
  for `kthvalue`/`mode`).
- `scripts/torch_ops.yaml`: list `kthvalue`, `mode`, `index_reduce`,
  `scatter_reduce`.

Final state: 447 generated ops (up from 433).  `pytest
tests/test_torch_ops.py` (cpu + cuda) reports 3353 passed,
4693 skipped, 0 failed.
@voltjia voltjia force-pushed the codex/add-xlogy_outscalar_other-base branch from 3f39716 to cd1427c Compare May 7, 2026 09:59
@voltjia voltjia changed the title feat: add XlogyOutscalarOther base feat: add xlogy_outscalar_other base May 7, 2026

namespace infini::ops {

class XlogyOutscalarOther : public Operator<XlogyOutscalarOther> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

命名有 OutscalarOther后缀

@voltjia voltjia force-pushed the feat/torch-codegen branch from dc3b3b0 to 156e83f Compare May 9, 2026 07:32
@voltjia
Copy link
Copy Markdown
Collaborator Author

voltjia commented May 9, 2026

Closing as superseded by the latest feat/torch-codegen.

The torch codegen has been updated to drop ATen overload-name suffixes (_grad_input, _outtensor, _n_scalar, _values, _x, etc.) from generated class names — those are ATen schema artefacts, not InfiniOps API. ATen overloads of the same base op are now overloaded operator() methods on a single class instead of separate classes.

This PR added src/base/xlogy_outscalar_other.h, but that ATen overload is now folded into xlogy as an additional operator() overload. The hand-written base for xlogy_outscalar_other is therefore no longer needed; the work continues in #589.

The original review feedback on this PR (naming, parameter exposure, scalar storage) has been incorporated directly into the codegen — the generated class uses the canonical name, stores visible scalars as members, and exposes default-valued bool / int / float parameters that were previously hidden.

No action required.

@voltjia voltjia closed this May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants