WIP ETACE Calculator Interface and Training Assembly #313

jameskermode · 2025-12-23T19:37:41Z

Summary

Adds EquivariantTensors-based calculators for the ETACE backend, enabling:

AtomsCalculators-compatible energy/forces/virial evaluation
Linear least squares assembly for model training
Composable calculator stacking via StackedCalculator
GPU acceleration for energy and forces (requires Polynomials4ML v0.5.8+)

New Components

Core Calculators:

ETACEPotential (type alias): Standalone ACE calculator with full AtomsCalculators interface
WrappedSiteCalculator{M, PS, ST}: Generic wrapper for site energy models implementing the ETACE pattern
StackedCalculator: Combines multiple calculators with @generated loop unrolling

ET-Native Models:

ETACE: Many-body ACE basis using EquivariantTensors graph evaluation
ETPairModel: Pair potential model using ET graph evaluation
ETOneBody: One-body reference energies (zero gradients)

Conversion Functions:

convert2et(model): Convert many-body ACE to ETACE
convert2et_full(model, ps, st): Convert full model (E0 + pair + many-body) to StackedCalculator

Training Assembly

length_basis(calc) - Total number of linear parameters
get_linear_parameters(calc) / set_linear_parameters!(calc, θ) - Parameter management
potential_energy_basis(sys, calc) - Energy design matrix row
energy_forces_virial_basis(sys, calc) - Full EFV design matrix row

Benchmark Results

GPU Many-Body Only (ETACE) - Energy:

Atoms	Edges	CPU (ms)	GPU (ms)	GPU Speedup
64	2146	3.38	0.54	6.3x
512	17176	27.77	0.66	41.9x
800	26868	37.12	0.78	47.6x

GPU Many-Body Only (ETACE) - Forces:

Atoms	Edges	CPU (ms)	GPU (ms)	GPU Speedup
64	2146	46.46	14.42	3.2x
512	17178	104.39	15.12	6.9x
800	26860	289.32	16.33	17.7x

GPU Full Model (E0 + Pair + Many-Body) - Energy:

Atoms	Edges	CPU (ms)	GPU (ms)	GPU Speedup
64	2140	3.40	0.94	3.6x
512	17166	31.18	0.95	32.9x
800	26858	45.16	1.24	36.4x

GPU Full Model (E0 + Pair + Many-Body) - Forces:

Atoms	Edges	CPU (ms)	GPU (ms)	GPU Speedup
64	2134	24.05	19.34	1.2x
512	17178	~110	~20	~5x
800	26860	~300	~22	~14x

CPU Full Model Forces (ETACE vs ACE):

Atoms	Edges	ACE CPU (ms)	ETACE CPU (ms)	ETACE Speedup
64	2146	73.6	30.5	2.4x
256	8596	307.7	74.4	4.1x
800	26886	975.0	225.6	4.3x

Notes:

GPU forces now work with Polynomials4ML v0.5.8+ (bug fix in P4ML release Dec 29, 2024)
GPU shows excellent scaling: larger systems see better speedups
Full model GPU speedups are lower than many-body only due to graph construction overhead
CPU forces are 2-4x faster with ETACE due to Zygote AD through ET graph

Design Note: Basis Index Handling

The current implementation handles species-separated basis indices at the calculator level (in energy_forces_virial_basis). Each species gets separate parameter indices via p = (s - 1) * nbasis + k. This design choice may need discussion for future GPU training assembly use cases.

Files Added/Modified

New files:

src/et_models/et_ace.jl - ETACE model implementation
src/et_models/et_pair.jl - ETPairModel implementation
src/et_models/et_onebody.jl - ETOneBody implementation
src/et_models/et_calculators.jl - Calculator wrappers and conversion functions
src/et_models/stackedcalc.jl - StackedCalculator with @generated
src/et_models/convert.jl - Model conversion utilities
test/etmodels/test_etbackend.jl - ETACE tests
test/etmodels/test_etpair.jl - ETPairModel tests
test/etmodels/test_etonebody.jl - ETOneBody tests
benchmark/gpu_benchmark.jl - GPU benchmark script

Modified files:

src/et_models/et_models.jl - Updated includes and exports
Project.toml - EquivariantTensors compat bumped to 0.4.2
test/Project.toml - EquivariantTensors compat bumped to 0.4.2, added LuxCUDA

Test plan

All existing tests pass (945 passed, 1 known Julia 1.12 hash ordering issue)
New test files cover ETACE, ETPairModel, and ETOneBody
Finite difference validation for forces and virial
GPU benchmark confirms significant speedups for energy and forces
CPU benchmark confirms ETACE force speedups vs classic ACE

🤖 Generated with Claude Code

Implements Phase 1 of the ETACE calculator interface plan: - ETACEPotential struct wrapping ETACE models - AtomsCalculators interface (energy, forces, virial) - Combined energy_forces_virial evaluation - Tests comparing against original ACE model - CPU and GPU performance benchmarks Key implementation details: - Forces computed via site_grads() + forces_from_edge_grads() - Force sign: forces_from_edge_grads returns +∇E, negated for F=-∇E - Virial: V = -∑ ∂E/∂𝐫ij ⊗ 𝐫ij Performance results (8-atom Si/O cell, order=3, maxl=6): - Energy: ETACE ~15% slower (graph construction overhead) - Forces: ETACE ~6.5x faster (vectorized gradients) - EFV: ETACE ~5x faster GPU benchmarks use auto-detection from EquivariantTensors utils. GPU gradients skipped due to Polynomials4ML GPU compat issues. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Implements a composable calculator architecture: - SiteEnergyModel interface: site_energies(), site_energy_grads(), cutoff_radius() - E0Model: One-body reference energies (constant per species, zero forces) - WrappedETACE: Wraps ETACE model with SiteEnergyModel interface - WrappedSiteCalculator: Converts site quantities to global (energy, forces, virial) - StackedCalculator: Combines multiple AtomsCalculators by summing contributions Architecture allows non-site-based calculators (e.g., Coulomb, dispersion) to be added directly to StackedCalculator without requiring site energy decomposition. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Move StackedCalculator to src/et_models/stackedcalc.jl for better separation of concerns - it's a generic utility for combining calculators, independent of ETACE-specific code. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Implements linear least squares training support: - length_basis(): Returns number of linear parameters (nbasis * nspecies) - energy_forces_virial_basis(): Compute basis values for E/F/V - potential_energy_basis(): Faster energy-only basis computation - get_linear_parameters(): Extract readout weights as flat vector - set_linear_parameters!(): Set readout weights from flat vector The basis functions allow linear fitting via: E = dot(E_basis, θ) F = F_basis * θ V = sum(θ .* V_basis) Tests verify that linear combination of basis with current parameters reproduces the direct energy/forces/virial evaluation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

@generated

Use compile-time loop unrolling via @generated functions for efficient summation over calculators. The N type parameter allows generating specialized code like: E_1 = potential_energy(sys, calc.calcs[1]) E_2 = potential_energy(sys, calc.calcs[2]) return E_1 + E_2 instead of runtime loops. This enables better inlining and type inference when the number of calculators is small and known at compile time. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- benchmark_comparison.jl: Energy benchmarks (CPU + GPU) - benchmark_forces.jl: Forces benchmarks (CPU only) Results show: - Energy: ETACE CPU 1.7-2.2x faster, ETACE GPU up to 87x faster - Forces: ETACE CPU 7.7-11.4x faster than ACE 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Phase 1, 2, 5 complete - Phase 3 (E0/PairModel) assigned to maintainer - Added benchmark results: GPU up to 87x faster, forces 8-11x faster - Documented all new files and test coverage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add ACEfit.basis_size dispatch for ETACEPotential - Add ACEfit to test/Project.toml - Test training assembly on multiple structures (5 random) - Test multi-species parameter ordering (pure Si, pure O, mixed) - Verify species-specific basis contributions are correctly separated - Fix soft scope warnings with local declarations 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add ACEpotentials.ETModels to autodocs in all_exported.md - Add comprehensive integration test for ETACE calculators based on test_silicon workflow - Tests verify energy/forces/virial consistency with original ACE - Tests verify training basis assembly and StackedCalculator composition 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Pre-allocate gradient buffer (∇Ei_buf) outside loop, reuse across iterations - Eliminate W_unit matrix allocation by directly copying ∂𝔹[:, i, k] - Pre-compute zero gradient element for species masking - Pre-extract edge vectors for virial computation - Use zero() instead of zeros() for SMatrix virial accumulator Performance improvement (64-atom system): - Time: 1597ms → 422ms (3.8x faster) - Memory: 3.4 GiB → 412 MiB (8.4x reduction) Also fix variable scoping in test_et_silicon.jl for Julia 1.10+ (added `global` keyword for loop variable updates). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

ETACE only implements the many-body basis, not the pair potential. The test was incorrectly comparing full ACE (with pair) against ETACE. Changes: - Create model_nopair with Wpair=0 for fair comparison - Compare ETACE against ACE many-body contribution only - Fix E0Model constructor: use Symbol key (:Si) not Int (14) - Skip isolated atoms in all tests (ETACE requires >= 2 atoms) - Update test comments and summary to clarify scope 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

@ntuple

- Extract core helpers (_core_site_energies, _core_site_grads) for shared evaluation logic between ETACEPotential and WrappedETACE - Refactor WrappedETACE and ETACEPotential to use core helpers - Simplify stackedcalc.jl: replace manual AST building (_gen_sum, _gen_broadcast_sum) with idiomatic @nexprs/@ntuple from Base.Cartesian - Net reduction of ~50 lines while maintaining identical behavior 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Removes duplicate AtomsCalculators interface and evaluation logic by: - Making WrappedETACE mutable with co_ps field for training - Defining ETACEPotential as const alias for WrappedSiteCalculator{WrappedETACE} - Removing duplicate _evaluate_* functions and AtomsCalculators methods - Adding accessor helpers (_etace, _ps, _st) for training functions The evaluation now flows through WrappedSiteCalculator's generic methods which call site_energies/site_energy_grads on the WrappedETACE model. This reduces ~66 lines of duplicated code. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Remove duplicate E0Model in favor of upstream ETOneBody - Unify WrappedSiteCalculator to work with all ETACE-pattern models directly - Document that ETACE, ETPairModel, ETOneBody share identical interface - Plan Phase 6 refactoring to eliminate WrappedETACE indirection - Update architecture diagrams showing target unified structure 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Refactor WrappedSiteCalculator to store ps, st, rcut directly - Remove E0Model (use upstream ETOneBody instead) - Remove WrappedETACE (functionality merged into WrappedSiteCalculator) - Remove old SiteEnergyModel interface (site_energies, site_energy_grads) - Update ETACEPotential to be type alias for WrappedSiteCalculator{ETACE} - Update training assembly accessors for new flat structure All ETACE-pattern models (ETACE, ETPairModel, ETOneBody) now work directly with WrappedSiteCalculator via their common interface: - model(G, ps, st) -> (site_energies, st) - site_grads(model, G, ps, st) -> edge gradients 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add convert2et_full() to convert complete ACE model to StackedCalculator - Combines ETOneBody (E0), ETPairModel (pair), and ETACE (many-body) - Returns StackedCalculator compatible with AtomsCalculators - Add _copy_ace_params!() for many-body parameter copying - Copies radial basis Wnlq parameters - Copies readout WB parameters - Add _copy_pair_params!() for pair potential parameter copying - Based on mapping from test/etmodels/test_etpair.jl - Copies pair radial basis and readout parameters 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Replace E0Model tests with ETOneBody (upstream) tests - Remove WrappedETACE tests (no longer exists) - Update WrappedSiteCalculator tests for new (model, ps, st, rcut) signature - Update ETACEPotential construction test for direct model access - Update silicon integration test to use ETOneBody and unified wrapper Tests now use upstream models directly: - ETOneBody instead of E0Model - WrappedSiteCalculator(model, ps, st, rcut) instead of nested wrappers 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Return NamedTuple with empty edge_data matching ETACE/ETPairModel interface - Remove unnecessary Zygote import (hand-coded since gradient is trivially zero) - Update test to check isempty(∂G.edge_data) instead of zero norms The calling code in et_calculators.jl checks isempty(∂G.edge_data) and returns zero forces/virial when true, which is the correct behavior for ETOneBody (energy depends only on atom types, not positions). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Comment out Pkg.activate in test_committee.jl that was switching away from the test project environment - Update test_etonebody.jl gradient test to check for NamedTuple return type with .edge_data field (matching the updated ETOneBody interface that returns consistent structure with ETACE/ETPairModel) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

ET ACE (site_basis_jacobian): - Remove ps.basis and st.basis from _jacobian_X call - The upstream ET._jacobian_X for SparseACEbasis only takes 5 args: (basis, Rnl, Ylm, dRnl, dYlm) ET Pair (site_grads): - Implement hand-coded gradient using evaluate_ed instead of Zygote - Avoids Zygote InplaceableThunk issue with upstream EdgeEmbed rrule - Matches the pattern used in site_basis_jacobian Also inline _apply_etpairmodel to avoid calling site_basis (cleaner). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Fix z.number → z.atomic_number in E0_dict creation - Fix _copy_ace_params! path: rembed.basis.linl.W → rembed.post.W - Fix _copy_pair_params! path: rembed.basis.rbasis.linl.W → rembed.rbasis.post.W - Add benchmark comparing ACE vs ETACE StackedCalculator for full model 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Address moderator concern about commit 50ed668: - Avoid forming O(nnodes * nbasis) dense intermediate matrix - Compute edge gradients directly using loops - Same numerical results, better memory characteristics 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Bump EquivariantTensors compat to 0.4.2 in main and test Project.toml - Simplify site_basis_jacobian to use 5-arg _jacobian_X API (requires ET >= 0.4.2) - Improve ETPairModel site_grads memory efficiency: - Avoid O(nnodes * nbasis) intermediate matrix allocation - Compute edge gradients directly using loops 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Now that test project uses ET 0.4.2 (which fixed InplaceableThunk bug in EdgeEmbed rrule), we can use the simpler Zygote-based gradient computation for ETPairModel. Also fix _jacobian_X call in ETACE to use 7-arg API (requires ps, st). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Add benchmark/gpu_benchmark.jl for GPU energy/forces benchmarks - Test both many-body only (ETACE) and full model (E0 + Pair + Many-Body) - Add LuxCUDA to test/Project.toml for GPU testing support - GPU forces now work with Polynomials4ML v0.5.8+ (bug fix Dec 29, 2024) Results show significant GPU speedups: - Many-body energy: 6x-48x speedup (64-800 atoms) - Many-body forces: 3x-18x speedup (64-800 atoms) - Full model energy: 3x-36x speedup (64-800 atoms) - Full model forces: 1x-14x speedup (64-800 atoms) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Mark all core phases as complete - Add GPU benchmark results (energy and forces) - Document outstanding work: pair training assembly, ACEfit integration - Note basis index design discussion needed with maintainer - Update dependencies: ET 0.4.2, P4ML 0.5.8+ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Add ETPairPotential and ETOneBodyPotential type aliases - Implement length_basis, energy_forces_virial_basis, potential_energy_basis for ETPairPotential, ETOneBodyPotential, and StackedCalculator - Add get/set_linear_parameters for all calculator types - Add ACEfit.basis_size dispatch for all calculator types - Import and extend length_basis, energy_forces_virial_basis from Models - ACEfit.assemble now works with full ETACE StackedCalculator 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…neBodyPotential, StackedCalculator Tests cover: - ETOneBodyPotential returns empty arrays (0 learnable parameters) - ETPairPotential training assembly with learnable pair basis - StackedCalculator concatenation of basis from all components - Linear combinations reproduce energy/forces/virial - get/set_linear_parameters round-trip - ACEfit.basis_size dispatch 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

cortner · 2025-12-31T23:50:26Z

@jameskermode - please move ET compat to 4.3 when you have a moment.

cortner · 2026-01-01T00:30:27Z

src/et_models/onebody.jl

+# Gradient w.r.t. positions is always zero.
+# Return NamedTuple matching Zygote gradient structure with empty edge_data.
+# The calling code checks isempty(∂G.edge_data) and returns zero forces/virial.
+function site_grads(l::ETOneBody, X::ET.ETGraph, ps, st)


I don't think this will lead to type stable code. If you look at the gradients that are produced by the other model components it will be of quite a different form: there will be additional fields in the returned NamedTuple and the edge_data field will contain VState or NamedTuple and not PState variables. I think this may need a bit of iteration.

That said, it might be a bit much hoping for perfect type stability here. I'll follow up on Slack on how to proceed.

I think the correct code might be

return (; edge_data = fill(VState(), length(X.edge_data)))

The reason this should work is that

VState(r = SA[1.0, 2.0, 3.0]) + VState() == VState(r = SA[1.0, 2.0, 3.0])

i.e. an empty VState() just acts like zero when added on any other VState.

cortner · 2026-01-01T00:31:42Z

src/et_models/stackedcalc.jl

+
+@generated function _stacked_energy(sys::AbstractSystem, calc::StackedCalculator{N}) where {N}
+   quote
+      @nexprs $N i -> E_i = AtomsCalculators.potential_energy(sys, calc.calcs[i])


This refers to my other comment about the OneBody gradient. This is exactly the correct way to ensure type stability because you have N expressions for N calculators rather than a for loop. So I don't think there will be a type stability problem anywhere.

jameskermode changed the title ~~ETACE Calculator Interface and Training Assembly~~ WIP ETACE Calculator Interface and Training Assembly Dec 23, 2025

jameskermode marked this pull request as draft December 23, 2025 19:41

jameskermode force-pushed the jrk/etcalculators branch from fdadcb9 to 60212bb Compare December 23, 2025 21:34

jameskermode and others added 14 commits December 30, 2025 18:28

jameskermode force-pushed the jrk/etcalculators branch from c51a644 to 7d03f9d Compare December 30, 2025 21:19

jameskermode and others added 12 commits December 30, 2025 21:24

jameskermode and others added 2 commits December 31, 2025 16:34

cortner reviewed Jan 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP ETACE Calculator Interface and Training Assembly #313

WIP ETACE Calculator Interface and Training Assembly #313

Uh oh!

jameskermode commented Dec 23, 2025 •

edited

Loading

Uh oh!

cortner commented Dec 31, 2025

Uh oh!

cortner Jan 1, 2026

Uh oh!

cortner Jan 1, 2026

Uh oh!

cortner Jan 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

WIP ETACE Calculator Interface and Training Assembly #313

Are you sure you want to change the base?

WIP ETACE Calculator Interface and Training Assembly #313

Uh oh!

Conversation

jameskermode commented Dec 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New Components

Training Assembly

Benchmark Results

Design Note: Basis Index Handling

Files Added/Modified

Test plan

Uh oh!

cortner commented Dec 31, 2025

Uh oh!

cortner Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

cortner Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

cortner Jan 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jameskermode commented Dec 23, 2025 •

edited

Loading