Skip to content

Conversation

@jameskermode
Copy link
Collaborator

@jameskermode jameskermode commented Dec 23, 2025

Summary

Adds EquivariantTensors-based calculators for the ETACE backend, enabling:

  • AtomsCalculators-compatible energy/forces/virial evaluation
  • Linear least squares assembly for model training
  • Composable calculator stacking via StackedCalculator
  • GPU acceleration for energy and forces (requires Polynomials4ML v0.5.8+)

New Components

Core Calculators:

  • ETACEPotential (type alias): Standalone ACE calculator with full AtomsCalculators interface
  • WrappedSiteCalculator{M, PS, ST}: Generic wrapper for site energy models implementing the ETACE pattern
  • StackedCalculator: Combines multiple calculators with @generated loop unrolling

ET-Native Models:

  • ETACE: Many-body ACE basis using EquivariantTensors graph evaluation
  • ETPairModel: Pair potential model using ET graph evaluation
  • ETOneBody: One-body reference energies (zero gradients)

Conversion Functions:

  • convert2et(model): Convert many-body ACE to ETACE
  • convert2et_full(model, ps, st): Convert full model (E0 + pair + many-body) to StackedCalculator

Training Assembly

  • length_basis(calc) - Total number of linear parameters
  • get_linear_parameters(calc) / set_linear_parameters!(calc, θ) - Parameter management
  • potential_energy_basis(sys, calc) - Energy design matrix row
  • energy_forces_virial_basis(sys, calc) - Full EFV design matrix row

Benchmark Results

GPU Many-Body Only (ETACE) - Energy:

Atoms Edges CPU (ms) GPU (ms) GPU Speedup
64 2146 3.38 0.54 6.3x
512 17176 27.77 0.66 41.9x
800 26868 37.12 0.78 47.6x

GPU Many-Body Only (ETACE) - Forces:

Atoms Edges CPU (ms) GPU (ms) GPU Speedup
64 2146 46.46 14.42 3.2x
512 17178 104.39 15.12 6.9x
800 26860 289.32 16.33 17.7x

GPU Full Model (E0 + Pair + Many-Body) - Energy:

Atoms Edges CPU (ms) GPU (ms) GPU Speedup
64 2140 3.40 0.94 3.6x
512 17166 31.18 0.95 32.9x
800 26858 45.16 1.24 36.4x

GPU Full Model (E0 + Pair + Many-Body) - Forces:

Atoms Edges CPU (ms) GPU (ms) GPU Speedup
64 2134 24.05 19.34 1.2x
512 17178 ~110 ~20 ~5x
800 26860 ~300 ~22 ~14x

CPU Full Model Forces (ETACE vs ACE):

Atoms Edges ACE CPU (ms) ETACE CPU (ms) ETACE Speedup
64 2146 73.6 30.5 2.4x
256 8596 307.7 74.4 4.1x
800 26886 975.0 225.6 4.3x

Notes:

  • GPU forces now work with Polynomials4ML v0.5.8+ (bug fix in P4ML release Dec 29, 2024)
  • GPU shows excellent scaling: larger systems see better speedups
  • Full model GPU speedups are lower than many-body only due to graph construction overhead
  • CPU forces are 2-4x faster with ETACE due to Zygote AD through ET graph

Design Note: Basis Index Handling

The current implementation handles species-separated basis indices at the calculator level (in energy_forces_virial_basis). Each species gets separate parameter indices via p = (s - 1) * nbasis + k. This design choice may need discussion for future GPU training assembly use cases.

Files Added/Modified

New files:

  • src/et_models/et_ace.jl - ETACE model implementation
  • src/et_models/et_pair.jl - ETPairModel implementation
  • src/et_models/et_onebody.jl - ETOneBody implementation
  • src/et_models/et_calculators.jl - Calculator wrappers and conversion functions
  • src/et_models/stackedcalc.jl - StackedCalculator with @generated
  • src/et_models/convert.jl - Model conversion utilities
  • test/etmodels/test_etbackend.jl - ETACE tests
  • test/etmodels/test_etpair.jl - ETPairModel tests
  • test/etmodels/test_etonebody.jl - ETOneBody tests
  • benchmark/gpu_benchmark.jl - GPU benchmark script

Modified files:

  • src/et_models/et_models.jl - Updated includes and exports
  • Project.toml - EquivariantTensors compat bumped to 0.4.2
  • test/Project.toml - EquivariantTensors compat bumped to 0.4.2, added LuxCUDA

Test plan

  • All existing tests pass (945 passed, 1 known Julia 1.12 hash ordering issue)
  • New test files cover ETACE, ETPairModel, and ETOneBody
  • Finite difference validation for forces and virial
  • GPU benchmark confirms significant speedups for energy and forces
  • CPU benchmark confirms ETACE force speedups vs classic ACE

🤖 Generated with Claude Code

@jameskermode jameskermode changed the title ETACE Calculator Interface and Training Assembly WIP ETACE Calculator Interface and Training Assembly Dec 23, 2025
@jameskermode jameskermode marked this pull request as draft December 23, 2025 19:41
jameskermode and others added 14 commits December 30, 2025 18:28
Implements Phase 1 of the ETACE calculator interface plan:
- ETACEPotential struct wrapping ETACE models
- AtomsCalculators interface (energy, forces, virial)
- Combined energy_forces_virial evaluation
- Tests comparing against original ACE model
- CPU and GPU performance benchmarks

Key implementation details:
- Forces computed via site_grads() + forces_from_edge_grads()
- Force sign: forces_from_edge_grads returns +∇E, negated for F=-∇E
- Virial: V = -∑ ∂E/∂𝐫ij ⊗ 𝐫ij

Performance results (8-atom Si/O cell, order=3, maxl=6):
- Energy: ETACE ~15% slower (graph construction overhead)
- Forces: ETACE ~6.5x faster (vectorized gradients)
- EFV: ETACE ~5x faster

GPU benchmarks use auto-detection from EquivariantTensors utils.
GPU gradients skipped due to Polynomials4ML GPU compat issues.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements a composable calculator architecture:

- SiteEnergyModel interface: site_energies(), site_energy_grads(), cutoff_radius()
- E0Model: One-body reference energies (constant per species, zero forces)
- WrappedETACE: Wraps ETACE model with SiteEnergyModel interface
- WrappedSiteCalculator: Converts site quantities to global (energy, forces, virial)
- StackedCalculator: Combines multiple AtomsCalculators by summing contributions

Architecture allows non-site-based calculators (e.g., Coulomb, dispersion) to be
added directly to StackedCalculator without requiring site energy decomposition.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Move StackedCalculator to src/et_models/stackedcalc.jl for better
separation of concerns - it's a generic utility for combining
calculators, independent of ETACE-specific code.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements linear least squares training support:

- length_basis(): Returns number of linear parameters (nbasis * nspecies)
- energy_forces_virial_basis(): Compute basis values for E/F/V
- potential_energy_basis(): Faster energy-only basis computation
- get_linear_parameters(): Extract readout weights as flat vector
- set_linear_parameters!(): Set readout weights from flat vector

The basis functions allow linear fitting via:
  E = dot(E_basis, θ)
  F = F_basis * θ
  V = sum(θ .* V_basis)

Tests verify that linear combination of basis with current parameters
reproduces the direct energy/forces/virial evaluation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use compile-time loop unrolling via @generated functions for efficient
summation over calculators. The N type parameter allows generating
specialized code like:

  E_1 = potential_energy(sys, calc.calcs[1])
  E_2 = potential_energy(sys, calc.calcs[2])
  return E_1 + E_2

instead of runtime loops. This enables better inlining and type inference
when the number of calculators is small and known at compile time.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- benchmark_comparison.jl: Energy benchmarks (CPU + GPU)
- benchmark_forces.jl: Forces benchmarks (CPU only)

Results show:
- Energy: ETACE CPU 1.7-2.2x faster, ETACE GPU up to 87x faster
- Forces: ETACE CPU 7.7-11.4x faster than ACE

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Phase 1, 2, 5 complete
- Phase 3 (E0/PairModel) assigned to maintainer
- Added benchmark results: GPU up to 87x faster, forces 8-11x faster
- Documented all new files and test coverage

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add ACEfit.basis_size dispatch for ETACEPotential
- Add ACEfit to test/Project.toml
- Test training assembly on multiple structures (5 random)
- Test multi-species parameter ordering (pure Si, pure O, mixed)
- Verify species-specific basis contributions are correctly separated
- Fix soft scope warnings with local declarations

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add ACEpotentials.ETModels to autodocs in all_exported.md
- Add comprehensive integration test for ETACE calculators based on
  test_silicon workflow
- Tests verify energy/forces/virial consistency with original ACE
- Tests verify training basis assembly and StackedCalculator composition

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Pre-allocate gradient buffer (∇Ei_buf) outside loop, reuse across iterations
- Eliminate W_unit matrix allocation by directly copying ∂𝔹[:, i, k]
- Pre-compute zero gradient element for species masking
- Pre-extract edge vectors for virial computation
- Use zero() instead of zeros() for SMatrix virial accumulator

Performance improvement (64-atom system):
- Time: 1597ms → 422ms (3.8x faster)
- Memory: 3.4 GiB → 412 MiB (8.4x reduction)

Also fix variable scoping in test_et_silicon.jl for Julia 1.10+
(added `global` keyword for loop variable updates).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
ETACE only implements the many-body basis, not the pair potential.
The test was incorrectly comparing full ACE (with pair) against ETACE.

Changes:
- Create model_nopair with Wpair=0 for fair comparison
- Compare ETACE against ACE many-body contribution only
- Fix E0Model constructor: use Symbol key (:Si) not Int (14)
- Skip isolated atoms in all tests (ETACE requires >= 2 atoms)
- Update test comments and summary to clarify scope

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Extract core helpers (_core_site_energies, _core_site_grads) for shared
  evaluation logic between ETACEPotential and WrappedETACE
- Refactor WrappedETACE and ETACEPotential to use core helpers
- Simplify stackedcalc.jl: replace manual AST building (_gen_sum,
  _gen_broadcast_sum) with idiomatic @nexprs/@ntuple from Base.Cartesian
- Net reduction of ~50 lines while maintaining identical behavior

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Removes duplicate AtomsCalculators interface and evaluation logic by:
- Making WrappedETACE mutable with co_ps field for training
- Defining ETACEPotential as const alias for WrappedSiteCalculator{WrappedETACE}
- Removing duplicate _evaluate_* functions and AtomsCalculators methods
- Adding accessor helpers (_etace, _ps, _st) for training functions

The evaluation now flows through WrappedSiteCalculator's generic methods
which call site_energies/site_energy_grads on the WrappedETACE model.
This reduces ~66 lines of duplicated code.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove duplicate E0Model in favor of upstream ETOneBody
- Unify WrappedSiteCalculator to work with all ETACE-pattern models directly
- Document that ETACE, ETPairModel, ETOneBody share identical interface
- Plan Phase 6 refactoring to eliminate WrappedETACE indirection
- Update architecture diagrams showing target unified structure

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
jameskermode and others added 12 commits December 30, 2025 21:24
- Refactor WrappedSiteCalculator to store ps, st, rcut directly
- Remove E0Model (use upstream ETOneBody instead)
- Remove WrappedETACE (functionality merged into WrappedSiteCalculator)
- Remove old SiteEnergyModel interface (site_energies, site_energy_grads)
- Update ETACEPotential to be type alias for WrappedSiteCalculator{ETACE}
- Update training assembly accessors for new flat structure

All ETACE-pattern models (ETACE, ETPairModel, ETOneBody) now work
directly with WrappedSiteCalculator via their common interface:
- model(G, ps, st) -> (site_energies, st)
- site_grads(model, G, ps, st) -> edge gradients

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add convert2et_full() to convert complete ACE model to StackedCalculator
  - Combines ETOneBody (E0), ETPairModel (pair), and ETACE (many-body)
  - Returns StackedCalculator compatible with AtomsCalculators

- Add _copy_ace_params!() for many-body parameter copying
  - Copies radial basis Wnlq parameters
  - Copies readout WB parameters

- Add _copy_pair_params!() for pair potential parameter copying
  - Based on mapping from test/etmodels/test_etpair.jl
  - Copies pair radial basis and readout parameters

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Replace E0Model tests with ETOneBody (upstream) tests
- Remove WrappedETACE tests (no longer exists)
- Update WrappedSiteCalculator tests for new (model, ps, st, rcut) signature
- Update ETACEPotential construction test for direct model access
- Update silicon integration test to use ETOneBody and unified wrapper

Tests now use upstream models directly:
- ETOneBody instead of E0Model
- WrappedSiteCalculator(model, ps, st, rcut) instead of nested wrappers

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Return NamedTuple with empty edge_data matching ETACE/ETPairModel interface
- Remove unnecessary Zygote import (hand-coded since gradient is trivially zero)
- Update test to check isempty(∂G.edge_data) instead of zero norms

The calling code in et_calculators.jl checks isempty(∂G.edge_data) and
returns zero forces/virial when true, which is the correct behavior for
ETOneBody (energy depends only on atom types, not positions).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Comment out Pkg.activate in test_committee.jl that was switching
  away from the test project environment
- Update test_etonebody.jl gradient test to check for NamedTuple
  return type with .edge_data field (matching the updated ETOneBody
  interface that returns consistent structure with ETACE/ETPairModel)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
ET ACE (site_basis_jacobian):
- Remove ps.basis and st.basis from _jacobian_X call
- The upstream ET._jacobian_X for SparseACEbasis only takes 5 args:
  (basis, Rnl, Ylm, dRnl, dYlm)

ET Pair (site_grads):
- Implement hand-coded gradient using evaluate_ed instead of Zygote
- Avoids Zygote InplaceableThunk issue with upstream EdgeEmbed rrule
- Matches the pattern used in site_basis_jacobian

Also inline _apply_etpairmodel to avoid calling site_basis (cleaner).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix z.number → z.atomic_number in E0_dict creation
- Fix _copy_ace_params! path: rembed.basis.linl.W → rembed.post.W
- Fix _copy_pair_params! path: rembed.basis.rbasis.linl.W → rembed.rbasis.post.W
- Add benchmark comparing ACE vs ETACE StackedCalculator for full model

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Address moderator concern about commit 50ed668:
- Avoid forming O(nnodes * nbasis) dense intermediate matrix
- Compute edge gradients directly using loops
- Same numerical results, better memory characteristics

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Bump EquivariantTensors compat to 0.4.2 in main and test Project.toml
- Simplify site_basis_jacobian to use 5-arg _jacobian_X API (requires ET >= 0.4.2)
- Improve ETPairModel site_grads memory efficiency:
  - Avoid O(nnodes * nbasis) intermediate matrix allocation
  - Compute edge gradients directly using loops

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Now that test project uses ET 0.4.2 (which fixed InplaceableThunk bug
in EdgeEmbed rrule), we can use the simpler Zygote-based gradient
computation for ETPairModel.

Also fix _jacobian_X call in ETACE to use 7-arg API (requires ps, st).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add benchmark/gpu_benchmark.jl for GPU energy/forces benchmarks
- Test both many-body only (ETACE) and full model (E0 + Pair + Many-Body)
- Add LuxCUDA to test/Project.toml for GPU testing support
- GPU forces now work with Polynomials4ML v0.5.8+ (bug fix Dec 29, 2024)

Results show significant GPU speedups:
- Many-body energy: 6x-48x speedup (64-800 atoms)
- Many-body forces: 3x-18x speedup (64-800 atoms)
- Full model energy: 3x-36x speedup (64-800 atoms)
- Full model forces: 1x-14x speedup (64-800 atoms)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Mark all core phases as complete
- Add GPU benchmark results (energy and forces)
- Document outstanding work: pair training assembly, ACEfit integration
- Note basis index design discussion needed with maintainer
- Update dependencies: ET 0.4.2, P4ML 0.5.8+

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
jameskermode and others added 2 commits December 31, 2025 16:34
- Add ETPairPotential and ETOneBodyPotential type aliases
- Implement length_basis, energy_forces_virial_basis, potential_energy_basis
  for ETPairPotential, ETOneBodyPotential, and StackedCalculator
- Add get/set_linear_parameters for all calculator types
- Add ACEfit.basis_size dispatch for all calculator types
- Import and extend length_basis, energy_forces_virial_basis from Models
- ACEfit.assemble now works with full ETACE StackedCalculator

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…neBodyPotential, StackedCalculator

Tests cover:
- ETOneBodyPotential returns empty arrays (0 learnable parameters)
- ETPairPotential training assembly with learnable pair basis
- StackedCalculator concatenation of basis from all components
- Linear combinations reproduce energy/forces/virial
- get/set_linear_parameters round-trip
- ACEfit.basis_size dispatch

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@cortner
Copy link
Member

cortner commented Dec 31, 2025

@jameskermode - please move ET compat to 4.3 when you have a moment.

# Gradient w.r.t. positions is always zero.
# Return NamedTuple matching Zygote gradient structure with empty edge_data.
# The calling code checks isempty(∂G.edge_data) and returns zero forces/virial.
function site_grads(l::ETOneBody, X::ET.ETGraph, ps, st)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will lead to type stable code. If you look at the gradients that are produced by the other model components it will be of quite a different form: there will be additional fields in the returned NamedTuple and the edge_data field will contain VState or NamedTuple and not PState variables. I think this may need a bit of iteration.

That said, it might be a bit much hoping for perfect type stability here. I'll follow up on Slack on how to proceed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the correct code might be

return (; edge_data = fill(VState(), length(X.edge_data)))

The reason this should work is that

VState(r = SA[1.0, 2.0, 3.0]) + VState() == VState(r = SA[1.0, 2.0, 3.0])

i.e. an empty VState() just acts like zero when added on any other VState.


@generated function _stacked_energy(sys::AbstractSystem, calc::StackedCalculator{N}) where {N}
quote
@nexprs $N i -> E_i = AtomsCalculators.potential_energy(sys, calc.calcs[i])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refers to my other comment about the OneBody gradient. This is exactly the correct way to ensure type stability because you have N expressions for N calculators rather than a for loop. So I don't think there will be a type stability problem anywhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants