Skip to content

feat(pt_expt): implement .pte inference pipeline with dynamic shapes#5284

Merged
wanghan-iapcm merged 16 commits intodeepmodeling:masterfrom
wanghan-iapcm:feat-pt-expt-infer
Mar 5, 2026
Merged

feat(pt_expt): implement .pte inference pipeline with dynamic shapes#5284
wanghan-iapcm merged 16 commits intodeepmodeling:masterfrom
wanghan-iapcm:feat-pt-expt-infer

Conversation

@wanghan-iapcm
Copy link
Collaborator

@wanghan-iapcm wanghan-iapcm commented Mar 3, 2026

Implement the full pt_expt inference pipeline: serialize models to .pte files via torch.export, and load them for inference via DeepPot/DeepEval.

Key changes:

  • Add DeepEval backend for .pte files (deepmd/pt_expt/infer/deep_eval.py)
  • Add serialize/deserialize hooks (deepmd/pt_expt/utils/serialization.py)
  • Wire up backend hooks in deepmd/backend/pt_expt.py
  • Add forward_common_lower_exportable using make_fx + torch.export
  • Support dynamic nframes, nloc, and nall dimensions
  • Fix atomic_virial_corr to use explicit loop instead of vmap

Add xp_take_first_n helper to avoid torch.export contiguity guards on [:, :nloc] slices. When torch.export traces tensor[:, :nloc] on a tensor of size nall, it records a Ne(nall, nloc) guard from the view's contiguity check, which fails when nall == nloc (no PBC). Using torch.index_select instead creates a new tensor, avoiding the guard.

Summary by CodeRabbit

  • New Features

    • PyTorch-exportable inference for exported models with flexible batching and neighbor-list support.
    • Model serialize/deserialize flow that preserves dynamic shapes and includes round-trip import/export.
    • New neighbor-statistics utility for PyTorch.
    • Array utility to take the first N elements with a Torch-optimized path.
  • Improvements

    • Enhanced export/tracing reliability and array handling to preserve dynamic shapes.
    • Backend hooks implemented for PyTorch evaluation paths.
  • Tests

    • Extensive tests for inference consistency, serialization round-trips, and multi-frame scenarios.
  • Chores

    • Added SPDX license headers.

Implement the full pt_expt inference pipeline: serialize models to .pte
files via torch.export, and load them for inference via DeepPot/DeepEval.

Key changes:
- Add DeepEval backend for .pte files (deepmd/pt_expt/infer/deep_eval.py)
- Add serialize/deserialize hooks (deepmd/pt_expt/utils/serialization.py)
- Wire up backend hooks in deepmd/backend/pt_expt.py
- Add forward_common_lower_exportable using make_fx + torch.export
- Support dynamic nframes, nloc, and nall dimensions
- Fix atomic_virial_corr to use explicit loop instead of vmap

Add xp_take_first_n helper to avoid torch.export contiguity guards on
[:, :nloc] slices. When torch.export traces tensor[:, :nloc] on a
tensor of size nall, it records a Ne(nall, nloc) guard from the view's
contiguity check, which fails when nall == nloc (no PBC). Using
torch.index_select instead creates a new tensor, avoiding the guard.
@wanghan-iapcm wanghan-iapcm requested review from iProzd and njzjz March 3, 2026 09:39
@github-actions github-actions bot added the Python label Mar 3, 2026
@dosubot dosubot bot added the new feature label Mar 3, 2026
)

if TYPE_CHECKING:
import ase.neighborlist

Check notice

Code scanning / CodeQL

Unused import Note

Import of 'ase' is not used.
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 3, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a PyTorch-exportable inference backend and supporting utilities: implements DeepEval evaluator, NeighborStat operator, serialization/deserialization for .pte exports, torch-aware array helpers and export tracing changes; replaces direct slicing with xp_take_first_n in multiple modules and adds comprehensive unit tests for pt_expt inference and serialization.

Changes

Cohort / File(s) Summary
Backend property implementations
deepmd/backend/pt_expt.py
Implements previously NotImplemented properties to lazily return DeepEval, NeighborStat, serialize_from_file, and deserialize_to_file.
DeepEval evaluator & tests
deepmd/pt_expt/infer/deep_eval.py, source/tests/pt_expt/infer/test_deep_eval.py
Adds DeepEval class for loading/evaluating .pte exports, neighbor-list handling (native/ASE), metadata accessors, batching, and extensive unit tests validating API and inference consistency.
Neighbor statistics operator
deepmd/pt_expt/utils/neighbor_stat.py
Adds torch-enabled NeighborStatOP wrapper and NeighborStat class exposing neighbor-statistics iteration and execution on DEVICE.
Serialization utilities
deepmd/pt_expt/utils/serialization.py
New module for model (de)serialization: numpy↔JSON conversion, sample-input generation, dynamic-shape spec builder, metadata collection, serialize_from_file and deserialize_to_file for .pte round-trip export/import.
Export tracing & output transform
deepmd/pt_expt/model/make_model.py, deepmd/pt_expt/model/ener_model.py, deepmd/pt_expt/model/transform_output.py
Introduces forward_common_lower_exportable using torch.fx.make_fx symbolic tracing; adapts forward_lower export closure and replaces vmap-based gradient code with explicit per-component loops for better traceability.
Array API & call-site updates
deepmd/dpmodel/array_api.py, deepmd/dpmodel/atomic_model/base_atomic_model.py, deepmd/dpmodel/atomic_model/dp_atomic_model.py, deepmd/dpmodel/utils/env_mat.py
Adds xp_take_first_n and a torch-specific fast path for xp_take_along_axis; replaces direct slicing with xp_take_first_n(..., 1, nloc) at call sites.
License headers
deepmd/pt_expt/infer/__init__.py, source/tests/pt_expt/infer/__init__.py
Adds SPDX license identifier headers.

Sequence Diagram

sequenceDiagram
    participant User as User
    participant DeepEval as DeepEval
    participant ExportedModule as ExportedModule
    participant NeighborList as NeighborList
    participant DataProc as DataProcessor

    User->>DeepEval: __init__(model_file, output_def)
    DeepEval->>ExportedModule: load .pte (module + metadata)
    ExportedModule-->>DeepEval: traced module + metadata
    DeepEval->>DeepEval: reconstruct ModelOutputDef

    User->>DeepEval: eval(coords, cells, atom_types, ...)
    DeepEval->>DataProc: determine nframes & natoms
    DataProc-->>DeepEval: nframes, natoms
    DeepEval->>NeighborList: build neighbor list (native or ASE)
    NeighborList-->>DeepEval: nlist, mapping
    DeepEval->>DataProc: extend coords/types, prepare tensors
    DataProc-->>DeepEval: extended_coord, extended_atype, fparam, aparam
    DeepEval->>ExportedModule: forward_common_lower(extended_coord, extended_atype, nlist, mapping, fparam, aparam)
    ExportedModule-->>DeepEval: internal output dict
    DeepEval->>DataProc: map & reshape outputs per ModelOutputDef
    DataProc-->>DeepEval: formatted numpy results
    DeepEval-->>User: return results (energy, force, virial, ...)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • njzjz
  • iProzd
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 68.06% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(pt_expt): implement .pte inference pipeline with dynamic shapes' accurately describes the main change—adding a complete .pte inference pipeline with dynamic shape support—which aligns with the substantial additions to DeepEval, serialization utilities, and exportable tracing throughout the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
deepmd/pt_expt/model/transform_output.py (1)

31-40: Trim autograd graph retention in the component loop.

You only need to retain the graph for the first two gradient calls; dropping it on the last iteration reduces peak memory.

♻️ Proposed adjustment
     for i in range(3):
         grad_output = torch.zeros_like(sumce)
         grad_output[:, i] = 1.0
         result = torch.autograd.grad(
             [sumce],
             [extended_coord],
             grad_outputs=[grad_output],
             create_graph=False,
-            retain_graph=True,
+            retain_graph=i < 2,
         )[0]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/model/transform_output.py` around lines 31 - 40, The loop that
computes gradients for each component currently sets retain_graph=True for all
iterations, causing unnecessary memory retention; update the torch.autograd.grad
call inside the for i in range(3) loop to set retain_graph=True only for the
first two iterations (e.g., retain_graph=(i < 2)) while leaving
create_graph=False, so the autograd graph is freed on the last gradient
computation; refer to the variables sumce and extended_coord and the
torch.autograd.grad call to locate and change the retain_graph condition.
deepmd/backend/pt_expt.py (1)

94-98: Align NeighborStat typing import with the runtime import path.

Line 25 type-checks NeighborStat from deepmd.utils.neighbor_stat, while this property returns the class from deepmd.dpmodel.utils.neighbor_stat. Keeping both paths consistent will avoid typing drift.

♻️ Proposed typing alignment
 if TYPE_CHECKING:
@@
-    from deepmd.utils.neighbor_stat import (
+    from deepmd.dpmodel.utils.neighbor_stat import (
         NeighborStat,
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/backend/pt_expt.py` around lines 94 - 98, The typing import for
NeighborStat is from deepmd.utils.neighbor_stat but at runtime pt_expt.py
returns the class from deepmd.dpmodel.utils.neighbor_stat; update the type
annotation import to the same module path (deepmd.dpmodel.utils.neighbor_stat)
so the typing and runtime import for NeighborStat match (adjust the import
statement that declares NeighborStat used in annotations to reference
deepmd.dpmodel.utils.neighbor_stat).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/pt_expt/infer/deep_eval.py`:
- Around line 107-114: The constructor for the class accepts neighbor_list but
never stores or uses it; fix by storing the passed neighbor_list (e.g.,
self._neighbor_list = neighbor_list) in __init__ and then update _eval_model
(and the other occurrence around lines 350-357) to use the stored neighbor list
when present instead of always calling build_neighbor_list(...); if
self._neighbor_list is None, fall back to the existing build_neighbor_list(...)
call so existing behavior remains unchanged.
- Around line 230-238: The code assumes fparam/aparam already have shapes
(nframes, dim_fparam) and (nframes, natoms, dim_aparam) but the docstring allows
broadcast forms (dim_fparam,), (natoms, dim_aparam) and (dim_aparam,), so before
converting/reshaping to tensors normalize inputs: detect these cases and expand
them to full shape using np.broadcast_to or repeat (for fparam: if fparam.shape
== (dim_fparam,) -> broadcast to (nframes, dim_fparam); if fparam.shape ==
(nframes, dim_fparam) keep; for aparam: if aparam.shape == (dim_aparam,) ->
broadcast to (nframes, natoms, dim_aparam); if aparam.shape == (natoms,
dim_aparam) -> broadcast to (nframes, natoms, dim_aparam); if already (nframes,
natoms, dim_aparam) keep). Apply the same normalization in both places mentioned
(around the fparam/aparam reshape logic at the blocks near lines 230-238 and
370-382) before any tensor conversion so downstream reshape operations succeed.
- Around line 111-114: The signature currently includes unused parameters *args
and **kwargs which trigger Ruff ARG002; to fix, rename them to *_args and
**_kwargs (or prefix the specific unused named param with an underscore) in the
function/method signature so they remain available for interface compatibility
without lint failures (e.g., change "*args: Any" -> "*_args: Any" and "**kwargs:
Any" -> "**_kwargs: Any"); apply the same change for the second occurrence
mentioned (around the other signature at line ~210) and run ruff check . and
ruff format . before committing.
- Line 413: The tensor conversion at model_predict[odef.name].detach().numpy()
is unsafe when tensors live on CUDA; update the code that builds the output to
move the detached tensor to CPU before calling numpy (i.e., use
detach().cpu().numpy() when converting and reshaping), referencing the
model_predict[odef.name] expression and the DEVICE usage that may place tensors
on CUDA.

In `@deepmd/pt_expt/utils/serialization.py`:
- Around line 137-144: The function _build_dynamic_shapes currently declares
parameters ext_coord, ext_atype, nlist, and mapping but does not use them (Ruff
ARG001); rename those four parameters to start with an underscore (e.g.,
_ext_coord, _ext_atype, _nlist, _mapping) in the _build_dynamic_shapes signature
to mark them as intentionally unused (leave fparam and aparam as-is), update any
internal references if present, and run ruff check . and ruff format . before
committing to keep CI green.
- Around line 23-31: The serializer lambda in _numpy_to_json_serializable
currently only handles np.ndarray and will raise when encountering torch.Tensor;
update the converter used by json.dumps (the function named
_numpy_to_json_serializable) to detect torch.Tensor and convert it to a numpy
array (e.g., via tensor.detach().cpu().numpy()) before producing the same dict
structure (with "@class":"np.ndarray", "@is_variable":True, "dtype", "value") so
torch tensors serialize identically to numpy arrays; ensure you import torch
conditionally or guard if torch isn't installed to avoid import errors.

---

Nitpick comments:
In `@deepmd/backend/pt_expt.py`:
- Around line 94-98: The typing import for NeighborStat is from
deepmd.utils.neighbor_stat but at runtime pt_expt.py returns the class from
deepmd.dpmodel.utils.neighbor_stat; update the type annotation import to the
same module path (deepmd.dpmodel.utils.neighbor_stat) so the typing and runtime
import for NeighborStat match (adjust the import statement that declares
NeighborStat used in annotations to reference
deepmd.dpmodel.utils.neighbor_stat).

In `@deepmd/pt_expt/model/transform_output.py`:
- Around line 31-40: The loop that computes gradients for each component
currently sets retain_graph=True for all iterations, causing unnecessary memory
retention; update the torch.autograd.grad call inside the for i in range(3) loop
to set retain_graph=True only for the first two iterations (e.g.,
retain_graph=(i < 2)) while leaving create_graph=False, so the autograd graph is
freed on the last gradient computation; refer to the variables sumce and
extended_coord and the torch.autograd.grad call to locate and change the
retain_graph condition.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a3db25a and a31fd18.

📒 Files selected for processing (13)
  • deepmd/backend/pt_expt.py
  • deepmd/dpmodel/array_api.py
  • deepmd/dpmodel/atomic_model/base_atomic_model.py
  • deepmd/dpmodel/atomic_model/dp_atomic_model.py
  • deepmd/dpmodel/utils/env_mat.py
  • deepmd/pt_expt/infer/__init__.py
  • deepmd/pt_expt/infer/deep_eval.py
  • deepmd/pt_expt/model/ener_model.py
  • deepmd/pt_expt/model/make_model.py
  • deepmd/pt_expt/model/transform_output.py
  • deepmd/pt_expt/utils/serialization.py
  • source/tests/pt_expt/infer/__init__.py
  • source/tests/pt_expt/infer/test_deep_eval.py

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a31fd18ca7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@codecov
Copy link

codecov bot commented Mar 3, 2026

Codecov Report

❌ Patch coverage is 90.90909% with 34 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.30%. Comparing base (fdeff2b) to head (143d5c4).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
deepmd/pt_expt/infer/deep_eval.py 89.38% 26 Missing ⚠️
deepmd/backend/pt_expt.py 25.00% 6 Missing ⚠️
deepmd/pt_expt/utils/serialization.py 97.26% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5284      +/-   ##
==========================================
+ Coverage   82.25%   82.30%   +0.04%     
==========================================
  Files         765      767       +2     
  Lines       76631    76984     +353     
  Branches     3659     3659              
==========================================
+ Hits        63034    63360     +326     
- Misses      12425    12453      +28     
+ Partials     1172     1171       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@njzjz njzjz linked an issue Mar 3, 2026 that may be closed by this pull request
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
deepmd/pt_expt/infer/deep_eval.py (3)

623-623: ⚠️ Potential issue | 🔴 Critical

Move tensor to CPU before calling .numpy().

If DEVICE is CUDA, model_predict[odef.name].detach().numpy() raises at runtime.

Suggested fix
-                    out = model_predict[odef.name].detach().numpy().reshape(shape)
+                    out = (
+                        model_predict[odef.name]
+                        .detach()
+                        .cpu()
+                        .numpy()
+                        .reshape(shape)
+                    )
#!/bin/bash
# Verify unsafe tensor->numpy calls and CUDA-capable device usage in this file.
rg -n '\.detach\(\)\.numpy\(' deepmd/pt_expt/infer/deep_eval.py
rg -n 'DEVICE|device=' deepmd/pt_expt/infer/deep_eval.py
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/infer/deep_eval.py` at line 623, The call that converts model
output to a NumPy array uses model_predict[odef.name].detach().numpy() which
will fail on CUDA tensors; change the sequence to move the tensor to CPU before
converting (e.g., detach() then cpu()/to('cpu') then numpy()) for the expression
around model_predict[odef.name] in deep_eval.py so that out is created from a
CPU tensor before reshape.

580-592: ⚠️ Potential issue | 🟠 Major

Implement documented fparam/aparam broadcast forms before tensor conversion.

The docstring allows scalar/per-frame/per-atom forms, but current code only supports already-expanded shapes and can fail on valid inputs (for example fparam.shape == (dim_fparam,), aparam.shape == (natoms, dim_aparam) with multiple frames).

Suggested fix
         if fparam is not None:
+            fparam = np.asarray(fparam)
+            dim_fparam = self.get_dim_fparam()
+            if fparam.ndim == 1:
+                fparam = np.broadcast_to(fparam.reshape(1, dim_fparam), (nframes, dim_fparam))
+            elif fparam.ndim == 2 and fparam.shape[0] == 1 and nframes > 1:
+                fparam = np.broadcast_to(fparam, (nframes, dim_fparam))
             fparam_t = torch.tensor(
                 fparam.reshape(nframes, self.get_dim_fparam()),
                 dtype=torch.float64,
                 device=DEVICE,
             )
         else:
             fparam_t = None

         if aparam is not None:
+            aparam = np.asarray(aparam)
+            dim_aparam = self.get_dim_aparam()
+            if aparam.ndim == 1:
+                aparam = np.broadcast_to(
+                    aparam.reshape(1, 1, dim_aparam), (nframes, natoms, dim_aparam)
+                )
+            elif aparam.ndim == 2:
+                aparam = np.broadcast_to(
+                    aparam.reshape(1, natoms, dim_aparam), (nframes, natoms, dim_aparam)
+                )
             aparam_t = torch.tensor(
                 aparam.reshape(nframes, natoms, self.get_dim_aparam()),
                 dtype=torch.float64,
                 device=DEVICE,
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/infer/deep_eval.py` around lines 580 - 592, The code that
converts fparam/aparam to tensors should first accept and expand the documented
broadcast forms (scalar, per-frame, per-atom, and their combinations) before
calling torch.tensor; update the block around fparam/aparam handling (where
fparam_t and aparam_t are created and where get_dim_fparam()/get_dim_aparam(),
nframes and natoms are available) to detect shapes like () or (dim_fparam,),
(nframes, dim_fparam) for fparam and () or (dim_aparam,), (natoms, dim_aparam),
(nframes, natoms, dim_aparam) for aparam, and broadcast/reshape them (using
numpy.reshape/np.broadcast_to or equivalent) into the full (nframes, dim_fparam)
and (nframes, natoms, dim_aparam) shapes before converting to torch.tensor so
all documented input forms work.

112-115: ⚠️ Potential issue | 🟠 Major

Resolve Ruff ARG002 by prefixing intentionally unused interface parameters.

*args, **kwargs (constructor) and **kwargs (eval) are currently unused and flagged.

Suggested fix
-        *args: Any,
+        *_args: Any,
         auto_batch_size: bool | int | AutoBatchSize = True,
         neighbor_list: Optional["ase.neighborlist.NewPrimitiveNeighborList"] = None,
-        **kwargs: Any,
+        **_kwargs: Any,
@@
-        **kwargs: Any,
+        **_kwargs: Any,

As per coding guidelines: **/*.py: Always run ruff check . and ruff format . before committing changes or CI will fail.

Also applies to: 212-212

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/infer/deep_eval.py` around lines 112 - 115, Rename
intentionally unused variadic parameters by prefixing them with an underscore to
satisfy Ruff ARG002: change *args to *_args and **kwargs to **_kwargs in the
class __init__ signature (the constructor that currently has "*args: Any,
auto_batch_size: ... , **kwargs: Any") and do the same for the eval method's
**kwargs parameter referenced elsewhere (the eval method flagged at the other
occurrence). This preserves the interface while marking them as intentionally
unused so Ruff no longer reports ARG002.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@source/tests/pt_expt/infer/test_deep_eval.py`:
- Line 388: The unpacked unused variables ext_atype_ase, mapping_ase (and
mapping_nat at the other occurrence) should be prefixed with an underscore to
satisfy Ruff RUF059; update the unpacking statements (e.g., the tuple assignment
that defines ext_coord_ase, ext_atype_ase, nlist_ase, mapping_ase) to use
_ext_atype_ase and _mapping_ase, and similarly rename mapping_nat to
_mapping_nat where it is unpacked, and then run ruff check/format to ensure no
other references remain to those names.

---

Duplicate comments:
In `@deepmd/pt_expt/infer/deep_eval.py`:
- Line 623: The call that converts model output to a NumPy array uses
model_predict[odef.name].detach().numpy() which will fail on CUDA tensors;
change the sequence to move the tensor to CPU before converting (e.g., detach()
then cpu()/to('cpu') then numpy()) for the expression around
model_predict[odef.name] in deep_eval.py so that out is created from a CPU
tensor before reshape.
- Around line 580-592: The code that converts fparam/aparam to tensors should
first accept and expand the documented broadcast forms (scalar, per-frame,
per-atom, and their combinations) before calling torch.tensor; update the block
around fparam/aparam handling (where fparam_t and aparam_t are created and where
get_dim_fparam()/get_dim_aparam(), nframes and natoms are available) to detect
shapes like () or (dim_fparam,), (nframes, dim_fparam) for fparam and () or
(dim_aparam,), (natoms, dim_aparam), (nframes, natoms, dim_aparam) for aparam,
and broadcast/reshape them (using numpy.reshape/np.broadcast_to or equivalent)
into the full (nframes, dim_fparam) and (nframes, natoms, dim_aparam) shapes
before converting to torch.tensor so all documented input forms work.
- Around line 112-115: Rename intentionally unused variadic parameters by
prefixing them with an underscore to satisfy Ruff ARG002: change *args to *_args
and **kwargs to **_kwargs in the class __init__ signature (the constructor that
currently has "*args: Any, auto_batch_size: ... , **kwargs: Any") and do the
same for the eval method's **kwargs parameter referenced elsewhere (the eval
method flagged at the other occurrence). This preserves the interface while
marking them as intentionally unused so Ruff no longer reports ARG002.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 361ad727-4d3c-42cf-80da-7f316c7dba14

📥 Commits

Reviewing files that changed from the base of the PR and between a31fd18 and ebf90e3.

📒 Files selected for processing (2)
  • deepmd/pt_expt/infer/deep_eval.py
  • source/tests/pt_expt/infer/test_deep_eval.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
deepmd/pt_expt/infer/deep_eval.py (1)

108-116: ⚠️ Potential issue | 🟡 Minor

Prefix intentionally-unused interface parameters to satisfy Ruff ARG002.

*args / **kwargs here are kept for interface compatibility but unused. Rename to underscored forms to keep lint/CI green.

🔧 Suggested diff
     def __init__(
         self,
         model_file: str,
         output_def: ModelOutputDef,
-        *args: Any,
+        *_args: Any,
         auto_batch_size: bool | int | AutoBatchSize = True,
         neighbor_list: Optional["ase.neighborlist.NewPrimitiveNeighborList"] = None,
-        **kwargs: Any,
+        **_kwargs: Any,
     ) -> None:
@@
     def eval(
@@
-        **kwargs: Any,
+        **_kwargs: Any,
     ) -> dict[str, np.ndarray]:

As per coding guidelines: **/*.py: Always run ruff check . and ruff format . before committing changes or CI will fail.

Also applies to: 204-213

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/infer/deep_eval.py` around lines 108 - 116, The __init__
signature in deep_eval.py uses unused variadic parameters *args and **kwargs
which trigger Ruff ARG002; rename them to underscored forms (e.g., _args and
_kwargs) in the DeepEval __init__ (and any other constructors or functions in
this file with intentionally-unused variadic params, including the later
signature around the same class) to signal they are intentionally unused and
keep lint/CI green, then run ruff check/format before committing.
deepmd/pt_expt/utils/serialization.py (1)

19-33: ⚠️ Potential issue | 🟠 Major

Handle torch.Tensor in JSON conversion path to avoid TypeError at save time.

_numpy_to_json_serializable currently converts only np.ndarray. If data contains torch.Tensor values, json.dumps(...) in the extra-files path will fail.

💡 Minimal fix
 def _numpy_to_json_serializable(model_obj: dict) -> dict:
     """Convert numpy arrays in a model dict to JSON-serializable lists."""
+    def _to_jsonable(x: object) -> object:
+        if isinstance(x, torch.Tensor):
+            x = x.detach().cpu().numpy()
+        if isinstance(x, np.ndarray):
+            return {
+                "@class": "np.ndarray",
+                "@is_variable": True,
+                "dtype": x.dtype.name,
+                "value": x.tolist(),
+            }
+        return x
+
     return traverse_model_dict(
         model_obj,
-        lambda x: (
-            {
-                "@class": "np.ndarray",
-                "@is_variable": True,
-                "dtype": x.dtype.name,
-                "value": x.tolist(),
-            }
-            if isinstance(x, np.ndarray)
-            else x
-        ),
+        _to_jsonable,
     )

Based on learnings: In the deepmd-kit PyTorch backend, modifier serialization uses pickle.dumps/pickle.loads because serialized dictionaries can contain torch.Tensor objects that are not JSON-serializable.

Also applies to: 285-290

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/utils/serialization.py` around lines 19 - 33, The JSON
conversion currently only handles np.ndarray in _numpy_to_json_serializable
called via traverse_model_dict; extend it to also detect torch.Tensor (use
isinstance(x, torch.Tensor)), convert tensors to CPU numpy with
x.detach().cpu().numpy(), and then serialize exactly like np.ndarray (include
"@class", "@is_variable", "dtype" from the numpy array and "value" as a list) to
keep the JSON path valid; ensure torch is imported and mirror the same change
for the other JSON-conversion helper in this file that performs the same
array-to-serializable transformation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@deepmd/pt_expt/infer/deep_eval.py`:
- Around line 108-116: The __init__ signature in deep_eval.py uses unused
variadic parameters *args and **kwargs which trigger Ruff ARG002; rename them to
underscored forms (e.g., _args and _kwargs) in the DeepEval __init__ (and any
other constructors or functions in this file with intentionally-unused variadic
params, including the later signature around the same class) to signal they are
intentionally unused and keep lint/CI green, then run ruff check/format before
committing.

In `@deepmd/pt_expt/utils/serialization.py`:
- Around line 19-33: The JSON conversion currently only handles np.ndarray in
_numpy_to_json_serializable called via traverse_model_dict; extend it to also
detect torch.Tensor (use isinstance(x, torch.Tensor)), convert tensors to CPU
numpy with x.detach().cpu().numpy(), and then serialize exactly like np.ndarray
(include "@class", "@is_variable", "dtype" from the numpy array and "value" as a
list) to keep the JSON path valid; ensure torch is imported and mirror the same
change for the other JSON-conversion helper in this file that performs the same
array-to-serializable transformation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 6de0d57e-40fe-4df7-aec7-e82b0d135e17

📥 Commits

Reviewing files that changed from the base of the PR and between ebf90e3 and 8fd1153.

📒 Files selected for processing (2)
  • deepmd/pt_expt/infer/deep_eval.py
  • deepmd/pt_expt/utils/serialization.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@deepmd/pt_expt/utils/neighbor_stat.py`:
- Around line 82-84: The NumPy-to-torch conversions in neighbor_stat.py
(torch.from_numpy(coord), torch.from_numpy(atype), torch.from_numpy(cell)) must
set explicit dtypes to avoid accidental fp64 on GPU; convert coord and cell to
the project's float dtype by passing GLOBAL_PT_FLOAT_PRECISION (or using
.to(DEVICE, dtype=GLOBAL_PT_FLOAT_PRECISION)) and convert atype to torch.long
(dtype=torch.long) before moving to DEVICE, ensuring all three use the DEVICE
constant and follow project conventions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8b0b8d09-e96d-4a83-a57a-f5e2f9c647bc

📥 Commits

Reviewing files that changed from the base of the PR and between 8fd1153 and e065c22.

📒 Files selected for processing (2)
  • deepmd/backend/pt_expt.py
  • deepmd/pt_expt/utils/neighbor_stat.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • deepmd/backend/pt_expt.py

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
deepmd/pt_expt/infer/deep_eval.py (2)

623-626: Unnecessary np.abs(shape) in fallback path.

Shape values should always be positive. Using np.abs(shape) could mask bugs if negative dimensions somehow appear. Consider removing the np.abs() call or adding an assertion to catch invalid shapes explicitly.

🔧 Suggested fix
             else:
                 shape = self._get_output_shape(odef, nframes, natoms)
-                results.append(
-                    np.full(np.abs(shape), np.nan, dtype=GLOBAL_NP_FLOAT_PRECISION)
-                )
+                results.append(
+                    np.full(shape, np.nan, dtype=GLOBAL_NP_FLOAT_PRECISION)
+                )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/infer/deep_eval.py` around lines 623 - 626, The fallback fills
results with arrays using np.abs(shape) which can mask invalid negative
dimensions; change this to use shape directly and add an explicit validation:
after calling self._get_output_shape(odef, nframes, natoms) validate that all
dimensions are non-negative (e.g., assert or raise a ValueError) and then call
np.full with the validated shape (reference symbols: self._get_output_shape,
shape, results, GLOBAL_NP_FLOAT_PRECISION). Ensure the assertion/error message
clearly indicates which output shape and odef caused the invalid value.

343-356: Consider using torch-based neighbor list construction.

The current design builds neighbor lists using numpy-based dpmodel utilities, then converts to torch tensors. This works correctly, but as previously noted, using PyTorch-native neighbor list construction could:

  1. Avoid the numpy→torch conversion overhead
  2. Keep the entire inference pipeline on the same device (GPU)

If performance is a concern, consider if there's a torch-based equivalent for extend_coord_with_ghosts and build_neighbor_list.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@deepmd/pt_expt/infer/deep_eval.py` around lines 343 - 356, The current code
calls extend_coord_with_ghosts and build_neighbor_list (which are numpy-based)
and then converts outputs to torch, causing overhead and CPU/GPU device
transfers; replace these calls with PyTorch-native implementations or utilities
so neighbor list construction runs on torch tensors on the target device: locate
uses of extend_coord_with_ghosts, build_neighbor_list, coord_normalized,
atom_types, cells, rcut and implement or call a torch-based
extend_coord_with_ghosts + build_neighbor_list that accepts torch tensors
(keeping distinguish_types semantics), returns torch tensors for
extended_coord/extended_atype/mapping and nlist with identical shapes and
indexing, and ensure subsequent code consumes those tensors without any numpy
conversion so all operations remain on GPU.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@deepmd/pt_expt/infer/deep_eval.py`:
- Around line 623-626: The fallback fills results with arrays using
np.abs(shape) which can mask invalid negative dimensions; change this to use
shape directly and add an explicit validation: after calling
self._get_output_shape(odef, nframes, natoms) validate that all dimensions are
non-negative (e.g., assert or raise a ValueError) and then call np.full with the
validated shape (reference symbols: self._get_output_shape, shape, results,
GLOBAL_NP_FLOAT_PRECISION). Ensure the assertion/error message clearly indicates
which output shape and odef caused the invalid value.
- Around line 343-356: The current code calls extend_coord_with_ghosts and
build_neighbor_list (which are numpy-based) and then converts outputs to torch,
causing overhead and CPU/GPU device transfers; replace these calls with
PyTorch-native implementations or utilities so neighbor list construction runs
on torch tensors on the target device: locate uses of extend_coord_with_ghosts,
build_neighbor_list, coord_normalized, atom_types, cells, rcut and implement or
call a torch-based extend_coord_with_ghosts + build_neighbor_list that accepts
torch tensors (keeping distinguish_types semantics), returns torch tensors for
extended_coord/extended_atype/mapping and nlist with identical shapes and
indexing, and ensure subsequent code consumes those tensors without any numpy
conversion so all operations remain on GPU.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d7d1cb15-f653-48e6-957d-ef3a32be35a6

📥 Commits

Reviewing files that changed from the base of the PR and between e065c22 and 8da4064.

📒 Files selected for processing (1)
  • deepmd/pt_expt/infer/deep_eval.py

@wanghan-iapcm wanghan-iapcm requested a review from njzjz March 4, 2026 10:15
@njzjz njzjz linked an issue Mar 4, 2026 that may be closed by this pull request
Copy link
Collaborator

@iProzd iProzd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wanghan-iapcm wanghan-iapcm added this pull request to the merge queue Mar 4, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 4, 2026
@wanghan-iapcm wanghan-iapcm enabled auto-merge March 5, 2026 05:11
@wanghan-iapcm wanghan-iapcm added this pull request to the merge queue Mar 5, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Mar 5, 2026
# Conflicts:
#	deepmd/pt_expt/utils/neighbor_stat.py
#	source/tests/pt_expt/conftest.py
@wanghan-iapcm wanghan-iapcm added this pull request to the merge queue Mar 5, 2026
Merged via the queue into deepmodeling:master with commit 4a29836 Mar 5, 2026
70 checks passed
@wanghan-iapcm wanghan-iapcm deleted the feat-pt-expt-infer branch March 5, 2026 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Model Serialization and Deserialization (PyTorch Exportable) Python Inference Support (PyTorch Exportable + AOTI)

3 participants