feat(pt_expt): add dp freeze support for pt_expt backend (.pte/.pt2)#5299
feat(pt_expt): add dp freeze support for pt_expt backend (.pte/.pt2)#5299wanghan-iapcm wants to merge 4 commits intodeepmodeling:masterfrom
dp freeze support for pt_expt backend (.pte/.pt2)#5299Conversation
- add support for serialization to .pt2 - deep eval with .pt2 - C/C++ inference api with .pt2 - add ut for python inference - add ut for c++/c inference: se_e2_a, fparam/aparam, multi-frames, dpa1
Replace flat (nf*nall,) indexing in dpa1.py and exclude_mask.py with xp_take_along_axis and xp_take_first_n. The flat indexing creates Ne(nall, nloc) assertions during torch.export tracing that fail when nall == nloc (NoPbc). The new gather-based approach avoids these shape constraints. Unify PT/pt_expt test models so .pth and .pt2 are generated from the same dpmodel weights (type_one_side=True) via gen scripts. Remove deeppot_sea.pth, deeppot_dpa.pth, and fparam_aparam.pth from git tracking — they are now regenerated in CI via convert-models.sh. Changes: - dpa1.py, exclude_mask.py: xp_take_along_axis replaces flat indexing - gen_dpa1.py: generates deeppot_dpa1.pth + .pt2 from dpmodel - gen_fparam_aparam.py: generates fparam_aparam.pth + .pt2 from dpmodel - convert-models.sh: regenerate .pth from yaml and run gen scripts in CI - test_deeppot_dpa_ptexpt.cc: add NoPbc test fixture - test_deeppot_dpa1_pt.cc: new DPA1 .pth test (PBC + NoPbc) - test_deeppot_a_fparam_aparam_pt.cc: update reference values
…ckends Fix a latent bug in all 5 C++ backends (DeepPotPT, DeepPotPTExpt, DeepPotJAX, DeepSpinPT, DeepPotPD) where the ghost-to-local mapping was incorrectly indexed through fwd_map when virtual atoms are present. The old code `mapping[ii] = lmp_list.mapping[fwd_map[ii]]` treats the post-filter index as an original index, causing undefined behavior when fwd_map contains -1 entries for filtered atoms. The fix uses bkw_map to correctly translate: new_idx -> orig_idx -> mapped_orig -> new_mapped. Also fix a use-after-free in DeepPotPTExpt.cc where torch::from_blob referenced a local vector's data after the vector went out of scope. Additionally fix dpmodel code (dpa2.py, repformers.py, make_model.py, nlist.py) to use xp_take_first_n and xp_expand_dims instead of slice indexing that creates shape constraints incompatible with torch.export. Add DPA2 C++ test files for both .pth and .pt2 backends with PBC, NoPbc, and type_sel test cases. Reduce DPA2 test model size for machine-precision agreement between jit and export paths.
Add freeze() function that loads a training checkpoint, reconstructs the model, serializes it, and exports via the existing deserialize_to_file pipeline. Wire the freeze command in the main() CLI dispatcher with checkpoint-dir resolution and automatic .pt2 suffix defaulting.
|
merge after #5298 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6145fe8d69
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| elif FLAGS.command == "freeze": | ||
| if Path(FLAGS.checkpoint_folder).is_dir(): | ||
| checkpoint_path = Path(FLAGS.checkpoint_folder) | ||
| latest_ckpt_file = (checkpoint_path / "checkpoint").read_text() |
There was a problem hiding this comment.
Handle pt_expt checkpoint dirs without
checkpoint file
Reading checkpoint_path / "checkpoint" unconditionally breaks dp --pt_expt freeze -c <dir> for directories produced by the pt_expt trainer, because deepmd/pt_expt/train/training.py writes model.ckpt-<step>.pt plus a model.ckpt.pt symlink and does not create a checkpoint pointer file; in that common flow (including the default -c .), freeze raises FileNotFoundError instead of resolving the latest checkpoint.
Useful? React with 👍 / 👎.
📝 WalkthroughWalkthroughThis PR introduces PyTorch-exportable (.pt2) model format support alongside the existing .pte format, using AOTInductor for compiled model execution. It refactors descriptor array operations using new utility functions (xp_take_first_n, xp_take_along_axis), adds a new PyTorchExportable backend enum, implements full C++ and C APIs for inference, provides model serialization/deserialization with ZIP-based metadata, adds a CLI freeze command for model export, and includes comprehensive test coverage across Python, C++, and C integration layers. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant CLI as Freeze Command
participant Serializer as Serialization
participant Tracer as PyTorch Tracer
participant Compiler as AOTInductor
participant Storage as File Storage
User->>CLI: freeze(checkpoint_path, output.pt2)
CLI->>CLI: Load checkpoint state_dict
CLI->>CLI: Reconstruct model from params
CLI->>Serializer: Collect metadata and serialize
Serializer->>Tracer: Create sample inputs
Tracer->>Tracer: Trace model forward pass
Tracer->>Compiler: Export with dynamic shapes
Compiler->>Compiler: Compile to AOTInductor format
Compiler->>Storage: Package model + metadata + extra/
Storage-->>User: Output .pt2 file (ZIP archive)
sequenceDiagram
actor User
participant DeepPot as DeepPot Loader
participant Format as Format Detector
participant PTExpt as DeepPotPTExpt
participant Loader as AOTILoader
participant Model as Compiled Model
participant Extractor as Output Extractor
User->>DeepPot: Load model file (.pt2)
DeepPot->>Format: Detect file type
Format-->>DeepPot: PyTorchExportable (.pt2)
DeepPot->>PTExpt: Create DeepPotPTExpt instance
PTExpt->>PTExpt: Read metadata from ZIP
PTExpt->>Loader: Initialize AOTIModelPackageLoader
Loader-->>PTExpt: Package loaded
User->>PTExpt: compute(coord, atype, box, ...)
PTExpt->>PTExpt: Build tensors and neighbor lists
PTExpt->>Model: run_model(inputs)
Model-->>PTExpt: Raw flat outputs as tensors
PTExpt->>Extractor: extract_outputs(flat_outputs)
Extractor-->>PTExpt: Keyed output map
PTExpt-->>User: energy, force, virial, ...
sequenceDiagram
participant Descriptor as DPA1/DPA2 Descriptor
participant ArrayAPI as Array API
participant Embedding as Embedding Layer
participant Gathering as Gather Operations
Descriptor->>ArrayAPI: xp_take_first_n(atype_embd_ext, 1, nloc)
ArrayAPI->>Gathering: First-n selection along axis
Gathering-->>Descriptor: atype_embd (local atoms only)
Descriptor->>ArrayAPI: xp_take_along_axis(atype_embd_nlist, indices_2d)
ArrayAPI->>Gathering: 2D along-axis gather
Gathering-->>Descriptor: atype_embd_nlist (neighbor embeddings)
Descriptor->>Descriptor: Reshape for concatenation
Descriptor->>Embedding: Concat embeddings for attention
Embedding-->>Descriptor: Combined features
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 16
🧹 Nitpick comments (7)
source/tests/infer/gen_dpa1.py (1)
143-164: Prefix unused unpacked variables with underscore for ruff compliance.Static analysis flags several unused variables from the
dp.eval()return tuples. Per coding guidelines, prefix them with_to passruff check.♻️ Proposed fix
- e1, f1, v1, ae1, av1 = dp.eval(coord, box, atype, atomic=True) + e1, f1, _v1, ae1, av1 = dp.eval(coord, box, atype, atomic=True) print(f"\n// PBC total energy: {e1[0, 0]:.18e}") # noqa: T201 print_cpp_values("PBC reference values", ae1, f1, av1) # ---- 5. Run inference for NoPbc test ---- - e_np, f_np, v_np, ae_np, av_np = dp.eval(coord, None, atype, atomic=True) + e_np, f_np, _v_np, ae_np, av_np = dp.eval(coord, None, atype, atomic=True) print(f"\n// NoPbc total energy: {e_np[0, 0]:.18e}") # noqa: T201 print_cpp_values("NoPbc reference values", ae_np, f_np, av_np) # ---- 6. Verify .pth gives same results ---- dp_pth = DeepPot(pth_path) - e_pth, f_pth, v_pth, ae_pth, av_pth = dp_pth.eval(coord, box, atype, atomic=True) + e_pth, f_pth, _v_pth, _ae_pth, _av_pth = dp_pth.eval(coord, box, atype, atomic=True) print(f"\n// .pth PBC total energy: {e_pth[0, 0]:.18e}") # noqa: T201 print(f"// .pth vs .pt2 energy diff: {abs(e1[0, 0] - e_pth[0, 0]):.2e}") # noqa: T201 print(f"// .pth vs .pt2 force max diff: {np.max(np.abs(f1 - f_pth)):.2e}") # noqa: T201 - e_pth_np, f_pth_np, _, ae_pth_np, av_pth_np = dp_pth.eval( + e_pth_np, _f_pth_np, _, _ae_pth_np, _av_pth_np = dp_pth.eval( coord, None, atype, atomic=True )As per coding guidelines:
**/*.py: Always runruff check .andruff format .before committing changes or CI will fail.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/infer/gen_dpa1.py` around lines 143 - 164, The tmp variables returned by dp.eval/dp_pth.eval that are unused (v1, v_np, v_pth) should be prefixed with an underscore (e.g., rename v1 -> _v1, v_np -> _v_np, v_pth -> _v_pth) to satisfy ruff; update the unpacking in the calls to DeepPot.eval (the tuples assigned to e1,f1,v1,ae1,av1; e_np,f_np,v_np,ae_np,av_np; and e_pth,f_pth,v_pth,ae_pth,av_pth) accordingly and then run ruff check . and ruff format . before committing.source/tests/pt_expt/infer/test_deep_eval.py (1)
490-791: Mirror the dynamic-shape regression for.pt2as well.This new suite validates fixed-size
.pt2exports, but the refactor in this PR is specifically about avoidingtorch.exportshape constraints. Porting the existing dynamic-shape check here would cover the main failure mode this change is targeting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/tests/pt_expt/infer/test_deep_eval.py` around lines 490 - 791, The test class TestDeepEvalEnerPt2 is only exercising fixed-size .pt2 exports; add a dynamic-shape regression equivalent to the existing dynamic-shape check (used for .pte) to ensure .pt2 DeepPot.eval accepts variable frame counts and avoids torch.export shape constraints—modify TestDeepEvalEnerPt2 to include a new test (e.g., test_pt2_dynamic_shape) that mirrors the logic used in the .pte dynamic-shape test: generate variable nframes (1,3,5...), create coords and cells accordingly, call self.dp.eval(...) with atomic=True and compare outputs (energy, force, virial, atom_energy, atom_virial) across frames to either the direct model forward (self.model.forward) or the .pte reference (self.dp_pte.eval) using np.testing.assert_allclose with tight rtol/atol; ensure the test uses the same RNG seed pattern and atom_types as other tests so it targets the torch.export shape issue.source/api_cc/src/DeepPotPTExpt.cc (5)
805-808: Redundantresize()beforeassign()ondatom_energy.Line 805 resizes
datom_energytonall_realelements, then line 806-808 immediately reassigns it usingassign(), which discards the previous resize. Theresize()call is unnecessary.♻️ Proposed fix
- datom_energy.resize(nall_real, 0.0); datom_energy.assign( cpu_atom_energy_.data_ptr<VALUETYPE>(), cpu_atom_energy_.data_ptr<VALUETYPE>() + cpu_atom_energy_.numel());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/api_cc/src/DeepPotPTExpt.cc` around lines 805 - 808, Remove the unnecessary pre-sizing: the call to datom_energy.resize(nall_real, 0.0) is redundant because datom_energy is immediately overwritten by datom_energy.assign(...). Delete the resize() invocation and keep the datom_energy.assign(...) that copies from cpu_atom_energy_.data_ptr<VALUETYPE>() to correctly set the container contents.
526-535: Redundantcatch (...) { throw; }block.The catch block on lines 532-534 catches all exceptions and immediately rethrows them without any processing. This is redundant and can be removed.
♻️ Proposed fix
DeepPotPTExpt::DeepPotPTExpt(const std::string& model, const int& gpu_rank, const std::string& file_content) : inited(false) { - try { - translate_error([&] { init(model, gpu_rank, file_content); }); - } catch (...) { - throw; - } + translate_error([&] { init(model, gpu_rank, file_content); }); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/api_cc/src/DeepPotPTExpt.cc` around lines 526 - 535, The DeepPotPTExpt constructor contains a redundant catch-all rethrow (catch (...) { throw; }) after calling translate_error which serves no purpose; remove the catch (...) block so the constructor simply calls translate_error([&]{ init(model, gpu_rank, file_content); }); and lets exceptions propagate naturally—update the DeepPotPTExpt::DeepPotPTExpt implementation accordingly, leaving translate_error and init unchanged.
1124-1129: Minor:get_type_mapappends trailing space.The function appends a space after each type name, resulting in a trailing space in the output string. This may be intentional for compatibility but could cause issues if the caller expects clean output.
♻️ Proposed fix if trailing space is unintended
void DeepPotPTExpt::get_type_map(std::string& type_map_str) { - for (const auto& t : type_map) { - type_map_str += t; - type_map_str += " "; + for (size_t i = 0; i < type_map.size(); ++i) { + if (i > 0) { + type_map_str += " "; + } + type_map_str += type_map[i]; } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/api_cc/src/DeepPotPTExpt.cc` around lines 1124 - 1129, DeepPotPTExpt::get_type_map currently appends a space after every entry in type_map which leaves a trailing space in type_map_str; change the logic to only insert spaces between entries (e.g., iterate with an index or detect first/last element) so type_map_str contains the type names separated by single spaces with no trailing space, preserving the function signature and using the existing type_map and type_map_str variables.
158-174:parse_bool()andparse_null()lack validation of actual content.
parse_bool()only checks if the input starts with "true" but assumes anything else is "false" without verifying the actual characters. Similarly,parse_null()blindly advances 4 positions without confirming the input is actually "null".Since this parses trusted model metadata, this is low risk, but adding validation would make it more robust.
♻️ Proposed fix to add validation
JsonValue parse_bool() { JsonValue v; v.type = JsonValue::Bool; if (s_.substr(pos_, 4) == "true") { v.bool_val = true; pos_ += 4; - } else { + } else if (s_.substr(pos_, 5) == "false") { v.bool_val = false; pos_ += 5; + } else { + throw deepmd::deepmd_exception("Invalid JSON boolean"); } return v; } JsonValue parse_null() { + if (s_.substr(pos_, 4) != "null") { + throw deepmd::deepmd_exception("Invalid JSON null"); + } pos_ += 4; return JsonValue(); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/api_cc/src/DeepPotPTExpt.cc` around lines 158 - 174, parse_bool() and parse_null() should validate the actual input before advancing pos_ and returning a JsonValue: check that s_.substr(pos_, ...) matches "true" or "false" (and that there are enough characters remaining) and only then set v.bool_val and advance pos_ by 4 or 5; if neither matches, handle the parse error (e.g., throw/return an error state). Likewise, in parse_null() verify s_.substr(pos_,4) == "null" and enough length before advancing pos_ by 4 and returning JsonValue(), otherwise handle the parse error. Use the existing symbols parse_bool, parse_null, JsonValue, s_, and pos_ to locate and update the logic.
601-614: Silent exception swallowing when setting thread counts.The empty
catch (...)blocks on lines 606-607 and 612-613 silently discard any exceptions fromat::set_num_interop_threads()andat::set_num_threads(). This makes debugging thread configuration issues difficult.Consider logging a warning when thread configuration fails.
♻️ Proposed fix to add warning logging
if (num_inter_nthreads) { try { at::set_num_interop_threads(num_inter_nthreads); } catch (...) { + std::cerr << "WARNING: Failed to set inter-op threads to " + << num_inter_nthreads << std::endl; } } if (num_intra_nthreads) { try { at::set_num_threads(num_intra_nthreads); } catch (...) { + std::cerr << "WARNING: Failed to set intra-op threads to " + << num_intra_nthreads << std::endl; } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@source/api_cc/src/DeepPotPTExpt.cc` around lines 601 - 614, The code currently swallows exceptions from at::set_num_interop_threads and at::set_num_threads via empty catch(...) blocks; replace these with try/catch that logs a warning: catch(const std::exception& e) should log a clear warning including e.what() and the function name (at::set_num_interop_threads or at::set_num_threads) and the attempted thread count, and retain a catch(...) that logs an "unknown exception" warning if non-std exceptions occur; locate these changes around get_env_nthreads handling and ensure the warning uses your project's logging mechanism (or std::cerr if no logger is available).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deepmd/dpmodel/descriptor/dpa1.py`:
- Line 1060: Remove the unused local assignment "nall = atype_ext.shape[1]" from
dpa1.py (the dead variable 'nall' referenced in the review) so the F841 lint
error is resolved; locate the assignment near the code that references
'atype_ext' and delete that single line, then run "ruff check ." and "ruff
format ." to ensure no other lint errors remain.
In `@deepmd/pt_expt/entrypoints/main.py`:
- Around line 247-250: The code reads the checkpoint pointer into
latest_ckpt_file with Path(...).read_text() which can include a trailing
newline, causing FLAGS.model to point to a non-existent path; update the logic
that sets latest_ckpt_file (used when FLAGS.checkpoint_folder is a directory) to
strip whitespace/newlines (e.g., .strip()) before calling
checkpoint_path.joinpath(...) so FLAGS.model is set to the clean filename used
by torch.load().
- Around line 163-208: The freeze function currently ignores the head parameter;
update freeze (the function handling model loading, get_model, ModelWrapper,
serialize, deserialize_to_file) to fail-fast when head is not None by raising a
clear exception (e.g., ValueError) indicating per-head freezing is not yet
supported, so callers are not silently misled; add the check near the start of
freeze before reconstructing the model and ensure the error message references
the unsupported head behavior, then run ruff check . and ruff format . to
satisfy linting/style rules.
In `@deepmd/pt_expt/infer/deep_eval.py`:
- Around line 167-169: The change clears self.exported_module after loading a
.pt2 model which breaks get_model() (which returns exported_module); update the
loader so that after aoti_load_package(model_file) you either set
self.exported_module = self._pt2_runner (so get_model() continues to work) or
update get_model() to detect the .pt2 case and return self._pt2_runner (or raise
a clear UnsupportedOperationError) — locate the code around
aoti_load_package(model_file), the attributes self._pt2_runner and
self.exported_module, and the get_model() method to implement one of these
fixes.
In `@deepmd/pt_expt/utils/serialization.py`:
- Around line 355-362: The code uses the private API
torch._inductor.aoti_compile_and_package (called with exported and
package_path=model_file after _trace_and_export), which is unstable; update the
call site to guard against API changes by either pinning the supported PyTorch
version in packaging/tests or adding a runtime compatibility layer: attempt an
import of aoti_compile_and_package from torch._inductor, fall back to an
alternative public API or raise a clear error that includes the PyTorch version
and helpful instructions; keep references to exported, _trace_and_export, and
model_file so the fallback/compatibility check wraps the exact call site and
logs actionable messages when the private symbol is unavailable or
signature-changed.
In `@source/api_cc/include/common.h`:
- Line 16: The new PyTorchExportable enumerator shifted existing ordinals and
can break binary compatibility for get_backend(); to fix, preserve existing enum
values by moving PyTorchExportable to the end just before Unknown (i.e., keep
the original order TensorFlow, PyTorch, Paddle, JAX, PyTorchExportable, Unknown)
or alternatively give each DPBackend an explicit integral value so TensorFlow,
PyTorch, Paddle, JAX retain their previous numeric values; update the DPBackend
enum definition accordingly to avoid renumbering.
In `@source/api_cc/src/common.cc`:
- Around line 1455-1461: The get_backend function (deepmd::get_backend)
currently checks for ".pt2" but not ".pte", causing ".pte" filenames to be
treated as unsupported; update the suffix checks in deepmd::get_backend so that
files ending with ".pte" are routed to deepmd::DPBackend::PyTorchExportable
(i.e., treat ".pte" the same as ".pt2")—modify the branch that returns
deepmd::DPBackend::PyTorchExportable to accept either suffix.
In `@source/api_cc/src/DeepPotPTExpt.cc`:
- Around line 65-66: The get() method lacks a bounds check and can read past s_
when the input is malformed; update get() (which uses pos_ and s_) to first
verify pos_ < s_.size() and handle the out-of-bounds case (e.g., return '\0' or
signal an error) instead of unconditionally indexing and incrementing pos_,
ensuring it mirrors the safety of peek().
- Around line 240-263: The ZIP parser can underflow when computing
content.size() - 22; before the EOCD search, add a guard that verifies
content.size() >= 22 (or return/throw an error) and avoid computing
content.size() - 22 when the file is smaller; update the EOCD search around
variables content and eocd_pos (and the loop that starts from content.size() -
22) to only run when the size check passes and otherwise throw the same
deepmd::deepmd_exception("Invalid ZIP file: " + zip_path) or a more specific
message.
- Around line 641-648: The size check in DeepPotPTExpt::extract_outputs
currently uses assert(), which is compiled out in release builds; replace it
with a runtime check (e.g., an if that compares flat_outputs.size() and
output_keys.size()) and throw a descriptive exception (std::runtime_error or
similar) if they differ to prevent undefined behavior when populating output_map
using output_keys and flat_outputs; keep the rest of the function logic the
same.
In `@source/api_cc/tests/test_deeppot_dpa_ptexpt.cc`:
- Around line 378-382: The loop that populates coord_vir only copies one scalar
per virtual atom into a buffer sized nvir*3, producing malformed virtual atom
coordinates; update the loop that fills coord_vir (and keeps atype_vir) to copy
all three coordinates per virtual atom by writing coord_vir[3*ii + 0..2] from
coord[3*ii + 0..2] for each ii (0..nvir-1) so each synthetic atom has full x,y,z
positions.
In `@source/api_cc/tests/test_deeppot_dpa2_ptexpt.cc`:
- Around line 387-391: The coord_vir buffer is sized for nvir*3 but the loop
only copies one scalar per virtual atom, leaving two coordinates zero; update
the loop that fills coord_vir (the section using coord_vir, coord, and nvir) to
copy all three coordinates per atom (e.g., iterate j=0..2 and assign
coord_vir[3*ii + j] = coord[3*ii + j] or memcpy three values) so each virtual
atom gets its full (x,y,z) position rather than a single scalar.
In `@source/api_cc/tests/test_deeppot_ptexpt.cc`:
- Around line 360-364: The loop that initializes coord_vir only copies a single
coordinate per virtual atom into a buffer sized nvir*3; update the
initialization for coord_vir so that for each virtual atom index ii you copy all
three coordinates (e.g., assign coord_vir[3*ii + 0..2] = coord[0..2] or use
std::copy/memcpy) so the 3D coordinates for each synthetic virtual atom are
properly populated; keep atype_vir initialization as-is.
In `@source/tests/infer/gen_dpa2.py`:
- Around line 127-135: The current flow attempts
pt_deserialize_to_file(pth_path, ...) and then later still checks
os.path.exists(pth_path), which can cause loading a stale deeppot_dpa2.pth if
the export failed; change the logic so that after catching RuntimeError from
pt_deserialize_to_file you do not proceed to verify or load pth_path (e.g., set
a local success flag or return/continue) and only perform the
os.path.exists(pth_path) verification when the export actually succeeded; apply
the same fix for the second occurrence that mirrors lines 181-198.
- Around line 171-190: The lint error comes from unpacking unused returns from
dp.eval/dp_pth.eval; update the unused variable names by prefixing them with an
underscore (e.g., change v_np -> _v_np, ae_np -> _ae_np, av_np -> _av_np in the
dp.eval call and similarly use _ for the unused third return in dp_pth.eval and
prefix any unused ae_pth_np/av_pth_np as needed) so that eval(...) still returns
all values but ruff RUF059 is satisfied; locate uses around the dp.eval and
dp_pth.eval calls and rename only the unused symbols (leave e_np, f_np, e_pth,
f_pth etc. as-is).
In `@source/tests/infer/gen_fparam_aparam.py`:
- Around line 127-177: The new test generator declares unused variables (v,
nloc, ae_pth, av_pth) which causes ruff F841; fix by either using or marking
them as intentionally unused—rename them with a leading underscore (e.g., _v,
_nloc, _ae_pth, _av_pth) or replace their assignments with a single
underscore/tuple discard when calling DeepPot.eval (e.g., e_pth, f_pth, _ = ...
or e_pth, f_pth, *_ = ...) so that functions/methods DeepPot.eval and local
names v, nloc, ae_pth, av_pth no longer trigger unused-variable errors.
---
Nitpick comments:
In `@source/api_cc/src/DeepPotPTExpt.cc`:
- Around line 805-808: Remove the unnecessary pre-sizing: the call to
datom_energy.resize(nall_real, 0.0) is redundant because datom_energy is
immediately overwritten by datom_energy.assign(...). Delete the resize()
invocation and keep the datom_energy.assign(...) that copies from
cpu_atom_energy_.data_ptr<VALUETYPE>() to correctly set the container contents.
- Around line 526-535: The DeepPotPTExpt constructor contains a redundant
catch-all rethrow (catch (...) { throw; }) after calling translate_error which
serves no purpose; remove the catch (...) block so the constructor simply calls
translate_error([&]{ init(model, gpu_rank, file_content); }); and lets
exceptions propagate naturally—update the DeepPotPTExpt::DeepPotPTExpt
implementation accordingly, leaving translate_error and init unchanged.
- Around line 1124-1129: DeepPotPTExpt::get_type_map currently appends a space
after every entry in type_map which leaves a trailing space in type_map_str;
change the logic to only insert spaces between entries (e.g., iterate with an
index or detect first/last element) so type_map_str contains the type names
separated by single spaces with no trailing space, preserving the function
signature and using the existing type_map and type_map_str variables.
- Around line 158-174: parse_bool() and parse_null() should validate the actual
input before advancing pos_ and returning a JsonValue: check that
s_.substr(pos_, ...) matches "true" or "false" (and that there are enough
characters remaining) and only then set v.bool_val and advance pos_ by 4 or 5;
if neither matches, handle the parse error (e.g., throw/return an error state).
Likewise, in parse_null() verify s_.substr(pos_,4) == "null" and enough length
before advancing pos_ by 4 and returning JsonValue(), otherwise handle the parse
error. Use the existing symbols parse_bool, parse_null, JsonValue, s_, and pos_
to locate and update the logic.
- Around line 601-614: The code currently swallows exceptions from
at::set_num_interop_threads and at::set_num_threads via empty catch(...) blocks;
replace these with try/catch that logs a warning: catch(const std::exception& e)
should log a clear warning including e.what() and the function name
(at::set_num_interop_threads or at::set_num_threads) and the attempted thread
count, and retain a catch(...) that logs an "unknown exception" warning if
non-std exceptions occur; locate these changes around get_env_nthreads handling
and ensure the warning uses your project's logging mechanism (or std::cerr if no
logger is available).
In `@source/tests/infer/gen_dpa1.py`:
- Around line 143-164: The tmp variables returned by dp.eval/dp_pth.eval that
are unused (v1, v_np, v_pth) should be prefixed with an underscore (e.g., rename
v1 -> _v1, v_np -> _v_np, v_pth -> _v_pth) to satisfy ruff; update the unpacking
in the calls to DeepPot.eval (the tuples assigned to e1,f1,v1,ae1,av1;
e_np,f_np,v_np,ae_np,av_np; and e_pth,f_pth,v_pth,ae_pth,av_pth) accordingly and
then run ruff check . and ruff format . before committing.
In `@source/tests/pt_expt/infer/test_deep_eval.py`:
- Around line 490-791: The test class TestDeepEvalEnerPt2 is only exercising
fixed-size .pt2 exports; add a dynamic-shape regression equivalent to the
existing dynamic-shape check (used for .pte) to ensure .pt2 DeepPot.eval accepts
variable frame counts and avoids torch.export shape constraints—modify
TestDeepEvalEnerPt2 to include a new test (e.g., test_pt2_dynamic_shape) that
mirrors the logic used in the .pte dynamic-shape test: generate variable nframes
(1,3,5...), create coords and cells accordingly, call self.dp.eval(...) with
atomic=True and compare outputs (energy, force, virial, atom_energy,
atom_virial) across frames to either the direct model forward
(self.model.forward) or the .pte reference (self.dp_pte.eval) using
np.testing.assert_allclose with tight rtol/atol; ensure the test uses the same
RNG seed pattern and atom_types as other tests so it targets the torch.export
shape issue.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 4c480049-2bf8-44ef-a12e-752b9dcff61a
📒 Files selected for processing (39)
deepmd/backend/pt_expt.pydeepmd/dpmodel/descriptor/dpa1.pydeepmd/dpmodel/descriptor/dpa2.pydeepmd/dpmodel/descriptor/repformers.pydeepmd/dpmodel/model/make_model.pydeepmd/dpmodel/utils/exclude_mask.pydeepmd/dpmodel/utils/nlist.pydeepmd/pt_expt/entrypoints/main.pydeepmd/pt_expt/infer/deep_eval.pydeepmd/pt_expt/utils/serialization.pysource/api_c/tests/test_deeppot_a_fparam_aparam_ptexpt.ccsource/api_c/tests/test_deeppot_a_ptexpt.ccsource/api_cc/include/DeepPotPTExpt.hsource/api_cc/include/common.hsource/api_cc/src/DeepPot.ccsource/api_cc/src/DeepPotJAX.ccsource/api_cc/src/DeepPotPD.ccsource/api_cc/src/DeepPotPT.ccsource/api_cc/src/DeepPotPTExpt.ccsource/api_cc/src/DeepSpinPT.ccsource/api_cc/src/common.ccsource/api_cc/tests/test_deeppot_a_fparam_aparam_nframes_ptexpt.ccsource/api_cc/tests/test_deeppot_a_fparam_aparam_pt.ccsource/api_cc/tests/test_deeppot_a_fparam_aparam_ptexpt.ccsource/api_cc/tests/test_deeppot_dpa1_pt.ccsource/api_cc/tests/test_deeppot_dpa2_pt.ccsource/api_cc/tests/test_deeppot_dpa2_ptexpt.ccsource/api_cc/tests/test_deeppot_dpa_ptexpt.ccsource/api_cc/tests/test_deeppot_ptexpt.ccsource/api_cc/tests/test_utils.hsource/tests/infer/convert-models.shsource/tests/infer/deeppot_dpa.pthsource/tests/infer/deeppot_sea.pthsource/tests/infer/fparam_aparam.pthsource/tests/infer/gen_dpa1.pysource/tests/infer/gen_dpa2.pysource/tests/infer/gen_fparam_aparam.pysource/tests/pt_expt/infer/test_deep_eval.pysource/tests/pt_expt/test_freeze.py
| ) | ||
| nf, nloc, nnei, _ = dmatrix.shape | ||
| atype = atype_ext[:, :nloc] | ||
| nall = atype_ext.shape[1] |
There was a problem hiding this comment.
Drop the dead nall assignment.
Ruff already flags this as F841, so the Python lint job will keep failing until this local is removed.
💡 Suggested fix
- nall = atype_ext.shape[1]
atype = xp_take_first_n(atype_ext, 1, nloc)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| nall = atype_ext.shape[1] | |
| atype = xp_take_first_n(atype_ext, 1, nloc) |
🧰 Tools
🪛 Ruff (0.15.4)
[error] 1060-1060: Local variable nall is assigned to but never used
Remove assignment to unused variable nall
(F841)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deepmd/dpmodel/descriptor/dpa1.py` at line 1060, Remove the unused local
assignment "nall = atype_ext.shape[1]" from dpa1.py (the dead variable 'nall'
referenced in the review) so the F841 lint error is resolved; locate the
assignment near the code that references 'atype_ext' and delete that single
line, then run "ruff check ." and "ruff format ." to ensure no other lint errors
remain.
| def freeze( | ||
| model: str, | ||
| output: str = "frozen_model.pt2", | ||
| head: str | None = None, | ||
| ) -> None: | ||
| """Freeze a pt_expt training checkpoint to .pte or .pt2 format. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| model : str | ||
| Path to the training checkpoint (.pt file). | ||
| output : str | ||
| Path for the frozen model output (.pte or .pt2). | ||
| head : str or None | ||
| Head to freeze in a multi-task model (not yet supported). | ||
| """ | ||
| import torch | ||
|
|
||
| from deepmd.pt_expt.model import ( | ||
| get_model, | ||
| ) | ||
| from deepmd.pt_expt.train.wrapper import ( | ||
| ModelWrapper, | ||
| ) | ||
| from deepmd.pt_expt.utils.env import ( | ||
| DEVICE, | ||
| ) | ||
| from deepmd.pt_expt.utils.serialization import ( | ||
| deserialize_to_file, | ||
| ) | ||
|
|
||
| state_dict = torch.load(model, map_location=DEVICE, weights_only=True) | ||
| if "model" in state_dict: | ||
| state_dict = state_dict["model"] | ||
| model_params = state_dict["_extra_state"]["model_params"] | ||
|
|
||
| # Reconstruct model and load weights | ||
| pt_expt_model = get_model(model_params).to(DEVICE) | ||
| wrapper = ModelWrapper(pt_expt_model) | ||
| wrapper.load_state_dict(state_dict) | ||
| pt_expt_model.eval() | ||
|
|
||
| # Serialize to dict and export | ||
| model_dict = pt_expt_model.serialize() | ||
| deserialize_to_file(output, {"model": model_dict}) | ||
| log.info(f"Saved frozen model to {output}") |
There was a problem hiding this comment.
Don't silently ignore --head.
If the caller passes a head today, freeze() still exports the default/full model and never tells them the request was ignored. Fail fast until per-head freezing is actually implemented.
🛑 Suggested guard
def freeze(
model: str,
output: str = "frozen_model.pt2",
head: str | None = None,
) -> None:
@@
from deepmd.pt_expt.utils.serialization import (
deserialize_to_file,
)
+ if head is not None:
+ raise NotImplementedError(
+ "--head is not supported for the pt_expt freeze command yet."
+ )
+
state_dict = torch.load(model, map_location=DEVICE, weights_only=True)As per coding guidelines, "Always run ruff check . and ruff format . before committing changes or CI will fail."
🧰 Tools
🪛 Ruff (0.15.4)
[warning] 166-166: Unused function argument: head
(ARG001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deepmd/pt_expt/entrypoints/main.py` around lines 163 - 208, The freeze
function currently ignores the head parameter; update freeze (the function
handling model loading, get_model, ModelWrapper, serialize, deserialize_to_file)
to fail-fast when head is not None by raising a clear exception (e.g.,
ValueError) indicating per-head freezing is not yet supported, so callers are
not silently misled; add the check near the start of freeze before
reconstructing the model and ensure the error message references the unsupported
head behavior, then run ruff check . and ruff format . to satisfy linting/style
rules.
| if Path(FLAGS.checkpoint_folder).is_dir(): | ||
| checkpoint_path = Path(FLAGS.checkpoint_folder) | ||
| latest_ckpt_file = (checkpoint_path / "checkpoint").read_text() | ||
| FLAGS.model = str(checkpoint_path.joinpath(latest_ckpt_file)) |
There was a problem hiding this comment.
Strip the checkpoint pointer before joining the path.
Path.read_text() preserves trailing newlines, so a line-based checkpoint file can produce FLAGS.model with a trailing \n and make torch.load() miss the file.
🧩 Suggested fix
- latest_ckpt_file = (checkpoint_path / "checkpoint").read_text()
- FLAGS.model = str(checkpoint_path.joinpath(latest_ckpt_file))
+ latest_ckpt_file = (checkpoint_path / "checkpoint").read_text().strip()
+ FLAGS.model = str(checkpoint_path / latest_ckpt_file)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if Path(FLAGS.checkpoint_folder).is_dir(): | |
| checkpoint_path = Path(FLAGS.checkpoint_folder) | |
| latest_ckpt_file = (checkpoint_path / "checkpoint").read_text() | |
| FLAGS.model = str(checkpoint_path.joinpath(latest_ckpt_file)) | |
| if Path(FLAGS.checkpoint_folder).is_dir(): | |
| checkpoint_path = Path(FLAGS.checkpoint_folder) | |
| latest_ckpt_file = (checkpoint_path / "checkpoint").read_text().strip() | |
| FLAGS.model = str(checkpoint_path / latest_ckpt_file) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deepmd/pt_expt/entrypoints/main.py` around lines 247 - 250, The code reads
the checkpoint pointer into latest_ckpt_file with Path(...).read_text() which
can include a trailing newline, causing FLAGS.model to point to a non-existent
path; update the logic that sets latest_ckpt_file (used when
FLAGS.checkpoint_folder is a directory) to strip whitespace/newlines (e.g.,
.strip()) before calling checkpoint_path.joinpath(...) so FLAGS.model is set to
the clean filename used by torch.load().
| # Load the AOTInductor model | ||
| self._pt2_runner = aoti_load_package(model_file) | ||
| self.exported_module = None |
There was a problem hiding this comment.
Don’t silently break get_model() for .pt2.
Line 169 clears self.exported_module, but get_model() still returns that attribute at Line 710. After this change, .pt2 callers get None instead of the loaded artifact. Either return _pt2_runner from get_model() for this format or make get_model() raise a clear unsupported-operation error.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deepmd/pt_expt/infer/deep_eval.py` around lines 167 - 169, The change clears
self.exported_module after loading a .pt2 model which breaks get_model() (which
returns exported_module); update the loader so that after
aoti_load_package(model_file) you either set self.exported_module =
self._pt2_runner (so get_model() continues to work) or update get_model() to
detect the .pt2 case and return self._pt2_runner (or raise a clear
UnsupportedOperationError) — locate the code around
aoti_load_package(model_file), the attributes self._pt2_runner and
self.exported_module, and the get_model() method to implement one of these
fixes.
| from torch._inductor import ( | ||
| aoti_compile_and_package, | ||
| ) | ||
|
|
||
| exported, metadata, data_for_json, output_keys = _trace_and_export(data) | ||
|
|
||
| # Compile via AOTInductor into a .pt2 package | ||
| aoti_compile_and_package(exported, package_path=model_file) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
PyTorch AOTInductor aoti_compile_and_package stable API
💡 Result:
torch._inductor.aoti_compile_and_package() is not a stable/public API in the PyTorch sense: it lives under the underscored torch._inductor namespace, which PyTorch uses for internal/developer-facing functionality that may change without a backward-compatibility guarantee (even if it’s documented). PyTorch’s docs explicitly warn that private/developer-facing APIs “may be subject to changes without warning.” [3]
What is “stable” around AOTInductor today is not the Python compile entrypoint, but the deployment/runtime ABI surface:
- The official AOTInductor tutorial documents
torch._inductor.aoti_compile_and_package(exported_program, *, package_path=None, inductor_configs=None) -> strandtorch._inductor.aoti_load_package(...)for producing/loading a.pt2artifact (PT2 archive). [1] - For cross-version binary compatibility, PyTorch has a LibTorch Stable ABI effort that includes stable C headers (originating from AOTInductor) and higher-level
torch::stableC++ wrappers, with an explicit ABI stability promise (not modifying headers for at least ~2 years after release). [2][4]
Practical takeaway: if you need “stable,” treat aoti_compile_and_package as version-pinned build tooling, and rely on the LibTorch Stable ABI / torch::stable side for the runtime integration guarantees. [2][4]
Sources: [1] AOTInductor docs (PyTorch) • [2] LibTorch Stable ABI docs (PyTorch) • [3] “Extending PyTorch” docs note on private APIs changing without warning • [4] Torch Stable API C++ docs.
torch._inductor.aoti_compile_and_package is a private/unstable PyTorch API that may change between versions.
While currently the documented way to use AOTInductor for .pt2 packages, torch._inductor is internal and not covered by PyTorch's backward-compatibility guarantees. Consider either pinning the PyTorch version or adding API compatibility checks to handle potential breaking changes in future releases.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@deepmd/pt_expt/utils/serialization.py` around lines 355 - 362, The code uses
the private API torch._inductor.aoti_compile_and_package (called with exported
and package_path=model_file after _trace_and_export), which is unstable; update
the call site to guard against API changes by either pinning the supported
PyTorch version in packaging/tests or adding a runtime compatibility layer:
attempt an import of aoti_compile_and_package from torch._inductor, fall back to
an alternative public API or raise a clear error that includes the PyTorch
version and helpful instructions; keep references to exported,
_trace_and_export, and model_file so the fallback/compatibility check wraps the
exact call site and logs actionable messages when the private symbol is
unavailable or signature-changed.
| std::vector<VALUETYPE> coord_vir(nvir * 3); | ||
| std::vector<int> atype_vir(nvir, 2); | ||
| for (int ii = 0; ii < nvir; ++ii) { | ||
| coord_vir[ii] = coord[ii]; | ||
| } |
There was a problem hiding this comment.
Copy all 3 coordinates for each synthetic virtual atom.
This loop only copies nvir scalars into a nvir * 3 coordinate buffer, so most of the virtual-atom coordinates stay zero. The type-selection test then runs against malformed positions.
💡 Suggested fix
- for (int ii = 0; ii < nvir; ++ii) {
- coord_vir[ii] = coord[ii];
- }
+ std::copy_n(coord.begin(), nvir * 3, coord_vir.begin());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@source/api_cc/tests/test_deeppot_dpa2_ptexpt.cc` around lines 387 - 391, The
coord_vir buffer is sized for nvir*3 but the loop only copies one scalar per
virtual atom, leaving two coordinates zero; update the loop that fills coord_vir
(the section using coord_vir, coord, and nvir) to copy all three coordinates per
atom (e.g., iterate j=0..2 and assign coord_vir[3*ii + j] = coord[3*ii + j] or
memcpy three values) so each virtual atom gets its full (x,y,z) position rather
than a single scalar.
| std::vector<VALUETYPE> coord_vir(nvir * 3); | ||
| std::vector<int> atype_vir(nvir, 2); | ||
| for (int ii = 0; ii < nvir; ++ii) { | ||
| coord_vir[ii] = coord[ii]; | ||
| } |
There was a problem hiding this comment.
Copy all 3 coordinates for each synthetic virtual atom.
This loop only fills the first nvir entries of a nvir * 3 buffer, so the remaining coordinates stay zero. The test then exercises the type-selection path with malformed virtual atoms.
💡 Suggested fix
- for (int ii = 0; ii < nvir; ++ii) {
- coord_vir[ii] = coord[ii];
- }
+ std::copy_n(coord.begin(), nvir * 3, coord_vir.begin());📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| std::vector<VALUETYPE> coord_vir(nvir * 3); | |
| std::vector<int> atype_vir(nvir, 2); | |
| for (int ii = 0; ii < nvir; ++ii) { | |
| coord_vir[ii] = coord[ii]; | |
| } | |
| std::vector<VALUETYPE> coord_vir(nvir * 3); | |
| std::vector<int> atype_vir(nvir, 2); | |
| std::copy_n(coord.begin(), nvir * 3, coord_vir.begin()); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@source/api_cc/tests/test_deeppot_ptexpt.cc` around lines 360 - 364, The loop
that initializes coord_vir only copies a single coordinate per virtual atom into
a buffer sized nvir*3; update the initialization for coord_vir so that for each
virtual atom index ii you copy all three coordinates (e.g., assign
coord_vir[3*ii + 0..2] = coord[0..2] or use std::copy/memcpy) so the 3D
coordinates for each synthetic virtual atom are properly populated; keep
atype_vir initialization as-is.
| pth_path = os.path.join(base_dir, "deeppot_dpa2.pth") | ||
| print(f"Exporting to {pth_path} ...") # noqa: T201 | ||
| try: | ||
| pt_deserialize_to_file(pth_path, copy.deepcopy(data)) | ||
| except RuntimeError as e: | ||
| # Custom ops (e.g. tabulate_fusion_se_t_tebd) may not be available | ||
| # in all build environments; .pth generation is not critical. | ||
| print(f"WARNING: .pth export failed ({e}), skipping.") # noqa: T201 | ||
|
|
There was a problem hiding this comment.
Don’t verify a stale .pth artifact after export failure.
If pt_deserialize_to_file() throws and an old deeppot_dpa2.pth is already present, the later os.path.exists(pth_path) branch will still load that stale artifact. That can silently validate the wrong weights and leave regenerated fixtures out of sync with the fresh .pt2 model.
💡 Suggested fix
pth_path = os.path.join(base_dir, "deeppot_dpa2.pth")
+ pth_generated = False
print(f"Exporting to {pth_path} ...") # noqa: T201
try:
+ if os.path.exists(pth_path):
+ os.remove(pth_path)
pt_deserialize_to_file(pth_path, copy.deepcopy(data))
+ pth_generated = True
except RuntimeError as e:
+ if os.path.exists(pth_path):
+ os.remove(pth_path)
# Custom ops (e.g. tabulate_fusion_se_t_tebd) may not be available
# in all build environments; .pth generation is not critical.
print(f"WARNING: .pth export failed ({e}), skipping.") # noqa: T201
@@
- if os.path.exists(pth_path):
+ if pth_generated:Also applies to: 181-198
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@source/tests/infer/gen_dpa2.py` around lines 127 - 135, The current flow
attempts pt_deserialize_to_file(pth_path, ...) and then later still checks
os.path.exists(pth_path), which can cause loading a stale deeppot_dpa2.pth if
the export failed; change the logic so that after catching RuntimeError from
pt_deserialize_to_file you do not proceed to verify or load pth_path (e.g., set
a local success flag or return/continue) and only perform the
os.path.exists(pth_path) verification when the export actually succeeded; apply
the same fix for the second occurrence that mirrors lines 181-198.
| e1, f1, v1, ae1, av1 = dp.eval(coord, box, atype, atomic=True) | ||
| print(f"\n// PBC total energy: {e1[0, 0]:.18e}") # noqa: T201 | ||
| print_cpp_values("PBC reference values", ae1, f1, av1) | ||
|
|
||
| # ---- 5. Run inference for NoPbc test ---- | ||
| e_np, f_np, v_np, ae_np, av_np = dp.eval(coord, None, atype, atomic=True) | ||
| print(f"\n// NoPbc total energy: {e_np[0, 0]:.18e}") # noqa: T201 | ||
| print_cpp_values("NoPbc reference values", ae_np, f_np, av_np) | ||
|
|
||
| # ---- 6. Verify .pth gives same results ---- | ||
| if os.path.exists(pth_path): | ||
| dp_pth = DeepPot(pth_path) | ||
| e_pth, f_pth, v_pth, ae_pth, av_pth = dp_pth.eval( | ||
| coord, box, atype, atomic=True | ||
| ) | ||
| print(f"\n// .pth PBC total energy: {e_pth[0, 0]:.18e}") # noqa: T201 | ||
| print(f"// .pth vs .pt2 energy diff: {abs(e1[0, 0] - e_pth[0, 0]):.2e}") # noqa: T201 | ||
| print(f"// .pth vs .pt2 force max diff: {np.max(np.abs(f1 - f_pth)):.2e}") # noqa: T201 | ||
|
|
||
| e_pth_np, f_pth_np, _, ae_pth_np, av_pth_np = dp_pth.eval( |
There was a problem hiding this comment.
Prefix the intentionally unused eval outputs.
ruff is already reporting RUF059 here for the unused unpacked values, so this helper will keep failing lint until those names are marked as intentionally unused.
💡 Suggested fix
- e1, f1, v1, ae1, av1 = dp.eval(coord, box, atype, atomic=True)
+ e1, f1, _v1, ae1, av1 = dp.eval(coord, box, atype, atomic=True)
@@
- e_np, f_np, v_np, ae_np, av_np = dp.eval(coord, None, atype, atomic=True)
+ e_np, f_np, _v_np, ae_np, av_np = dp.eval(coord, None, atype, atomic=True)
@@
- e_pth, f_pth, v_pth, ae_pth, av_pth = dp_pth.eval(
+ e_pth, f_pth, _v_pth, _ae_pth, _av_pth = dp_pth.eval(
coord, box, atype, atomic=True
)
@@
- e_pth_np, f_pth_np, _, ae_pth_np, av_pth_np = dp_pth.eval(
+ e_pth_np, _f_pth_np, _, _ae_pth_np, _av_pth_np = dp_pth.eval(
coord, None, atype, atomic=True
)🧰 Tools
🪛 Ruff (0.15.4)
[warning] 171-171: Unpacked variable v1 is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
[warning] 176-176: Unpacked variable v_np is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
[warning] 183-183: Unpacked variable v_pth is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
[warning] 183-183: Unpacked variable ae_pth is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
[warning] 183-183: Unpacked variable av_pth is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
[warning] 190-190: Unpacked variable f_pth_np is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
[warning] 190-190: Unpacked variable ae_pth_np is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
[warning] 190-190: Unpacked variable av_pth_np is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@source/tests/infer/gen_dpa2.py` around lines 171 - 190, The lint error comes
from unpacking unused returns from dp.eval/dp_pth.eval; update the unused
variable names by prefixing them with an underscore (e.g., change v_np -> _v_np,
ae_np -> _ae_np, av_np -> _av_np in the dp.eval call and similarly use _ for the
unused third return in dp_pth.eval and prefix any unused ae_pth_np/av_pth_np as
needed) so that eval(...) still returns all values but ruff RUF059 is satisfied;
locate uses around the dp.eval and dp_pth.eval calls and rename only the unused
symbols (leave e_np, f_np, e_pth, f_pth etc. as-is).
| e, f, v, ae, av = dp.eval( | ||
| coord, | ||
| box, | ||
| atype, | ||
| fparam=fparam_val, | ||
| aparam=aparam_val, | ||
| atomic=True, | ||
| ) | ||
|
|
||
| nloc = 6 | ||
| atom_energy = ae[0, :, 0] | ||
| force = f[0] | ||
| atom_virial = av[0] | ||
|
|
||
| # Print C++ format | ||
| print("\n// ---- Reference values for C++ test (shared by .pth and .pt2) ----") # noqa: T201 | ||
| print(f"// Total energy: {e[0, 0]:.18e}") # noqa: T201 | ||
| print() # noqa: T201 | ||
| print(" std::vector<VALUETYPE> expected_e = {") # noqa: T201 | ||
| for ii, ev in enumerate(atom_energy): | ||
| comma = "," if ii < len(atom_energy) - 1 else "" | ||
| print(f" {ev:.18e}{comma}") # noqa: T201 | ||
| print(" };") # noqa: T201 | ||
|
|
||
| print(" std::vector<VALUETYPE> expected_f = {") # noqa: T201 | ||
| force_flat = force.flatten() | ||
| for ii, fv in enumerate(force_flat): | ||
| comma = "," if ii < len(force_flat) - 1 else "" | ||
| print(f" {fv:.18e}{comma}") # noqa: T201 | ||
| print(" };") # noqa: T201 | ||
|
|
||
| print(" std::vector<VALUETYPE> expected_v = {") # noqa: T201 | ||
| virial_flat = atom_virial.flatten() | ||
| for ii, vv in enumerate(virial_flat): | ||
| comma = "," if ii < len(virial_flat) - 1 else "" | ||
| print(f" {vv:.18e}{comma}") # noqa: T201 | ||
| print(" };") # noqa: T201 | ||
|
|
||
| # ---- 5. Verify .pth gives same results ---- | ||
| dp_pth = DeepPot(pth_path) | ||
| e_pth, f_pth, _, ae_pth, av_pth = dp_pth.eval( | ||
| coord, | ||
| box, | ||
| atype, | ||
| fparam=fparam_val, | ||
| aparam=aparam_val, | ||
| atomic=True, | ||
| ) | ||
| print(f"\n// .pth vs .pt2 energy diff: {abs(e[0, 0] - e_pth[0, 0]):.2e}") # noqa: T201 | ||
| print(f"// .pth vs .pt2 force max diff: {np.max(np.abs(f - f_pth)):.2e}") # noqa: T201 | ||
|
|
There was a problem hiding this comment.
Fix the Ruff failures in this generator.
v, nloc, ae_pth, and av_pth are unused here, so this new file will fail ruff check as-is.
♻️ Suggested cleanup
- e, f, v, ae, av = dp.eval(
+ e, f, _, ae, av = dp.eval(
coord,
box,
atype,
fparam=fparam_val,
aparam=aparam_val,
atomic=True,
)
- nloc = 6
atom_energy = ae[0, :, 0]
force = f[0]
atom_virial = av[0]
@@
- e_pth, f_pth, _, ae_pth, av_pth = dp_pth.eval(
+ e_pth, f_pth, _, _ae_pth, _av_pth = dp_pth.eval(
coord,
box,
atype,
fparam=fparam_val,
aparam=aparam_val,
atomic=True,
)As per coding guidelines, "Always run ruff check . and ruff format . before committing changes or CI will fail."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| e, f, v, ae, av = dp.eval( | |
| coord, | |
| box, | |
| atype, | |
| fparam=fparam_val, | |
| aparam=aparam_val, | |
| atomic=True, | |
| ) | |
| nloc = 6 | |
| atom_energy = ae[0, :, 0] | |
| force = f[0] | |
| atom_virial = av[0] | |
| # Print C++ format | |
| print("\n// ---- Reference values for C++ test (shared by .pth and .pt2) ----") # noqa: T201 | |
| print(f"// Total energy: {e[0, 0]:.18e}") # noqa: T201 | |
| print() # noqa: T201 | |
| print(" std::vector<VALUETYPE> expected_e = {") # noqa: T201 | |
| for ii, ev in enumerate(atom_energy): | |
| comma = "," if ii < len(atom_energy) - 1 else "" | |
| print(f" {ev:.18e}{comma}") # noqa: T201 | |
| print(" };") # noqa: T201 | |
| print(" std::vector<VALUETYPE> expected_f = {") # noqa: T201 | |
| force_flat = force.flatten() | |
| for ii, fv in enumerate(force_flat): | |
| comma = "," if ii < len(force_flat) - 1 else "" | |
| print(f" {fv:.18e}{comma}") # noqa: T201 | |
| print(" };") # noqa: T201 | |
| print(" std::vector<VALUETYPE> expected_v = {") # noqa: T201 | |
| virial_flat = atom_virial.flatten() | |
| for ii, vv in enumerate(virial_flat): | |
| comma = "," if ii < len(virial_flat) - 1 else "" | |
| print(f" {vv:.18e}{comma}") # noqa: T201 | |
| print(" };") # noqa: T201 | |
| # ---- 5. Verify .pth gives same results ---- | |
| dp_pth = DeepPot(pth_path) | |
| e_pth, f_pth, _, ae_pth, av_pth = dp_pth.eval( | |
| coord, | |
| box, | |
| atype, | |
| fparam=fparam_val, | |
| aparam=aparam_val, | |
| atomic=True, | |
| ) | |
| print(f"\n// .pth vs .pt2 energy diff: {abs(e[0, 0] - e_pth[0, 0]):.2e}") # noqa: T201 | |
| print(f"// .pth vs .pt2 force max diff: {np.max(np.abs(f - f_pth)):.2e}") # noqa: T201 | |
| e, f, _, ae, av = dp.eval( | |
| coord, | |
| box, | |
| atype, | |
| fparam=fparam_val, | |
| aparam=aparam_val, | |
| atomic=True, | |
| ) | |
| atom_energy = ae[0, :, 0] | |
| force = f[0] | |
| atom_virial = av[0] | |
| # Print C++ format | |
| print("\n// ---- Reference values for C++ test (shared by .pth and .pt2) ----") # noqa: T201 | |
| print(f"// Total energy: {e[0, 0]:.18e}") # noqa: T201 | |
| print() # noqa: T201 | |
| print(" std::vector<VALUETYPE> expected_e = {") # noqa: T201 | |
| for ii, ev in enumerate(atom_energy): | |
| comma = "," if ii < len(atom_energy) - 1 else "" | |
| print(f" {ev:.18e}{comma}") # noqa: T201 | |
| print(" };") # noqa: T201 | |
| print(" std::vector<VALUETYPE> expected_f = {") # noqa: T201 | |
| force_flat = force.flatten() | |
| for ii, fv in enumerate(force_flat): | |
| comma = "," if ii < len(force_flat) - 1 else "" | |
| print(f" {fv:.18e}{comma}") # noqa: T201 | |
| print(" };") # noqa: T201 | |
| print(" std::vector<VALUETYPE> expected_v = {") # noqa: T201 | |
| virial_flat = atom_virial.flatten() | |
| for ii, vv in enumerate(virial_flat): | |
| comma = "," if ii < len(virial_flat) - 1 else "" | |
| print(f" {vv:.18e}{comma}") # noqa: T201 | |
| print(" };") # noqa: T201 | |
| # ---- 5. Verify .pth gives same results ---- | |
| dp_pth = DeepPot(pth_path) | |
| e_pth, f_pth, _, _ae_pth, _av_pth = dp_pth.eval( | |
| coord, | |
| box, | |
| atype, | |
| fparam=fparam_val, | |
| aparam=aparam_val, | |
| atomic=True, | |
| ) | |
| print(f"\n// .pth vs .pt2 energy diff: {abs(e[0, 0] - e_pth[0, 0]):.2e}") # noqa: T201 | |
| print(f"// .pth vs .pt2 force max diff: {np.max(np.abs(f - f_pth)):.2e}") # noqa: T201 |
🧰 Tools
🪛 Ruff (0.15.4)
[warning] 127-127: Unpacked variable v is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
[error] 136-136: Local variable nloc is assigned to but never used
Remove assignment to unused variable nloc
(F841)
[warning] 167-167: Unpacked variable ae_pth is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
[warning] 167-167: Unpacked variable av_pth is never used
Prefix it with an underscore or any other dummy variable pattern
(RUF059)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@source/tests/infer/gen_fparam_aparam.py` around lines 127 - 177, The new test
generator declares unused variables (v, nloc, ae_pth, av_pth) which causes ruff
F841; fix by either using or marking them as intentionally unused—rename them
with a leading underscore (e.g., _v, _nloc, _ae_pth, _av_pth) or replace their
assignments with a single underscore/tuple discard when calling DeepPot.eval
(e.g., e_pth, f_pth, _ = ... or e_pth, f_pth, *_ = ...) so that
functions/methods DeepPot.eval and local names v, nloc, ae_pth, av_pth no longer
trigger unused-variable errors.
| std::vector<VALUETYPE> frame_aparam; | ||
| if (!aparam.empty()) { | ||
| frame_aparam.assign(aparam.begin() + ff * dap, | ||
| aparam.begin() + (ff + 1) * dap); |
Check failure
Code scanning / CodeQL
Multiplication result converted to larger type High
| } | ||
| std::vector<VALUETYPE> frame_aparam; | ||
| if (!aparam.empty()) { | ||
| frame_aparam.assign(aparam.begin() + ff * dap, |
Check failure
Code scanning / CodeQL
Multiplication result converted to larger type High
| std::vector<VALUETYPE> frame_fparam; | ||
| if (!fparam.empty()) { | ||
| frame_fparam.assign(fparam.begin() + ff * dfp, | ||
| fparam.begin() + (ff + 1) * dfp); |
Check failure
Code scanning / CodeQL
Multiplication result converted to larger type High
| } | ||
| std::vector<VALUETYPE> frame_fparam; | ||
| if (!fparam.empty()) { | ||
| frame_fparam.assign(fparam.begin() + ff * dfp, |
Check failure
Code scanning / CodeQL
Multiplication result converted to larger type High
| std::vector<VALUETYPE> frame_coord(coord.begin() + ff * natoms * 3, | ||
| coord.begin() + (ff + 1) * natoms * 3); | ||
| std::vector<int> frame_atype(atype.begin() + ff * natoms, | ||
| atype.begin() + (ff + 1) * natoms); |
Check failure
Code scanning / CodeQL
Multiplication result converted to larger type High
| atom_virial.clear(); | ||
| } | ||
| for (int ff = 0; ff < nframes; ++ff) { | ||
| std::vector<VALUETYPE> frame_coord(coord.begin() + ff * natoms * 3, |
Check failure
Code scanning / CodeQL
Multiplication result converted to larger type High
| for (int kk = 0; kk < nframes; ++kk) { | ||
| std::copy(in.begin() + kk * nall * ndim, | ||
| in.begin() + kk * nall * ndim + nloc * ndim, | ||
| out.begin() + kk * nloc * ndim); |
Check failure
Code scanning / CodeQL
Multiplication result converted to larger type High
| out.resize(static_cast<size_t>(nframes) * nloc * ndim); | ||
| for (int kk = 0; kk < nframes; ++kk) { | ||
| std::copy(in.begin() + kk * nall * ndim, | ||
| in.begin() + kk * nall * ndim + nloc * ndim, |
Check failure
Code scanning / CodeQL
Multiplication result converted to larger type High
| out.resize(static_cast<size_t>(nframes) * nloc * ndim); | ||
| for (int kk = 0; kk < nframes; ++kk) { | ||
| std::copy(in.begin() + kk * nall * ndim, | ||
| in.begin() + kk * nall * ndim + nloc * ndim, |
Check failure
Code scanning / CodeQL
Multiplication result converted to larger type High
| const int nframes = 1) { | ||
| out.resize(static_cast<size_t>(nframes) * nloc * ndim); | ||
| for (int kk = 0; kk < nframes; ++kk) { | ||
| std::copy(in.begin() + kk * nall * ndim, |
Check failure
Code scanning / CodeQL
Multiplication result converted to larger type High
Add
freeze()function that loads a training checkpoint, reconstructs the model, serializes it, and exports via the existingdeserialize_to_filepipeline. Wire the freeze command in themain()CLI dispatcher with checkpoint-dir resolution and automatic .pt2 suffix defaulting.Summary by CodeRabbit
Release Notes
New Features
Bug Fixes