Skip to content

Conversation

@calvinp0
Copy link
Member

Addition of CREST Adapter that complements the heuristic adapter.

Fixed Arkane to check LoT by ignoring the year suffix (plans to implement year in the input)
AutoTST does not attempt to import autotst (due to its reliance on RMG-Py env).
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a CREST adapter to generate transition state (TS) conformer guesses for H_Abstraction reactions, complementing the existing heuristic approach. The integration includes CREST path detection, conda environment setup, and job submission/monitoring workflows. Several installation scripts were improved with better environment management, PATH/PYTHONPATH handling, and more robust activation/deactivation hooks. The Arkane module gained flexible model chemistry matching that handles hyphenated method names and year suffixes.

Key Changes:

  • New CREST adapter module (arc/job/adapters/ts/crest.py) integrating constrained TS conformer searches into the heuristics workflow
  • Enhanced Arkane model chemistry lookup with normalization and year-aware matching for RMG quantum corrections database
  • Improved installation scripts with better conda environment isolation and conflict resolution (AutoTST/RMG-Py, GCN)

Reviewed changes

Copilot reviewed 19 out of 20 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
arc/job/adapters/ts/crest.py New CREST adapter with job setup, submission, monitoring, and H-abstraction atom detection
arc/job/adapters/ts/heuristics.py Integrates CREST TS search alongside heuristics, submits jobs and processes results
arc/job/adapters/ts/crest_test.py Unit tests for CREST input file generation
arc/statmech/arkane.py Adds flexible matching for LevelOfTheory strings, normalizing hyphens and handling year suffixes
arc/species/converter.py New reorder_xyz_string function for coordinate format/unit conversion with deprecation wrapper
arc/species/converter_test.py Tests for the new XYZ reordering function
arc/settings/settings.py CREST executable discovery across conda/mamba/micromamba environments and standalone builds
devtools/install_crest.sh New installer for CREST 2.12 via conda
devtools/install_autotst.sh Enhanced PATH management to isolate AutoTST from RMG-Py conflicts
devtools/install_rmg.sh Improved PATH/PYTHONPATH additions with duplicate prevention
devtools/install_gcn.sh Fixed activation hooks and switched to conda run for pip installs
devtools/install_pyrdl.sh Corrected cmake installation to target arc_env explicitly
devtools/install_torchani.sh Added LOGFILE variable for cleaner log handling
devtools/install_all.sh Added --no-rmg flag, reordered PyRDL after ARC, added CREST to external tools
devtools/crest_environment.yml Conda environment specification for CREST 2.12
Makefile Added install-crest target
AGENTS.md Repository guidelines file (added but also gitignored - should be removed)
.gitignore Ignores AGENTS.md
arc/species/species.py Added logging for similar TS guesses during clustering
arc/job/adapters/ts/autotst_ts.py Better error handling for missing AutoTST package in subprocess
Comments suppressed due to low confidence (1)

arc/job/adapters/ts/heuristics.py:289

  • When CREST generates TS guesses, they will have different timing information than the heuristics guesses, but all guesses (both Heuristics and CREST) are being assigned tsg.t0 and tsg.execution_time from the heuristics TSGuess object. CREST guesses should have their own timing information, or at minimum the execution_time should reflect the actual CREST job duration, not the heuristics generation time.
                                       t0=tsg.t0,
                                       execution_time=tsg.execution_time,

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 306 to 310
atom for atom in closest_atoms if not atom.startswith("H")
)
if_hydrogen_present = any(
atom
for atom in closest_atoms
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in get_h_abs_atoms is incorrect. Line 305-312 checks any(atom for atom in closest_atoms if not atom.startswith("H")) which iterates over keys of closest_atoms dict (strings like "H0", "C1"), not the values. It should check any(atom for atom in atom_neighbours if not atom.startswith("H")) to check the actual neighbors of the hydrogen. The same issue exists on lines 308-312 for if_hydrogen_present.

Suggested change
atom for atom in closest_atoms if not atom.startswith("H")
)
if_hydrogen_present = any(
atom
for atom in closest_atoms
atom for atom in atom_neighbours if not atom.startswith("H")
)
if_hydrogen_present = any(
atom
for atom in atom_neighbours

Copilot uses AI. Check for mistakes.
"H": extract_digits(best_occurrence["H"]),
"A": extract_digits(best_occurrence["A"]),
"B": extract_digits(best_occurrence["B"]),
}
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When len(condition_occurrences) == 1, the function should return the single occurrence but doesn't. After the if len(condition_occurrences) > 1: block completes at line 341, control falls through to the else block at line 342. The single occurrence case needs to be handled explicitly, or the logic restructured to return condition_occurrences[0] when there's exactly one match.

Suggested change
}
}
elif len(condition_occurrences) == 1:
occurrence = condition_occurrences[0]
return {
"H": extract_digits(occurrence["H"]),
"A": extract_digits(occurrence["A"]),
"B": extract_digits(occurrence["B"]),
}

Copilot uses AI. Check for mistakes.
Args:
xyz_str (str): The string xyz format to be converted.
reverse_atoms (bool, optional): Whether to reverse the atoms and coordinates.
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring is missing the units parameter which is part of the function signature. This parameter should be documented to specify the units of the input coordinates (either 'angstrom' or 'bohr').

Suggested change
reverse_atoms (bool, optional): Whether to reverse the atoms and coordinates.
reverse_atoms (bool, optional): Whether to reverse the atoms and coordinates.
units (str, optional): The units of the input coordinates (either 'angstrom' or 'bohr').

Copilot uses AI. Check for mistakes.
}
else:

# Check the all the hydrogen atoms, and see the closest two heavy atoms and aggregate their distances to determine which Hyodrogen atom has the lowest distance aggregate
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: "Hyodrogen" should be "Hydrogen".

Suggested change
# Check the all the hydrogen atoms, and see the closest two heavy atoms and aggregate their distances to determine which Hyodrogen atom has the lowest distance aggregate
# Check the all the hydrogen atoms, and see the closest two heavy atoms and aggregate their distances to determine which Hydrogen atom has the lowest distance aggregate

Copilot uses AI. Check for mistakes.
Comment on lines 143 to 148
autotst_on () {
export AUTOTST_ROOT="__AUTOTST_PATH__"
export AUTOTST_OLD_PATH="$PATH"
export AUTOTST_OLD_PYTHONPATH="${PYTHONPATH:-}"
_strip_path () { local needle=":$1:"; local haystack=":$2:"; echo "${haystack//$needle/:}" | sed 's/^://;s/:$//'; }
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _strip_path function is defined inside the heredoc with double-quoted variables, but the function body uses single quotes for string operations. This creates a fragile quoting situation. Consider defining this as a regular shell function outside the heredoc, or ensure consistent quoting to avoid unexpected variable expansion issues.

Suggested change
autotst_on () {
export AUTOTST_ROOT="__AUTOTST_PATH__"
export AUTOTST_OLD_PATH="$PATH"
export AUTOTST_OLD_PYTHONPATH="${PYTHONPATH:-}"
_strip_path () { local needle=":$1:"; local haystack=":$2:"; echo "${haystack//$needle/:}" | sed 's/^://;s/:$//'; }
_strip_path () { local needle=":$1:"; local haystack=":$2:"; echo "${haystack//$needle/:}" | sed 's/^://;s/:$//'; }
autotst_on () {
export AUTOTST_ROOT="__AUTOTST_PATH__"
export AUTOTST_OLD_PATH="$PATH"
export AUTOTST_OLD_PYTHONPATH="${PYTHONPATH:-}"

Copilot uses AI. Check for mistakes.
distances = dataframe.loc[hydrogen_key, heavy_atoms].sum()
occurrence_distances.append((occurrence, distances))
except KeyError as e:
print(f"Error accessing distances for occurrence {occurrence}: {e}")
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use logger.error() instead of print() for consistency with the rest of the codebase. The logger is already imported and used elsewhere in this module.

Suggested change
print(f"Error accessing distances for occurrence {occurrence}: {e}")
logger.error(f"Error accessing distances for occurrence {occurrence}: {e}")

Copilot uses AI. Check for mistakes.

def _iter_level_keys_from_section(file_path: str,
section_start: str,
section_end: str) -> list[str]:
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type hint list[str] requires Python 3.9+. Since this project supports Python 3.7+ (as seen in environment files), you need to either use List[str] from typing (which is already imported on line 9) or add from __future__ import annotations at the top of the file.

Suggested change
section_end: str) -> list[str]:
section_end: str) -> List[str]:

Copilot uses AI. Check for mistakes.
return method.lower().replace('-', '')


def _split_method_year(method_norm: str):
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type annotation is missing for _split_method_year. Based on the docstring examples and implementation, it should return Tuple[str, Optional[int]].

Copilot uses AI. Check for mistakes.
Comment on lines 23 to 27
# try:
# from autotst.reaction import Reaction as AutoTST_Reaction
# except (ImportError, ModuleNotFoundError):
# HAS_AUTOTST = False

Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment appears to contain commented-out code.

Suggested change
# try:
# from autotst.reaction import Reaction as AutoTST_Reaction
# except (ImportError, ModuleNotFoundError):
# HAS_AUTOTST = False

Copilot uses AI. Check for mistakes.

import datetime
import itertools
import os
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'os' is not used.

Suggested change
import os

Copilot uses AI. Check for mistakes.
else:
raise ValueError("No valid hydrogen atom found.")

return {}

Check warning

Code scanning / CodeQL

Unreachable code Warning

This statement is unreachable.

Copilot Autofix

AI 29 days ago

The best fix is to delete the final unreachable return {} at line 378 in the get_h_abs_atoms function in arc/job/adapters/ts/crest.py. All execution paths in the function already either return with a dictionary or raise an exception. Removing this line does not alter any functionality, as it cannot be executed under any code path. No imports, method or variable definitions, or other changes are needed—just removal of this single line.


Suggested changeset 1
arc/job/adapters/ts/crest.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/arc/job/adapters/ts/crest.py b/arc/job/adapters/ts/crest.py
--- a/arc/job/adapters/ts/crest.py
+++ b/arc/job/adapters/ts/crest.py
@@ -375,4 +375,3 @@
         else:
             raise ValueError("No valid hydrogen atom found.")
 
-    return {}
EOF
@@ -375,4 +375,3 @@
else:
raise ValueError("No valid hydrogen atom found.")

return {}
Copilot is powered by AI and may make mistakes. Always verify output.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants