-
Notifications
You must be signed in to change notification settings - Fork 24
CREST Adapter #807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
CREST Adapter #807
Conversation
Fixed Arkane to check LoT by ignoring the year suffix (plans to implement year in the input) AutoTST does not attempt to import autotst (due to its reliance on RMG-Py env).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds a CREST adapter to generate transition state (TS) conformer guesses for H_Abstraction reactions, complementing the existing heuristic approach. The integration includes CREST path detection, conda environment setup, and job submission/monitoring workflows. Several installation scripts were improved with better environment management, PATH/PYTHONPATH handling, and more robust activation/deactivation hooks. The Arkane module gained flexible model chemistry matching that handles hyphenated method names and year suffixes.
Key Changes:
- New CREST adapter module (
arc/job/adapters/ts/crest.py) integrating constrained TS conformer searches into the heuristics workflow - Enhanced Arkane model chemistry lookup with normalization and year-aware matching for RMG quantum corrections database
- Improved installation scripts with better conda environment isolation and conflict resolution (AutoTST/RMG-Py, GCN)
Reviewed changes
Copilot reviewed 19 out of 20 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
arc/job/adapters/ts/crest.py |
New CREST adapter with job setup, submission, monitoring, and H-abstraction atom detection |
arc/job/adapters/ts/heuristics.py |
Integrates CREST TS search alongside heuristics, submits jobs and processes results |
arc/job/adapters/ts/crest_test.py |
Unit tests for CREST input file generation |
arc/statmech/arkane.py |
Adds flexible matching for LevelOfTheory strings, normalizing hyphens and handling year suffixes |
arc/species/converter.py |
New reorder_xyz_string function for coordinate format/unit conversion with deprecation wrapper |
arc/species/converter_test.py |
Tests for the new XYZ reordering function |
arc/settings/settings.py |
CREST executable discovery across conda/mamba/micromamba environments and standalone builds |
devtools/install_crest.sh |
New installer for CREST 2.12 via conda |
devtools/install_autotst.sh |
Enhanced PATH management to isolate AutoTST from RMG-Py conflicts |
devtools/install_rmg.sh |
Improved PATH/PYTHONPATH additions with duplicate prevention |
devtools/install_gcn.sh |
Fixed activation hooks and switched to conda run for pip installs |
devtools/install_pyrdl.sh |
Corrected cmake installation to target arc_env explicitly |
devtools/install_torchani.sh |
Added LOGFILE variable for cleaner log handling |
devtools/install_all.sh |
Added --no-rmg flag, reordered PyRDL after ARC, added CREST to external tools |
devtools/crest_environment.yml |
Conda environment specification for CREST 2.12 |
Makefile |
Added install-crest target |
AGENTS.md |
Repository guidelines file (added but also gitignored - should be removed) |
.gitignore |
Ignores AGENTS.md |
arc/species/species.py |
Added logging for similar TS guesses during clustering |
arc/job/adapters/ts/autotst_ts.py |
Better error handling for missing AutoTST package in subprocess |
Comments suppressed due to low confidence (1)
arc/job/adapters/ts/heuristics.py:289
- When CREST generates TS guesses, they will have different timing information than the heuristics guesses, but all guesses (both Heuristics and CREST) are being assigned
tsg.t0andtsg.execution_timefrom the heuristics TSGuess object. CREST guesses should have their own timing information, or at minimum the execution_time should reflect the actual CREST job duration, not the heuristics generation time.
t0=tsg.t0,
execution_time=tsg.execution_time,
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
arc/job/adapters/ts/crest.py
Outdated
| atom for atom in closest_atoms if not atom.startswith("H") | ||
| ) | ||
| if_hydrogen_present = any( | ||
| atom | ||
| for atom in closest_atoms |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic in get_h_abs_atoms is incorrect. Line 305-312 checks any(atom for atom in closest_atoms if not atom.startswith("H")) which iterates over keys of closest_atoms dict (strings like "H0", "C1"), not the values. It should check any(atom for atom in atom_neighbours if not atom.startswith("H")) to check the actual neighbors of the hydrogen. The same issue exists on lines 308-312 for if_hydrogen_present.
| atom for atom in closest_atoms if not atom.startswith("H") | |
| ) | |
| if_hydrogen_present = any( | |
| atom | |
| for atom in closest_atoms | |
| atom for atom in atom_neighbours if not atom.startswith("H") | |
| ) | |
| if_hydrogen_present = any( | |
| atom | |
| for atom in atom_neighbours |
| "H": extract_digits(best_occurrence["H"]), | ||
| "A": extract_digits(best_occurrence["A"]), | ||
| "B": extract_digits(best_occurrence["B"]), | ||
| } |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When len(condition_occurrences) == 1, the function should return the single occurrence but doesn't. After the if len(condition_occurrences) > 1: block completes at line 341, control falls through to the else block at line 342. The single occurrence case needs to be handled explicitly, or the logic restructured to return condition_occurrences[0] when there's exactly one match.
| } | |
| } | |
| elif len(condition_occurrences) == 1: | |
| occurrence = condition_occurrences[0] | |
| return { | |
| "H": extract_digits(occurrence["H"]), | |
| "A": extract_digits(occurrence["A"]), | |
| "B": extract_digits(occurrence["B"]), | |
| } |
| Args: | ||
| xyz_str (str): The string xyz format to be converted. | ||
| reverse_atoms (bool, optional): Whether to reverse the atoms and coordinates. |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring is missing the units parameter which is part of the function signature. This parameter should be documented to specify the units of the input coordinates (either 'angstrom' or 'bohr').
| reverse_atoms (bool, optional): Whether to reverse the atoms and coordinates. | |
| reverse_atoms (bool, optional): Whether to reverse the atoms and coordinates. | |
| units (str, optional): The units of the input coordinates (either 'angstrom' or 'bohr'). |
| } | ||
| else: | ||
|
|
||
| # Check the all the hydrogen atoms, and see the closest two heavy atoms and aggregate their distances to determine which Hyodrogen atom has the lowest distance aggregate |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling error: "Hyodrogen" should be "Hydrogen".
| # Check the all the hydrogen atoms, and see the closest two heavy atoms and aggregate their distances to determine which Hyodrogen atom has the lowest distance aggregate | |
| # Check the all the hydrogen atoms, and see the closest two heavy atoms and aggregate their distances to determine which Hydrogen atom has the lowest distance aggregate |
devtools/install_autotst.sh
Outdated
| autotst_on () { | ||
| export AUTOTST_ROOT="__AUTOTST_PATH__" | ||
| export AUTOTST_OLD_PATH="$PATH" | ||
| export AUTOTST_OLD_PYTHONPATH="${PYTHONPATH:-}" | ||
| _strip_path () { local needle=":$1:"; local haystack=":$2:"; echo "${haystack//$needle/:}" | sed 's/^://;s/:$//'; } |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _strip_path function is defined inside the heredoc with double-quoted variables, but the function body uses single quotes for string operations. This creates a fragile quoting situation. Consider defining this as a regular shell function outside the heredoc, or ensure consistent quoting to avoid unexpected variable expansion issues.
| autotst_on () { | |
| export AUTOTST_ROOT="__AUTOTST_PATH__" | |
| export AUTOTST_OLD_PATH="$PATH" | |
| export AUTOTST_OLD_PYTHONPATH="${PYTHONPATH:-}" | |
| _strip_path () { local needle=":$1:"; local haystack=":$2:"; echo "${haystack//$needle/:}" | sed 's/^://;s/:$//'; } | |
| _strip_path () { local needle=":$1:"; local haystack=":$2:"; echo "${haystack//$needle/:}" | sed 's/^://;s/:$//'; } | |
| autotst_on () { | |
| export AUTOTST_ROOT="__AUTOTST_PATH__" | |
| export AUTOTST_OLD_PATH="$PATH" | |
| export AUTOTST_OLD_PYTHONPATH="${PYTHONPATH:-}" |
arc/job/adapters/ts/crest.py
Outdated
| distances = dataframe.loc[hydrogen_key, heavy_atoms].sum() | ||
| occurrence_distances.append((occurrence, distances)) | ||
| except KeyError as e: | ||
| print(f"Error accessing distances for occurrence {occurrence}: {e}") |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use logger.error() instead of print() for consistency with the rest of the codebase. The logger is already imported and used elsewhere in this module.
| print(f"Error accessing distances for occurrence {occurrence}: {e}") | |
| logger.error(f"Error accessing distances for occurrence {occurrence}: {e}") |
|
|
||
| def _iter_level_keys_from_section(file_path: str, | ||
| section_start: str, | ||
| section_end: str) -> list[str]: |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type hint list[str] requires Python 3.9+. Since this project supports Python 3.7+ (as seen in environment files), you need to either use List[str] from typing (which is already imported on line 9) or add from __future__ import annotations at the top of the file.
| section_end: str) -> list[str]: | |
| section_end: str) -> List[str]: |
| return method.lower().replace('-', '') | ||
|
|
||
|
|
||
| def _split_method_year(method_norm: str): |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return type annotation is missing for _split_method_year. Based on the docstring examples and implementation, it should return Tuple[str, Optional[int]].
arc/job/adapters/ts/autotst_ts.py
Outdated
| # try: | ||
| # from autotst.reaction import Reaction as AutoTST_Reaction | ||
| # except (ImportError, ModuleNotFoundError): | ||
| # HAS_AUTOTST = False | ||
|
|
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment appears to contain commented-out code.
| # try: | |
| # from autotst.reaction import Reaction as AutoTST_Reaction | |
| # except (ImportError, ModuleNotFoundError): | |
| # HAS_AUTOTST = False |
arc/job/adapters/ts/heuristics.py
Outdated
|
|
||
| import datetime | ||
| import itertools | ||
| import os |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'os' is not used.
| import os |
| else: | ||
| raise ValueError("No valid hydrogen atom found.") | ||
|
|
||
| return {} |
Check warning
Code scanning / CodeQL
Unreachable code Warning
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 29 days ago
The best fix is to delete the final unreachable return {} at line 378 in the get_h_abs_atoms function in arc/job/adapters/ts/crest.py. All execution paths in the function already either return with a dictionary or raise an exception. Removing this line does not alter any functionality, as it cannot be executed under any code path. No imports, method or variable definitions, or other changes are needed—just removal of this single line.
| @@ -375,4 +375,3 @@ | ||
| else: | ||
| raise ValueError("No valid hydrogen atom found.") | ||
|
|
||
| return {} |
Addition of CREST Adapter that complements the heuristic adapter.