Skip to content

Conversation

@kspieks
Copy link
Contributor

@kspieks kspieks commented Nov 27, 2021

This PR brings several improvements

  1. Improves rate estimates by separating the 1,3 sigmatropic rearrangement reaction template into the traditional ketoenol tautomerization and the 1,3 sigmatropic rearrangement. This essentially reverts PR Rename ketoenol to 1,3 sigmatropic rearrangement (Part 3) #532. The Ea for the simple H transfer with a traditional ketoenol reaction is much lower than the Ea for the heavy atom transfers with 1,3 sigmatropic rearrangement. Combining these trees messed up the rate estimates for ketoenol reactions since higher level nodes were averaging wildly different Arrhenius parameters. Now, the average error for ketoenol rate estimation (last line from running the ATG notebook) is much lower than it was before, and even slightly lower when compared to PR Refit Ketoenol Rate Rules #524 which had not changed the template yet.

  2. Adds a thermo library for all species that had been added as new training reactions for these families. Previously, GAV was used to estimate the thermo for the reverse reaction when fitting the rate tree. The new thermo library will improve the reverse rate estimate when fitting the rate tree. All species thermo was calculated at CCSD(T)-F12a/cc-pVDZ-F12//ωB97X-D3/def2-TZVP with AECs and BACs that been added in PRs AEC BAC for ωB97X-D3/def2-TZVP and B97-D3/def2-mSVP #459 and Update CCSD(T)-F12 AECs and BACs #508.

  3. Performs a thorough conformer search for all species. No new training reactions were added in this PR. The existing training reactions calculated at CCSD(T)-F12a/cc-pVDZ-F12//ωB97X-D3/def2-TZVP from PRs Refit Ketoenol Rate Rules #524, Refit Ketoenol Rate Rules (Part 2) #527, and Rename ketoenol to 1,3 sigmatropic rearrangement (Part 3) #532 were separated into the new reaction templates. However, a thorough conformer search was done with ACS for all reactants, transition states, and products. The previous PRs that had added these reactions did not do a conformer search because the RDKit function CalcNumRotatableBonds incorrectly classified these species as rigid. I should've tested it more thoroughly. Upon closer inspection, the documentation's recommended settings do not give the expected result for all molecules, and it is actually better to use RotatableBondSmarts instead.

The zip file contains the Arkane outputs for thermo calculations and Arrhenius fitting. It also contains the notebooks used to fit the rate trees.
notebooks.zip

@kspieks
Copy link
Contributor Author

kspieks commented Nov 30, 2021

Tagging @davidfarinajr since I noticed he is working on a PR for 1,3 sigmatropic rearrangement

Copy link
Contributor

@davidfarinajr davidfarinajr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR @kspieks ! I've had issues with some of the rates for 1,3 sigmatropic rearrangement family recently, and I think this PR will go a long way towards fixing those issues. I had a wide variety of comments on this review 😂 , some are requested changes (squashing a commit for example) and some are just thinking out loud so don't feel like you need to address them all.

2 *3 R!H u0 {1,S} {4,S}
3 *1 R!H u0 {1,D}
4 *4 R u0 {2,S}
4 *4 R!H u0 {2,S}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By not allowing *4 to be H, we will no longer generate reactions with *4H-*3C or *4H-*3N (since *3 is only allowed to be O and S for ketoenol family). Do we want to disallow these types of reactions? It seems the barrier is very high for *4H-*3C (~450 kJ/mol for a rxn I calculated here 0f01765) so maybe they will never be relevant and we shouldn't generate them as estimating them to be faster than they are might cause problems

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea that's a good observation. Technically this PR reduces coverage relative to PR #532. However, the 1,3 sigmatropic template in this PR still expands the coverage of possible reactions that RMG could explore relative to anything before PR #532, so we're not removing legacy reaction templates. Separately, as you point out, these reactions have incredibly high barriers, which almost certainly preclude them from being added to any RMG mechanism. Although we could add training reactions that teach RMG to estimate these ~450 kJ/mol barriers (I had added some in my previous PR), my understanding is that the quality of the estimate (or whether our templates even allow these reactions in the first place) won't have any practical impact since RMG would never include these high barrier reactions.

@@ -1,8 +1,8 @@
#!/usr/bin/env python
# encoding: utf-8

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please squash this commit 1b42f5f into first commit 0a37dea that adds this thermo library?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea absolutely :)

kinetics = ArrheniusBM(A=(3.50759e-62,'s^-1'), n=22.1275, w0=(750402,'J/mol'), E0=(-20429.3,'J/mol'), Tmin=(300,'K'), Tmax=(1500,'K'), uncertainty=RateUncertainty(mu=-0.8800459367924431, var=31.676078103554833, Tref=1000.0, N=56, data_mean=0.0, correlation='Root_N-1R!H-inRing',), comment="""BM rule fitted to 56 training reactions at node Root_N-1R!H-inRing
Total Standard Deviation in ln(k): 13.494121433083981"""),
label = "Root_1R!H-inRing",
kinetics = ArrheniusBM(A=(9.20285e+105,'s^-1'), n=-26.8654, w0=(654062,'J/mol'), E0=(462603,'J/mol'), Tmin=(300,'K'), Tmax=(1500,'K'), uncertainty=RateUncertainty(mu=1.596114397653623, var=262.29968162840566, Tref=1000.0, N=8, data_mean=0.0, correlation='Root_1R!H-inRing',), comment="""BM rule fitted to 8 training reactions at node Root_1R!H-inRing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An A of 9e105 and n of -26 concerns me a bit, especially because temperature range is 300-1500K and someone might run RMG hotter than that (I think large negative n is better than large positive when extrapolating to T>1500, but still a bit concerning). Do we know why there is such a big spread here? Which rxns are fit to this node and if there are any outliers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A larger spread is expected higher up the tree, so perhaps we should put constraints on the fitted params (-10<n<10 for example) so we don't get large negative or positive values that might fit better over the T range, but extrapolate more poorly? Or we could expand the T range for fitting, but maybe your training reaction kinetics are not valid over that range, or use the collision limit at high T as the high T rate, not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for bringing this up. To summarize the discussion you, Matt, and I had this morning at RMG developer hours, we ultimately care about the fit quality so it's okay if some of the fitted parameters are odd (very very odd haha) if they give a good fit. Regarding the temperature range, I will override the hardcoded values (link) from the fit_to_reactions method in the ArrheniusBM class to include 2000K. For ketoenol, 14 of the 17 reactions are valid from 300K to 2000K so I'd argue that it makes sense to ask the other 3 training reactions to extrapolate to 2000K rather than restrict all reactions to 1500 K. Similarly, 11 of the 12 training reactions for 1,3 sigmatropic were also fit to 3 parameter Arrhenius from 300K to 2000K. For now, I left these as separate commits so you can see that changes. Including 2000K actually made the fitted parameters more reasonable. I can squash these before you merge

['FORM_BOND', '*4', 1, '*1'],
])

boundaryAtoms = ["*2", "*3"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need boundary atoms? I'm not sure how these are currently used in RMG. Based on the Root, it looks like this would only match a 1,3 rearrangement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for asking about this at RMG developer hours today. To summarize your discussion, boundary atoms are deprecated. Since ATG, doesn't use them, I will remove this line.

Many reactants in this dataset are not unique. For these cases, only the first instance is included.
@kspieks
Copy link
Contributor Author

kspieks commented Dec 8, 2021

Thanks for reviewing @davidfarinajr ! I appreciate your comments and have left replies. For now, I pushed fixup commits that I can squash once this is ready to be merged

@kspieks kspieks force-pushed the separate_1_3_sigmatropic branch from 742f5d6 to 5d504aa Compare December 8, 2021 01:21
@davidfarinajr
Copy link
Contributor

Thanks for addressing my comments @kspieks ! This PR looks good to me! By including 2000K into the fit, it looks like the rate changes by an order of magnitude over 1500-2000 K (for this particular example anyways),
Screen Shot 2021-12-08 at 11 32 30 AM

So perhaps we should include 2000K point when fitting BM rules for default Trange settings.

I think this is ready to merge, if you want to squash those commits, I'll merge it in. Thanks!

@davidfarinajr davidfarinajr self-requested a review December 8, 2021 16:37
Reinstate the training reactions that were previously in this family (originally added in PR #273). Since this reaction family will exclusively handle ketoenol tautomerization reactions, training reactions from PR #216 will remain in the 1,3 sigmatropic rearrangement reaction family.
R!H ensures this family does not overlap with the new ketoenol template

Also remove the previous ketoenol training reactions that were previously merged into this template
@kspieks kspieks force-pushed the separate_1_3_sigmatropic branch from 5d504aa to 694a7dc Compare December 9, 2021 01:43
@kspieks
Copy link
Contributor Author

kspieks commented Dec 9, 2021

Thanks @davidfarinajr! I squashed the commits
I can make a mini PR tomorrow for adding 2000K to the ArrheniusBM :)

@davidfarinajr
Copy link
Contributor

I almost forgot about adding Ketoenol family back to recommended families 🤦 . Just did that here c47c48b. I'll merge when tests pass this time ✅

@davidfarinajr davidfarinajr merged commit b6cfa54 into main Dec 9, 2021
@davidfarinajr davidfarinajr deleted the separate_1_3_sigmatropic branch December 9, 2021 19:13
@kspieks
Copy link
Contributor Author

kspieks commented Dec 9, 2021

Thanks for reviewing and merging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants