Computational NMR Prediction: A Microreview

March 14, 2023

Recently, I’ve been working to assign the relative configuration of some tricky diastereomers, which has led me to do a bit of a deep dive into the world of computational NMR prediction. Having spent the last week or so researching the current state-of-the-art in simulating experimental 1H NMR spectra, I’m excited to share some of my findings.

My main resource in this quest has been a new NMR benchmarking paper, published on March 7th by authors from Merck (and a few other places). Why this paper in particular? Although there have been many NMR benchmarks, not all of these papers are as useful as they seem. Broadly speaking, there are two ways to benchmark NMR shifts: (1) against high-level computed results or (2) against experimental NMR shifts.

The first strategy seems to be popular with theoretical chemists: NMR shifts at a very high level of theory are presumably very accurate, and so if we can just reproduce those values with a cheap method, we will have solved the NMR prediction problem. Of course, effects due to solvation and vibrational motion will be ignored, but these effects can always be corrected for later. In contrast, the second strategy is more useful for experimental chemists: if the calculation is going to be compared to experimental NMR spectra in CDCl3 solution, the match with experiment is much more important than the gas-phase accuracy of the functional employed.

Not only are these two approaches different in theory, they yield vastly different results in practice, as is nicely illustrated by the case of the double-hybrid functional DSD-PBEP86. DSD-PBEP86 was first reported in 2018 by Frank Neese and coworkers, who found it to be much superior to regular DFT methods or MP2-type wavefunction methods at reproducing CCSD(T) reference data.1 A subsequent benchmark by Kaupp and coworkers looked at a much larger set of compounds and confirmed that DSD-PBEP86 was indeed superior at reproducing CCSD(T) data, with a mean absolute error (MAE) for 1H of 0.06 ppm. In contrast, de Oliveira and coworkers found that DSD-PBEP86 and related double-hybrid methods were much worse at predicting experimental 1H NMR shifts, with a MAE of 0.20 ppm, making them no better than conventional DFT approaches.

The difference between these two mindsets is nicely demonstrated by Kaupp’s paper, which dismisses de Oliveira’s work as suffering from “methodological inadequacies” and states:

[Benchmarking] can be done by comparing approximative calculations to experimental data or to data computed using high-level ab initio methodologies. The latter helps to eliminate a number of factors that often complicate the direct comparison against experiment, such as environmental, ro-vibrational, or thermal contributions (possibly also relativistic effects, see below).

While Kaupp is correct that using gas-phase CCSD(T) data does eliminate “environmental” effects (e.g. from solvent), it’s not clear that these effects always ought to be eliminated! Although directly optimizing a computational method to reproduce a bunch of ill-defined environmental effects is perhaps inelegant, it’s certainly pragmatic.

The authors of the 2023 benchmark create a new set of well-behaved reference compounds that avoid troublesome heavy-atom effects (poorly handled by most conventional calculations) or low-lying conformational equilibria, and re-acquire experimental spectra (in chloroform) for every compound in the set. They then score a wide variety of computational methods against this dataset: functionals, basis sets, implicit solvent methods, and more.

In the end, Cramer’s WP04 functional is found to be best, which is perhaps unsurprising given that it was specifically optimized for the prediction of 1H shifts in chloroform.2 The WP04/6-311++G(2d,p)/PCM(chloroform) level of theory is optima, giving an MAE of 0.08 ppm against experiment, but WP04/jul-CC-PVDZ/PCM(chloroform) is cheaper and not much worse. B3LYP-D3/6-31G(d) works fine for geometry optimization, as do wB97X-D/6-31G(d) and M06-2X/6-31G(d).

Based on these results, my final workflow for predicting experimental proton spectra is:

  1. Run a conformational search using crest.
  2. Optimize each conformer using B3LYP-D3BJ/6-31G(d).
  3. Remove duplicate conformers with cctk.ConformationalEnsemble.eliminate_redundant().
  4. Predict NMR shifts for each conformer using WP04/6-311++G(2d,p)/PCM(chloroform).
  5. Combine conformer predictions through Boltzmann weighting, and apply a linear correction.

For small molecules, this workflow runs extremely quickly (just a few hours from start to finish), and has produced good-quality results that solved the problem I was trying to solve.

Nevertheless, the theoreticians have a point—although WP04 can account for a lot of environmental effects (essentially by overfitting to experimental data), there are plenty of systems for which this pragmatic approach cannot succeed. For instance, the DELTA50 dataset intentionally excludes molecules which might exhibit concentration-dependent aggregation behavior, which includes basically anything capable of hydrogen bonding or π–π stacking! If we hope to get beyond a certain level of accuracy, it seems likely that physically correct models of NMR shieldings, solvent effects, and aggregation will be necessary.

Footnotes

  1. CCSD(T) NMR shifts have to be computed in CFOUR.
  2. The WP04 functional is not technically in Gaussian, but can be employed with the following route card: #p nmr=giao BLYP IOp(3/ 76=1000001189,3/77=0961409999,3/78=0000109999) 6-311++G(2d,p) scrf=(pcm,solvent=chloroform).


If you want email updates when I write new posts, you can subscribe on Substack.