The Two Cultures in Atomistic Simulation

July 28, 2023

TW: stereotypes about molecular dynamics.

In his fantastic essay “The Two Cultures,” C. P. Snow observed that there was (in 1950s England) a growing divide between the academic cultures of science and the humanities:

Literary intellectuals at one pole—at the other scientists, and as the most representative, the physical scientists. Between the two a gulf of mutual incomprehension—sometimes (particularly among the young) hostility and dislike, but most of all lack of understanding. They have a curious distorted image of each other. Their attitudes are so different that, even on the level of emotion, they can't find much common ground.

He reflects on the origins of this phenomenon, which he contends is new to the 20th century, and argues that it ought to be opposed:

This polarisation is sheer loss to us all. To us as people, and to our society. It is at the same time practical and intellectual and creative loss, and I repeat that it is false to imagine that those three considerations are clearly separable. But for a moment I want to concentrate on the intellectual loss.

Snow’s essay is wonderful: his portrait of a vanishing cultural intellectual unity should inspire us all, scientists and otherwise, to improve ourselves, and the elegiac prose reminds the reader that even the best cultural institutions are fragile and fleeting things.

I want to make an analogous—but much less powerful—observation about the two cultures present in atomistic simulation. I’ll call these the “QM tribe” and the “MD tribe” for convenience: crudely, “people who use Gaussian/ORCA/Psi4 for their research” and “people who use Schrodinger/AMBER/OpenMM/LAMMPS for their research,” respectively. Although this dichotomy is crude, I contend there are real differences between these two groups, and that their disunity hurts scientific progress.

The Nature of Energy Surfaces

The most fundamental disagreement between these two cultures is in how they think about energy surfaces, I think. Most QM-tribe people think in terms of optimizing to discrete critical points on the potential energy surface: one can perform some sort of gradient-informed optimization to a ground state, or follow negative eigenvalues to a transition state.

Implicit to this assumption is that there exist well-defined critical points on the PES, and that finding such critical points is meaningful and productive. Conformers exist, and many people now compute properties as Boltzmann-weighted averages over conformational ensembles, but this is usually done for 10–100 conformers, not thousands or millions. Entropy and solvation, if they’re considered at all, are viewed as corrections, not key factors: since QM is so frequently used to study high-barrier bond-breaking processes where enthalpic factors dominate, one can often get reasonable results with cartoonish treatments of entropy.

In contrast, MD-tribe scientists generally don’t think about transition states as specific configurations of atoms—rather, a transition state can emerge from some sort of simulation involving biased sampling, but it’s just a position along some abstract reaction coordinate, rather than a structure which can be visualized in CYLView. Any information gleaned is statistical, rather than concretely visual (e.g. “what is the mean number of hydrogen bonds to this oxygen near this transition state”).

Unlike the QM tribe, MD-tribe scientists generally cannot study bond-breaking processes, and so focus on conformational processes (protein folding, aggregation, nucleation, transport) where entropy and solvation are of critical importance: as such, free energy is almost always taken into consideration by MD-tribe scientists, and the underlying PES itself is rarely (to my knowledge) viewed as a worthy topic of study in and of itself.

Molecular Representations

This divide also affects how the two cultures view the task of molecular representation. QM-tribe scientists generally view a list of coordinates and atomic numbers as the most logical representation of a molecule (perhaps with charge and multiplicity information). To the QM tribe, a minimum on the PES represents a structure, and different minima naturally ought to have different representations. Bonding and bond order are not specified, because QM methods can figure that out without assistance (and it’s not uncommon for bonds to change in a QM simulation anyway).

In contrast, people in the MD tribe generally want a molecular representation that’s independent of conformation, since many different conformations will intrinsically be considered. (See Connor Coley’s presentation from a recent MolSSI workshop for a discussion of this.) Thus, it’s common to represent molecules through their topology, where connectivity and bond order are explicitly specified. This allows for some pretty wild simulations of species that would be reactive in a QM simulation, but also means that e.g. tautomers can be a massive problem in MD (ref), since protons can’t equilibrate freely.

For property prediction, an uneasy compromise can be reached wherein one takes a SMILES string, performs a conformational search, and then Boltzmann-averages properties over all different conformers, to return a set of values which are associated only with the SMILES string and not any individual conformation. (Matt Sigman does this, as does Bobby Paton for NMR prediction.) This is a lot of work, though.

“A Gulf Of Mutual Incomprehension”

These differences also become apparent when comparing the software packages that different tribes use. Take, for instance, the task of predicting partial charges for a given small molecule. A QM-tribe scientist would expect these charges to be a function of the geometry, whereas an MD-tribe scientist would want the results to be explicitly geometry-independent (e.g.) so that they can be used for subsequent MD simulations.

The assumptions implicit to these worldviews mean that it’s often quite difficult to go from QM-tribe software packages to MD-tribe software packages or vice versa. I’ve been stymied before by trying to get OpenMM and openforcefield to work on organic molecules for which I had a list of coordinates and not e.g. a SMILES string—although obviously coordinates will at some point be needed in the MD simulation, most workflows expect you to start from a topology and not an xyz file.

Similarly, it’s very difficult to get the graphics package NGLView to illustrate the process of bonds breaking and forming—NGLView is typically used for MD, and expects that the system’s topology will be defined at the start of the simulation and never changed. (There are kludgy workarounds, like defining a new object for every frame, but it’s nevertheless true that NGLView is not made for QM-tribe people.)

(I’m sure that MD-tribe people are very frustrated by QM software as well, but I don’t have as much experience going in this direction. In general, MD tooling seems quite a bit more advanced than QM-tribe tooling; most MD people I’ve talked to seem to interact with QM software as little as possible, and I can’t say I blame them.)

“A Curious Distorted Image of Each Other”

There are also cultural factors to consider here. The questions that QM-tribe scientists think about are different than those that MD-tribe scientists think about: a somewhat famous QM expert once told me that they were “stuck on an ivory tower where people hold their nose when it comes to DFT, forget anything more approximate,” whereas MD-tribe scientists often seem alarmingly unconcerned about forcefield error but are obsessed with proper sampling and simulation convergence.

It seems that most people have only a vague sense of what their congeners in the other tribe actually work on. I don’t think most QM-tribe scientists I know have ever run or analyzed a regular molecular dynamics simulation using e.g. AMBER or OpenMM, nor do I expect that most MD-tribe scientists have tried to find a transition state in Gaussian or ORCA. In theory, coursework could remedy this, but education for QM alone already seems chaotic and ad hoc—trying to cram in MD, statistical mechanics, etc is probably ill-advised at the present.

Social considerations also play a role. There’s limited crosstalk between the two fields, especially at the trainee level. How many QM people even know who Prayush Tiwary is, or Michael Shirts, or Jay Ponder? How many MD graduate students have heard of Frank Neese or Troy Van Voorhis? As always, generational talent manages to transcend narrow boundaries—but rank-and-file scientists would benefit immensely from increased contact with the other tribe.

“Unite Them”

I’m not an expert on the history of chemistry, but my understanding is that the two fields were not always so different: Martin Karplus, Arieh Warshel, and Bill Jorgensen, key figures in the development of modern MD, were also formidable quantum chemists. (If any famous chemists who read this blog care to share their thoughts on this history, please email me: you know who you are!)

And as the two fields advance, I think they will come closer together once more. As QM becomes capable of tackling larger and larger systems, QM-tribe scientists will be forced to deal with more and more complicated conformational landscapes: modern enantioselective catalysts routinely have hundreds of ground-state complexes to consider (ref), and QSimulate and Amgen recently reported full DFT calculations on protein–ligand complexes (ref).

Similarly, the increase in computing power means that many MD use cases (like FEP) are now limited not by insufficient sampling but by the poor energetics of the forcefields they employ. This is difficult to prove unequivocally, but I’ve heard this in interviews with industry folks, and there are certainly plenty of references complaining about poor forcefield accuracy (1, 2): a Psivant review dryly notes that “historically solvation energy errors on the order of 2–3 kcal/mol have been considered to be accurate,” which is hardly encouraging.

Many QM-tribe professors now work on dynamics: Dean Tantillo and Todd Martinez (who have long been voices “crying out in the wilderness” for dynamics) perhaps most prominently, but also Steven Lopez, Daniel Ess, Fernanda Duarte, Peng Liu, etc. And MD-tribe professors seem more and more interested in using ML mimics of QM to replace forcefields (e.g.), which will inevitably lead them down the speed–accuracy rabbit hole that is quantum chemistry. So it seems likely to me that the two fields will increasingly reunite, and that being a good 21st-century computational chemist will require competency in both areas.

If this is true, the conclusions for individual computational chemists are obvious: learn techniques outside your specialty, before you get forcibly dragged along by the current of scientific progress! There’s plenty to learn from the other culture of people that deals with more-or-less the same scientific problems you do, and no reason to wait.

As a denizen of quantum chemistry myself, I apologize for any misrepresentations or harmful stereotypes about practitioners of molecular dynamics, for whom I have only love and respect. I would be happy to hear any corrections over email.

If you want email updates when I write new posts, you can subscribe on Substack.