Blog

Machine Learning for Explicit Solvent Molecular Dynamics

July 20, 2023

An important problem with simulating chemical reactions is that reactions generally take place in solvent, but most simulations are run without solvent molecules. This is a big deal, since much of the inaccuracy associated with simulation actually stems from poor treatment of solvation: when gas phase experimental data is compared to computations, the results are often quite good.

Why don’t computational chemists include solvent molecules in their models? It takes a lot of solvent molecules to accurately mimic bulk solvent (enough to cover the system with a few different layers, usually ~10³).¹ Since most quantum chemical methods scale in practice as O(N²)–O(N³), adding hundreds of additional atoms has a catastrophic effect on the speed of the simulation.

To make matters worse, the additional degrees of freedom introduced by the solvent molecules are very “flat”—solvent molecules don’t usually have well-defined positions about the substrate, meaning that the number of energetically accessible conformations goes to infinity (with attendant consequences for entropy). This necessitates a fundamental change in how calculations are performed: instead of finding well-defined extrema on the electronic potential energy surface (ground states or transition states), molecular dynamics (MD) or Monte Carlo simulations must be used to sample from an underlying distribution of structures and reconstruct the free energy surface. Sufficient sampling usually requires consideration of 10⁴–10⁶ individual structures,² meaning that each individual computation must be very fast (which is challenging for quantum chemical methods).

The title of this paper makes me so sad, because these techniques are still ignored by most organic chemists.

Given the complexity this introduces, it’s not surprising that most computational organic chemists try to avoid explicit solvent at all costs. The typical workaround is to use “implicit solvent” models, which “reduce the complexity of individual solvent−solute interactions such as hydrogen-bond, dipole−dipole, and van der Waals interactions into a fictitious surface potential... scaled to reproduce the experimental solvation free energies” (Baik). This preserves the well-defined potential energy surfaces that organic chemists are accustomed to, so you can still find transition states by eigenvector following, etc.

Implicit solvent models like PCM, COSMO, or SMD are better than nothing, but are known to struggle for charged species. In particular, they don’t really describe explicit inner-sphere solvent–solute interactions (like hydrogen bonding), meaning that they’ll behave poorly when these interactions are important. Dan Singleton’s paper on the Baylis–Hillman reaction is a nice case study of how badly implicit solvent can fail: even high-level quantum chemical methods are useless when solvation free energies are 10 kcal/mol off from experiment!

This issue is well-known. To quote from Schreiner and Grimme:

An even more important but still open issue is solvation. In the opinion of the authors it is a ‘scandal’ that in 2018 no routine free energy solvation method is available beyond (moderately successful) continuum theories such as COSMO-RS and SMD and classical FF/MD-based explicit treatments.

When computational studies have been performed in explicit solvent, the results have often been promising: Singleton has studied diene hydrochlorination and nitration of toluene, and Peng Liu has recently conducted a nice study of chemical glycosylation. Nevertheless, these studies all require heroic levels of effort: quantum chemistry is still very slow, and so a single free energy surface might take months and months to compute.³

One promising workaround is using machine learning to accelerate quantum chemistry. Since these MD-type studies look at the same exact system over and over again, we could imagine first training some sort of ML model based on high-level quantum chemistry data, and then employing this model over and over again for the actual MD run. As long as (1) the ML model is faster than the QM method used to train it and (2) it takes less data to train the ML model than it would to run the simulation, this will save time: in most cases, a lot of time.

(This is a somewhat different use case than e.g. ANI-type models, which aim to achieve decent accuracy for any organic molecule. Here, we already know what system we want to study, and we’re willing to do some training up front.)

A lot of people are working in this field right now, but today I want to highlight some work that I liked from Fernanda Duarte and co-workers. Last year, they published a paper comparing a few different ML methods for studying quasiclassical dynamics (in the gas phase), and found that atomic cluster expansion (ACE) performed better than Gaussian approximation potentials while training faster than NequIP. They then went on to show that ACE models could be trained automatically through active learning, and used the models to successfully predict product ratios for cycloadditions with post-TS bifurcations.

Their new paper, posted on ChemRxiv yesterday, applies the same ACE/active learning approach to studying reactions in explicit solvent, with the reaction of cyclopentadiene and methyl vinyl ketone chosen as a model system. This is more challenging than their previous work, because the ML model now not only has to recapitulate the solute reactivity but also the solute–solvent and solvent–solvent interactions. To try and capture all the different interactions efficiently, the authors ended up using four different sets of training data: substrates only, substrates with 2 solvent molecules, substrates with 33 solvent molecules, and clusters of solvent only.

Previously, the authors used an energy-based selector to determine if a structure should be added to the training set: they predicted the energy with the model, ran a QM calculation, and selected the structure if the difference between the two values was big enough. This approach makes a lot of sense, but has the unfortunate downside that a lot of QM calculations are needed, which is exactly what this ML-based approach is trying to avoid. Here, the authors found that they could use similarity-based descriptors to select data points to add to the training set: these descriptors are both more efficient (needing fewer structures to converge) and faster to compute, making them overall a much better choice. (This approach is reminiscent of the metadynamics-based approach previously reported by John Parkhill and co-workers.)

With a properly trained model in hand, the authors went on to study the reaction with biased sampling MD. They find that the reaction is indeed accelerated in explicit water, and that the free energy surface begins to look stepwise, as opposed to the concerted mechanism predicted in implicit solvent. (Singleton has observed similar behavior before, and I’ve seen this too.) They do some other interesting studies: they look at the difference between methanol and water as solvents, argue that Houk is wrong about the role of water in the TS,⁴ and suggest that the hydrophobic effect drives solvent-induced rate acceleration.⁵

Figure 4B from the paper, showing the change in the PES.

The results they find for this particular system are interesting, but more exciting is the promise that these techniques may soon become accessible to “regular” computational chemists. Duarte and co-workers have shown that ML can be used to solve an age-old problem in chemical simulation; if explicit solvent ML/MD simulations of organic reactions become easy enough for non-experts to run, I have no doubt that they will become a valued and essential part of the physical organic chemistry toolbox. Much work is needed to get to that point—new software packages, further validation on new systems, new ways to assess quality and check robustness of simulations, and much more—but the vision behind this paper is powerful, and I can’t wait until it comes to fruition.

Thanks to Croix Laconsay for reading a draft of this post.

Footnotes

This video from Chris Cramer makes the point nicely.
This obviously depends on the system in question, and what processes are being studied. But in general insufficient sampling is a big issue in molecular dynamics, which I think is underappreciated by organic chemists wading into the area. Jeff Grossman has a nice paper on this.
If you look carefully, many people who claim to be doing big ab initio molecular dynamics studies are actually doing semiempirical molecular dynamics. This isn’t dishonest per se, but it’s a little underwhelming to a computational chemist, especially when it’s only mentioned in the SI. Things get even more confusing when plane wave DFT is employed: in theory, plane wave DFT can be just as accurate as regular DFT, but in practice there are some sneaky approximations that often get introduced.
This argument hinges on whether uphill dynamics (starting from reactants, going to transition state) or downhill dynamics (starting from transition state, going to reactants) are more appropriate. The authors argue that "uphill dynamics allow the solvent sufficient time to reorganise [sic] before the trajectory passes the free energy barrier, providing a more realistic view of solvent behaviour [sic] during the reaction." I'm not fully convinced by this—isn't the idea that the system reorganizes to minimize the energy of the transition state a basic precept of transition state theory? But I'm not convinced I understand these issues deeply enough to have an opinion; I will leave this to the experts.
This argument hearkens back to some old-school computational organic chemistry I love from Bill Jorgensen, studying the hydrophobic effect on conformational preferences of butane. We usually think of the hydrophobic effect as associated with macromolecules (ligands binding to proteins, etc), but it can still matter in tiny systems!

For-Profit Micro Focused Research Organizations: A Proposal

July 17, 2023

TW: sarcasm.

Today, most research is done by academic labs funded mainly by the government. Many articles have been written on the shortcomings with academic research: Sam Rodriques recently had a nice post about how academia is ultimately an educational institution, and how this limits the quality of academic research. (It’s worth a read; I’ve written about these issues from a few angles, and will probably write more at a later date.)

The major alternative to academic research that people put forward is focused research organizations (FROs): large, non-profit research organizations capable of tackling big unsolved problems. These organizations, similar in scope and ambition to e.g. CERN or LIGO, are envisioned to operate with a budget of $20–100M over five years, making them substantially larger and more expensive than a single academic lab. This model is still being tested, but it seems likely that some version of FROs will prove effective for appropriately sized problems.

But FROs have some disadvantages, too: they represent a significant investment on the part of funders, and so it’s important to choose projects where there’s a high likelihood of impact in the given area. (In contrast, it’s expected that most new academic labs will focus on high-risk projects, and pivot if things don’t work out in a few years.) In this piece, I propose a new form of scientific organization that combines aspects of both FROs and academic labs: for-profit micro focused research organizations (FPµFROs).

The key insight behind FPµFROs is that existing financial markets could be used to fund scientific research when there is a realistic possibility for profit as a result of the research. This means that FPµFROs need not be funded by the government or philanthropic spending, but could instead raise capital from e.g. venture capitalists or angel investors, who have access to substantially more money and are used to making high-risk, high-reward investments.

FPµFROs would also be smaller and more nimble than full-fledged FROs, able to tackle high-risk problems just like academia. But unlike academic labs, FPµFROs would be able to spend more freely and hire more aggressively, thus circumventing the human capital issues that plague academic research. While most academic labs are staffed entirely with inexperienced trainees (as Rodriques notes above), FPµFROs could hire experienced scientists, engineers, and programmers, thus accelerating the rate of scientific progress.

One limitation of the FPµFRO model is that research would need to be profitable within a reasonable time frame. But this limitation might actually be a blessing in disguise: the need for profitability means that FPµFROs would be incentivized to provide real value to firms, thus preventing useless research through the magic of Adam Smith’s invisible hand.

Another disadvantage of FPµFROs is that they must be able to achieve success with relatively little funding (probably around $10M; big for academia, but small compared to a FRO). This means that their projects would have to be modest in scope. I think this is probably a blessing in disguise, though. Consider the following advice from Paul Graham:

Empirically, the way to do really big things seems to be to start with deceptively small things.… Maybe it's a bad idea to have really big ambitions initially, because the bigger your ambition, the longer it's going to take, and the further you project into the future, the more likely you'll get it wrong.

Thus, the need for FPµFROs to focus on getting a single “minimal viable product” right might be very helpful, and could even lead to more impactful firms later on.

In conclusion, FPµFROs could combine the best qualities of academic labs and FROs: they would be agile and risk-tolerant, like academic labs, but properly incentivized to produce useful research instead of publishing papers, like FROs. This novel model should be investigated further as a mechanism for generating new scientific discoveries at scale with immediate short-term utility.

* * *

Hopefully it’s clear by now that this is a joke: an FPµFRO is just a startup.

The point of this piece isn’t to criticize FROs or academia: both have their unique advantages relative to startups, and much has been written about the relative advantages and disadvantages of different sorts of research institutions (e.g.).

Rather, I want to remind people that startups can do really good scientific work, something that many people seem to forget. It’s true that basic research can be a public good, and something that’s difficult to monetize within a reasonable timeframe. But most research today isn’t quite this basic, which leads me to suspect that many activities today confined to academic labs could be profitably conducted in startups.

Academics are generally very skeptical of organizations motivated by profit. But all incentives are imperfect, and the drive to achieve profitability pushes companies to provide value to real customers, which is more than many academics motivated by publication or prestige ever manage to achieve. It seems likely that for organizations focused on applied research, profit is the least bad incentive.

I’ll close with a quote from Eric Gilliam’s recent essay on a new model for “deep tech” startups:

Our corporate R&D labs in most industries have taken a step back in how “basic” their research is. Meanwhile, what universities call ‘applied’ research has become much less applied than it used to be. This ‘middle’ of the deep tech pipeline has been hollowed out.

What Eric proposes in his piece, and what I’m arguing here, is that scientific startups can help fill this void: not by replacing FROs and academic research, but by complementing them.

Thanks to Ari Wagen for reading a draft of this piece.

pKa and Nonpolar Media

July 10, 2023

The concept of pK_a is introduced so early in the organic chemistry curriculum that it’s easy to overlook what a remarkable idea it is.

Briefly, for the non-chemists reading this: pK_a is defined as the negative base-10 logarithm of the acidity constant of a given acid H–A:

pK_a := -log₁₀([HA]/[A-][H+])

Unlike pH, which describes the acidity of a bulk solution, pK_a describes the intrinsic proclivity of a molecule to shed a proton—a given molecule in a given solvent will always have the same pK_a, no matter the pH. This makes pK_a a very useful tool for ranking molecules by their acidity (e.g. the Evans pK_a table).

The claim implicit in the definition of pK_a is that a single parameter suffices to describe the acidity of each molecule.¹ In general, this isn’t true in chemistry—there’s no single “reactivity” parameter which describes how reactive a given molecule is. For various regions of chemical space a two-parameter model can work, but in general we don’t expect to be able to evaluate the efficacy of a given reaction by looking up the reactivity values of the reactants and seeing if they’re close enough.

Instead, structure and reactivity interact with each other in complex, high-dimensional ways. A diene will react with an electron-poor alkene and not an alcohol, while acetyl chloride doesn’t react with alkenes but will readily acetylate alcohols, and a free radical might ignore both the alkene and the alcohol and abstract a hydrogen from somewhere else. Making sense of this confusing morass of different behaviors is, on some level, what organic chemistry is all about. The fact that the reactivity of different functional groups depends on reaction conditions is key to most forms of synthesis!

But pK_a isn’t so complicated. If I want to know whether acetic acid will protonate pyridine in a given solvent, all I have to do is look up the pK_a values for acetic acid and pyridinium (pyridine’s conjugate acid). If pyridinium has a higher pK_a, protonation will be favored; otherwise, it’ll be disfavored. More generally, one can predict the equilibrium distribution of protons amongst N different sites from a list of the corresponding pK_as.

Why is pK_a so well-behaved? The key assumption underlying the above definition is that ions are free and do not interact with one another. This allows us to neglect any specific ion–ion interactions, and makes the scale universal: if the pyridinium cation and the acetate anion never interact, then I can learn everything I need to about pyridinium acetate just by measuring the pK_as of pyridine and acetic acid in isolation.

This assumption is quite good in solvents like water or DMSO, which excel at stabilizing charged species, but progressively breaks down as one travels to the realm of nonpolar solvents. As ions start to pair with themselves, specific molecule–molecule interactions become important. The relative size of the anions can matter: in a nonpolar solvent, a small anion will be better stabilized by a small cation than by a large, diffuse cation, meaning that e.g. acetate will appear more acidic when protonating smaller molecules. Other more quotidian intermolecular interactions, like hydrogen bonding and π-stacking, can also play a role.

And the ions aren’t the only thing that can stick together: aggregation of acids is often observed in nonpolar solvents. Benzenesulfonic acid forms a trimer in benzonitrile solution, which is still pretty polar, and alcohols and carboxylic acids are known to aggregate under a variety of conditions as well.² Even seemingly innocuous species like tetrabutylammonium chloride will aggregate at high concentrations (ref, ref).

To reliably extend pK_a scales to nonpolar solvents, one must thus deliberately choose compounds which resist aggregation. As the dielectric constant drops, so does the number of such compounds. The clearest demonstration of this I’ve found is a series of papers (1, 2) by pK_a guru Ivo Leito measuring the acidity of a series of fluorinated compounds in heptane:

A portion of the scale, showing the compounds employed.

This effort, while heroic, demonstrates the futility of measuring pK_a in nonpolar media from the standpoint of the synthetic chemist. If only weird fluoroalkanes engineered not to aggregate can have pK_a values, then the scale may be analytically robust, but it’s hardly useful for designing reactions!

The key point here is that the difficulty of measuring pK_a in nonpolar media is not an analytical barrier which can be surmounted by new and improved technologies, but rather a fundamental breakdown in the idea of pK_a itself. Even the best pK_a measurement tool in the world can’t determine the pK_a of HCl in hexanes, because no such value exists—the concept itself is borderline nonsensical. Chloride will ion-pair with everything in hexanes, hydrogen chloride will aggregate with itself, chloride will stick to hydrogen chloride, and so forth. Asking for a pK_a in this context just doesn't make much sense.³

It’s important to remember, however, that just because the pK_a scale no longer functions in nonpolar solvents doesn’t mean that acids don’t have different acidities. Triflic acid in toluene will still protonate just about everything, whereas acetic acid will not. Instead, chemists wishing to think about acidity in nonpolar media have to accept that no one-dimensional scale will be forthcoming. The idealized world of pK_a we’re accustomed to may no longer function in nonpolar solvents, but chemistry itself still works just fine.

Thanks to Ivo Leito for discussing these topics with me over Zoom, and to Joe Gair for reading a draft of this post.

Footnotes

Really, each proton.
Ivo Leito called carboxylic acids "the world champions of aggregation" when I asked him about these issues.
Even experimentally nonsensical pK_as can be simulated, though: Jorgensen famously used free energy perturbation to compute the pK_a of ethane in water.

Opinionated Advice for Incoming Graduate Students

July 3, 2023

You are a scientist, not a lab monkey. You ought not to view your degree as “six years of hard labor in the chemistry mines.” Always make time to go to interesting seminars, talk with other people about their research, and read the literature. Otherwise, what’s the point of being a scientist?
Only one person is really looking out for your best interests: you. Your advisor, your classmates, and your collaborators all have their own distinct incentives and interests, which will often roughly align with yours, but will never align perfectly.
Your research interests will also not match up perfectly with those of your advisor. Your job as a graduate student is to find the intersection between what the two of you care about, and work there. Otherwise, one of you will be unhappy.
The more non-chemists in your life, the better your mental health will be. (h/t Arthur Brooks)
Try to maximize the ratio of “thinking”/publishable work to mindless SI work. Any project will take some grinding, but the thinking work is (ideally) what you’ve been recruited for, what you’ll present on, and what’ll get you a job. If your advisor views you only as a set of hands… be very worried.
Cold emails to scientists work much better than it seems they ought to. Most professors spend their entire career trying to get people to care about their work—if you’re interested, they’ll usually talk your ear off.
Read your PI’s old papers, as many as possible. Too often students are totally ignorant of the work that occurred a decade before they joined the lab, and end up repeating it or falling into the same pitfalls time and time again.
Learn to code; please stop performing curve fits in Excel. (cf. pmarca)
Be outcome focused. Each day, ask yourself “What is the biggest problem I’m facing in my research?” If whatever you’re doing isn’t addressing that problem, you’re wasting your time. Sometimes this means stopping all experiments and reading papers for a few weeks; sometimes this looks like running reactions; sometimes this looks like just sitting at your desk and writing. (h/t Brian Liau for giving me this advice when I started graduate school)
Be outcome focused on the big scale, too. Figure out what success in graduate school looks like—what your ideal job is, and what it takes to get that sort of job—and then pursue those outcomes relentlessly. Perhaps in an ideal world we could all follow our natural curiosity to our heart’s content, but that’s not real life, not in science as I’ve known it.
Success in graduate school may be necessary for your life goals, but it won’t be sufficient. At the end of the day, science is just a job, and your molecules will never love you. So don’t work so hard that you put the rest of your life on hold; don’t make science the highest good in your worldview. (cf. Augustine)

Thanks to Joe Gair for reading a draft of this piece.

Peer Review, Imperfect Feedback

June 26, 2023

I’ve been pretty critical of peer review in the past, arguing that it doesn’t accomplish much, contributes to status quo bias, etc. But a few recent experiences remind me of the value that peer review provides: in today’s scientific culture, peer review is essentially the only time that scientists get honest and unbiased feedback on their work.

How can this be true? In experimental science, scientists typically work alongside other students and postdocs under the supervision of a professor. This body of people forms a lab, also known as a research group, and it’s to these people that you present most frequently. Your lab generally knows the techniques and methods that you employ very well: so if you’ve misinterpreted a piece of data or designed an experiment poorly, group meeting is a great place to get feedback.

But a lab is also biased in certain ways. People are attracted to a lab because they think the science is exciting and shows promise, and so they’re likely to be credulous about positive results. Certain labs also develop beliefs or dogmas about how to conduct science: the best ways to perform a mechanistic study, or the most useful reaction conditions. To some extent, every lab is a paradigm unto itself. This means that paradigm-shifting criticism is hard to find among one’s coworkers, even if it’s common in the outside world.

Here are some examples of controversial-in-the-field statements that are unlikely to be controversial within given labs:

Palladium-based catalysis is going to become obsolete due to the scarcity of Pd and must be replaced. Some labs build their research program around this; others think Pd will always be relevant.
Water is the greenest possible solvent. Some scientists believe this wholeheartedly, while others think it’s stupid (for instance, a 2021 review states that “to meet regulations concerning the discharge of waste water into rivers and other natural waters… [water] often requires very extensive treatment prior to discharge, making the use of water as a reaction medium much less attractive.”)
More broadly, the belief that machine learning and generative AI are the future of chemical discovery: while many computational chemists believe this, plenty of experimentalists (even young ones) are pretty skeptical.

In each of these cases, it’s unlikely that criticism along these lines is available internally: people who’ve chosen to do their PhDs studying ML in chemistry aren’t likely to criticize your paper for overemphasizing the importance of ML in chemistry!

More generally, internal criticism works best when a lab serves as a shared repository of expertise, i.e. when everyone in the lab has roughly the same skillset. Some labs focus instead on a single overarching goal and employ many different tools to get to that point: a given chemical biology group might have a synthetic chemist, a MS specialist, a genomics guru, a mechanistic enzymologist, and someone specializing in cell culture. If this is the case, your techniques are opaque to your coworkers: what advice can someone who does cell culture give about improving Q-TOF signal-to-noise?

Ideally, one’s professor is well-versed enough in each of the techniques employed that he or she can dispense criticism as needed. But professors are often busy, aren’t always operational experts at each of the techniques they oversee, and suffer from the same viewpoint biases that their students do (perhaps even more so).

So, it’s important to solicit feedback from external sources. Unfortunately, at least in my experience most external feedback is too positive: “great talk,” “nice job,” etc. Our scientific culture tries so hard to be supportive that I almost never get any meaningful criticism from people outside my group, either publicly or privately. (Ideally one’s committee would help, but I never really got to present research results to my committee, and this doesn’t help postdocs anyhow.)

Peer review, then, serves as the last bastion against low-quality science: reviewers are outside the lab, have no incentive to be nice, and are tasked specifically with poking holes in your argument or pointing out extra experiments that would improve it. Peer review has improved each one of my papers, and I’m grateful for it.¹

What’s a little sad is that the excellent feedback that reviewers give only comes at the bitter end of a project, which for me has often meant that the results are more than a year old and my collaborators have moved on. Much more useful would be critical feedback delivered early on in a project, when my own thinking is more flexible and the barrier to running additional experiments is lower. And more useful still would be high-quality criticism available at every step of the project, given not anonymously but by people whom you can talk to and learn from.

What might this practically look like?

A culture of “red teaming,” where students are incentivized to find flaws in others’ projects in somewhat adversarial ways. This would need to be done within a supportive and collegial atmosphere, lest it degenerate into bullying: red teaming need not be red in tooth and claw.
Similarly, PIs could invite other professors to come to group meetings and (constructively) criticize the projects, particularly professors from adjacent subfields who might have different perspectives.
Poster presentations or talks (at conferences), although often used to present finished projects, can also be used to present unfinished work. I presented some unfinished work at TPOC a few years ago, and got really helpful suggestions from Ken Houk and Dean Tantillo that fundamentally changed how we approached the rest of the project. Maybe this is something that we should encourage more, although finished projects will probably always look more impressive than unfinished projects.
Decentralized peer review solutions like Seeds of Science or PubPeer might also help here, but my sense is that it’s unlikely that qualified experts will just spend their time investigating something that they found online; they need to be solicited somehow.

I don’t know what the right solution looks like here: the burden of peer review is already substantial, and I don’t mean to suggest that this work ought to be arbitrarily multiplied for free. But I do worry that eliminating peer review, absent other changes, would simply mean that one of the only meaningful chances to get unfiltered feedback on one’s science would be eliminated, and that this would be bad.

Thanks to Croix Laconsay and Lucas Karas for helpful feedback on this piece.

Footnotes

It's also true that the threat of peer review increases paper quality. I agree with Singleton that this is important today, but am less convinced that this is necessary from an institutional design perspective: if peer review didn't exist, I think some other system of norms or regulations would spring forth to protect good-quality science. See the "Anarchic Preprint Lake" discussion in my piece on journals. (h/t Croix Laconsay for raising this point)