An important problem with simulating chemical reactions is that reactions generally take place in solvent, but most simulations are run without solvent molecules. This is a big deal, since much of the inaccuracy associated with simulation actually stems from poor treatment of solvation: when gas phase experimental data is compared to computations, the results are often quite good.
Why don’t computational chemists include solvent molecules in their models? It takes a lot of solvent molecules to accurately mimic bulk solvent (enough to cover the system with a few different layers, usually ~103).1 Since most quantum chemical methods scale in practice as O(N2)–O(N3), adding hundreds of additional atoms has a catastrophic effect on the speed of the simulation.
To make matters worse, the additional degrees of freedom introduced by the solvent molecules are very “flat”—solvent molecules don’t usually have well-defined positions about the substrate, meaning that the number of energetically accessible conformations goes to infinity (with attendant consequences for entropy). This necessitates a fundamental change in how calculations are performed: instead of finding well-defined extrema on the electronic potential energy surface (ground states or transition states), molecular dynamics (MD) or Monte Carlo simulations must be used to sample from an underlying distribution of structures and reconstruct the free energy surface. Sufficient sampling usually requires consideration of 104–106 individual structures,2 meaning that each individual computation must be very fast (which is challenging for quantum chemical methods).
Given the complexity this introduces, it’s not surprising that most computational organic chemists try to avoid explicit solvent at all costs. The typical workaround is to use “implicit solvent” models, which “reduce the complexity of individual solvent−solute interactions such as hydrogen-bond, dipole−dipole, and van der Waals interactions into a fictitious surface potential... scaled to reproduce the experimental solvation free energies” (Baik). This preserves the well-defined potential energy surfaces that organic chemists are accustomed to, so you can still find transition states by eigenvector following, etc.
Implicit solvent models like PCM, COSMO, or SMD are better than nothing, but are known to struggle for charged species. In particular, they don’t really describe explicit inner-sphere solvent–solute interactions (like hydrogen bonding), meaning that they’ll behave poorly when these interactions are important. Dan Singleton’s paper on the Baylis–Hillman reaction is a nice case study of how badly implicit solvent can fail: even high-level quantum chemical methods are useless when solvation free energies are 10 kcal/mol off from experiment!
This issue is well-known. To quote from Schreiner and Grimme:
An even more important but still open issue is solvation. In the opinion of the authors it is a ‘scandal’ that in 2018 no routine free energy solvation method is available beyond (moderately successful) continuum theories such as COSMO-RS and SMD and classical FF/MD-based explicit treatments.
When computational studies have been performed in explicit solvent, the results have often been promising: Singleton has studied diene hydrochlorination and nitration of toluene, and Peng Liu has recently conducted a nice study of chemical glycosylation. Nevertheless, these studies all require heroic levels of effort: quantum chemistry is still very slow, and so a single free energy surface might take months and months to compute.3
One promising workaround is using machine learning to accelerate quantum chemistry. Since these MD-type studies look at the same exact system over and over again, we could imagine first training some sort of ML model based on high-level quantum chemistry data, and then employing this model over and over again for the actual MD run. As long as (1) the ML model is faster than the QM method used to train it and (2) it takes less data to train the ML model than it would to run the simulation, this will save time: in most cases, a lot of time.
(This is a somewhat different use case than e.g. ANI-type models, which aim to achieve decent accuracy for any organic molecule. Here, we already know what system we want to study, and we’re willing to do some training up front.)
A lot of people are working in this field right now, but today I want to highlight some work that I liked from Fernanda Duarte and co-workers. Last year, they published a paper comparing a few different ML methods for studying quasiclassical dynamics (in the gas phase), and found that atomic cluster expansion (ACE) performed better than Gaussian approximation potentials while training faster than NequIP. They then went on to show that ACE models could be trained automatically through active learning, and used the models to successfully predict product ratios for cycloadditions with post-TS bifurcations.
Their new paper, posted on ChemRxiv yesterday, applies the same ACE/active learning approach to studying reactions in explicit solvent, with the reaction of cyclopentadiene and methyl vinyl ketone chosen as a model system. This is more challenging than their previous work, because the ML model now not only has to recapitulate the solute reactivity but also the solute–solvent and solvent–solvent interactions. To try and capture all the different interactions efficiently, the authors ended up using four different sets of training data: substrates only, substrates with 2 solvent molecules, substrates with 33 solvent molecules, and clusters of solvent only.
Previously, the authors used an energy-based selector to determine if a structure should be added to the training set: they predicted the energy with the model, ran a QM calculation, and selected the structure if the difference between the two values was big enough. This approach makes a lot of sense, but has the unfortunate downside that a lot of QM calculations are needed, which is exactly what this ML-based approach is trying to avoid. Here, the authors found that they could use similarity-based descriptors to select data points to add to the training set: these descriptors are both more efficient (needing fewer structures to converge) and faster to compute, making them overall a much better choice. (This approach is reminiscent of the metadynamics-based approach previously reported by John Parkhill and co-workers.)
With a properly trained model in hand, the authors went on to study the reaction with biased sampling MD. They find that the reaction is indeed accelerated in explicit water, and that the free energy surface begins to look stepwise, as opposed to the concerted mechanism predicted in implicit solvent. (Singleton has observed similar behavior before, and I’ve seen this too.) They do some other interesting studies: they look at the difference between methanol and water as solvents, argue that Houk is wrong about the role of water in the TS,4 and suggest that the hydrophobic effect drives solvent-induced rate acceleration.5
The results they find for this particular system are interesting, but more exciting is the promise that these techniques may soon become accessible to “regular” computational chemists. Duarte and co-workers have shown that ML can be used to solve an age-old problem in chemical simulation; if explicit solvent ML/MD simulations of organic reactions become easy enough for non-experts to run, I have no doubt that they will become a valued and essential part of the physical organic chemistry toolbox. Much work is needed to get to that point—new software packages, further validation on new systems, new ways to assess quality and check robustness of simulations, and much more—but the vision behind this paper is powerful, and I can’t wait until it comes to fruition.
Thanks to Croix Laconsay for reading a draft of this post.TW: sarcasm.
Today, most research is done by academic labs funded mainly by the government. Many articles have been written on the shortcomings with academic research: Sam Rodriques recently had a nice post about how academia is ultimately an educational institution, and how this limits the quality of academic research. (It’s worth a read; I’ve written about these issues from a few angles, and will probably write more at a later date.)
The major alternative to academic research that people put forward is focused research organizations (FROs): large, non-profit research organizations capable of tackling big unsolved problems. These organizations, similar in scope and ambition to e.g. CERN or LIGO, are envisioned to operate with a budget of $20–100M over five years, making them substantially larger and more expensive than a single academic lab. This model is still being tested, but it seems likely that some version of FROs will prove effective for appropriately sized problems.
But FROs have some disadvantages, too: they represent a significant investment on the part of funders, and so it’s important to choose projects where there’s a high likelihood of impact in the given area. (In contrast, it’s expected that most new academic labs will focus on high-risk projects, and pivot if things don’t work out in a few years.) In this piece, I propose a new form of scientific organization that combines aspects of both FROs and academic labs: for-profit micro focused research organizations (FPµFROs).
The key insight behind FPµFROs is that existing financial markets could be used to fund scientific research when there is a realistic possibility for profit as a result of the research. This means that FPµFROs need not be funded by the government or philanthropic spending, but could instead raise capital from e.g. venture capitalists or angel investors, who have access to substantially more money and are used to making high-risk, high-reward investments.
FPµFROs would also be smaller and more nimble than full-fledged FROs, able to tackle high-risk problems just like academia. But unlike academic labs, FPµFROs would be able to spend more freely and hire more aggressively, thus circumventing the human capital issues that plague academic research. While most academic labs are staffed entirely with inexperienced trainees (as Rodriques notes above), FPµFROs could hire experienced scientists, engineers, and programmers, thus accelerating the rate of scientific progress.
One limitation of the FPµFRO model is that research would need to be profitable within a reasonable time frame. But this limitation might actually be a blessing in disguise: the need for profitability means that FPµFROs would be incentivized to provide real value to firms, thus preventing useless research through the magic of Adam Smith’s invisible hand.
Another disadvantage of FPµFROs is that they must be able to achieve success with relatively little funding (probably around $10M; big for academia, but small compared to a FRO). This means that their projects would have to be modest in scope. I think this is probably a blessing in disguise, though. Consider the following advice from Paul Graham:
Empirically, the way to do really big things seems to be to start with deceptively small things.… Maybe it's a bad idea to have really big ambitions initially, because the bigger your ambition, the longer it's going to take, and the further you project into the future, the more likely you'll get it wrong.
Thus, the need for FPµFROs to focus on getting a single “minimal viable product” right might be very helpful, and could even lead to more impactful firms later on.
In conclusion, FPµFROs could combine the best qualities of academic labs and FROs: they would be agile and risk-tolerant, like academic labs, but properly incentivized to produce useful research instead of publishing papers, like FROs. This novel model should be investigated further as a mechanism for generating new scientific discoveries at scale with immediate short-term utility.
Hopefully it’s clear by now that this is a joke: an FPµFRO is just a startup.
The point of this piece isn’t to criticize FROs or academia: both have their unique advantages relative to startups, and much has been written about the relative advantages and disadvantages of different sorts of research institutions (e.g.).
Rather, I want to remind people that startups can do really good scientific work, something that many people seem to forget. It’s true that basic research can be a public good, and something that’s difficult to monetize within a reasonable timeframe. But most research today isn’t quite this basic, which leads me to suspect that many activities today confined to academic labs could be profitably conducted in startups.
Academics are generally very skeptical of organizations motivated by profit. But all incentives are imperfect, and the drive to achieve profitability pushes companies to provide value to real customers, which is more than many academics motivated by publication or prestige ever manage to achieve. It seems likely that for organizations focused on applied research, profit is the least bad incentive.
I’ll close with a quote from Eric Gilliam’s recent essay on a new model for “deep tech” startups:
Our corporate R&D labs in most industries have taken a step back in how “basic” their research is. Meanwhile, what universities call ‘applied’ research has become much less applied than it used to be. This ‘middle’ of the deep tech pipeline has been hollowed out.
What Eric proposes in his piece, and what I’m arguing here, is that scientific startups can help fill this void: not by replacing FROs and academic research, but by complementing them.
Thanks to Ari Wagen for reading a draft of this piece.The concept of pKa is introduced so early in the organic chemistry curriculum that it’s easy to overlook what a remarkable idea it is.
Briefly, for the non-chemists reading this: pKa is defined as the negative base-10 logarithm of the acidity constant of a given acid H–A:
pKa := -log10([HA]/[A-][H+])
Unlike pH, which describes the acidity of a bulk solution, pKa describes the intrinsic proclivity of a molecule to shed a proton—a given molecule in a given solvent will always have the same pKa, no matter the pH. This makes pKa a very useful tool for ranking molecules by their acidity (e.g. the Evans pKa table).
The claim implicit in the definition of pKa is that a single parameter suffices to describe the acidity of each molecule.1 In general, this isn’t true in chemistry—there’s no single “reactivity” parameter which describes how reactive a given molecule is. For various regions of chemical space a two-parameter model can work, but in general we don’t expect to be able to evaluate the efficacy of a given reaction by looking up the reactivity values of the reactants and seeing if they’re close enough.
Instead, structure and reactivity interact with each other in complex, high-dimensional ways. A diene will react with an electron-poor alkene and not an alcohol, while acetyl chloride doesn’t react with alkenes but will readily acetylate alcohols, and a free radical might ignore both the alkene and the alcohol and abstract a hydrogen from somewhere else. Making sense of this confusing morass of different behaviors is, on some level, what organic chemistry is all about. The fact that the reactivity of different functional groups depends on reaction conditions is key to most forms of synthesis!
But pKa isn’t so complicated. If I want to know whether acetic acid will protonate pyridine in a given solvent, all I have to do is look up the pKa values for acetic acid and pyridinium (pyridine’s conjugate acid). If pyridinium has a higher pKa, protonation will be favored; otherwise, it’ll be disfavored. More generally, one can predict the equilibrium distribution of protons amongst N different sites from a list of the corresponding pKas.
Why is pKa so well-behaved? The key assumption underlying the above definition is that ions are free and do not interact with one another. This allows us to neglect any specific ion–ion interactions, and makes the scale universal: if the pyridinium cation and the acetate anion never interact, then I can learn everything I need to about pyridinium acetate just by measuring the pKas of pyridine and acetic acid in isolation.
This assumption is quite good in solvents like water or DMSO, which excel at stabilizing charged species, but progressively breaks down as one travels to the realm of nonpolar solvents. As ions start to pair with themselves, specific molecule–molecule interactions become important. The relative size of the anions can matter: in a nonpolar solvent, a small anion will be better stabilized by a small cation than by a large, diffuse cation, meaning that e.g. acetate will appear more acidic when protonating smaller molecules. Other more quotidian intermolecular interactions, like hydrogen bonding and π-stacking, can also play a role.
And the ions aren’t the only thing that can stick together: aggregation of acids is often observed in nonpolar solvents. Benzenesulfonic acid forms a trimer in benzonitrile solution, which is still pretty polar, and alcohols and carboxylic acids are known to aggregate under a variety of conditions as well.2 Even seemingly innocuous species like tetrabutylammonium chloride will aggregate at high concentrations (ref, ref).
To reliably extend pKa scales to nonpolar solvents, one must thus deliberately choose compounds which resist aggregation. As the dielectric constant drops, so does the number of such compounds. The clearest demonstration of this I’ve found is a series of papers (1, 2) by pKa guru Ivo Leito measuring the acidity of a series of fluorinated compounds in heptane:
This effort, while heroic, demonstrates the futility of measuring pKa in nonpolar media from the standpoint of the synthetic chemist. If only weird fluoroalkanes engineered not to aggregate can have pKa values, then the scale may be analytically robust, but it’s hardly useful for designing reactions!
The key point here is that the difficulty of measuring pKa in nonpolar media is not an analytical barrier which can be surmounted by new and improved technologies, but rather a fundamental breakdown in the idea of pKa itself. Even the best pKa measurement tool in the world can’t determine the pKa of HCl in hexanes, because no such value exists—the concept itself is borderline nonsensical. Chloride will ion-pair with everything in hexanes, hydrogen chloride will aggregate with itself, chloride will stick to hydrogen chloride, and so forth. Asking for a pKa in this context just doesn't make much sense.3
It’s important to remember, however, that just because the pKa scale no longer functions in nonpolar solvents doesn’t mean that acids don’t have different acidities. Triflic acid in toluene will still protonate just about everything, whereas acetic acid will not. Instead, chemists wishing to think about acidity in nonpolar media have to accept that no one-dimensional scale will be forthcoming. The idealized world of pKa we’re accustomed to may no longer function in nonpolar solvents, but chemistry itself still works just fine.
Thanks to Ivo Leito for discussing these topics with me over Zoom, and to Joe Gair for reading a draft of this post.I’ve been pretty critical of peer review in the past, arguing that it doesn’t accomplish much, contributes to status quo bias, etc. But a few recent experiences remind me of the value that peer review provides: in today’s scientific culture, peer review is essentially the only time that scientists get honest and unbiased feedback on their work.
How can this be true? In experimental science, scientists typically work alongside other students and postdocs under the supervision of a professor. This body of people forms a lab, also known as a research group, and it’s to these people that you present most frequently. Your lab generally knows the techniques and methods that you employ very well: so if you’ve misinterpreted a piece of data or designed an experiment poorly, group meeting is a great place to get feedback.
But a lab is also biased in certain ways. People are attracted to a lab because they think the science is exciting and shows promise, and so they’re likely to be credulous about positive results. Certain labs also develop beliefs or dogmas about how to conduct science: the best ways to perform a mechanistic study, or the most useful reaction conditions. To some extent, every lab is a paradigm unto itself. This means that paradigm-shifting criticism is hard to find among one’s coworkers, even if it’s common in the outside world.
Here are some examples of controversial-in-the-field statements that are unlikely to be controversial within given labs:
In each of these cases, it’s unlikely that criticism along these lines is available internally: people who’ve chosen to do their PhDs studying ML in chemistry aren’t likely to criticize your paper for overemphasizing the importance of ML in chemistry!
More generally, internal criticism works best when a lab serves as a shared repository of expertise, i.e. when everyone in the lab has roughly the same skillset. Some labs focus instead on a single overarching goal and employ many different tools to get to that point: a given chemical biology group might have a synthetic chemist, a MS specialist, a genomics guru, a mechanistic enzymologist, and someone specializing in cell culture. If this is the case, your techniques are opaque to your coworkers: what advice can someone who does cell culture give about improving Q-TOF signal-to-noise?
Ideally, one’s professor is well-versed enough in each of the techniques employed that he or she can dispense criticism as needed. But professors are often busy, aren’t always operational experts at each of the techniques they oversee, and suffer from the same viewpoint biases that their students do (perhaps even more so).
So, it’s important to solicit feedback from external sources. Unfortunately, at least in my experience most external feedback is too positive: “great talk,” “nice job,” etc. Our scientific culture tries so hard to be supportive that I almost never get any meaningful criticism from people outside my group, either publicly or privately. (Ideally one’s committee would help, but I never really got to present research results to my committee, and this doesn’t help postdocs anyhow.)
Peer review, then, serves as the last bastion against low-quality science: reviewers are outside the lab, have no incentive to be nice, and are tasked specifically with poking holes in your argument or pointing out extra experiments that would improve it. Peer review has improved each one of my papers, and I’m grateful for it.1
What’s a little sad is that the excellent feedback that reviewers give only comes at the bitter end of a project, which for me has often meant that the results are more than a year old and my collaborators have moved on. Much more useful would be critical feedback delivered early on in a project, when my own thinking is more flexible and the barrier to running additional experiments is lower. And more useful still would be high-quality criticism available at every step of the project, given not anonymously but by people whom you can talk to and learn from.
What might this practically look like?
I don’t know what the right solution looks like here: the burden of peer review is already substantial, and I don’t mean to suggest that this work ought to be arbitrarily multiplied for free. But I do worry that eliminating peer review, absent other changes, would simply mean that one of the only meaningful chances to get unfiltered feedback on one’s science would be eliminated, and that this would be bad.
Thanks to Croix Laconsay and Lucas Karas for helpful feedback on this piece.