Blog


What Achilles Said to the Tortoise About Binding-Affinity Prediction

March 3, 2025

This post is an attempt to capture some thoughts I have about ML models for predicting protein–ligand binding affinity, sequence- and structure-based approaches to protein modeling, and what the interplay between generative models and simulation may look like in the future. I have a lot of open questions about this space, and Abhishaike Mahajan’s recent Socratic dialogue on DNA foundation models made me curious to try the dialogue format here.

(With apologies to Lewis Carroll and Douglas Hofstadter.)



[The TORTOISE is sitting on a park bench with a thermos of tea and a stack of papers beside him. Enter ACHILLES, holding a stack of papers.]

ACHILLES: Hello, Mr. T. Mind if I join you on your bench?

TORTOISE: Of course, Achilles. What are you reading on this fine spring day?

ACHILLES: Right now, I’m reviewing some recent literature on the economics of seating in Mongolian yurts. And yourself?

TORTOISE: I’m looking through two fascinating papers criticizing modern protein–ligand co-folding methods.

The first is by Matthew Masters and co-workers and is entitled “Do Deep Learning Models for Co-Folding Learn the Physics of Protein–Ligand Interactions?” The authors show that AlphaFold 3 predicts the “correct” binding site for a variety of complexes even when the entire binding site is mutated to glycine, when bulky residues are added to fill the binding pocket, or when the polarity of key interactions is reversed. The authors argue that this demonstrates that AlphaFold is overfit to specific protein families, and that models need to be validated on “their compliance with physical and chemical principles.”

ACHILLES: Interesting, but not surprising.

TORTOISE: The second is by Peter Škrinjar and co-workers and is entitled “Have protein–ligand co-folding methods moved beyond memorization?” Here, the authors show that the success rate of co-folding methods is dictated by the similarity of structures to the training set. The models appear to perform well in cases where there is high train–test similarity, but on truly different structures their performance is dismal. The authors’ conclusion is even stronger than that of the first paper:

Incorporating physics-based terms to more accurately model protein-ligand interactions, potentially from simulations, conformational ensembles, or other sources, are likely needed to achieve more exciting results in this field.

Taken together, it’s clear that pure deep-learning-based approaches to solving these important scientific problems are doomed to fail.

ACHILLES: Well, let’s not rush ahead too quickly—perhaps we’ve been spending too much time together. It’s not surprising that these structure-based methods are prone to overfitting, but I expect that the next generation of sequence-only methods will overcome these hurdles.

TORTOISE: Hm, I admit this intuition leaves me in the dust. Can you enlighten me as to why your response to unphysical overfitting is to reject one of the only physical descriptors that we have—the 3D structures of the protein and the ligand? It seems to me that reducing the amount of available data is a peculiar way to improve the performance of one’s model.

ACHILLES: Of course, I’m happy to explain. Consider the problem from first principles. It’s not surprising that using 3D structures leads to overfitting—the dimensionality of these problems is vast, and our datasets are comparatively miniscule. So any given set of coordinates is virtually a guaranteed fingerprint for a particular protein or ligand, and we’re just training models that have one-hot encoded the structures they’ve seen. See for instance the recent work of Jain, Cleves, and Walters arguing that DiffDock is simply a fancy lookup table.

TORTOISE: Of course I agree, which is why it’s important that we find ways to generate more training data, not jettison what little data we have. The problem is not intractable; it seems that DiffDock-L is superior at this task. We need only wait for another order-of-magnitude increase in the amount of training data available to arrive at a robust deep-learning-based docking method.

ACHILLES: But, if you will, follow me a little further down this line of thinking. We know that protein–ligand structures are but a single snapshot of a dynamic ensemble of possibilities that interconvert smoothly in solution. This is why attempting to guess the binding affinity from a single pose is so futile, and why extensive sampling is needed for free-energy methods like FEP or TI.

Protein–ligand co-folding models must labor under the same constraints. Just because we’ve changed the scoring function from a forcefield to a neural network doesn’t mean that we can go back to considering a single averaged pose—let alone whatever pose happened to crystallize out of solution best. No, any method predicated on considering just a single pose is doomed to fail.

TORTOISE: So your proposal is to disregard all poses, and hope that “machine learning” can just call the right answer from the vasty deeps? I fear that you’ve been spending too much time on LinkedIn, my dear friend. Perhaps it’s time for you to return to a time before computing, like 5th-century Greece.

ACHILLES: Au contraire, tortuga. We know that it’s possible to go from sequence to structure with machine learning, unless you’ve already forgotten about this year’s Nobel Prize. And others have shown you can generate structural ensembles this way—look at AlphaFlow, or BioEmu. One could imagine running these models to generate candidate structures, then feeding these structures into a docking model, then feeding the docked structures into a scoring model, then combining the scoring predictions to generate a single predicted binding affinity.

TORTOISE: I agree in principle, provided each of these models can be benchmarked and verified to follow proper thermodynamic and statistical mechanical principles. But creating a perfect Boltzmann generator won’t be easy; and methods that do not reproduce the canonical ensemble lead to pathological failures in practice.

ACHILLES: Precisely! Many of these intermediate models are difficult to train, since we don’t have good ground truth for protein structural ensembles or individual binding affinities per pose. In fact, almost the only piece of data we can reliably acquire data for is the very task we want to predict—macroscopic protein–ligand binding affinity. So the entire problem becomes far more tractable if we simply combine the individual models into one end-to-end model so that we can backpropagate through the entire stack. Then we can scale to larger datasets that don’t have associated structural information, like DNA-encoded libraries or Terray’s microarray technology.

Thus, by combining the models into one, we at once simplify our task and make it possible to scale to much larger datasets: e pluribus unum.

TORTOISE: A surprisingly plausible vision, but I’m still not convinced. (And you ought to be speaking Greek, not Latin.)

Partitioning this problem into multiple models, each of which performs a defined task, means that there are verifiable, low-dimensional intermediate states that can be inspected. Structural ensembles can be saved to PDB files, and individual binding affinities can be sanity-checked. When we dump everything together into one massive mega-model, who knows what the model will try to do? These low-dimensional checkpoints might even be critical for giving us the right inductive bias to prevent overfitting.

By way of comparison, consider LLMs—we use textual checkpointing all the time, from chain-of-thought to retrieval-augmented generation. “Just train a model to do the entire task in a single pass” sounds like the accelerationist, AI-informed position, but in reality interpretability and modularity have proven to be valuable levers across many fields of machine learning. Gleefully jettisoning them hardly seems prudent.

ACHILLES: Perhaps. But forcing a model to go through a certain intermediate state only makes sense when that intermediate state is actually relevant to the task at hand. How will structure-based methods handle intrinsically disordered proteins?

TORTOISE: Even disordered proteins must have a structure.

[Enter CRAB.]

CRAB: Hullo, dear friends! Are we talking about ESM2? I fear that these methods are passé; if you haven’t heard yet, ascribing individual importance to mere proteins is an inadequate assumption now obsoleted by deep learning.

ACHILLES: Whatever do you mean?

CRAB: Exactly what I said! Proteins don’t exist in a vacuum—they possess different post-translational modifications, they aggregate, they float in and out of biomolecular condensates, and many of the most important cellular functions don’t even involve proteins.

ACHILLES: You’re correct, of course, but it’s clear that proteins are one of the key structural and functional elements of the cell. How else do you explain the history of successful therapeutics that target specific proteins?

CRAB: Selection bias, my dear friend. Of course the brute-force medicinal chemistry strategies of yesteryear managed to identify a handful of indications amenable to single-protein therapies, just like a handful of traits can be ascribed to single genes. But most traits that matter are polygenic, and most diseases are doubtless treatable only at the systems-biology level. Any lesser approximations are simply inadequate.

TORTOISE: Oh dear, I fear this is becoming a bit too much for me.

CRAB: I’ve just accepted a position at a biotech company personally backed by the high suzerains of artificial intelligence. We take millions of brightfield images of cells that have been exposed to different molecules and use deep learning to connect the observed cell-state modifications to molecular structure. Think phenotypic screening, but grander and more glorious.

ACHILLES: Now I feel out of my depth. Perhaps Mr. T is right and this new world is not for me. The 5th century does have a certain rustic charm…

TORTOISE: Wait, I think I understand. Previously, we discussed how, by training a single model, we could circumvent the need for explicitly generating protein structural ensembles and scoring individual docked poses—a single meta-model could implicitly perform all these tasks in an end-to-end differentiable fashion and simply learn all the patterns, or perhaps perform some more advanced and less constrained form of logic. Achilles, do you consider this a fair summary of your position?

ACHILLES: Yes, that seems fair enough, although I hardly see how my proposal connects to this outlandish suggestion.

TORTOISE: If we wanted to extrapolate this to entire cells, we could perform a similar exercise. We could enumerate all the proteins in the cell with all their various post-translational modifications, and then use Achilles’s model to score a given molecule’s interaction with all of them. It would be a mighty amount of work—but, in theory, it’s possible.

CRAB: Ah, but you’d still be neglecting the effects of environment, aggregation, and so on. Think of an E3 ligase—do you think you could model that one protein at a time? And what do you say to DNA, RNA, lipids, and so on and so forth.

TORTOISE: Touché. Perhaps “protein” is the wrong word here—but there must be some number of defined, localized structural entities in the cell which interact with an exogenous small molecule, and these entities must be at least somewhat separable per the principle of locality.

ACHILLES: Yes, that’s right. After all, a molecule can only be at one place at a time.

TORTOISE: So if we could use Achilles’s model to predict the interaction of the small molecule with each of these entities, we would have a sort of interaction fingerprint in entity space. We could then, with sufficient data, train a new model to learn the interaction network between each entity and predict an overall cell-level response. Do you agree, Mr. Crab?

CRAB: I suppose so, although it sounds ungainly. How exactly do you plan to study the effects of a bunch of small molecules on a particular region of chromatin?

TORTOISE: Ah, but this is where we use Achilles’s trick once more. Instead of learning one model that accounts for per-entity interactions, and another model that combines the individual per-entity predictions into a cell-level prediction, we can just learn a single model and backpropagate through the entire stack. So now our single foundation model is implicitly learning not only protein conformational ensembles, protein–ligand docking, docking rescoring—we’re also learning post-translational modifications, systems biology, and so on.

ACHILLES: Ah, now I see. Our aquatic colleague here is taking my same logic a step further—instead of implicitly learning individual structures in the course of predicting a protein–ligand interaction, he’s implicitly learning individual protein–ligand interactions in the course of predicting a single cell response.

TORTOISE: Exactly. The question then becomes if he’ll have enough data to learn the entire stack, or if his model will suffer the same overgeneralization problems as today’s protein–ligand interaction models.

ACHILLES: Right. It’s clear that at some scale, questions of information theory must predominate—every problem has some minimum amount of data that it takes to solve. Otherwise we’d all be able to solve drug toxicity just from the 1500 structures in the ClinTox dataset.

TORTOISE: Precisely. We could imagine such a strategy working at the infinite-data limit, but in practice the mismatch between problem complexity and data availability seems vast, and slow to fill.

CRAB: This has been an interesting philosophical aside, but I’m afraid that trying to cram your preconceived notions about biological dogma into my model is ill-advised. Today’s scientists think of proteins because that’s all they know how to study—but true biological understanding can only come when we’re able to learn directly on cellular data without the foolish assumptions that have plagued biochemistry to date. Trying to interpret my cell-level models through the viewpoint of proteins is like trying to decompose a Cybertruck into a linear combination of horses.

But in any event, I must be off. An army of H100s awaits me, and I must deploy them!

[Exit CRAB.]

ACHILLES: That fellow has no scientific humility. Of course proteins are important! These Silicon-Valley types have no respect for the deep biological body of knowledge that came before them, and think they can just pour images and SMILES strings into a transformer and “solve biology.” But we’d better return to our previous discussion, or things may become too recursive.

TORTOISE: There seem to be more and more fellows like him around these days... but I suppose carcinization is a well-documented phenomenon. Where were we before this unexpected conversational loop?

ACHILLES: I was just proposing the idea that sequence-based models will implicitly learn structure where it’s helpful.

TORTOISE: Ah, yes. I am beginning to catch up with your lightning-fast intuition. Are you opposed to structure for ideological reasons, or because you think structural information will never be achievable on the scale required to solve this problem?

ACHILLES: Both—I’m opposed to structure because accurate structural ensembles, which are what’s needed here, will never be available. Even a billion cryoEM structures won’t be enough because single ground-state snapshots will never be enough.

TORTOISE: But you must concede that, for instance, molecular dynamics could provide a way to generate relevant structural information under non-ground-state conditions.

ACHILLES: I freely admit that the Platonic ideal of MD simulations might furnish us with such data, to run the risk of sounding overly Greek. But you know as well as I do that MD simulations are unreliable and provide data that’s far worse than crystallography. What makes you think that dumping millions of AMBER trajectories into an ML model will do anything except increase demand for H100s?

TORTOISE: Improving MD simulations seems to be quite tractable. There have been a few papers over the past 12 months that use neural network potentials for protein simulation—consider GEMS, or AI2BMD, or even the most recent MACE-OFF preprint. Scaling NNPs works well; why not just scale NNPs and use them to run MD simulations?

ACHILLES: For one, NNPs are ridiculously slow compared to normal MD—capturing protein conformational motion through MD is expensive enough without making it three orders of magnitude slower. You may be content with slow and accurate simulations, but I myself feel the need to go quickly. MD simulations will never be fast enough for high-throughput virtual screening. And how are we supposed to verify the alleged accuracy of these simulations, anyway?

TORTOISE: NMR measurements, perhaps, or terahertz spectroscopy. The ingenuity of experimentalists cannot be underestimated.

ACHILLES: I grant that this might work for a single protein. But you’ve managed to select methods that are even less scalable than growing crystals in a tray. This can’t be a general solution—it’s the age of “big data” now, not painstaking spectral analysis measured in graduate-student years.

TORTOISE: Ah, but we don’t need massive amounts of data for our benchmarks. NNPs and MD are physically motivated, so they’re much less prone to overfitting than the approaches you discuss. Generalization occurs naturally, without needing to resort to the sorts of paranoid dataset splits seen with sequence-only methods.

ACHILLES: Might this not simply arise from how small the models are today? Once an NNP must handle long-range forces, complex many-body interactions, and so on, these models will be just as susceptible to overfitting as co-folding methods. I know you like to hide in your shell from time to time, but robustness isn’t everything—if all you want is to prevent overfitting, you might as well go back to using AutoDock Vina.

TORTOISE: Not all approaches are equally susceptible to overfitting, and encoding proper inductive biases is one of the most important tasks of an ML researcher. The sorts of properties predicted by NNPs—forces, energy, charges, and so on—are intrinsically local and thus can be learned much more easily from a limited dataset. In fact, this is one of the strongest arguments for using a geometric GNN in the first place; we naturally account for the symmetries of the problem, as opposed to needing to learn them through vast datasets. Consider the analogies to Noether’s theorem.

ACHILLES: I must confess, I rarely revisit the 1910s.

TORTOISE: More fundamentally, learning energy as an intermediate variable is an incredibly fundamental task, and it’s unlikely that we can avoid some version of this task—particularly since diffusion models and AlphaFold are almost certainly both implicitly learning forcefields anyway.

Trying to one-shot the hardest problems in computational biochemistry and biophysics with “deep learning” will forever be hamstrung by memorization and overfitting, since the approach is fundamentally agnostic to the nature of the problem. I’m simply proposing that trying to learn physically motivated, verifiable, and practical models that correspond to our physical understanding of the world may be a more tractable strategy, even if it seems slower to you.

ACHILLES: You know that I respect your stepwise approach to scientific discovery, but I fear you’re confusing your own intrinsic conservatism for enlightenment. Haven’t you heard of Sutton’s “bitter lesson”? Encoding expert intuition always makes the researcher feel accomplished, and is often effective in the small-data regime, but never pays off in the end.

TORTOISE: Mr. Crab could say the same thing to you.

ACHILLES: Admittedly. But the task of the ML researcher is not dissimilar to that of the philosopher: to carve reality at its joints, as my kinsman Plato said, and find the natural partitions between concepts that make our tasks tractable. Choosing the right problem to tackle with deep learning might seem like encoding expert intuition in an un-Suttonian way, but really it’s a higher-order consideration, and one which itself still remains impervious to learning.

TORTOISE: And what, pray tell, makes your protein–ligand model a natural partition, and my NNPs an unnatural partition?

ACHILLES: The elegance of the protein–ligand task is that it corresponds to a real information bottleneck—all the complexity of the system can easily be distilled into a single number, and in practice the measurement is performed that way. In contrast, your model is only indirectly testable and verifiable.

TORTOISE: Only as indirectly as any other physics-based method is testable. Scientists have been doing this for some time, you know.

ACHILLES: And even more fundamentally, even a “physics-based model” is anything but. Scratch the surface of an NNP-powered MD simulation and you’ll see an ocean of questionable assumptions: band-gap collapse, nuclear quantum effects, spin–orbit coupling, quantum vs. classical zero-point energy, and so on and so forth. Even a model trained on full-configuration-interaction calculations won’t perfectly reflect reality. At the end of the day, you’ll have wasted ten million dollars on AWS computers generating gnostic simulated data that you could have spent getting real, tangible results without approximations.

TORTOISE: I’m willing to concede that at some scale, what you’re proposing might work. But you have no idea how much data you need to learn protein–ligand interactions. Have you done a scaling study; do you even have a back-of-the-envelope estimate for what your proposed model will cost? Who knows what the true dimensionality of protein–ligand interaction space is, or if it’s remotely learnable with the general architectures you propose? Someone’s going to have to generate all this data, and it’s not cheap—even fleet-footed Achilles can’t outrun the fundamental limitations of laboratory science.

ACHILLES: Ah, let’s not let our conversation fold back on itself. Isn’t it possible that there are latent low-dimensional representations of protein–ligand interactions that can make my structure-only training process more efficient?

TORTOISE: Possible, yes, but not guaranteed. To make matters worse, even if you train a protein–ligand model you’ll have to turn around and train another foundation model for protein–protein interactions, and another model for nucleotides, and another model for lipids, and so on and so forth.

ACHILLES: Presuming the first model succeeds, I would think this a fine outcome.

TORTOISE: We know what the scaling laws for NNPs are, and we know that they can scale across different domains of science even at sub-GPT1 parameter count. These are real advantages, and we ought to not be hasty in discarding them. Plus, it’s not like today’s methods are inconceivably far from where we want to go. Forcefield-based free-energy methods aren’t perfect, but they’re good enough to be useful. Doesn’t that suggest that we don’t need to get e.g. nuclear quantum effects exactly right to build a useful model?

ACHILLES: Scaling simulation across the chemical sciences is intriguing. You should tell Adam Marblestone; maybe you can build an FRO out of this idea. But we must stay focused on running the race at hand first and worry about the whole decathlon later. Perhaps we’ll be able to perform evolutionary model merging and pull out conformational ensembles at a later date, but I fear that your bias towards legacy simulation methods blinds you to the task at hand.

And arguing that FEP+ is good enough to be useful proves too much. Simply creating a histogram of distance by atom types is good enough to be useful; even plastic model kits are useful. Being useful at the small-data limit and being a viable path towards the future are very different, and I fear you confuse them at your own peril.

TORTOISE: Think strategically, my tactical friend. Let’s say we’re trying to get to the ultimate protein–ligand prediction model, which I’ll call the Galapagos Giant Model. If I train an NNP that’s halfway there, I’ve built something that’s immediately practically useful and which I can deploy to real problems. If you build a one-shot prediction model that’s halfway there, you’re going to get an overfit and confused model that takes a SMILES string and a sequence and returns meaningless noise.

ACHILLES: (Of course, first I’d have to train a model that was halfway to being halfway complete…)

TORTOISE: The ability of startups and research programs to bootstrap their way through increasing complexity is a critical determiner of their success—this is why YC tells companies to ship and start talking to users as soon as possible. We know that NNPs are already useful. How can you accomplish a similar feat with your approach?

ACHILLES: Ah, but your line of argumentation seems to rely on its own conclusion. Why is my hypothetical half-baked model unusable but yours is useful? Isn’t it just as possible that my model is useful across many domains but struggles to generalize to bizarre systems, while your model manages to be deeply useful nowhere?

The greatest advantage of simulation—its exactitude—is also its greatest weakness. A simulation-based workflow is only as strong as its weakest link, or what one might call its Achilles heel.

TORTOISE: Science aside, I fear the self-reference here will soon become ponderous.

ACHILLES: This might explain why the data on using NNPs in FEP are pretty bleak with today’s models, even though these models are undeniably a big improvement over the predecessor forcefield methods. Furthermore, fine-tuning models to be better at specific tasks seems to make them less general.

TORTOISE: I caution you not to rush to dismiss my approach prematurely. True ML FEP has never been tried, since the timescales remain inaccessible. Ligand-only corrections neglect the most important part of the system, which is the protein–ligand interactions—and we know that protein conformational motion is poorly described by forcefields, potentially biasing the entire simulation in deleterious ways. So no, I cannot feign surprise that these results are underwhelming.

ACHILLES: Still, you can’t deny that even the “overfit” ML methods of today like DiffDock are practically useful—it’s not like most drug programs deal with first-in-class structural families. How well do you think AlphaFold 3 works for kinase inhibitors? I would be surprised if the performance is not excellent.

TORTOISE: The dimensionality of ligand space is much higher than that of protein space.

ACHILLES: True. But it’s possible that generalization is easier in ligand space. I’m growing hungry—how about we continue this discussion over brunch?

TORTOISE: A capital idea. Shall we leave now?

ACHILLES: You are welcome to, but I may sit and read for a bit longer. As you know, I have a considerable speed advantage over you, and keeping up with the literature takes more and more of my time.

TORTOISE: Best of luck. We’ll see who gets there first!

[Exit TORTOISE.]



Thanks to Abhishaike Mahajan, Navvye Anand, Tony Kulesa, Pat Walters, and Ari Wagen for helpful conversations on these topics. I've also taken inspiration from talks I heard by Tom Sercu (Evolutionary Scale) and Pranam Chatterjee (Duke). Any errors are mine alone.

What Did The Early Church Think About Fasting?

February 3, 2025

(This is a bit of a departure from my usual chemistry-focused writing.)

Fasting is an important part of many religious traditions, but modern Protestant Christians don’t really have a unified stance on fasting (and have opposed systematic fasts for a while). That’s not to say that Protestants don’t fast, though: over just the past few years, I’ve met people doing water-only fasts, juice fasts, dinner-only fasts, “social media” fasts, and many more.

These fasts don’t really line up with what I see in neighboring faith traditions:

I’ve been a bit puzzled by all this, so I decided to do a “literature review” and find documents from the early Church that discussed fasting. This post collects and summarizes the sources that I found. The sources are listed in approximate chronological order, with emphasis added throughout—if you don’t want to read everything, you can skip to the end and read my brief takeaways.

Didache (c. 100 AD)

But before the baptism let the baptizer fast, and the baptized, and whatever others can; but you shall order the baptized to fast one or two days before….

But let not your fasts be with the hypocrites; for they fast on the second [Monday] and fifth day [Thursday] of the week; but fast on the fourth day [Wednesday] and the Preparation (Friday).

Shepherd of Hermas (c. 150–200 AD)

Thus, then, shall you observe the fasting which you intend to keep. First of all, be on your guard against every evil word, and every evil desire, and purify your heart from all the vanities of this world. If you guard against these things, your fasting will be perfect. And you will do also as follows. Having fulfilled what is written, in the day on which you fast you will taste nothing but bread and water; and having reckoned up the price of the dishes of that day which you intended to have eaten, you will give it to a widow, or an orphan, or to some person in want, and thus you will exhibit humility of mind, so that he who has received benefit from your humility may fill his own soul, and pray for you to the Lord. If you observe fasting, as I have commanded you, your sacrifice will be acceptable to God, and this fasting will be written down; and the service thus performed is noble, and sacred, and acceptable to the Lord. These things, therefore, shall you thus observe with your children, and all your house, and in observing them you will be blessed; and as many as hear these words and observe them shall be blessed; and whatsoever they ask of the Lord they shall receive.

On Fasting, Tertullian (c. 160–240 AD)

Now, if there has been temerity in our retracing to primordial experiences the reasons for God's having laid, and our duty (for the sake of God) to lay, restrictions upon food, let us consult common conscience. Nature herself will plainly tell with what qualities she is ever wont to find us endowed when she sets us, before taking food and drink, with our saliva still in a virgin state, to the transaction of matters, by the sense especially whereby things divine are handled; whether (it be not) with a mind much more vigorous, with a heart much more alive, than when that whole habitation of our interior man, stuffed with meats, inundated with wines, fermenting for the purpose of excremental secretion, is already being turned into a premeditatory of privies, (a premeditatory) where, plainly, nothing is so proximately supersequent as the savouring of lasciviousness…

This principal species in the category of dietary restriction may already afford a prejudgment concerning the inferior operations of abstinence also, as being themselves too, in proportion to their measure, useful or necessary. For the exception of certain kinds from use of food is a partial fast. Let us therefore look into the question of the novelty or vanity of xerophagies, to see whether in them too we do not find an operation alike of most ancient as of most efficacious religion… I return likewise to Elijah. When the ravens had been wont to satisfy him with bread and flesh, why was it that afterwards, at Beersheba of Judea, that certain angel, after rousing him from sleep, offered him, beyond doubt, bread alone, and water? Had ravens been wanting, to feed him more liberally? Or had it been difficult to the angel to carry away from some pan of the banquet-room of the king some attendant with his amply-furnished waiter, and transfer him to Elijah, just as the breakfast of the reapers was carried into the den of lions and presented to Daniel in his hunger? But it behooved that an example should be set, teaching us that, at a time of pressure and persecution and whatsoever difficulty, we must live on xerophagies…. Anyhow, wherever abstinence from wine is either exacted by God or vowed by man, there let there be understood likewise a restriction of food fore-furnishing a formal type to drink. For the quality of the drink is correspondent to that of the eating. It is not probable that a man should sacrifice to God half his appetite; temperate in waters, and intemperate in meats….

The apostle reprobates likewise such as bid to abstain from meats; but he does so from the foresight of the Holy Spirit, precondemning already the heretics who would enjoin perpetual abstinence to the extent of destroying and despising the works of the Creator; such as I may find in the person of a Marcion, a Tatian, or a Jupiter, the Pythagorean heretic of today; not in the person of the Paraclete. For how limited is the extent of our interdiction of meats! Two weeks of xerophagies in the year (and not the whole of these — the Sabbaths, to wit, and the Lord's days, being excepted) we offer to God; abstaining from things which we do not reject, but defer.

Letter 1, Athanasius (329 AD)

For since, as I before said, there are various proclamations, listen, as in a figure, to the prophet blowing the trumpet; and further, having turned to the truth, be ready for the announcement of the trumpet, for he says, 'Blow the trumpet in Sion: sanctify a fast' This is a warning trumpet, and commands with great earnestness, that when we fast, we should hallow the fast. For not all those who call upon God, hallow God, since there are some who defile Him; yet not Him — that is impossible — but their own mind concerning Him; for He is holy, and has pleasure in the saints. And therefore the blessed Paul accuses those who dishonour God; 'Transgressors of the law dishonour God' So then, to make a separation from those who pollute the fast, he says here, 'sanctify a fast.' For many, crowding to the fast, pollute themselves in the thoughts of their hearts, sometimes by doing evil against their brethren, sometimes by daring to defraud…

We begin the holy fast on the fifth day of Pharmuthi (March 31), and adding to it according to the number of those six holy and great days, which are the symbol of the creation of this world, let us rest and cease (from fasting) on the tenth day of the same Pharmuthi (April 5), on the holy sabbath of the week. And when the first day of the holy week dawns and rises upon us, on the eleventh day of the same month (April 6), from which again we count all the seven weeks one by one, let us keep feast on the holy day of Pentecost — on that which was at one time to the Jews, typically, the feast of weeks, in which they granted forgiveness and settlement of debts; and indeed that day was one of deliverance in every respect.'

Catechetical Lecture 4, Cyril of Jerusalem (c. 350 AD)

And concerning food let these be your ordinances, since in regard to meats also many stumble. For some deal indifferently with things offered to idols, while others discipline themselves, but condemn those that eat: and in different ways men's souls are defiled in the matter of meats, from ignorance of the useful reasons for eating and not eating. For we fast by abstaining from wine and flesh, not because we abhor them as abominations, but because we look for our reward; that having scorned things sensible, we may enjoy a spiritual and intellectual feast; and that having now sown in tears we may reap in joy in the world to come. Despise not therefore them that eat, and because of the weakness of their bodies partake of food.

Apostolic Constitutions, Book V (c. 375 AD)

You should therefore fast on the days of the passover, beginning from the second day of the week until the preparation, and the Sabbath, six days, making use of only bread, and salt, and herbs, and water for your drink; but do you abstain on these days from wine and flesh, for they are days of lamentation and not of feasting….

We enjoin you to fast every fourth day of the week, and every day of the preparation, and the surplusage of your fast bestow upon the needy; every Sabbath day excepting one, and every Lord's day, hold your solemn assemblies, and rejoice: for he will be guilty of sin who fasts on the Lord's day, being the day of the resurrection, or during the time of Pentecost, or, in general, who is sad on a festival day to the Lord. For on them we ought to rejoice, and not to mourn.

Homily 1, Basil of Caesarea (330–379 AD)

Yet even life in Paradise is an image of fasting, not only insofar as man, sharing the life of the Angels, attained to likeness with them through being contented with little, but also insofar as those things which human ingenuity subsequently invented had not yet been devised by those living in Paradise, be it the drinking of wine, the slaughter of animals, or whatever else befuddles the human mind. Since we did not fast, we fell from Paradise; let us, therefore, fast in order that we might return thither….

Do not, however, define the benefit that comes from fasting solely in terms of abstinence from foods. For true fasting consists in estrange­ment from vices. “Loose every burden of iniquity.” Forgive your neigh­bor the distress he causes you; forgive him his debts. “Fast not for quar­rels and strifes.” You do not eat meat, but you devour your brother. You abstain from wine, but do not restrain yourself from insulting others. You wait until evening to eat, but waste your day in law courts. Woe to those who get drunk, but not from wine. Anger is inebriation of the soul, mak­ing it deranged, just as wine does. Grief is also a form of intoxication, one that submerges the intellect. Fear is another kind of drunkenness, when we have phobias regarding inappropriate objects; for Scripture says: “Rescue my soul from fear of the enemy.” And in general, every passion which causes mental derangement may justly be called drunkenness.

De Elia Et Jejunio, Ambrose (c. 389)

GPT-4o translated this for me.
Fasting is the medicine of the soul, which teaches the body to abstain not only from vices but also from unnecessary desires. Just as the sick are often advised to abstain from certain foods, so too does the soul, wounded by sins, need the medicine of fasting, so that the allurements of pleasures may be removed and the purity of the heart may grow.

Thus, meat is to be avoided during fasts, for no sacrifice is pleasing if it nourishes the desires of the flesh. Likewise, wine must be tempered, lest the sweetness of drink weaken the fervor of devotion. For the holy Fathers abstained not only from food but also from drink, so that the entirety of body and soul might be consecrated to the Lord.

From this also arises the greater significance of fasting during Lent, so that not only is the external body afflicted, but the inner person is also renewed. For this reason, the number of forty days is sanctified, as the Lord fasted for forty days and nights in the desert and left this example for us, so that we may not falter in abstinence…

Fasting should not only be an abstinence from food but also a discipline of the soul. For one who abstains from food but does not abstain from sin harms himself more than he benefits. Thus fasting was pleasing to the holy men of old, as they neither consumed food nor committed sin. For it is written: 'Sanctify a fast' (Joel 2:15), meaning not only to observe a physical fast but also a spiritual one, free from sins, devoid of greed, unyielding to anger, and maintaining purity of mind and body.

As it is written, the fast is not broken before sunset, so that devotion is preserved throughout the entire day. For what benefit is fasting if the abstinence from food is not accompanied by discipline? The holy men of old fasted in such a way that the entire day was dedicated to prayer, and the fast itself became a pleasing sacrifice. This was also taught by the apostles, whose fasts combined not only abstinence from food but also persistent dedication to prayer.

For fasting alone is not enough; a virtuous life is also required. For what benefit is it to refrain from food if malice abounds? As the Lord said in the Gospel: "Do not be like the hypocrites, who appear gloomy" (Matt. 6:16). Fasting should be an internal sacrifice, so that not only is the body disciplined, but the soul is also purified.

The holy Fathers always observed this practice, ensuring that fasts were completed at evening time, reserving this period not only for abstinence but also for works of piety. After the day's labor, they devoted themselves to prayer and meditation on the divine law, for as evening approached, they offered a complete sacrifice of devotion to the Lord.

Homily 3 on the Statues, John Chrysostom (c. 347–407 AD)

I speak not, indeed, of such a fast as most persons keep, but of real fasting; not merely an abstinence from meats; but from sins too. For the nature of a fast is such, that it does not suffice to deliver those who practise it, unless it be done according to a suitable law. For the wrestler, it is said, is not crowned unless he strive lawfully. To the end then, that when we have gone through the labour of fasting, we forfeit not the crown of fasting, we should understand how, and after what manner, it is necessary to conduct this business; since that Pharisee also fasted, but afterwards went down empty, and destitute of the fruit of fasting….

I have said these things, not that we may disparage fasting, but that we may honour fasting; for the honour of fasting consists not in abstinence from food, but in withdrawing from sinful practices; since he who limits his fasting only to an abstinence from meats, is one who especially disparages it. Do you fast? Give me proof of it by your works! Is it said by what kind of works? If you see a poor man, take pity on him! If you see in enemy, be reconciled to him! If you see a friend gaining honour, envy him not! If you see a handsome woman, pass her by! For let not the mouth only fast, but also the eye, and the ear, and the feet, and the hands, and all the members of our bodies.

Homily 4 on the Statues, John Chrysostom (c. 347–407 AD)

And with respect to the two former precepts, we will discourse to you on another occasion; but we shall speak to you during the whole of the present week respecting oaths; thus beginning with the easier precept. For it is no labour at all to overcome the habit of swearing, if we would but apply a little endeavour, by reminding each other; by advising; by observing; and by requiring those who thus forget themselves, to render an account, and to pay the penalty. For what advantage shall we gain by abstinence from meats, if we do not also expel the evil habits of the soul? Lo, we have spent the whole of this day fasting; and in the evening we shall spread a table, not such as we did on yester-eve, but one of an altered and more solemn kind. Can any one of us then say that he has changed his life too this day; that he has altered his ill custom, as well as his food? Truly, I suppose not! Of what advantage then is our fasting? Wherefore I exhort, and I will not cease to exhort, that undertaking each precept separately, you should spend two or three days in the attainment of it; and just as there are some who rival one another in fasting, and show a marvellous emulation in it; (some indeed who spend two whole days without food; and others who, rejecting from their tables not only the use of wine, and of oil, but of every dish, and taking only bread and water, persevere in this practice during the whole of Lent); so, indeed, let us also contend mutually with one another in abolishing the frequency of oaths. For this is more useful than any fasting; this is more profitable than any austerity.

Homily 10 on the Statues, John Chrysostom (c. 347–407 AD)

What need then is there to say more? Stand only near the man who fasts, and you will straightway partake of his good odour; for fasting is a spiritual perfume; and through the eyes, the tongue, and every part, it manifests the good disposition of the soul. I have said this, not for the purpose of condemning those who have dined, but that I may show the advantage of fasting. I do not, however, call mere abstinence from meats, fasting; but even before this, abstinence from sin; since he who, after he has taken a meal, has come hither with suitable sobriety, is not very far behind the man who fasts; even as he who continues fasting, if he does not give earnest and diligent heed to what is spoken, will derive no great benefit from his fast.

Letter 130 To Demetrias, Jerome (414 AD)

After you have paid the most careful attention to your thoughts, you must then put on the armour of fasting and sing with David: I chastened my soul with fasting, and I have eaten ashes like bread, and as for me when they troubled me my clothing was sackcloth. Eve was expelled from paradise because she had eaten of the forbidden fruit. Elijah on the other hand after forty days of fasting was carried in a fiery chariot into heaven. For forty days and forty nights Moses lived by the intimate converse which he had with God, thus proving in his own case the complete truth of the saying, man does not live by bread only but by every word that proceeds out of the mouth of the Lord. The Saviour of the world, who in His virtues and His mode of life has left us an example to follow, was, immediately after His baptism, taken up by the spirit that He might contend with the devil, and after crushing him and overthrowing him might deliver him to his disciples to trample under foot. For what says the apostle? God shall bruise Satan under your feet shortly. And yet after the Saviour had fasted forty days, it was through food that the old enemy laid a snare for him, saying, If you be the Son of God, command that these stones be made bread. Under the law, in the seventh month after the blowing of trumpets and on the tenth day of the month, a fast was proclaimed for the whole Jewish people, and that soul was cut off from among his people which on that day preferred self-indulgence to self-denial.…

I do not, however, lay on you as an obligation any extreme fasting or abnormal abstinence from food. Such practices soon break down weak constitutions and cause bodily sickness before they lay the foundations of a holy life. It is a maxim of the philosophers that virtues are means, and that all extremes are of the nature of vice; and it is in this sense that one of the seven wise men propounds the famous saw quoted in the comedy, In nothing too much. You must not go on fasting until your heart begins to throb and your breath to fail and you have to be supported or carried by others. No; while curbing the desires of the flesh, you must keep sufficient strength to read scripture, to sing psalms, and to observe vigils. For fasting is not a complete virtue in itself but only a foundation on which other virtues may be built. The same may be said of sanctification and of that chastity without which no man shall see the Lord. Each of these is a step on the upward way, yet none of them by itself will avail to win the virgin's crown. The gospel teaches us this in the parable of the wise and foolish virgins; the former of whom enter into the bridechamber of the bridegroom, while the latter are shut out from it because not having the oil of good works they allow their lamps to fail. This subject of fasting opens up a wide field in which I have often wandered myself, and many writers have devoted treatises to the subject. I must refer you to these if you wish to learn the advantages of self-restraint and on the other hand the evils of over-feeding.

Church History Book V, Socrates of Constantinople (c. 439)

The fasts before Easter will be found to be differently observed among different people. Those at Rome fast three successive weeks before Easter, excepting Saturdays and Sundays. Those in Illyrica and all over Greece and Alexandria observe a fast of six weeks, which they term "The forty days' fast." Others commencing their fast from the seventh week before Easter, and fasting three five days only, and that at intervals, yet call that time "The forty days' fast." It is indeed surprising to me that thus differing in the number of days, they should both give it one common appellation; but some assign one reason for it, and others another, according to their several fancies. One can see also a disagreement about the manner of abstinence from food, as well as about the number of days. Some wholly abstain from things that have life: others feed on fish only of all living creatures: many together with fish, eat fowl also, saying that according to Moses, these were likewise made out of the waters. Some abstain from eggs, and all kinds of fruits: others partake of dry bread only; still others eat not even this: while others having fasted till the ninth hour, afterwards take any sort of food without distinction. And among various nations there are other usages, for which innumerable reasons are assigned. Since however no one can produce a written command as an authority, it is evident that the apostles left each one to his own free will in the matter, to the end that each might perform what is good not by constraint or necessity. Such is the difference in the churches on the subject of fasts.

Ecclesiastical History Chapter XXIII, Bede (731)

But [Bishop Cedd], desiring first to cleanse the place which he had received for the monastery from stain of former crimes, by prayer and fasting, and so to lay the foundations there, requested of the king that he would give him opportunity and leave to abide there for prayer all the time of Lent, which was at hand. All which days, except Sundays, he prolonged his fast till the evening, according to custom, and then took no other sustenance than a small piece of bread, one hen’s egg, and a little milk and water. This, he said, was the custom of those of whom he had learned the rule of regular discipline, first to consecrate to the Lord, by prayer and fasting, the places which they had newly received for building a monastery or a church.


To summarize my takeaways:

What Dates?

Early sources suggest fasting on Wednesday and Friday. Other sources introduce a Lenten fast, but the dates are a little unclear—sometimes just during Holy Week, sometimes just Good Friday and Holy Saturday, sometimes more.

Eating What?

There’s a mix: bread and water, bread and water and vegetables, or anything but meat and alcohol.

Eating When?

Often this isn’t mentioned at all, but sometimes it’s said that you shouldn’t eat anything until the evening.

Books from 2024

January 1, 2025

(Previously: 2022, 2023.)

#1. Baldassar Castiglione, The Book of the Courier

This book gets cited from time to time as a sort of historical guide to "being cool," since the characters spend some time discussing the idea of sprezzatura, basically grace or effortlessness. More interesting to me was the differences between Renaissance conceptions of virtue, character, & masculinity / femininity and how our culture's used to thinking about these concepts—"the past is a foreign country."

#2. Grant Cardone, Sell Or Be Sold
#3. Andrew Chen, The Cold Start Problem
#4–7. Stephanie Meyer, The Twilight Saga

Having never read or watched any Twilight before this year, I found them much weirder than I was expecting.

#8. Fuschia Dunlop, Invitation to a Banquet

As featured on CWT!

#9. Iris Murdoch, The Black Prince
#10. David Kushner, Masters of Doom

A history of id Software, the company behind Wolfenstein 3D, Doom, Quake, and the fast inverse square root algorithm. John Carmack is a legendary figure in the software world, and after reading a fictionalized history inspired by id last year (Tomorrow and Tomorrow and Tomorrow) it was good to read the real thing.

#11. Michael Gerber, The E-Myth Revisited
#12. William Gibson, Neuromancer

A lot of old science fiction is hard to appreciate properly—the best ideas have been sucked out and copied a hundredfold, leaving only the author's weirder musings behind to be appreciated. Neuromancer's been copied as much as any novel, but I was impressed by the pace and general bleakness of this novel; it holds up well.

#13–26. Lois McMaster Bujold, The Vorkosigan Saga

I adored this series, which I read pretty steadily over the course of the year. Bujold writes satisfying, well-constructed plots that keep the focus on characters, not setting. The books fit together nicely, too: each story stands alone, but together paint a decades-long picture of her characters aging, gaining wisdom through their mistakes, and learning to handle the responsibilities placed on them. I think Captain Vorpatril's Alliance is my favorite one.

#27. R. F. Kuang, Babel
#28. Clay Christiansen, The Innovator’s Dilemma

As recommended by Jensen Huang; unlike most business books, this one is worth reading all the way through.

#29. Rob Fitzpatrick, The Mom Test

A canonical book for startup founders, which I probably should have read 1–2 years ago.

#30. Elena Ferrante, My Brilliant Friend

At its core, this is a very similar story to Wicked: a coming-of-age story focusing on the envious and unstable friendship between two women. I liked this book, but haven't yet picked up the rest of the Neopolitan Novels; somehow keeping track of the names must intimidate me on a subconscious level.

#31. Andy Grove, Only The Paranoid Survive
#32. Vernor Vinge, A Fire Upon The Deep

I liked this book a lot. I would have adored it if I'd read it as a kid, I think; there's something viscerally compelling about Vinge's "Zones of Thought."

#33. C. S. Lewis, The Discarded Image

This book examines what medieval Europeans thought of the world: how did they see their universe and their place in it? This is a surprisingly subtle question: obviously they were Christian, but their cosmology was considerably different than what even the most "traditional" modern people believe. Last year, I wrote this about The Canterbury Tales:

Reading Chaucer fills me with questions about the medieval mind. The stories are steeped in Christianity, as one might expect. Any argument goes back to the Bible, even those among animals, and Chaucer assumes a level of familiarity with e.g. the Psalms far exceeding that of most modern Christians. Yet at the same time the Greco-Roman world looms large: Roman gods appear as plot characters in three tales (the Knight’s Tale, the Merchant’s Tale, and the Manciple’s Tale), and Seneca is viewed as a moral authority on par with Scripture. I’m curious how all these beliefs and ideas fit together and welcome any recommendations on this subject.

The Discarded Image exactly answers these questions. If you're at all interested in medieval thought, I highly recommend it.

#34. Jim Collins, Good To Great
#35. R. T. France, The Gospel of Mark
#36. Nathan Azrin, Toilet Training In Less Than One Day

We didn't quite live up to the book's promise, but it took less than a week, so I'm happy.

#37. Tim Keller, Every Good Endeavor
#38. Brad Feld, Venture Deals

Another canonical book for startup founders, which I also probably should have read before now.

#39. Abigail Shrier, Bad Therapy

Shrier invites controversy here as with her other writing. Sweeping conclusions about American youth aside, I found this surprisingly compelling when viewed as a self-help book about how to be less fearful.

#40. Sheldon Vaunaken, A Severe Mercy

Caused me to weep uncontrollably while stuck in a middle seat on a five-hour flight: you've been warned.

#41. Thich Nhat Hanh, You Are Here
#42. Gunther Hagen, This Is Germany: An Art Book
#43. Thomas Malory, Le Mort D’ Arthur
#44. Georgette Heyer, A Civil Contract
#45. Alex Hormozi, $100M Offers
#46. R. F. Kuang, Yellowface
#47. Barry Werth, The Billion-Dollar Molecule

This book is crazy, and I can't believe I hadn't read it before, particularly since I'm not too distant from a lot of the action, professionally or physically. It's framed as a science story, but I think it works even better at conveying the sheer desperation of early-stage startup life.

#48. Diarmid McCullough, The Reformation

The Reformation is much weirder than most people, Protestant or Catholic, realize: I was surprised by the diversity of pre-Reformation religious practice in Europe, which was mostly stamped out in the doctrinal standardization of the 1500s. For both Protestants and Catholics, it became very important to separate "us" from "them," which led to the rise of catechisms, inquisitions, and so on.

This book also soured me on the "Albion's Seed" idea, as popularized by the SSC book review. Viewed in isolation, the Puritans seem like a bunch of religious fanatics, but really McCullough argues that the same impulse predominated all over Europe in a "Reformation of Manners," from Charles Borromeo's Milan to Plymouth Colony. Perhaps it's less about the Puritans and more about the 1620s.

#49. Amy Chua, Battle Hymn Of The Tiger Mother

This book made it back into the discourse, so I decided I'd actually read it—it's much better than I was expecting, and I don't think most of Chua's critics really understand the book. Conclusions for my own parenting have yet to be determined.



I also read good chunks of a number of textbooks this year, including:

Overall, this was a good year for books. As the stress of Rowan has ramped up more, I've found it more difficult to write creatively in my free time, and easier to just read other people's words—this manifests in a much-diminished rate of blogging, and a lot more energy diverted to reading fiction.

Next year, I hope to read:

Happy new year, and feel free to leave book recommendations in the Substack comments!

Are Forcefields Able To Describe Protein Dynamics?

October 11, 2024

This post assumes some knowledge of molecular dynamics and forcefields/molecular mechanics. For readers unfamiliar with these topics, Abhishaike Mahajan has a great guide to these topics on his blog.

Although forcefields are commonplace in all sorts of biomolecular simulation today, there’s a growing body of evidence showing that they often give unreliable results. For instance, here’s Geoff Hutchison criticizing the use of forcefields for small-molecule geometry optimizations:

The use of classical MM methods for optimizing molecular structures having multiple torsional degrees of freedom is only advised if the precision and accuracy of the final structures and rankings obtained from the conformer searches is of little or no concern... current small molecule force fields should not be trusted to produce accurate potential energy surfaces for large molecules, even in the range of “typical organic compounds.” (emphasis added)

Here’s a few other scattered case studies where forcefields have failed:

This list could be a lot longer, but I think the point is clear—even for normal, bio-organic molecules, forcefields often give bad or unreliable answers.

Despite all these results, though, it’s tough to know how bad the problem really is because there have been lots of scientific questions that can only be studied with forcefields. Studying protein conformational motion, for instance, is one of the tasks that forcefields have traditionally been developed for, and the scale and size of the systems in question makes it really challenging to study any other way. So although researchers can show that different forcefields give different answers, it’s tough to quantify how close any of these answers is to the truth, and it’s always been possible to hope that a good forcefield really is describing the underlying motion of the system quite well.

It’s for this reason that I’ve been so fascinated by this April 2024 work from Oliver Unke and co-workers, which studies the dynamics of peptides and proteins using neural network potentials (NNPs). NNPs allow scientists to approach the accuracy of quantum chemical calculations in a tiny fraction of the time by training ML models to reproduce the output of high-level QM-based simulations: although NNPs are still significantly slower than forcefields, they’re typically about 3–6 orders of magnitude faster than the corresponding high-level calculations would be, with only slightly lower accuracy.

A nice overview of the paper.

In this case, Unke and co-workers train a SpookyNet-based NNP to reproduce PBE0/def2-TZVPPD+MBD reference data comprising fragments from the precise systems under study. (MBD refers to Tkatchenko’s many-body dispersion correction, which can be thought of as a fancier alternative to pairwise dispersion corrections like D3 or D4.) In total, about 60 million atom-labeled data points were used to train the NNPs used in this study—which reportedly took 110,000 hours of CPU time to compute, equivalent to 12 CPU-years!

(This might be a nitpick, but I don’t love the use of PBE0 here. Range-separated hybrids are crucial for producing consistent and accurate results for large zwitterionic biomolecules (see e.g. this recent work from Valeev), so it’s possible that the underlying training data isn’t as accurate as it seems.)

The authors find that the resulting NNPs (“GEMS”) perform much better than existing forcefields in terms of overall error metrics: for instance, GEMS has an MAE of 0.45 meV/atom on snapshots of AceAla15Nme structures taken from MD simulations, while Amber has an MAE of 2.27 meV/atom. What’s much more interesting, however, is that GEMS gives significantly different dynamics than forcefields! While Amber simulations of AceAla15Nme predict that a stable α-helix will form at 300 K, GEMS predicts that a mixture of α- and 310 helices exist, which is exactly what’s seen in Ala-rich peptides experimentally. The CHARMM and GROMOS forcefields also get this system wrong, suggesting that GEMS really is significantly more accurate than forcefields at modeling the structure of peptides.

Amber-based simulations stay in one configuration, while GEMS-based simulations are significantly more flexible.

The authors next study crambin, a small 46-residue protein which is frequently chosen as a model system in papers like this. Similar to what was seen with the Ala15 helices, crambin is significantly more flexible when modeled by GEMS than when modeled with Amber (see below figure). The authors conduct a variety of other analyses, and argue that there are “qualitative differences between simulations with conventional FFs and GEMS on all timescales.” This is an incredibly significant result, and one that casts doubt on literal decades of forcefield-based MD simulations. Think about what this means for Relay’s MD-based platform, for instance!

A UMAP plot of protein motion through conformational space. (Yes, we all know UMAP is bad, but this is still a nice plot!)

Why do Amber and GEMS differ so much here? Here’s what Unke and coworkers think is going on:

AmberFF is a conventional FF, and as such, models bonded interactions with harmonic terms. Consequently, structural fluctuations on small timescales are mostly related to these terms. Intermediate-scale conformational changes as involved in, for example, the “flipping” of the dihedral angle in the disulfide bridges of crambin, on the other hand, can only be mediated by (nonbonded) electrostatic and dispersion terms, because the vast majority of (local) bonded terms stay unchanged for all conformations. On the other hand, GEMS makes no distinction between bonded and non-bonded terms, and individual contributions are not restricted to harmonic potentials or any other fixed functional form. Consequently, it can be expected that large structural fluctuations for AmberFF always correspond to “rare events” associated with large energy barriers, whereas GEMS dynamics arise from a richer interplay between chemical bonds and nonlocal interactions.

The overall idea that (1) forcefields impose an unphysical distinction between bonded and non-bonded interactions, and (2) this distinction leads to strange dynamical effects makes sense to me. There’s parts of this discussion that I don’t fully understand—what’s to stop a large structural fluctuation in Amber from having a small barrier? Aren’t all high-barrier processes “rare events” irrespective of where the barrier comes from?

There are some obvious caveats here that mean this sort of strategy isn’t ready for widespread adoption yet. These aren’t foundation models; the authors create a new model for each peptide and protein under study by adding system-specific fragments to the training data and retraining the NNP. This takes “between 1 and 2 weeks, depending on the system,” not counting the cost of running all the DFT calculations, so this is far too expensive and slow for routine use. While this might seem like a failure, I think it’s worth reflecting on how tough this problem is. Crambin alone has thousands of degrees of freedom, not counting the surrounding water molecules, and accurately reproducing the results of the Schrodinger equation for this system is an incredible feat. The fact that we can’t automatically also solve this problem in a zero-shot manner for every other protein is hardly a failure, particularly because it seems very likely that scaling these models will dramatically improve their generalizability!

The other big limitation is inference speed: the SpookyNet-based NNPs are about 250x slower than a conventional forcefield, so it’s much tougher to access the long timescales that are needed to simulate processes like protein folding. There are a lot of techniques that can help address these problems: NNPs can become faster and not require system-specific retraining, coarse graining can reduce the number of particles in the system, and Boltzmann generators can reduce the number of evaluations needed. So the future is bright, but there’s clearly a lot of ML engineering and applied research that will be needed to help NNP-based simulations scale.

But overall, I think this is a very significant piece of work, and one that should make anyone adjacent to forcefield-based MD pause and take note. One day it will be possible to run simulations like this just as quickly as people run regular MD simulations today, and I can’t wait to see what comes of that.

Thanks to Abhishaike Mahajan for helpful feedback on this post.

Robots Won't Solve Organic Synthesis

September 17, 2024

Abhishaike Mahajan recently wrote an excellent piece on how generative ML in chemistry is bottlenecked by synthesis (disclaimer: I gave some comments on the piece, so I may be biased). One of the common reactions to this piece has been that self-driving labs and robotics will soon solve this problem—this is a pretty common sentiment, and one that I’ve heard a lot.

Unfortunately, I think that the strongest version of this take is wrong: organic synthesis won’t be “solved” by just replacing laboratory scientists with robots, because (1) figuring out what reactions to run is hard and (2) running reactions is even harder and (3) we need scientific advances to fix this, not just engineering.

Predicting What Reactions To Run Is Hard

Organic molecules are typically made through a sequence of reactions, and figuring out how to make a molecule involves both the strategic question of which reactions to run in what order and the tactical question of how to run each reaction.

There’s been a ton of work on both of these problems, and it’s certainly true that computer-assisted retrosynthesis tools have come a long way in the last decade! But retrosynthesis is one of those problems that’s (relatively) easy to be good at and almost impossible to be great at. In part, this is because data in this field tends to be very bad: publications and patents are full of irreproducible or misreported reactions, and negative results are virtually never reported. (This post by Derek Lowe is a good overview of some of the problems that the field faces.)

But also, the problems are just hard! I got the chance to try out one of the leading retrosynthesis software packages back in my career as an organic chemist, and when we fed it some of the tough synthetic problems we were facing, it gave us all the logical suggestions that we had already tried (unsuccessfully) and then began suggesting insane reactions to us. I can’t really blame the model for not being able to invent new chemistry—but this illustrates the limits of what pure retrosynthesis can accomplish, absent new scientific discoveries.

The tactical problem of optimizing reaction conditions is also difficult. In cases where there are a lot of continuous variables (like temperatures or concentrations), conventional optimization methods like design-of-experiments can work well—but where reagents or catalysts are involved, optimization becomes significantly more challenging. Lots of cheminformatics/small-data ML work has been done in this area, but it’s still not straightforward to reliably take a reaction drawn on paper and get it to work in the lab.

Running Reactions Is Even Harder

All of the above problems are, in principle, solvable. Where I think robotics is likely to struggle even more is in the actual execution of these routes. Synthetic organic chemistry is an arcane and specialized craft that typically requires at least five years of training to be decent at—most published reaction procedures assume that the reader is themselves a trained organic chemist, and omit most of the “obvious” details that are needed to unambiguously specify a sequence of steps. (The incredibly detailed procedures in Organic Syntheses illustrate just how much is missing from the average publication.)

My favorite illustration of how irreproducible organic chemistry can be is BlogSyn, a brief project that aimed to anonymously assess how easily published reactions could be reproduced. The second BlogSyn post found that a reported olefination of pyridine could not be reproduced—the original author of the paper, Jin-Quan Yu (Scripps) responded, and the shape of reaction tube was ultimately found to be critical to reaction success.

The third BlogSyn post found that an IBX-mediated benzylic oxidation reported by Phil Baran (also of Scripps) could not be reproduced at all as written. Phil and his co-authors responded pretty aggressively, and after several weeks of back-and-forth it was ultimately found that the reaction could be reproduced after modifying virtually every parameter. A comment from Phil’s co-author Tamsyn illustrates some of the complexities at play:

There is in [BlogSyn’s] discussion a throw away comment about the 2-methylnaphthalene not being volatile. Have you never showered and then left your hair to dry at room temperature? – water evaporates at RT, just as 2-methylnaphthalene does at 95 ºC. I suggest to you that at the working temperatures of this reaction, the biggest problem may be substrate evaporation (or “hanging out” on the colder parts of the flask as Phil said)... We need fluorobenzene to reflux in these reactions and in so-doing wash substrate back into the reaction from the walls of the vessel, but it clearly slows/inhibits the reaction also – so, we need to tune this balance carefully and with patience. Scale will have a big influence on how well this process works.

Tamsyn is, of course, right—volatile substrates can evaporate, and part of setting up a reaction is thinking about the vapor pressure of your substrates and how you can address this. But this sort of thinking requires a trained chemist, and isn’t easily automated. There are a million judgment calls to make in organic synthesis—what concentration to use, how quickly to add the reagent, how to work up the reaction, what extraction solvent to use, and so on—and it’s hard enough to teach first-year graduate students how to do all this, let alone robots. Perhaps at the limit as robots achieve AGI this will be possible, but for now these remain difficult problems.

We Need Scientific Advances To Fix This

What can be done, then?

From a synthetic point of view, we need more robust reactions. Lots of academics work on reaction development, but the list of truly reliable reactions remains miniscule: amide couplings, Suzuki couplings, addition to Ellman auxiliaries, SuFFEx chemistry, and so on. From a practical point of view, every reaction like this is worth a thousand random papers with a terrible substrate scope (Sharpless said it better in 2001 than I ever could; see also this 2015 study about how basically no new reactions are used in industry). Approaches like skeletal editing are incredibly exciting, but there’s a limit to how impactful any non-general methodology can be.

Perhaps even more important is finding better methods for reaction purification. Purification is one of those topics which doesn’t get a lot of academic attention, but being able to efficiently automate purification unlocks a whole new set of possibilities. Solid-phase synthesis (which makes purification as simple as rinsing off some beads) has always seen some amount of use in organic chemistry, but a lot of commonly-used reactions aren’t compatible with solid support: either new supports or new reactions could address this problem. There are also cool approaches like Marty Burke’s “catch-and-release” boronate platform which haven’t yet seen broad adoption.

Ultimately, I share the dream of the robotics enthusiasts: if we’re able to make organic synthesis routine, we can stop worrying about how to make molecules and start thinking about what to make! I’m very optimistic about the opportunity of new technologies to address synthetic bottlenecks and enable higher-throughput data generation in chemistry. But getting to this point will take not only laboratory automation but also a ton of scientific progress in organic chemistry, and the first step in solving these problems is actually taking them seriously and recognizing that they’re unsolved.

Thanks to Abhishaike Mahajan and Ari Wagen for helpful comments about this post.