(This is a bit of a departure from my usual chemistry-focused writing.)
Fasting is an important part of many religious traditions, but modern Protestant Christians don’t really have a unified stance on fasting (and have opposed systematic fasts for a while). That’s not to say that Protestants don’t fast, though: over just the past few years, I’ve met people doing water-only fasts, juice fasts, dinner-only fasts, “social media” fasts, and many more.
These fasts don’t really line up with what I see in neighboring faith traditions:
I’ve been a bit puzzled by all this, so I decided to do a “literature review” and find documents from the early Church that discussed fasting. This post collects and summarizes the sources that I found. The sources are listed in approximate chronological order, with emphasis added throughout—if you don’t want to read everything, you can skip to the end and read my brief takeaways.
But before the baptism let the baptizer fast, and the baptized, and whatever others can; but you shall order the baptized to fast one or two days before….
But let not your fasts be with the hypocrites; for they fast on the second [Monday] and fifth day [Thursday] of the week; but fast on the fourth day [Wednesday] and the Preparation (Friday).
Thus, then, shall you observe the fasting which you intend to keep. First of all, be on your guard against every evil word, and every evil desire, and purify your heart from all the vanities of this world. If you guard against these things, your fasting will be perfect. And you will do also as follows. Having fulfilled what is written, in the day on which you fast you will taste nothing but bread and water; and having reckoned up the price of the dishes of that day which you intended to have eaten, you will give it to a widow, or an orphan, or to some person in want, and thus you will exhibit humility of mind, so that he who has received benefit from your humility may fill his own soul, and pray for you to the Lord. If you observe fasting, as I have commanded you, your sacrifice will be acceptable to God, and this fasting will be written down; and the service thus performed is noble, and sacred, and acceptable to the Lord. These things, therefore, shall you thus observe with your children, and all your house, and in observing them you will be blessed; and as many as hear these words and observe them shall be blessed; and whatsoever they ask of the Lord they shall receive.
Now, if there has been temerity in our retracing to primordial experiences the reasons for God's having laid, and our duty (for the sake of God) to lay, restrictions upon food, let us consult common conscience. Nature herself will plainly tell with what qualities she is ever wont to find us endowed when she sets us, before taking food and drink, with our saliva still in a virgin state, to the transaction of matters, by the sense especially whereby things divine are handled; whether (it be not) with a mind much more vigorous, with a heart much more alive, than when that whole habitation of our interior man, stuffed with meats, inundated with wines, fermenting for the purpose of excremental secretion, is already being turned into a premeditatory of privies, (a premeditatory) where, plainly, nothing is so proximately supersequent as the savouring of lasciviousness…
This principal species in the category of dietary restriction may already afford a prejudgment concerning the inferior operations of abstinence also, as being themselves too, in proportion to their measure, useful or necessary. For the exception of certain kinds from use of food is a partial fast. Let us therefore look into the question of the novelty or vanity of xerophagies, to see whether in them too we do not find an operation alike of most ancient as of most efficacious religion… I return likewise to Elijah. When the ravens had been wont to satisfy him with bread and flesh, why was it that afterwards, at Beersheba of Judea, that certain angel, after rousing him from sleep, offered him, beyond doubt, bread alone, and water? Had ravens been wanting, to feed him more liberally? Or had it been difficult to the angel to carry away from some pan of the banquet-room of the king some attendant with his amply-furnished waiter, and transfer him to Elijah, just as the breakfast of the reapers was carried into the den of lions and presented to Daniel in his hunger? But it behooved that an example should be set, teaching us that, at a time of pressure and persecution and whatsoever difficulty, we must live on xerophagies…. Anyhow, wherever abstinence from wine is either exacted by God or vowed by man, there let there be understood likewise a restriction of food fore-furnishing a formal type to drink. For the quality of the drink is correspondent to that of the eating. It is not probable that a man should sacrifice to God half his appetite; temperate in waters, and intemperate in meats….
The apostle reprobates likewise such as bid to abstain from meats; but he does so from the foresight of the Holy Spirit, precondemning already the heretics who would enjoin perpetual abstinence to the extent of destroying and despising the works of the Creator; such as I may find in the person of a Marcion, a Tatian, or a Jupiter, the Pythagorean heretic of today; not in the person of the Paraclete. For how limited is the extent of our interdiction of meats! Two weeks of xerophagies in the year (and not the whole of these — the Sabbaths, to wit, and the Lord's days, being excepted) we offer to God; abstaining from things which we do not reject, but defer.
For since, as I before said, there are various proclamations, listen, as in a figure, to the prophet blowing the trumpet; and further, having turned to the truth, be ready for the announcement of the trumpet, for he says, 'Blow the trumpet in Sion: sanctify a fast' This is a warning trumpet, and commands with great earnestness, that when we fast, we should hallow the fast. For not all those who call upon God, hallow God, since there are some who defile Him; yet not Him — that is impossible — but their own mind concerning Him; for He is holy, and has pleasure in the saints. And therefore the blessed Paul accuses those who dishonour God; 'Transgressors of the law dishonour God' So then, to make a separation from those who pollute the fast, he says here, 'sanctify a fast.' For many, crowding to the fast, pollute themselves in the thoughts of their hearts, sometimes by doing evil against their brethren, sometimes by daring to defraud…
We begin the holy fast on the fifth day of Pharmuthi (March 31), and adding to it according to the number of those six holy and great days, which are the symbol of the creation of this world, let us rest and cease (from fasting) on the tenth day of the same Pharmuthi (April 5), on the holy sabbath of the week. And when the first day of the holy week dawns and rises upon us, on the eleventh day of the same month (April 6), from which again we count all the seven weeks one by one, let us keep feast on the holy day of Pentecost — on that which was at one time to the Jews, typically, the feast of weeks, in which they granted forgiveness and settlement of debts; and indeed that day was one of deliverance in every respect.'
And concerning food let these be your ordinances, since in regard to meats also many stumble. For some deal indifferently with things offered to idols, while others discipline themselves, but condemn those that eat: and in different ways men's souls are defiled in the matter of meats, from ignorance of the useful reasons for eating and not eating. For we fast by abstaining from wine and flesh, not because we abhor them as abominations, but because we look for our reward; that having scorned things sensible, we may enjoy a spiritual and intellectual feast; and that having now sown in tears we may reap in joy in the world to come. Despise not therefore them that eat, and because of the weakness of their bodies partake of food.
You should therefore fast on the days of the passover, beginning from the second day of the week until the preparation, and the Sabbath, six days, making use of only bread, and salt, and herbs, and water for your drink; but do you abstain on these days from wine and flesh, for they are days of lamentation and not of feasting….
We enjoin you to fast every fourth day of the week, and every day of the preparation, and the surplusage of your fast bestow upon the needy; every Sabbath day excepting one, and every Lord's day, hold your solemn assemblies, and rejoice: for he will be guilty of sin who fasts on the Lord's day, being the day of the resurrection, or during the time of Pentecost, or, in general, who is sad on a festival day to the Lord. For on them we ought to rejoice, and not to mourn.
Yet even life in Paradise is an image of fasting, not only insofar as man, sharing the life of the Angels, attained to likeness with them through being contented with little, but also insofar as those things which human ingenuity subsequently invented had not yet been devised by those living in Paradise, be it the drinking of wine, the slaughter of animals, or whatever else befuddles the human mind. Since we did not fast, we fell from Paradise; let us, therefore, fast in order that we might return thither….
Do not, however, define the benefit that comes from fasting solely in terms of abstinence from foods. For true fasting consists in estrangement from vices. “Loose every burden of iniquity.” Forgive your neighbor the distress he causes you; forgive him his debts. “Fast not for quarrels and strifes.” You do not eat meat, but you devour your brother. You abstain from wine, but do not restrain yourself from insulting others. You wait until evening to eat, but waste your day in law courts. Woe to those who get drunk, but not from wine. Anger is inebriation of the soul, making it deranged, just as wine does. Grief is also a form of intoxication, one that submerges the intellect. Fear is another kind of drunkenness, when we have phobias regarding inappropriate objects; for Scripture says: “Rescue my soul from fear of the enemy.” And in general, every passion which causes mental derangement may justly be called drunkenness.
Fasting is the medicine of the soul, which teaches the body to abstain not only from vices but also from unnecessary desires. Just as the sick are often advised to abstain from certain foods, so too does the soul, wounded by sins, need the medicine of fasting, so that the allurements of pleasures may be removed and the purity of the heart may grow.
Thus, meat is to be avoided during fasts, for no sacrifice is pleasing if it nourishes the desires of the flesh. Likewise, wine must be tempered, lest the sweetness of drink weaken the fervor of devotion. For the holy Fathers abstained not only from food but also from drink, so that the entirety of body and soul might be consecrated to the Lord.
From this also arises the greater significance of fasting during Lent, so that not only is the external body afflicted, but the inner person is also renewed. For this reason, the number of forty days is sanctified, as the Lord fasted for forty days and nights in the desert and left this example for us, so that we may not falter in abstinence…
Fasting should not only be an abstinence from food but also a discipline of the soul. For one who abstains from food but does not abstain from sin harms himself more than he benefits. Thus fasting was pleasing to the holy men of old, as they neither consumed food nor committed sin. For it is written: 'Sanctify a fast' (Joel 2:15), meaning not only to observe a physical fast but also a spiritual one, free from sins, devoid of greed, unyielding to anger, and maintaining purity of mind and body.
As it is written, the fast is not broken before sunset, so that devotion is preserved throughout the entire day. For what benefit is fasting if the abstinence from food is not accompanied by discipline? The holy men of old fasted in such a way that the entire day was dedicated to prayer, and the fast itself became a pleasing sacrifice. This was also taught by the apostles, whose fasts combined not only abstinence from food but also persistent dedication to prayer.
For fasting alone is not enough; a virtuous life is also required. For what benefit is it to refrain from food if malice abounds? As the Lord said in the Gospel: "Do not be like the hypocrites, who appear gloomy" (Matt. 6:16). Fasting should be an internal sacrifice, so that not only is the body disciplined, but the soul is also purified.
The holy Fathers always observed this practice, ensuring that fasts were completed at evening time, reserving this period not only for abstinence but also for works of piety. After the day's labor, they devoted themselves to prayer and meditation on the divine law, for as evening approached, they offered a complete sacrifice of devotion to the Lord.
I speak not, indeed, of such a fast as most persons keep, but of real fasting; not merely an abstinence from meats; but from sins too. For the nature of a fast is such, that it does not suffice to deliver those who practise it, unless it be done according to a suitable law. For the wrestler, it is said, is not crowned unless he strive lawfully. To the end then, that when we have gone through the labour of fasting, we forfeit not the crown of fasting, we should understand how, and after what manner, it is necessary to conduct this business; since that Pharisee also fasted, but afterwards went down empty, and destitute of the fruit of fasting….
I have said these things, not that we may disparage fasting, but that we may honour fasting; for the honour of fasting consists not in abstinence from food, but in withdrawing from sinful practices; since he who limits his fasting only to an abstinence from meats, is one who especially disparages it. Do you fast? Give me proof of it by your works! Is it said by what kind of works? If you see a poor man, take pity on him! If you see in enemy, be reconciled to him! If you see a friend gaining honour, envy him not! If you see a handsome woman, pass her by! For let not the mouth only fast, but also the eye, and the ear, and the feet, and the hands, and all the members of our bodies.
And with respect to the two former precepts, we will discourse to you on another occasion; but we shall speak to you during the whole of the present week respecting oaths; thus beginning with the easier precept. For it is no labour at all to overcome the habit of swearing, if we would but apply a little endeavour, by reminding each other; by advising; by observing; and by requiring those who thus forget themselves, to render an account, and to pay the penalty. For what advantage shall we gain by abstinence from meats, if we do not also expel the evil habits of the soul? Lo, we have spent the whole of this day fasting; and in the evening we shall spread a table, not such as we did on yester-eve, but one of an altered and more solemn kind. Can any one of us then say that he has changed his life too this day; that he has altered his ill custom, as well as his food? Truly, I suppose not! Of what advantage then is our fasting? Wherefore I exhort, and I will not cease to exhort, that undertaking each precept separately, you should spend two or three days in the attainment of it; and just as there are some who rival one another in fasting, and show a marvellous emulation in it; (some indeed who spend two whole days without food; and others who, rejecting from their tables not only the use of wine, and of oil, but of every dish, and taking only bread and water, persevere in this practice during the whole of Lent); so, indeed, let us also contend mutually with one another in abolishing the frequency of oaths. For this is more useful than any fasting; this is more profitable than any austerity.
What need then is there to say more? Stand only near the man who fasts, and you will straightway partake of his good odour; for fasting is a spiritual perfume; and through the eyes, the tongue, and every part, it manifests the good disposition of the soul. I have said this, not for the purpose of condemning those who have dined, but that I may show the advantage of fasting. I do not, however, call mere abstinence from meats, fasting; but even before this, abstinence from sin; since he who, after he has taken a meal, has come hither with suitable sobriety, is not very far behind the man who fasts; even as he who continues fasting, if he does not give earnest and diligent heed to what is spoken, will derive no great benefit from his fast.
After you have paid the most careful attention to your thoughts, you must then put on the armour of fasting and sing with David: I chastened my soul with fasting, and I have eaten ashes like bread, and as for me when they troubled me my clothing was sackcloth. Eve was expelled from paradise because she had eaten of the forbidden fruit. Elijah on the other hand after forty days of fasting was carried in a fiery chariot into heaven. For forty days and forty nights Moses lived by the intimate converse which he had with God, thus proving in his own case the complete truth of the saying, man does not live by bread only but by every word that proceeds out of the mouth of the Lord. The Saviour of the world, who in His virtues and His mode of life has left us an example to follow, was, immediately after His baptism, taken up by the spirit that He might contend with the devil, and after crushing him and overthrowing him might deliver him to his disciples to trample under foot. For what says the apostle? God shall bruise Satan under your feet shortly. And yet after the Saviour had fasted forty days, it was through food that the old enemy laid a snare for him, saying, If you be the Son of God, command that these stones be made bread. Under the law, in the seventh month after the blowing of trumpets and on the tenth day of the month, a fast was proclaimed for the whole Jewish people, and that soul was cut off from among his people which on that day preferred self-indulgence to self-denial.…
I do not, however, lay on you as an obligation any extreme fasting or abnormal abstinence from food. Such practices soon break down weak constitutions and cause bodily sickness before they lay the foundations of a holy life. It is a maxim of the philosophers that virtues are means, and that all extremes are of the nature of vice; and it is in this sense that one of the seven wise men propounds the famous saw quoted in the comedy, In nothing too much. You must not go on fasting until your heart begins to throb and your breath to fail and you have to be supported or carried by others. No; while curbing the desires of the flesh, you must keep sufficient strength to read scripture, to sing psalms, and to observe vigils. For fasting is not a complete virtue in itself but only a foundation on which other virtues may be built. The same may be said of sanctification and of that chastity without which no man shall see the Lord. Each of these is a step on the upward way, yet none of them by itself will avail to win the virgin's crown. The gospel teaches us this in the parable of the wise and foolish virgins; the former of whom enter into the bridechamber of the bridegroom, while the latter are shut out from it because not having the oil of good works they allow their lamps to fail. This subject of fasting opens up a wide field in which I have often wandered myself, and many writers have devoted treatises to the subject. I must refer you to these if you wish to learn the advantages of self-restraint and on the other hand the evils of over-feeding.
The fasts before Easter will be found to be differently observed among different people. Those at Rome fast three successive weeks before Easter, excepting Saturdays and Sundays. Those in Illyrica and all over Greece and Alexandria observe a fast of six weeks, which they term "The forty days' fast." Others commencing their fast from the seventh week before Easter, and fasting three five days only, and that at intervals, yet call that time "The forty days' fast." It is indeed surprising to me that thus differing in the number of days, they should both give it one common appellation; but some assign one reason for it, and others another, according to their several fancies. One can see also a disagreement about the manner of abstinence from food, as well as about the number of days. Some wholly abstain from things that have life: others feed on fish only of all living creatures: many together with fish, eat fowl also, saying that according to Moses, these were likewise made out of the waters. Some abstain from eggs, and all kinds of fruits: others partake of dry bread only; still others eat not even this: while others having fasted till the ninth hour, afterwards take any sort of food without distinction. And among various nations there are other usages, for which innumerable reasons are assigned. Since however no one can produce a written command as an authority, it is evident that the apostles left each one to his own free will in the matter, to the end that each might perform what is good not by constraint or necessity. Such is the difference in the churches on the subject of fasts.
But [Bishop Cedd], desiring first to cleanse the place which he had received for the monastery from stain of former crimes, by prayer and fasting, and so to lay the foundations there, requested of the king that he would give him opportunity and leave to abide there for prayer all the time of Lent, which was at hand. All which days, except Sundays, he prolonged his fast till the evening, according to custom, and then took no other sustenance than a small piece of bread, one hen’s egg, and a little milk and water. This, he said, was the custom of those of whom he had learned the rule of regular discipline, first to consecrate to the Lord, by prayer and fasting, the places which they had newly received for building a monastery or a church.
To summarize my takeaways:
Early sources suggest fasting on Wednesday and Friday. Other sources introduce a Lenten fast, but the dates are a little unclear—sometimes just during Holy Week, sometimes just Good Friday and Holy Saturday, sometimes more.
There’s a mix: bread and water, bread and water and vegetables, or anything but meat and alcohol.
Often this isn’t mentioned at all, but sometimes it’s said that you shouldn’t eat anything until the evening.
This book gets cited from time to time as a sort of historical guide to "being cool," since the characters spend some time discussing the idea of sprezzatura, basically grace or effortlessness. More interesting to me was the differences between Renaissance conceptions of virtue, character, & masculinity / femininity and how our culture's used to thinking about these concepts—"the past is a foreign country."
#2. Grant Cardone, Sell Or Be SoldHaving never read or watched any Twilight before this year, I found them much weirder than I was expecting.
#8. Fuschia Dunlop, Invitation to a BanquetAs featured on CWT!
#9. Iris Murdoch, The Black PrinceA history of id Software, the company behind Wolfenstein 3D, Doom, Quake, and the fast inverse square root algorithm. John Carmack is a legendary figure in the software world, and after reading a fictionalized history inspired by id last year (Tomorrow and Tomorrow and Tomorrow) it was good to read the real thing.
#11. Michael Gerber, The E-Myth RevisitedA lot of old science fiction is hard to appreciate properly—the best ideas have been sucked out and copied a hundredfold, leaving only the author's weirder musings behind to be appreciated. Neuromancer's been copied as much as any novel, but I was impressed by the pace and general bleakness of this novel; it holds up well.
#13–26. Lois McMaster Bujold, The Vorkosigan SagaI adored this series, which I read pretty steadily over the course of the year. Bujold writes satisfying, well-constructed plots that keep the focus on characters, not setting. The books fit together nicely, too: each story stands alone, but together paint a decades-long picture of her characters aging, gaining wisdom through their mistakes, and learning to handle the responsibilities placed on them. I think Captain Vorpatril's Alliance is my favorite one.
#27. R. F. Kuang, BabelAs recommended by Jensen Huang; unlike most business books, this one is worth reading all the way through.
#29. Rob Fitzpatrick, The Mom TestA canonical book for startup founders, which I probably should have read 1–2 years ago.
#30. Elena Ferrante, My Brilliant FriendAt its core, this is a very similar story to Wicked: a coming-of-age story focusing on the envious and unstable friendship between two women. I liked this book, but haven't yet picked up the rest of the Neopolitan Novels; somehow keeping track of the names must intimidate me on a subconscious level.
#31. Andy Grove, Only The Paranoid SurviveI liked this book a lot. I would have adored it if I'd read it as a kid, I think; there's something viscerally compelling about Vinge's "Zones of Thought."
#33. C. S. Lewis, The Discarded ImageThis book examines what medieval Europeans thought of the world: how did they see their universe and their place in it? This is a surprisingly subtle question: obviously they were Christian, but their cosmology was considerably different than what even the most "traditional" modern people believe. Last year, I wrote this about The Canterbury Tales:
Reading Chaucer fills me with questions about the medieval mind. The stories are steeped in Christianity, as one might expect. Any argument goes back to the Bible, even those among animals, and Chaucer assumes a level of familiarity with e.g. the Psalms far exceeding that of most modern Christians. Yet at the same time the Greco-Roman world looms large: Roman gods appear as plot characters in three tales (the Knight’s Tale, the Merchant’s Tale, and the Manciple’s Tale), and Seneca is viewed as a moral authority on par with Scripture. I’m curious how all these beliefs and ideas fit together and welcome any recommendations on this subject.
The Discarded Image exactly answers these questions. If you're at all interested in medieval thought, I highly recommend it.
#34. Jim Collins, Good To GreatWe didn't quite live up to the book's promise, but it took less than a week, so I'm happy.
#37. Tim Keller, Every Good EndeavorAnother canonical book for startup founders, which I also probably should have read before now.
#39. Abigail Shrier, Bad TherapyShrier invites controversy here as with her other writing. Sweeping conclusions about American youth aside, I found this surprisingly compelling when viewed as a self-help book about how to be less fearful.
#40. Sheldon Vaunaken, A Severe MercyCaused me to weep uncontrollably while stuck in a middle seat on a five-hour flight: you've been warned.
#41. Thich Nhat Hanh, You Are HereThis book is crazy, and I can't believe I hadn't read it before, particularly since I'm not too distant from a lot of the action, professionally or physically. It's framed as a science story, but I think it works even better at conveying the sheer desperation of early-stage startup life.
#48. Diarmid McCullough, The ReformationThe Reformation is much weirder than most people, Protestant or Catholic, realize: I was surprised by the diversity of pre-Reformation religious practice in Europe, which was mostly stamped out in the doctrinal standardization of the 1500s. For both Protestants and Catholics, it became very important to separate "us" from "them," which led to the rise of catechisms, inquisitions, and so on.
This book also soured me on the "Albion's Seed" idea, as popularized by the SSC book review. Viewed in isolation, the Puritans seem like a bunch of religious fanatics, but really McCullough argues that the same impulse predominated all over Europe in a "Reformation of Manners," from Charles Borromeo's Milan to Plymouth Colony. Perhaps it's less about the Puritans and more about the 1620s.
#49. Amy Chua, Battle Hymn Of The Tiger MotherThis book made it back into the discourse, so I decided I'd actually read it—it's much better than I was expecting, and I don't think most of Chua's critics really understand the book. Conclusions for my own parenting have yet to be determined.
I also read good chunks of a number of textbooks this year, including:
Overall, this was a good year for books. As the stress of Rowan has ramped up more, I've found it more difficult to write creatively in my free time, and easier to just read other people's words—this manifests in a much-diminished rate of blogging, and a lot more energy diverted to reading fiction.
Next year, I hope to read:
Happy new year, and feel free to leave book recommendations in the Substack comments!
This post assumes some knowledge of molecular dynamics and forcefields/molecular mechanics. For readers unfamiliar with these topics, Abhishaike Mahajan has a great guide to these topics on his blog.
Although forcefields are commonplace in all sorts of biomolecular simulation today, there’s a growing body of evidence showing that they often give unreliable results. For instance, here’s Geoff Hutchison criticizing the use of forcefields for small-molecule geometry optimizations:
The use of classical MM methods for optimizing molecular structures having multiple torsional degrees of freedom is only advised if the precision and accuracy of the final structures and rankings obtained from the conformer searches is of little or no concern... current small molecule force fields should not be trusted to produce accurate potential energy surfaces for large molecules, even in the range of “typical organic compounds.” (emphasis added)
Here’s a few other scattered case studies where forcefields have failed:
This list could be a lot longer, but I think the point is clear—even for normal, bio-organic molecules, forcefields often give bad or unreliable answers.
Despite all these results, though, it’s tough to know how bad the problem really is because there have been lots of scientific questions that can only be studied with forcefields. Studying protein conformational motion, for instance, is one of the tasks that forcefields have traditionally been developed for, and the scale and size of the systems in question makes it really challenging to study any other way. So although researchers can show that different forcefields give different answers, it’s tough to quantify how close any of these answers is to the truth, and it’s always been possible to hope that a good forcefield really is describing the underlying motion of the system quite well.
It’s for this reason that I’ve been so fascinated by this April 2024 work from Oliver Unke and co-workers, which studies the dynamics of peptides and proteins using neural network potentials (NNPs). NNPs allow scientists to approach the accuracy of quantum chemical calculations in a tiny fraction of the time by training ML models to reproduce the output of high-level QM-based simulations: although NNPs are still significantly slower than forcefields, they’re typically about 3–6 orders of magnitude faster than the corresponding high-level calculations would be, with only slightly lower accuracy.
In this case, Unke and co-workers train a SpookyNet-based NNP to reproduce PBE0/def2-TZVPPD+MBD reference data comprising fragments from the precise systems under study. (MBD refers to Tkatchenko’s many-body dispersion correction, which can be thought of as a fancier alternative to pairwise dispersion corrections like D3 or D4.) In total, about 60 million atom-labeled data points were used to train the NNPs used in this study—which reportedly took 110,000 hours of CPU time to compute, equivalent to 12 CPU-years!
(This might be a nitpick, but I don’t love the use of PBE0 here. Range-separated hybrids are crucial for producing consistent and accurate results for large zwitterionic biomolecules (see e.g. this recent work from Valeev), so it’s possible that the underlying training data isn’t as accurate as it seems.)
The authors find that the resulting NNPs (“GEMS”) perform much better than existing forcefields in terms of overall error metrics: for instance, GEMS has an MAE of 0.45 meV/atom on snapshots of AceAla15Nme structures taken from MD simulations, while Amber has an MAE of 2.27 meV/atom. What’s much more interesting, however, is that GEMS gives significantly different dynamics than forcefields! While Amber simulations of AceAla15Nme predict that a stable α-helix will form at 300 K, GEMS predicts that a mixture of α- and 310 helices exist, which is exactly what’s seen in Ala-rich peptides experimentally. The CHARMM and GROMOS forcefields also get this system wrong, suggesting that GEMS really is significantly more accurate than forcefields at modeling the structure of peptides.
The authors next study crambin, a small 46-residue protein which is frequently chosen as a model system in papers like this. Similar to what was seen with the Ala15 helices, crambin is significantly more flexible when modeled by GEMS than when modeled with Amber (see below figure). The authors conduct a variety of other analyses, and argue that there are “qualitative differences between simulations with conventional FFs and GEMS on all timescales.” This is an incredibly significant result, and one that casts doubt on literal decades of forcefield-based MD simulations. Think about what this means for Relay’s MD-based platform, for instance!
Why do Amber and GEMS differ so much here? Here’s what Unke and coworkers think is going on:
AmberFF is a conventional FF, and as such, models bonded interactions with harmonic terms. Consequently, structural fluctuations on small timescales are mostly related to these terms. Intermediate-scale conformational changes as involved in, for example, the “flipping” of the dihedral angle in the disulfide bridges of crambin, on the other hand, can only be mediated by (nonbonded) electrostatic and dispersion terms, because the vast majority of (local) bonded terms stay unchanged for all conformations. On the other hand, GEMS makes no distinction between bonded and non-bonded terms, and individual contributions are not restricted to harmonic potentials or any other fixed functional form. Consequently, it can be expected that large structural fluctuations for AmberFF always correspond to “rare events” associated with large energy barriers, whereas GEMS dynamics arise from a richer interplay between chemical bonds and nonlocal interactions.
The overall idea that (1) forcefields impose an unphysical distinction between bonded and non-bonded interactions, and (2) this distinction leads to strange dynamical effects makes sense to me. There’s parts of this discussion that I don’t fully understand—what’s to stop a large structural fluctuation in Amber from having a small barrier? Aren’t all high-barrier processes “rare events” irrespective of where the barrier comes from?
There are some obvious caveats here that mean this sort of strategy isn’t ready for widespread adoption yet. These aren’t foundation models; the authors create a new model for each peptide and protein under study by adding system-specific fragments to the training data and retraining the NNP. This takes “between 1 and 2 weeks, depending on the system,” not counting the cost of running all the DFT calculations, so this is far too expensive and slow for routine use. While this might seem like a failure, I think it’s worth reflecting on how tough this problem is. Crambin alone has thousands of degrees of freedom, not counting the surrounding water molecules, and accurately reproducing the results of the Schrodinger equation for this system is an incredible feat. The fact that we can’t automatically also solve this problem in a zero-shot manner for every other protein is hardly a failure, particularly because it seems very likely that scaling these models will dramatically improve their generalizability!
The other big limitation is inference speed: the SpookyNet-based NNPs are about 250x slower than a conventional forcefield, so it’s much tougher to access the long timescales that are needed to simulate processes like protein folding. There are a lot of techniques that can help address these problems: NNPs can become faster and not require system-specific retraining, coarse graining can reduce the number of particles in the system, and Boltzmann generators can reduce the number of evaluations needed. So the future is bright, but there’s clearly a lot of ML engineering and applied research that will be needed to help NNP-based simulations scale.
But overall, I think this is a very significant piece of work, and one that should make anyone adjacent to forcefield-based MD pause and take note. One day it will be possible to run simulations like this just as quickly as people run regular MD simulations today, and I can’t wait to see what comes of that.
Thanks to Abhishaike Mahajan for helpful feedback on this post.
Abhishaike Mahajan recently wrote an excellent piece on how generative ML in chemistry is bottlenecked by synthesis (disclaimer: I gave some comments on the piece, so I may be biased). One of the common reactions to this piece has been that self-driving labs and robotics will soon solve this problem—this is a pretty common sentiment, and one that I’ve heard a lot.
Unfortunately, I think that the strongest version of this take is wrong: organic synthesis won’t be “solved” by just replacing laboratory scientists with robots, because (1) figuring out what reactions to run is hard and (2) running reactions is even harder and (3) we need scientific advances to fix this, not just engineering.
Organic molecules are typically made through a sequence of reactions, and figuring out how to make a molecule involves both the strategic question of which reactions to run in what order and the tactical question of how to run each reaction.
There’s been a ton of work on both of these problems, and it’s certainly true that computer-assisted retrosynthesis tools have come a long way in the last decade! But retrosynthesis is one of those problems that’s (relatively) easy to be good at and almost impossible to be great at. In part, this is because data in this field tends to be very bad: publications and patents are full of irreproducible or misreported reactions, and negative results are virtually never reported. (This post by Derek Lowe is a good overview of some of the problems that the field faces.)
But also, the problems are just hard! I got the chance to try out one of the leading retrosynthesis software packages back in my career as an organic chemist, and when we fed it some of the tough synthetic problems we were facing, it gave us all the logical suggestions that we had already tried (unsuccessfully) and then began suggesting insane reactions to us. I can’t really blame the model for not being able to invent new chemistry—but this illustrates the limits of what pure retrosynthesis can accomplish, absent new scientific discoveries.
The tactical problem of optimizing reaction conditions is also difficult. In cases where there are a lot of continuous variables (like temperatures or concentrations), conventional optimization methods like design-of-experiments can work well—but where reagents or catalysts are involved, optimization becomes significantly more challenging. Lots of cheminformatics/small-data ML work has been done in this area, but it’s still not straightforward to reliably take a reaction drawn on paper and get it to work in the lab.
All of the above problems are, in principle, solvable. Where I think robotics is likely to struggle even more is in the actual execution of these routes. Synthetic organic chemistry is an arcane and specialized craft that typically requires at least five years of training to be decent at—most published reaction procedures assume that the reader is themselves a trained organic chemist, and omit most of the “obvious” details that are needed to unambiguously specify a sequence of steps. (The incredibly detailed procedures in Organic Syntheses illustrate just how much is missing from the average publication.)
My favorite illustration of how irreproducible organic chemistry can be is BlogSyn, a brief project that aimed to anonymously assess how easily published reactions could be reproduced. The second BlogSyn post found that a reported olefination of pyridine could not be reproduced—the original author of the paper, Jin-Quan Yu (Scripps) responded, and the shape of reaction tube was ultimately found to be critical to reaction success.
The third BlogSyn post found that an IBX-mediated benzylic oxidation reported by Phil Baran (also of Scripps) could not be reproduced at all as written. Phil and his co-authors responded pretty aggressively, and after several weeks of back-and-forth it was ultimately found that the reaction could be reproduced after modifying virtually every parameter. A comment from Phil’s co-author Tamsyn illustrates some of the complexities at play:
There is in [BlogSyn’s] discussion a throw away comment about the 2-methylnaphthalene not being volatile. Have you never showered and then left your hair to dry at room temperature? – water evaporates at RT, just as 2-methylnaphthalene does at 95 ºC. I suggest to you that at the working temperatures of this reaction, the biggest problem may be substrate evaporation (or “hanging out” on the colder parts of the flask as Phil said)... We need fluorobenzene to reflux in these reactions and in so-doing wash substrate back into the reaction from the walls of the vessel, but it clearly slows/inhibits the reaction also – so, we need to tune this balance carefully and with patience. Scale will have a big influence on how well this process works.
Tamsyn is, of course, right—volatile substrates can evaporate, and part of setting up a reaction is thinking about the vapor pressure of your substrates and how you can address this. But this sort of thinking requires a trained chemist, and isn’t easily automated. There are a million judgment calls to make in organic synthesis—what concentration to use, how quickly to add the reagent, how to work up the reaction, what extraction solvent to use, and so on—and it’s hard enough to teach first-year graduate students how to do all this, let alone robots. Perhaps at the limit as robots achieve AGI this will be possible, but for now these remain difficult problems.
What can be done, then?
From a synthetic point of view, we need more robust reactions. Lots of academics work on reaction development, but the list of truly reliable reactions remains miniscule: amide couplings, Suzuki couplings, addition to Ellman auxiliaries, SuFFEx chemistry, and so on. From a practical point of view, every reaction like this is worth a thousand random papers with a terrible substrate scope (Sharpless said it better in 2001 than I ever could; see also this 2015 study about how basically no new reactions are used in industry). Approaches like skeletal editing are incredibly exciting, but there’s a limit to how impactful any non-general methodology can be.
Perhaps even more important is finding better methods for reaction purification. Purification is one of those topics which doesn’t get a lot of academic attention, but being able to efficiently automate purification unlocks a whole new set of possibilities. Solid-phase synthesis (which makes purification as simple as rinsing off some beads) has always seen some amount of use in organic chemistry, but a lot of commonly-used reactions aren’t compatible with solid support: either new supports or new reactions could address this problem. There are also cool approaches like Marty Burke’s “catch-and-release” boronate platform which haven’t yet seen broad adoption.
Ultimately, I share the dream of the robotics enthusiasts: if we’re able to make organic synthesis routine, we can stop worrying about how to make molecules and start thinking about what to make! I’m very optimistic about the opportunity of new technologies to address synthetic bottlenecks and enable higher-throughput data generation in chemistry. But getting to this point will take not only laboratory automation but also a ton of scientific progress in organic chemistry, and the first step in solving these problems is actually taking them seriously and recognizing that they’re unsolved.
Thanks to Abhishaike Mahajan and Ari Wagen for helpful comments about this post.I've been playing around with generating non-equilibrium conformations by molecular dynamics recently, and I've been thinking about how to best parse the outputs of a dynamics simulation. A technique I've seen quite often in the literature is "choose a dissimilar subset of conformers by RMSD"—for instance, here's what the SPICE paper says:
For each of the 677 molecules, the dataset includes 50 conformations of which half are low energy and half are high energy. To generate them, RDKit 2020.09.3 was first used to generate 10 diverse conformations. Each was used as a starting point for 100 ps of molecular dynamics at a temperature of 500 K using OpenMM 7.6 and the Amber14 force field. A conformation was saved every 10 ps to produce 100 candidate high energy conformations. From these, a subset of 25 was selected that were maximally different from each other as measured by all atom RMSD.
This makes a good amount of sense: you want to choose conformers which cover as much chemical space as possible so that you get information about the PES as efficiently as possible, and RMSD is a cheap and reasonable way to do this. But how do you actually do this in practice? Nothing super helpful came up after a quick Google search, so I wrote a little script myself:
import cctk import numpy as np from sklearn.cluster import AgglomerativeClustering import sys import tqdm import copy e = cctk.XYZFile.read_file(sys.argv[1]).ensemble molecules = e.molecule_list() rmsd_matrix = np.zeros((len(molecules), len(molecules))) comparison_atoms = molecules[0].get_heavy_atoms() def compute_rmsd(mol1: cctk.Molecule, mol2: cctk.Molecule) -> float: geom1 = copy.deepcopy(mol1.geometry[comparison_atoms]) geom1 -= geom1.mean(axis=0) geom2 = copy.deepcopy(mol2.geometry[comparison_atoms]) geom2 -= geom2.mean(axis=0) return cctk.helper_functions.compute_RMSD(geom1, geom2) for i in tqdm.tqdm(range(len(molecules))): for j in range(i + 1, len(molecules)): rmsd_matrix[i, j] = compute_rmsd(molecules[i], molecules[j]) rmsd_matrix[j, i] = rmsd_matrix[i, j] clustering = AgglomerativeClustering( n_clusters=50, metric="precomputed", linkage="average" ) clustering.fit(rmsd_matrix) selected_molecules: list[int] = [] for cluster_id in range(50): cluster_indices = np.where(clustering.labels_ == cluster_id)[0] selected_molecule = cluster_indices[ np.argmin(rmsd_matrix[cluster_indices].sum(axis=1)) ] selected_molecules.append(selected_molecule) e2 = cctk.ConformationalEnsemble() for idx in selected_molecules: e2.add_molecule(molecules[idx]) cctk.XYZFile.write_ensemble_to_file(sys.argv[2], e2) print("done!")
This script uses agglomerative clustering to sort conformations into clusters, but could easily be adapted to work with other algorithms. To run this script, simply paste into into a file (choose_dissimilar.py) and run:
python choose_dissimilar.py input.xyz output.xyz
This will dump 50 output conformers into output.xyz. Hopefully this saves someone some time... happy computing!