In scientific computation, where I principally reside these days, there’s a cultural and philosophical divide between physics-based and machine-learning-based approaches.
Physics-based approaches to scientific problems typically start with defining a simplified model or theory for the system. For instance, a scientist attempting to simulate protein dynamics might start by defining how the energy of a system depends on its coordinates by choosing (or creating) a forcefield. Once a model’s been selected, numerical simulations are run using the model and the output of these simulations is analyzed to produce a prediction.
Physical simulations like these are extensively used today in climate modeling, orbital mechanics, computational fluid dynamics, and computational chemistry. These approaches are typically robust and relatively easy to interpret, since the underlying theory is perfectly known. Still, it’s often possible to get very complex behavior from a small set of underlying rules. Application of Newtonian gravitation, for instance, gives rise to the chaotic “three-body problem.” This combination of simple physical rules and complex emergent behavior is what makes physical simulations so powerful: starting from a well-understood model, it’s possible to gain non-trivial insights about complex systems.
ML-based approaches to scientific computation turn all of this upside down. Many ML models directly map input to output without any intermediate subtasks, making it difficult to interpret what’s going on. While physics-based methods can often extrapolate from simple test cases to complex emergent behaviors, ML-based methods frequently struggle to predict behaviors outside of what they’ve seen before. Still, ML-based methods are often far more efficient than physical methods and, since they’re not constrained by a specific theoretical framework, can handle complex phenomena where the underlying functional form is not known.
When these two subcultures collide, things often get nasty. Proponents of physics-based modeling frequently allege that ML is fundamentally unscientific, since science is all about extrapolation, and that the unreliability and interpretability of machine learning makes it ill-suited for anything except hacking benchmarks.
On the other side of the aisle, machine-learning advocates claim that physics-based modeling is basically a way for scientists to feel smart and look at pretty pictures and will never be able to capture sufficient complexity to be useful. If you think I’m exaggerating, I’m not—here’s Pranam Chatterjee discussing why structure-based models like AlphaFold are “irrelevant” to disease (emphasis added):
…if we believe Anfinsen’s hypothesis: the sequence should encode everything else, including structure. Why do you need to look at the structure of the protein to understand it? I have concluded people just feel more satisfied looking at a protein, rather than trusting an implicit language model to capture its features. Hence the Nobel Prize.
For Pranam, the physical assumptions made by models like AlphaFold—that proteins have a three-dimensional structure that’s relevant to their biology—mean that these models are incapable of describing the complexity of reality. Contra Pranam, the point that I want to make in this piece is that there’s no reason why physics- and ML-based approaches should be so opposed to one another. In fact, I’m becoming convinced that the future looks like some hybrid of both approaches.
Why? One of the big limitations of physics-based methods, at least in drug discovery, is that the underlying models often aren’t expressive enough and can’t easily be made more expressive. Molecular forcefields have only a handful of terms, which means that they can’t easily represent coupled dihedral rotation, π–π stacking, and so on—let alone bond-breaking or bond-forming processes—but even optimizing empirical parameters for all these terms quickly becomes a laborious and hard-to-scale process. In contrast, ML-based methods can scale to millions of empirical parameters or beyond without becoming intractably complex or overfit.
On the other hand, the big advantage of physics-based approaches is their generalizability—even simple rulesets, like cellular automata, can lead to immensely complex emergent behavior. This sort of extrapolation is rarely seen in scientific ML projects. A recent blog post from Pat Walters makes the observation that most cheminformatic models don’t seem capable of extrapolating outside the dynamic range of their training data. This is surprising, since even the simplest of “physical” models (like a linear equation) are capable of this extrapolation.
If a hybrid approach is to work, it must be able to capture the strengths of both methods. One of the simplest ways to do this is an energy-based approach, where scientists don’t predict an output parameter directly but instead learn an energy function which can be used to indirectly predict an output. (Yann LeCun has advocated for this.) This adds some test-time complexity, since additional computation is needed after inference to generate the final output, but also constrains the model to obey certain physical laws. (If you want to predict forces, for instance, it’s often best to do so through differentiating the predictions of an energy model, since doing this guarantees that the outputs will be a valid gradient field.)
Physics-informed ML approaches can also resolve the interpolation/extrapolation problem. Since physical equations are well-suited to extrapolation, applying a physical model to the output of an ML layer can convert interpolated intermediate states to extrapolated final results. This sort of extrapolation is well-documented in neural network potentials—for instance, ANI-1 was trained only on molecules with 8 or fewer heavy atoms but proved capable of extrapolating to significantly larger molecules with good accuracy.
One fair criticism of these hybrid physics–ML approaches is that they often require much more test-time compute than pure-ML methods: optimizing a geometry with a neural network potential can require hundreds of intermediate gradient calculations, while in theory an ML method could predict the correct geometry with a single step. But in practice I think this is an advantage. Scaling test-time compute has been an incredible boon for the LLM space, and chain-of-thought reasoning models like OpenAI’s o3 and DeepSeek-r1 are now the gold standard. To the extent that physics-informed ML methods give us the ability to spend more compute to get better answers, I suspect this will mostly be good for the field.
At Rowan, we’ve recently published a few papers in this area—Starling, an energy-based model for predicting microstate distributions and macroscopic pKa, and Egret-1, a family of neural network potentials for bioorganic simulation—and I expect we’ll keep working on hybrid physics–ML approaches in the future. Others seem to be moving in this direction too. The recent Boltz-1x model (now on Rowan!) from Gabri Corso and co-workers incorporates inference-time steering for superior physical accuracy, and Achira’s written about the merits of “a third way” in simulation built around “harmonization of the theoretical and the pragmatic.”
There’s immense room for creativity at the intersection of physics and machine learning. Virtually any part of a complex physics-based workflow can potentially be replaced with a bespoke ML model, leading to almost limitless combinatorial possibilities for experimentation. Which combinations of models, algorithms, and architectures will prove to be dead-ends, and which will unlock order-of-magnitude improvements in performance and accuracy? It’s a fun time to be working in this field.
I asked my co-founder Eli what he thought of this piece. He pointed out that the amount of physics one should incorporate into an ML model depends on the amount of data and the dimensionality of the problem. I asked him to explain more, and here’s what he wrote:
In ML terminology, incorporating information about the modality on which you are predicting into the design of the model is called adding inductive bias. One example of this for atomistic simulation is enforcing Euclidean equivariance into a model. This explicitly ensures that any spatial transformation to a model’s input is reflected in its output. This does ensure that physical laws are followed but also increases the computational complexity of the model, limiting system size at inference time and inference speed.
Some neural network potentials like Equiformer enforce equivariance in every operation within the model to ensure physical laws are always followed, while others enforce it only for inputs and outputs, relaxing constraints for intermediate model operations. Models like Orb-v3 don’t enforce equivariance at all, but incorporate dataset augmentation and unsupervised pre-training techniques to improve sample efficiency and learn physical laws.
As dataset size and diversity increase, we may see less of a need for physical inductive bias in models. One place we’ve seen this is with convolutional neural networks (CNNs) and vision transformers (ViTs) in computer vision: CNNs have built-in translational invariance whereas ViTs do not include any spatial inductive bias but are theoretically more flexible. We see ViTs outperforming CNNs only as dataset size and model size pass a certain threshold. If we assume that a similar pattern will hold for atomistic simulation, we might expect that the dataset size at which this threshold occurs will be greater, as molecular simulation has more degrees of freedom than computer vision.
I think this is a good point—thanks Eli! More abstractly, we might expect that incorporating explicit physics will help more in the low-data regime, while pure ML approaches will be more effective as data and compute approach infinity.
Thanks to Ari Wagen, Jonathon Vandezande, and Eli Mann for reading drafts of this post.