Workflows Are the New Models

July 2, 2025

Expanded from a post on X, which I felt didn’t do a good job expressing all of what I meant.

The past few years of “AI for life science” has been all about the models: AlphaFold 3, neural-network potentials, protein language models, binder generation, docking, co-folding, ADME/tox prediction, and so on. But Chai-2 (and lots of related work) shows us that the vibes are shifting. Models themselves are becoming just a building block; the real breakthroughs are going to happen at the workflow level, as we learn how to combine these models into robust and performant pipelines.

Workflows are the new models. To have a state-of-the-art computational stack for drug discovery (or protein engineering, or materials design, or anything else), it’s no longer enough to have just a single state-of-the-art model. You need a suite of modular tools that you can combine in a way that makes sense for your task. (At Rowan, we’re seeing this happen all over the industry.)

What does this mean in practice? Here are two imaginary case studies illustrating what modern computational chemistry looks like in 2025:

Materials Science

A company is developing a new inorganic photocatalyst for bulk acid–alkene coupling (following Zhu and Nocera, 2020). Their workflow might look something like this:

Agentic literature search for potential photo-active inorganic materials that seem synthesizable.
A diffusion or flow-matching model for 3-D structure generation where crystallography data doesn’t exist.
Rapid structural relaxation with a neural-network potential (NNP) to generate minimized structures.
Adsorption-energy estimation with another NNP to see if alkene binding is feasible.
HOMO–LUMO gap computation with periodic DFT to estimate photo-activity.
Molecular dynamics to check the stability of the bound pose.
Volcano-plot creation and final candidate scoring based on all properties.

The entire cycle can be repeated ad nauseum to generate new candidates, with the focus gradually shifting from exploration to exploitation.

Drug Discovery

A company has identified new CNS biological targets that they hope to inhibit with a small molecule. Their workflow might look something like this:

Based on a starting hit (from a DEL, or from a known binder), generate modifications automatically or by sampling from an enumerated library.
Filter candidates by synthesizability, solubility, pK_a, and other project-specific structural filters.
Dock molecules against the target and potential anti-targets using a fast method like Vina.
For hits predicted to show good selectivity, rescore with a second method (strain-corrected docking, Boltz-2, etc.).
Run a short MD simulation to check the stability of the bound pose.
Screen for blood–brain-barrier permeability and liver toxicity (e.g.).

This cycle, too, can be repeated until ~~you run out of Modal credits~~ a set of promising candidates is identified for synthesis.

Neither of these case studies is based on a particular company; instead, they’re meant to illustrate the sort of ML-native workflows we’re seeing from early adopters across the chemical sciences. For simplicity, experimental integration isn’t shown here, but any sane scientist will obviously incorporate wet-lab testing as soon as possible and feed those insights back into the top of the funnel.

In any case, the overall point is clear—no single model can by itself solve every problem, and figuring out the right way to combine a set of models is itself a non-trivial system-design problem. It’s entirely possible to create a state-of-the-art workflow simply by combining “commoditized” open-source models in a new way, and so far the resultant workflows don’t seem obvious or easy to copy. This defies popular intuition about what constitutes a “moat” for AI companies.

More metaphysically, the line between workflows and models is blurring. Many ML-adjacent people think of models as the active unit of science: “they have a model for X” or “we’re building a model for Y.” But, as shown above, most state-of-the-art research today requires lots of individual ML models, and many “models” are already miniature workflows. For instance, running a single inference call through the Uni-pKa “model” requires enumerating all possible microstates, performing a conformer search, and running geometry optimizations on every individual conformer—just to generate the pairwise-distance matrix used as input for the actual ML model.

Why does this matter? Here are a few thoughts that I've had, after thinking about this point:

Models must be plug-and-play, interoperable, and robust—anything that can’t be integrated into higher-level workflows won’t be used.
The best models might not be those that top isolated benchmarks; in a workflow context, speed, reliability, and uncertainty quantification also matter. Richard Hamming’s first rule of systems engineering comes to mind: “If you optimize the components, you will probably ruin the system performance” (see my previous book review).
Any thinking that depends on a sharp metaphysical difference between workflows and models is probably wrong. I recently had a sales call where someone told me they weren’t interested in any workflows—they only wanted to use models. I wanted to send them to old Slate Star Codex posts, but (wisely?) held my tongue.
Devops and good software engineering will rise in importance. At Rowan, we’ve learned firsthand how hard it is to manage hundreds of thousands of workflows across a vast sea of unruly scientific dependencies.
Relatedly, the amount of scientific, computational, and engineering expertise needed to run a modern computational-science program is rising exponentially—and shows no signs of stopping.

Thanks to Ari Wagen for reading a draft of this post.

If you want email updates when I write new posts, you can subscribe on Substack.