Blog


Vibe-Coding a Coloring-Page Generator

September 6, 2025

I don’t write about my non-working life much on the Internet; my online presence has been pretty closely tied to Rowan and I try to adhere to some level of what Mary Harrington calls “digital modesty” regarding my family. Still, some basic demographics are helpful for context.

I have two kids (aged 2 and 4) and one more due in December. My children have many and varied interests: they like playing GeoGuessr with me, they like making forts, they like pretending to serve food, they like playing with Lego, and—most importantly for this post—they like coloring pictures.

Dad–daughter coloring time, ft. her toothbrush which she brought over for no apparent reason.

I find myself printing out a lot of coloring pages these days, and I’ve been generally disappointed by the quality of coloring pages on Google Images. My son often has very specific requests (e.g. “a picture of Starscream fighting”) and it’s difficult to find a coloring page that matches his vision. I have similar problems: I like coloring historical maps because it helps me understand history a bit better, but it’s hard to find good coloring pages of historical maps.

This problem has bothered me for a while. I tried using ChatGPT for this when 4o started being able to generate images, but the results were terrible. Here’s what ChatGPT thought a map of medieval Europe in 1000 AD should look like, for instance:

Points for making modern-day Romania part of the Eastern Roman empire (Ρωμανία), I suppose, but this is unusably bad.

After giving up for a few months, I revisited this problem again recently. My goal this time around was to vibe-code a way to convert any image into a coloring page. (If the idea of “vibe coding” is unfamiliar to you, refer to Andrej Karpathy’s post on X.) I gave the prompt to both GPT 5 and Claude 4.1 Opus: while GPT 5 got confused and started creating epicyclically complex Numpy code, Claude 4.1 gave me a pretty clean solution using OpenCV.

The full code is on Github. Here’s Claude’s summary of its approach, which I confess I don’t fully understand (in the spirit of vibe coding):

This code converts colored images into black-and-white line drawings using multiple computer vision techniques tailored to different image types. The implementation provides three main extraction methods: character-based, color boundary-based, and adaptive threshold-based processing.

The character extraction method combines three edge detection approaches. It first applies CLAHE (Contrast Limited Adaptive Histogram Equalization) with a clip limit of 3.0 and 8x8 tile grid to enhance local contrast. It then uses adaptive thresholding with a 9x9 Gaussian window, Canny edge detection (50-150 thresholds) on a Gaussian-blurred image, and Laplacian operators with a threshold of 30. These outputs are combined using bitwise AND operations to preserve edges while reducing noise. The pipeline includes connected component analysis to remove regions smaller than 25 pixels and morphological closing with a 2x2 kernel to fill gaps.

The color boundary method operates in two parallel paths. The first path converts the image to LAB color space after bilateral filtering (d=5, sigmaColor=50, sigmaSpace=50) and runs Canny edge detection (40-80 thresholds) on each channel independently. The second path processes the grayscale image with a sharpening kernel, applies higher-threshold Canny detection (100-250), adaptive thresholding with a 7x7 window, and morphological gradient operations. Text features are preserved by combining these methods and removing components smaller than 4 pixels. Both paths merge via bitwise OR operations.

All methods support adjustable line thickness through erosion or dilation with 3x3 cross-shaped kernels. The adaptive threshold method uses CLAHE preprocessing (clip limit 2.0) followed by 11x11 Gaussian adaptive thresholding combined with Canny edges (30-100 thresholds). Final outputs undergo morphological opening with a 2x2 kernel for noise reduction.

Here’s a few examples. This is a picture of Optimus Prime from Google Images and the corresponding coloring page (for my son):

This is a map of medieval French duchies and counties and the corresponding coloring page (for me):

Neither of these are perfect! In both cases, the important boundaries are a bit lost in the details: Optimus’s outline is a bit unclear in some areas, and it’s tough to tell e.g. where Brittany’s boundaries are without the colored image as a reference. (The text also gets a little damaged in the map.) I’m sure someone who’s good at computer vision could do a better job here—using a tool like Dino v3 or Segment Anything could probably help, as could understanding what the above code is actually doing.

Still, this is good enough for routine usage and certainly better than the maps I could find floating around on the internet. I’m pretty happy with what a small amount of vibe-coding can accomplish, and I thought I’d share this anecdote in case other parents out there are looking for bespoke coloring pages.

Long-Range Forces and Neural Network Potentials

August 16, 2025
The Astronomer, Johannes Vermeer (c. 1668)

I’ve been sitting on this post for well over a year. This is the sort of thing I might consider turning into a proper review if I had more time—but I’m quite busy with other tasks, I don’t feel like this is quite comprehensive or impersonal enough to be a great review, and I’ve become pretty ambivalent about mainstream scientific publishing anyway.

Instead, I’m publishing this instead as a longform blog post on the state of NNP architectures, even though I recognize this may be interesting only to a small subset of my followers. Enjoy!



Atom-level simulation of molecules and materials has traditionally been limited by the immense complexity of quantum chemistry. Quantum-mechanics-based methods like density-functional theory struggle to scale to the timescales or system sizes required for many important applications, while simple approximations like molecular mechanics aren’t accurate enough to provide reliable models of many real-world systems. Despite decades of continual advances in computing hardware, algorithms, and theoretical chemistry, the fundamental tradeoff between speed and accuracy still limits what is possible for simulations of chemical systems.

Over the past two decades, machine learning has become an appealing alternative to the above dichotomy. In theory, a sufficiently advanced neural network potential (NNP) trained on high-level quantum chemical simulations can learn to reproduce the energy of a system to arbitrary precision, and once trained can reproduce the potential-energy surface (PES) many orders of magnitude faster than quantum chemistry, thus enabling simulations of unprecedented speed and accuracy. (If you’ve never heard of an NNP, Ari’s guide might be helpful.)

In practice, certain challenges arise in training an NNP to reproduce the PES calculated by quantum chemistry. Here’s what Behler and Parinello say in their landmark 2007 paper:

[The basic architecture of neural networks] has several disadvantages that hinder its application to high-dimensional PESs. Since all weights are generally different, the order in which the coordinates of a configuration are fed into the NN [neural network] is not arbitrary, and interchanging the coordinates of two atoms will change the total energy even if the two atoms are of the same type. Another limitation related to the fixed structure of the network is the fact that a NN optimized for a certain number of degrees of freedom, i.e., number of atoms, cannot be used to predict energies for a different system size, since the optimized weights are valid only for a fixed number of input nodes.

To avoid these problems, Behler and Parinello eschew directly training on the full 3N coordinates of each system, and instead learn a “short-range” potential about each atom that depends only on an atom’s neighbors within a given cutoff radius (in their work, 6 Å). Every atom of a given element has the same local potential, thus ensuring that energy is invariant with respect to permutation and making the potential more scalable and easier to learn.

Figure 2 from Behler and Parinello.

This overall approach has served as the groundwork for most subsequent NNPs: although the exact form of the function varies, most NNPs basically work by learning local molecular representations within a given cutoff distance and extrapolating to larger systems. Today, most NNPs follow the “graph neural network” (GNN) paradigm, and the vast majority also incorporate some form of message passing (for more details, see this excellent review from Duval and co-workers).

There are intuitive and theoretical reasons why this is a reasonable assumption to make: “locality is a central simplifying concept in chemical matter” (Chan), “local cutoff is a powerful inductive bias for modeling intermolecular interactions” (Duval), and the vast majority of chemical phenomena are highly local. But a strict assumption of locality can cause problems. Different intermolecular interactions have different long-range behavior, and some interactions drop off only slowly with increasing distance. See, for instance, this chart from Knowles and Jacobsen:

Table 1 from Knowles and Jacobsen.

As the above chart shows, interactions involving charged species can remain significant even at long distances. For example, a positive charge and a negative charge 15 Å apart in the gas phase exert a force of 1.47 kcal/mol/Å on each other; for those outside the field, that’s quite large. (In the condensed phase, this is reduced by a constant factor corresponding to the dielectric constant ε of the medium: for water, ε ≈ 78.)

This creates problems for NNPs, as naïve application of a 6 Å cutoff scheme would predict no force between the above charges. While NNPs can still perform well for systems without substantial long-range forces without addressing this problem, lots of important biomolecules and materials contain charged or ionic species—making it a virtual certainty that NNPs will have to figure out these issues sooner or later.

Almost everyone that I’ve talked to agrees that this problem is important, but there’s very little agreement on what the right approach forward is. I’ve spent the past few years talking to lots of researchers in this area about this question: while there are literally hundreds of papers on this topic, I think most approaches fall into one of three categories:

  1. Scale up existing local GNN approaches. Some approaches simply solve the long-range-force problem by making the NNPs larger—larger cutoff radii, more layers, and more data.
  2. Add explicit physics. Other approaches borrow from molecular mechanics and add explicit physics-based models of long-range forces.
  3. Use non-local neural networks. Still other approaches relax the locality of a pure GNN-based approach and train NNPs that can pass information over longer distances.

In this post, I’ll try to give a brief overview of all three paradigms. I’ll discuss how each approach works, point to evidence suggesting where it might or might not work, and discuss a select case study for each approach. This topic remains hotly debated in the NNP community, and I’m certainly not going to solve anything here. My hope is that this post instead can help to organize readers’ thoughts and, like any good review, help to organize the unstructured primal chaos of the underlying literature.

(There are literally hundreds of papers in this area, and while I’ve tried to cover a lot of ground, it’s a virtual certainty that I haven’t mentioned an important or influential paper. Please don’t take any omission as an intentional slight!)

Category 1 - Scaling Local GNN Approaches

Our first category is NNPs which don’t do anything special for long-range forces at all. This approach is often unfairly pilloried in the literature. Most papers advocating for explicit handling of long-range forces pretend that the alternative is simply discarding all forces beyond the graph cutoff: for instance, a recent review claimed that “interactions between particles more than 5 or 10 angstroms apart are all masked out” in short-range NNPs.

Representation of what an atom “sees” in a 7Å cutoff radius, from Meital Bojan and Sanketh Vedula.

This doesn’t describe modern NNPs at all. Almost all NNPs today use some variant of the message-passing architecture (Gilmer), which dramatically expands the effective cutoff of the model. Each round of message passing lets an atom exchange information with neighbors that are farther away, so a model with a graph cutoff radius of “5 Å” might actually have an effective cutoff of 20–30 Å, which is much more reasonable. It’s easy to find cases in which a force cutoff of 5 Å leads to pathological effects; it’s much harder to find cases in which a force cutoff of 20 Å leads to such effects.

Naïvely, one can calculate the effective cutoff radius as the product of the graph cutoff radius and the number of message-passing steps. Here’s how this math works for the recent eSEN-OMol25 models:

Since most discussions of long-range forces center around the 10–50 Å range, one might think that the larger eSEN models are easily able to handle long-range forces and this whole issue should be moot.

In practice, though, long-range communication in message-passing GNNs is fragile. The influence of distant features decays quickly because of “oversquashing” (the fixed size of messages compresses information that travels over multiple edges) and “oversmoothing” (repeated aggregation tends to make all node states similar). Furthermore, the gradients of remote features become tiny, so learning the precise functional form of long-range effects is difficult. As a result, even having a theoretical effective cutoff radius of “60 Å” is no guarantee that the model performs correctly over distances of 10 or 15 Å.

How long is long-ranged enough for a good description of properties of interest? The short answer is that it’s not clear, and different studies find different results. There’s good evidence that long-range forces may not be crucial for proper description of many condensed-phase systems. Many bulk systems are able to reorient to screen charges, dramatically attenuating electrostatic interactions over long distances and making it much more reasonable to neglect these interactions. Here’s what Behler’s 2021 review says:

The main reason [why local NNPs are used] is that for many systems, in particular condensed systems, long-range electrostatic energy contributions beyond the cutoff, which cannot be described by the short-range atomic energies in [local NNPs], are effectively screened and thus very small.

There are a growing number of papers reporting excellent performance on bulk- or condensed-phase properties with local GNNs. To name a few:

Still, there are still plenty of systems where one might imagine that strict assumptions of locality might lead to pathological behavior:

One common theme here is inhomogeneity—in accordance with previous theoretical work from Janacek, systems with interfaces or anisotropy are more sensitive to long-range forces than their homogenous congeners.

It’s worth noting that none of the above studies were done with state-of-the-art NNPs like MACE-OFF2x or UMA, so it’s possible that these systems don’t actually fail with good local NNPs. There are theoretical reasons why this might be true. Non-polarizable forcefields rely largely on electrostatics to describe non-covalent interactions (and use overpolarized charges), while NNPs can typically describe short- to medium-range NCIs just fine without explicit electrostatics: cf. benchmark results on S22 and S66.

Case Study - The Schake Architecture

An interesting local GNN architecture was recently reported by Justin Airas and Bin Zhang. In the search for scalable implicit-solvent models for proteins, Airas and Zhang found that extremely large cutoff radii (>20 Å) were needed for accurate results, but that conventional equivariant GNNs were way too slow with these large cutoff radii. To address this, the authors use a hybrid two-step approach which combines an accurate short-range “SAKE” layer (5 Å cutoff) with a light-weight long-range “SchNet” layer (25 Å cutoff):

Figure 3 from Airas and Zhang.

This hybrid “Schake” approach seems to give the best of both worlds, at least for this use case. Schake combines the short-range accuracy of a SAKE-only model with the ability to efficiently retrieve information from longer ranges with SchNet. For large systems, 75–80% of the atom pairs are calculated with SchNet.

I like this approach because—much like a classic multipole expansion—it takes into account the fact that long-range interactions are much simpler and lower-dimensional than short-range interactions, and provides an efficient way to scale purely local approaches while retaining their simplicity.

(Random aside: How do non-physics-based NNPs handle ionic systems? While methods that depend only on atomic coordinates—like ANI, Egret-1, or MACE-OFF2x—can’t tell whether a given system is cationic, neutral, or anionic, it’s pretty easy to add the overall charge and spin as a graph-level feature. Given enough data, one might reasonably expect that an NNP could implicitly learn the atom-centered partial charges and the forces between them. Simeon and co-workers explore this topic in a recent preprint and find that global charge labels suffice to allow TensorNet-based models to describe ionic species with high accuracy over a few different datasets. This is what eSEN-OMol25-small-conserving and UMA do.)

Category 2 - Incorporating Explicit Physics

Previously, I mentioned that Coulombic forces decayed only slowly with increasing distance, making them challenging to learn with traditional local NNP architectures. The astute reader might note that unlike the complex short-range interactions which require extensive data to learn, charge–charge interactions have a well-defined expression and are trivial to compute. If the underlying physics is known, why not just use it? Here’s what a review from Unke and co-workers has to say on this topic:

While local models with sufficiently large cutoffs are able to learn the relevant effects in principle, it may require a disproportionately large amount of data to reach an acceptable level of accuracy for an interaction with a comparably simple functional form.

(This review by Dylan Anstine and Olexandr Isayev makes similar remarks.)

The solution proposed by these authors is simple: if we already know the exact answer from classical electrostatics, we can add this term to our model and just ∆-learn the missing interactions. This is the approach taken by our second class of NNPs, which employ some form of explicit physics-based long-range forces in addition to machine-learned short-range forces.

There are a variety of ways to accomplish this. Most commonly, partial charges are assigned to each atom, and an extra Coulombic term is added to the energy and force calculation, with the NNP trained to predict the additional “short-range” non-Coulombic force. (Implementations vary in the details: sometimes the Coulombic term is screened out at short distances, sometimes not. The exact choice of loss function also varies here.)

The architecture of TensorMol (Yao et al, 2018), which to my knowledge is the first NNP to incorporate this approach.

Assigning partial charges to each atom is a difficult and non-trivial task. The electron density is a continuous function throughout space, and there’s no “correct” way to discretize it into various points: many different partitioning schemes have been developed, each with advantages and disadvantages (see this legendary Stack Exchange answer).

The simplest scheme is just to take partial charges from forcefields. While this can work, the atomic charges typically used in forcefields are overestimated to account for solvent-induced polarization, which can lead to unphysical results in more accurate NNPs. Additionally, using fixed charges means that changes in bonding cannot be described. Eastman, Pritchard, Chodera, and Markland explored this strategy in the “Nutmeg” NNP—while their model works well for small molecules, it’s incapable of describing reactivity and leads to poor bulk-material performance (though this may reflect dataset limitations and not a failure of the approach).

More commonly, local neural networks like those discussed above are used to predict atom-centered charges that depend on the atomic environment. These networks can be trained against DFT-derived partial charges or to reproduce the overall dipole moment. Sometimes, one network is used to predict both charges and short-range energy; other times, one network is trained to predict charges and a different network is used to predict short-range energy.

This strategy is flexible and capable of describing arbitrary environment-dependent changes—but astute readers may note that we’ve now created the same problems of locality that we had with force and energy prediction. What if there are long-range effects on charge and the partial charges assigned by a local network are incorrect? (This isn’t surprising; we already know charges interact with one another over long distances and you can demonstrate effects like this in toy systems.)

Figure 1 from Ko et al, which shows how distant protonation states can impact atom-centered partial charges. I think this example is easy to solve if you use a message-passing GNN for charge prediction, though…

To make long-range effects explicit, many models use charge equilibration (QEq)—see this work from Ko & co-workers and this work from Jacobson & coworkers. Typically a neural network predicts environment-dependent electronegativities and hardnesses, and the atomic charges are determined by minimizing the energy subject to charge conservation. QEq naturally propagates electrostatics to infinite range, but it adds nontrivial overhead—pairwise couplings plus linear algebra that (naïvely) scales as O(N3), although speedups are possible through various approaches—and simple application of charge equilibration also leads to unphysically high polarizability and overdelocalization.

Point-charge approaches like those we’ve been discussing aren’t the only solution. There’s a whole body of work on computing electrostatic interactions from the forcefield world, and many of these techniques can be bolted onto GNNs:

Case Study - BAMBOO

Last year, Sheng Gong and co-workers released BAMBOO, an NNP trained for use modeling battery electrolytes. Since electrostatic interactions are very important to electrolyte solutions, BAMBOO splits their force and energy into three components: (1) a “semi-local” component learned by a graph equivariant transformer, (2) an electrostatic energy with point charges predicted by a separate neural network, and (3) a dispersion correction following the D3 paradigm.

To learn accurate atom-centered partial charges, the team behind BAMBOO used a loss function with four terms. The point-charge model was trained to reproduce:

Equation 44 from v4 of the BAMBOO arXiv paper.

With accurate partial charges in hand, the BAMBOO team are able to predict accurate liquid and electrolyte densities, even for unseen molecules. The model’s predictions of ionic conductivity and viscosity are also quite good, which is impressive. BAMBOO uses only low-order equivariant terms (angular momentum of 0 or 1) and can run far more quickly than short-range-only NNPs like Allegro or MACE that require higher-order equivariant terms.

I like the BAMBOO work a lot because it highlights both the potential advantages of adding physics to NNPs—much smaller and more efficient models—but also the challenges of this approach. Physics is complicated, and even getting an NNP to learn atom-centered partial charges correctly requires sophisticated loss-function engineering and orthogonal sources of data (dipole moments and electrostatic potentials).

The biggest argument against explicit inclusion of long-range terms in neural networks, either through local NN prediction or global charge equilibration, is just pragmatism. Explicit handling of electrostatics adds a lot of complexity and doesn’t seem to matter most of the time.

While it’s possible to construct pathological systems where a proper description of long-range electrostatics is crucial, in practice it often seems to be true that purely local NNPs do just fine. This study from Preferred Networks found that adding long-range electrostatics to a GNN (NEquip) didn’t improve performance for most systems, and concluded that “effective cutoff radii can see charge transfer in the present datasets.” Similarly, Marcel Langer recently wrote on X that “most benchmark tasks are easily solved even by short-range message passing”, concluding that “we need more challenging benchmarks… or maybe LR behaviour is simply ‘not that complicated.’”

Still, as the quality of NNPs improves, it’s possible that we’ll start to see more and more cases where the lack of long-range interactions limits the accuracy of the model. The authors of the OMol25 paper speculate that this is the case for their eSEN-based models:

OMol25’s evaluation tasks reveal significant gaps that need to be addressed. Notably, ionization energies/electron affinity, spin-gap, and long range scaling have errors as high as 200-500 meV. Architectural improvements around charge, spin, and long-range interactions are especially critical here.

(For the chemists in the audience, 200–500 meV is approximately 5–12 kcal/mol. This is an alarmingly high error for electron affinity or spin–spin splitting.)

Category 3 - Adding Non-Local Features

The third category of NNPs are those which add an explicit learned non-local term to the network to model long-range effects. This can be seen as a hybrid approach: it’s not as naive as “just make the network bigger,” recognizing that there can be non-trivial non-local effects in chemical systems, but neither does it enforce any particular functional form for these non-local effects.

The above description is pretty vague, which is by design. There are a lot of papers in this area, many of which are quite different: using a self-consistent-field two-model approach (SCFNN), learning long-range effects in reciprocal space, learning non-local representations with equivariant descriptors (LODE) or spherical harmonics (SO3KRATES), learning low-dimensional non-local descriptors (sGDML), or employing long-range equivariant message passing (LOREM). Rather than go through all these papers in depth, I’ve chosen a single example that’s both modern and (I think) somewhat representative.

Case Study - Latent Ewald Summation

In this approach, proposed by Bingqing Cheng in December 2024, a per-atom vector called a “hidden variable” is learned from each atom’s invariant features. Then these variables are combined using long-range Ewald summation and converted to a long-range energy term. At the limit where the hidden variable is a scalar per-atom charge, this just reduces to using Ewald summation with Coulombic point-charge interactions—but with vector hidden variables, considerably more complex long-range interactions can be learned.

This approach (latent Ewald summation, or “LES”) can be combined with basically any short-range NNP architecture—a recent followup demonstrated adding LES to MACE, NequIP, CACE, and CHGNet. They also trained a larger “MACELES” model on the MACE-OFF23 dataset and showed that it outperformed MACE-OFF23(M) on a variety of properties, like organic liquid density:

Figure 8a from Kim et al (2025).

Entertainingly, the MACE-OFF23 dataset they use here is basically a filtered subset of SPICE that removes all the charged compounds—so there aren’t any ionic interactions here, which are the exact sorts of interactions you’d want long-range forces for (cf. BAMBOO, vide supra). I’m excited to see what happens when you train models like this on large datasets containing charged molecules, and to be able to benchmark models like this myself. (These models aren’t licensed for commercial use, sadly, so we can’t run them through the Rowan benchmark suite.)

There’s something charmingly logical about this third family of approaches. If you think there are non-local effects in your system but you want to use machine learning, simply use an architecture which doesn’t enforce strict locality! Abstractly, representations can be “coarser” for long-distance interactions—while scaling the short-range representation to long ranges can be prohibitively expensive, it’s easy to find a more efficient way to handle long-range interactions. (This is essentially how physics-based schemes like the fast multipole method work.)

Still, training on local representations means that you don’t need to train on large systems, which are often expensive to compute reference data for. As we start to add learned components that only operate at long distances, we need correspondingly larger training data—which becomes very expensive with DFT (and impossible with many functionals). This issue gets more and more acute as the complexity of the learnable long-range component increases—while atom-centered monopole models may be trainable simply with dipole and electrostatic distribution data (e.g. BAMBOO, ibid.), learning more complex long-range forces may require much more data.

Conclusions

I’ve been talking to people about this topic for a while; looking back through Google Calendar, I was surprised to realize that I met Simon Batzner for coffee to talk about long-range forces all the way back in February 2023. I started writing this post in June 2024 (over a year ago), but found myself struggling to finish because I didn’t know what to conclude from all this. Over a year later, I still don’t feel like I can predict what the future holds here—and I’m growing skeptical that anyone else can either.

While virtually everyone in the field agrees that (a) a perfectly accurate model of physics does need to account for long-range forces and (b) today’s models generally don’t account for long-range forces correctly, opinions differ sharply as to how much of a practical limitation this is. Some people see long-range forces as essentially a trivial annoyance that matter only for weird electrolyte-containing interfaces, while others see the mishandling of long-range forces as an insurmountable flaw for this entire generation of NNPs.

Progress in NNPs is happening quickly and chaotically enough that it’s difficult to find clear evidence in favor of any one paradigm. The success of GEMS at predicting peptide/protein dynamics, a model trained on the charge- and dispersion-aware SpookyNet architecture, might push me towards the belief that “long-range physics is needed to generalize to the mesoscale,” except that the entirely local MACE-OFF24 model managed to reproduce the same behaviors. (I blogged about these GEMS results earlier.)

Amber-based simulations stay in one configuration, while GEMS-based simulations match the diversity of experimental helix folding.

This general pattern has been repeated many times:

  1. A specific shortcoming of naïve cutoff-based NNPs is observed, like how many models don’t predict stable MD simulations. (Right now, an example of this might be protein–ligand interactions.)
  2. New models that account for long-range forces through some clever mechanism are developed to fix this problem.
  3. But a new generation of larger naïve NNPs are also able to fix this problem, so there’s no clear need for long-range physics.

The above pattern is admittedly oversimplified! I’m not aware of any models without explicit long-range forces that can actually predict ionic conductivity like BAMBOO can, although I don’t think the newest OMol25-based models have been tried. But it surprises me that it’s not easier to find cases in which long-range forces are clearly crucial, and this observation makes me slightly more optimistic that simply scaling local GNNs is the way to go (cf. “bitter lessons in chemistry”, which has admittedly become a bit of a meme).

(There’s something nice about keeping everything in learnable parameters, too, even if the models aren’t strictly local. While explicit physics appeals to the chemist in me, I suspect that adding Coulombic terms will make it harder to do clever ML tricks like coarse-graining or transfer learning. So, ceteris paribus, this consideration favors non-physics-based architectures.)

I want to close by quoting from a paper that came out just as I was finishing this post: “Performance of universal machine-learned potentials with explicit long-range interactions in biomolecular simulations,” from Viktor Zaverkin and co-workers. The authors train a variety of models with and without explicit physics-based terms, but don’t find any meaningful consistent improvement from adding explicit physics. In their words:

Incorporating explicit long-range interactions, even in ML potentials with an effective cutoff radius of 10 Å, further enhances the model’s generalization capability. These improvements, however, do not translate into systematic changes in predicted physical observables… Including explicit long-range electrostatics also did not improve the accuracy of predicted densities and RDFs of pure liquid water and the NaCl-water mixture…. Similar results were obtained for Ala3 and Crambin, with no evidence that explicit long-range electrostatics improve the accuracy of predicted properties.

This is a beautiful illustration of just how confusing and counterintuitive results in this field can be. It seems almost blatantly obvious that for a given architecture, adding smarter long-range forces should be trivially better than not having long-range forces! But to the surprise and frustration of researchers, real-world tests often fail to show any meaningful improvement. This ambiguity, more than anything else, is what I want to highlight here.

While I’m not smart enough to solve these problems myself, my hope is that this post helps to make our field’s open questions a bit clearer and more legible. I’m certain that there are researchers training models and writing papers right now that will address some of these open questions—and I can’t wait to see what the next 12 months holds.



Thanks to Justin Airas, Simon Batzner, Sam Blau, Liz Decolvaere, Tim Duignan, Alexandre Duval, Gianni de Fabritiis, Ishaan Ganti, Chandler Greenwell, Olexandr Isayev, Yi-Lun Liao, Abhishaike Mahajan, Eli Mann, Djamil Maouene, Alex Mathiasen, Alby Musaelian, Mark Neumann, Sam Norwood, John Parkhill, Justin Smith, Guillem Simeon, Hannes Stärk, Kayvon Tabrizi, Moritz Thürlemann, Zach Ulissi, Jonathan Vandezande, Ari Wagen, Brandon Wood, Wenbin Xu, Zhiao Yu, Yumin Zhang, & Larry Zitnik for helpful discussions on these topics—and Tim Duignan, Sawyer VanZanten, & Ari Wagen for editing drafts of this post. Any errors are mine alone; I have probably forgotten some acknowledgements.

Creative Software Licenses

August 4, 2025
The Moneylender and his Wife, Quentin Matsys (1514)
Inspired by related writings from Scott Alexander, who is funnier than me. TW: fiction, but barely.

AlphaProteinStructure-2 is a deep learning model that can predict the structure of mesoscale protein complexes like amyloid fibrils. AlphaProteinStructure-2 is free for academic usage. Typically this means researchers at universities can use the model for free, while researchers at for-profit institutions are banned from using the model without an explicit license agreement.

This creates arbitrage opportunities. For-profit companies can “collaborate” with academic groups and use the model for free in exchange for other forms of compensation. Similarly, “academics” in the process of creating startups from their academic work are incentivized to maintain their institutional affiliations for as long as possible. Both of these loopholes deprive model creators of the chance to capture the value they’re creating, a problem which plagued AlphaProteinStructure-1.

AlphaProteinStructure-2 solves this by explicitly specifying that the model is free for academic usage, not for academic researchers. Running jobs for companies doesn’t count as academic usage, nor does research in support of a future startup. To use AlphaProteinStructure-2, scientists must explicitly disavow any future commercial applications of their work and pledge to maintain the highest standards of academic purity. Because of the inevitable diffusion of ideas within the university, this has led AlphaProteinStructure-2 to be completely banned by all major US research institutions.

The only academic users of AlphaProteinStructure-2 are a handful of land-grant universities whose tech-transfer offices have been shut down by federal regulators for abuse of the patent systems. To ensure that no future commercialization is possible, all incoming professors, postdocs, and graduate students must symbolically run a single AlphaProteinStructure-2 calculation when they join. It is believed that important breakthroughs in Alzheimer’s research have occurred at one or more of these universities, but no scientific publisher has yet been able to stomach the legal risk needed to publish the results.


* * *

Rand-1 is a multimodal spectroscopy model developed by a decentralized anarcho-capitalist research organization. Rand-1 is not licensed for non-commercial use; only for-profit companies are allowed to use Rand-1 (in exchange for a license purchase). Model-hosting companies are allowed to host Rand-1 but cannot allow any academics to use the model through their platform. Researchers at for-profit universities are fine, though.


* * *

Evolv-1a is a RNA language model that’s free for benchmarking but requires a paid license agreement for business usage. The somewhat muddy line between “benchmarking” and “business usage” is enforced by vigorous litigation. Most companies have minimized legal risk by using a single model system for benchmarking and explicitly guaranteeing that they will never use this model system for any commercial application.

For sociological reasons, tRNA has become the go-to standard for assessing Evolv-1a and its competitors, with virtually every company using tRNA-based model systems as internal benchmarks. This consensus seemed quite safe until a family of tRNA structural mutations was implicated in treatment-resistant depression. 29 of the top 30 pharmaceutical companies had used tRNA as a RNA-language-model benchmark, leaving Takeda free to pursue this target virtually without opposition. Efforts by other companies to acquire tRNA assets from startups were blocked by litigation, while Takeda’s drug is expected to enter Phase 3 later this year.

In future, it is expected that all RNA-language-model benchmarking will occur through shell corporations to mitigate risks of this sort.


* * *

DCD-2 is a pocket-conditioned generative model for macrocycles. DCD-2 is completely free to use: simply upload a protein structure, specify the pocket, and DCD-2 will output the predicted binding affinity (with uncertainty estimates) and the structure of the macrocycle in .xsdf format. Unfortunately, .xsdf is a proprietary file format, and decoding the structure back to regular .sdf format requires a special package with a $100K/year license.


* * *

PLM-3 is a protein language model that’s free for commercial entities as long as the usage isn’t directly beneficial to the business. The phrase “directly beneficial” is not clearly defined in the license agreement, though, leading to grey areas:

The company behind PLM-3 has been hiring large numbers of metaphysicists, suggesting that they plan to pursue aggressive litigation in this space.


* * *

Telos-1 is a Boltzmann generator for biopolymers. Telos-1 is free for any usage where the ultimate purpose is charitable—so research towards the public good is permitted, but research that’s intended to make money is banned. This worked as intended until Novo Nordisk sued, arguing that since they’re owned by the non-profit Novo Nordisk Foundation, the ultimate purpose of all their research is charitable. The lawsuit is ongoing.


* * *

NP-2 is a neural network potential that can only be used by non-profits. Originally, this was construed as only organizations possessing 501(c)(3) non-profit status—but after heartfelt appeals from small biotech startups, the company behind NP-1 agreed that companies that were losing money could also qualify as non-profits, since they weren’t actually making any profit.

This led to a predictable wave of financial engineering, and most pharmaceuticals started outsourcing all calculations to shell corporations. These corporations must be “losing money” each quarter, but this simply refers to the operating cash flow. So the shell corporation can simply be spun out with an initial capital outlay of ten million dollars or so, and then calculations can be sold below cost to the parent company until the money runs out.

These companies were originally intended to be disposable, but it turns out that the business model of “sell ML inference to pharma below cost” was very appealing to venture capitalists. Negative unit margins are commonplace in AI right now, and unlike other AI-for-drug-design startups, the shell corporations actually had meaningful enterprise traction. The largest of these companies, EvolvAI (formerly “Merck Sharp Dolme Informatics Solutions 1”) just closed a $200M Series D financing round despite no conceivable path to profitability.



Thanks to Abhishaike Mahajan, Spencer Schneider, Jonathon Vandezande, and Ari Wagen for reading drafts of this post.

Book Review: I See Satan Fall Like Lightning

July 24, 2025
Gustave Doré, illustration for Paradise Lost (1866)

In center-right tech-adjacent circles, it’s common to reference René Girard. This is in part owing to the influence of Girard’s student Peter Thiel. Thiel is one of the undisputed titans of Silicon Valley—founder of PayPal, Palantir, and Founders Fund; early investor and board member at Facebook; creator of the Thiel Fellowship; and mentor to J.D. Vance—and his intellectual influence has pushed Girard’s ideas into mainstream technology discourse. (One of Thiel’s few published writings is “The Straussian Moment,” a 2007 essay which explores the ideas of Girard and Leo Strauss in the wake of 9/11.)

As a chemistry grad student exploring startup culture, I remember going to events and being confused why everyone kept describing things as “Girardian.” In part this is because Girard’s ideas are confusing. But the intrinsic confusingness of Girard is amplified by the fact that almost no one has actually read any of Girard’s writings. Instead, people learn what it means to be “Girardian” by listening to the term in conversation or on a podcast and, with practice, learn to start using it themselves.1

A post on X from Jiankui He.

I’ve been in the startup scene for a few years now, so I figured it was time I read some Girard myself rather than (1) avoiding any discussions about Girard or (2) blindly imitating how everyone else uses the word “Girardian” and hoping nobody finds out. What follows is a brief review of Girard’s 2001 book I See Satan Fall Like Lightning and an introduction to the ideas contained within, with the caveat that I’m not a Girard expert and I’ve only read this one book. (I’ve presented the ideas in a slightly different order than Girard does because they made more sense to me this way; apologies if this upsets any true Girardians out there!)


* * *

I See Satan Fall Like Lightning opens with a discussion of the 10th Commandment:

You shall not covet your neighbor's house; you shall not covet your neighbor's wife, or his male servant, or his female servant, or his ox, or his donkey, or anything that is your neighbor's. (Ex 20:17, ESV)

Girard argues that the archaic word “covet” is misleading and that a better translation is simply “desire.” This commandment, to Girard, illustrates a fundamental truth of human nature—we desire what others have, not out of an intrinsic need but simply because others have them. This is why the 10th Commandment begins by enumerating a list of objects but closes with a prohibition on desiring “anything that is your neighbor’s.” The neighbor is the source of the desire, not the object.

The essential and distinguishing characteristic of humans, in Girard’s framing, is the ability to have flexible and changing desires. Animals have static desires—food, sex, and so on—but humans have the capacity for “mimetic desire,” or learning to copy another’s desire. This is both good and bad:

Once their natural needs are satisfied, humans desire intensely, but they don’t know exactly what they desire, for no instinct guides them. We do not each have our own desire, one really our own… Mimetic desire enables us to escape from the animal realm. It is responsible for the best and the worst in us, for what lowers us below the animal level as well as what elevates us above it. Our unending discords are the ransom of our freedom. (pp. 15–16)

Mimetic desire leads us to copy others’ desires: we see that our friend has a house and we learn that this house is desirable. Mimetic desire, however, leads us into conflict with our neighbor. The house that we desire is the one that someone already has; the partner we desire is someone else’s.

To make matters worse, our own desire for what our neighbor validates and intensifies their own desire through the same mechanism, leading to a cycle of “mimetic escalation” in which the conflict between two parties increases without end. The essential sameness of the two parties makes violence inescapable. The Montagues and Capulets are two noble houses “both alike in dignity” (Romeo and Juliet, prologue)—their similarity makes them mimetic doubles, doomed to destruction.

Artistic depiction of the Battle of Agincourt.

While the above discussion focuses only on two parties, Girard then extends the logic to whole communities, arguing that unchecked mimetic desire would lead to the total destruction of society. To prevent this, something surprising happens: the undirected violence that humans feel towards their neighbors is reoriented onto a single individual, who is then brutally murdered in a collective ritual that cleanses the society of its mutual anger and restores order. Here’s Girard again:

The condensation of all the separated scandals into a single scandal is the paroxysm of a process that begins with mimetic desire and its rivalries. These rivalries, as they multiply, create a mimetic crisis, the war of all against all. The resulting violence of all against all would finally annihilate the community if it were not transformed, in the end, into a war of all against one, thanks to which the unity of the community is reestablished. (p. 24, emphasis original)

(Note that the word “scandal” here is Girard’s transliteration of the Greek σκάνδαλον, meaning “something that causes one to sin” and not the contemporary and more frivolous meaning of the word.)

This process is called the “single-victim mechanism”; the victim is chosen more or less at random as a Schelling point for society’s anger and frustration, not because of any actual guilt. Girard recognizes that this process seems foreign to modern audiences and mounts a vigorous historical defense of its validity. He recounts an episode from Philostratus’s Life of Apollonius of Tyana (link) in which Apollonius ends a plague in Ephesus by convincing the crowd to stone a beggar. Philostratus explains this by saying that the beggar was actually a demon in disguise, but Girard argues that the collective violence cures the Ephesians from social disorder, and that the victim is retrospectively demonized (literally) as a way for the Ephesians to justify their actions.

(How does ending social disorder cure a plague? Girard argues that our use of “plague” to specifically mean an outbreak of infectious disease is anachronistic, and that ancient writers didn’t differentiate between biological and social contagion—in this case, the plague must have been one of social disorder, not a literal viral or bacterial outbreak.)

While we no longer openly kill outsiders to battle plagues, Girard argues that the violent impulses of the single-victim mechanism are still visible in modern societies: he uses the examples of lynch mobs, the Dreyfus affair, witch hunts, and violence against “Jews, lepers, foreigners, the disabled, the marginal people of every kind” (p. 72) to illustrate our propensity towards collective violence. Racism, sexism, ableism, and religious persecution are all different aspects of what Girard argues is a fundamental urge towards collective majority violence.

In the past, ritual human sacrifice is well-documented: see inter alia the Athenian pharmakoi, the Mayan custom of throwing victims into cenotes to drown, and the sinister rites of ancient Celts. While modern anthropologists are typically perplexed by these rituals, Girard argues that they ought not to be downplayed. The development of the single-victim mechanism across so many cultures is not an accident. Instead, this mechanism is the foundation of all human culture, because it replaces primitive violence (which leads to anarchy) with ritual violence and allows society to persist and create institutions.

The single-victim mechanism is necessary cultural technology, which is why so many cultures share the myth of a “founding murder” (Abel, Remus, Apsu, Tammuz, and so on). But the founding murder doesn’t just herald the creation of human society. The collective murder of the victim does so much to restore harmony to the community that the transformation seems miraculous. The people, having witnessed a miracle, decide that the one they killed must now be divine:

Unanimous violence has reconciled the community and the reconciling power is attributed to the victim, who is already “guilty,” already “responsible” for the crisis. The victim is thus transfigured twice: the first time in a negative, evil fashion; the second time in a positive, beneficial fashion. Everyone thought that this victim had perished, but it turns out he or she must be alive since this very one reconstructs the community immediately after destroying it. He or she is clearly immortal and thus divine. (pp. 65–66)

This might seem bizarre—it did to me when I read it—but many other writers have discussed the peculiar motif of the “dying then rising” deity. A more historical example is Caesar, who is first killed and then deified as the founder of the Roman Empire, with his murder being the central event in the advent of the new age.

The Apotheosis of Hercules, Noël Coypel (c. 1700)

Pagan myths and deities, Girard argues, are the echoes of a shadowy pre-Christian era of violent catharsis, a time in which “the strong do what they can and the weak suffer what they must” (Thucydides). Behind the sanitized modern stories of Mount Olympus and Valhalla—dark even in their original, non-Percy-Jackson retellings—are sinister records of outcasts who were first killed and then deified.

Girard argues that the Bible stands in opposition to this mimetic cycle. Collective violence threatens figures like Joseph, Job, and John the Baptist, but the Biblical narrative both defends their innocence and maintains their humanity. The Psalms repeatedly defend the innocence of the Psalmist against the unjust accusation of crowds (cf. Psalms 22, 35, 69). Uniquely among ancient documents, the Bible takes the side of the victim and not the crowd.

In the Gospels, Jesus opposes the single-victim mechanism. In the story of the woman convicted of adultery, he tells the accusers “Let him who is without sin among you be the first to throw a stone at her” (John 8:7, ESV). This is the exact opposite of Apollonius of Tyana:

Saving the adulterous woman from being stoned, as Jesus does, means that he prevents the violent contagion from getting started. Another contagion in the reverse direction is set off, however, a contagion of nonviolence. From the moment the first individual gives up stoning the adulterous woman, he becomes a model who is imitated more and more until finally all the group, guided by Jesus, abandons its plan to stone the woman. (p. 57)

(Unfortunately, it’s unlikely that the story of the woman accused of adultery is in the original text of John. The oldest New Testament sources we have, like Codex Vaticanus, Codex Sinaiticus, and various papyri, lack this story.)

Jesus becomes the target of mimetic violence himself, of course, culminating in his crucifixion and death at Calvary. But Girard argues that what seems like the ultimate victory of mimetic violence—the brutal murder of the person who sought to stop it—is actually its moment of defeat, what he calls the “triumph of the cross” in a paraphrase of Colossians 2. He writes:

The principle of illusion or victim mechanism cannot appear in broad daylight without losing its structuring power. In order to be effective, it demands the ignorance of persecutors who “do not know what they are doing.” It demands the darkness of Satan to function adequately. (pp. 147–148)

Unlike in previous cases, where the actions of the crowd were unanimous and the victim perished alone, a minority remains to proclaim the innocence of Jesus. While the majority of the crowd doesn’t follow them, this is enough—the collective violence of the crowd can only function properly as long as the crowd remains ignorant of what they’re doing. Clearly explaining the victim mechanism also serves to destroy it, and so the victim mechanism is ended by the testimony of the early Christians, a stone “cut out by no human hand” that grows to fill the whole world and destroys the opposing kingdoms (Daniel 2:34–35, ESV).

The Gospel narrative exposes the workings of the victim mechanism and defeats it. While Satan thinks he’s winning by killing Jesus, Jesus’ death will make the single-victim mechanism clear and destroy the ignorance in which the Prince of Darkness must work. Girard explains this as the theological idea of “Satan duped by the cross” (which modern listeners may recognize from The Lion, The Witch, and the Wardrobe):

God in his wisdom had foreseen since the beginning that the victim mechanism would be reversed like a glove, exposed, placed in the open, stripped naked, and dismantled in the Gospel Passion texts… In triggering the victim mechanism against Jesus, Satan believed he was protecting his kingdom, defending his possession, not realizing that, in fact, he was doing the very opposite. (p. 151)

While the story of Jesus’s resurrection might seem to have many parallels with the divinization of sacrificial victims, Girard says that these are “externally similar but radically opposed” (p. 131). Indeed, he argues that the Gospel writers intentionally highlighted similarities between the false resurrection of collective violence and the true resurrection of Jesus: Mark 6:16 shows Herod anxious that John the Baptist, whom he killed, has come back to life, while Luke 23:12 shows how Jesus’ death makes Herod and Pilate friends in the same way the victim mechanism always does. In Girard’s words, “the two writers emphasize these resemblances in order to show the points where the satanic imitations of the truth are most impressive and yet ineffectual” (p. 135). The divinization of victims is pathetic and flimsy next to the true resurrection.

Crucifixion, Bartolomeo Bulgarini (c. 1330)

In the new Christian age, Jesus invites us to avoid mimetic contagion not by attempting to avoid having desires in a Buddhist way—to do so is to deny our nature as humans—but by presenting himself as the model for humans to imitate. Only Jesus, the Bible argues, is a fitting template for human desire: hence Paul’s command to “imitate me as I imitate Christ” (1 Cor 11:1), and the consistent call throughout Scripture to love what God loves and hate what God hates (cf. Ps. 139).

Girard ends his book with a discussion of modernity and our culture’s now-total embrace of victims. Girard’s writing is powerful, concise, and difficult to summarize in this section, so I’ll quote from the final pages at length.

All through the twentieth century, the most powerful mimetic force was never Nazism and related ideologies, all those that openly opposed the concern for victims and that readily acknowledged its Judeo-Christian origin. The most powerful anti-Christian movement is the one that takes over and “radicalizes” the concern for victims in order to paganize it. The powers and principalities want to be “revolutionary” now, and they reproach Christianity for not defending victims with enough ardor. In Christian history they see nothing but persecutions, acts of oppression, inquisitions.

This other totalitarianism presents itself as the liberator of humanity. In trying to usurp the place of Christ, the powers imitate him in the way a mimetic rival imitates his model in order to defeat him. They denounce the Christian concern for victims as hypocritical and a pale imitation of the authentic crusade against oppression and persecution for which they would carry the banner themselves.

In the symbolic language of the New Testament, we would say that in our world Satan, trying to make a new start and gain new triumphs, borrows the language of victims. Satan imitates Christ better and better and pretends to surpass him. This imitation by the usurper has long been present in the Christianized world, but it has increased enormously in our time. The New Testament evokes this process in the language of the Antichrist

The Antichrist boasts of bringing to human beings the peace and tolerance that Christianity promised but has failed to deliver. Actually, what the radicalization of contemporary victimology produces is a return to all sorts of pagan practices: abortion, euthanasia, sexual undifferentiation, Roman circus games galore but without real victims, etc.

Neo-paganism would like to turn the Ten Commandments and all of Judeo-Christian morality into some alleged intolerable violence, and indeed its primary objective is their complete abolition. (pp. 180–181, emphasis original)

Satan, whose previous work through the victim mechanism was defeated by Christianity, now seeks to imitate God’s people and use their own arguments against them. Although Christians invented the idea of having sympathy for the victim (in Girard’s view), Satan now argues that Christianity itself is intrinsically a form of violence against victims, with the only solution being the “complete abolition” of Christian morals. Girard alleges that this is bad, and if implemented would lead to the pagan anarchy of long ago.


* * *

Girard’s style is hard to pin down: he bounces between anthropology, close reading of ancient texts, history, and theology without breaking stride. I enjoy reading his writing a lot, but sometimes I wish there would be more sources: he alludes to a fascinating series of historical interviews with tribes that practiced human sacrifice, for instance, but doesn’t leave a reference to the original interviews.

As a work of theology, I See Satan Fall Like Lightning is interesting but peculiar. It’s not clear to me how interested Girard is in orthodox Christian thought. He alternates between referring to Satan as a person and an abstract concept, for instance, and uses few ideas that would be familiar to students of mainstream systematic theology. This isn’t necessarily wrong, but leaves me with a lot of open questions: what does Girard make of the sacrifice of Isaac, or the concept of sanctification, or the role of faith in all this? These might be answered in his other writings, but they weren’t answered here.

As a work of history, I See Satan Fall Like Lightning is downright bizarre. The book’s literal claim—that all non-Christian “gods” are deified victims of ritual mass murder—is hard for me to accept at face value. That being said, it’s hard enough to reason about the recent past, let alone ancient pre-history. Maybe he’s right about all this? Evidence feels scarce, though, and I See Satan Fall Like Lightning hardly conducts a careful meta-analysis of all available ancient myths.

Perhaps the most similar work to Girard’s in scope and ambition is Julian Jaynes’s The Origin of Consciousness in the Breakdown of the Bicameral Mind (Wikipedia, Slate Star Codex).2 Jaynes argues that ancient people didn’t actually have theory of mind or a concept of the self: instead, they personified their internal monologue and viewed it as the voice of the gods. We usually don’t notice this when we read ancient texts, because we subconsciously assume that they were similar to us. But, to quote Scott’s review:

Every ancient text is in complete agreement that everyone in society heard the gods’ voices very often and usually based decisions off of them. Jaynes is just the only guy who takes this seriously.

Much like Jaynes, Girard takes a surprising historical observation—the ubiquity of human sacrifice and the fact that ancient people saw this as essential to the health of their society—and takes it seriously, building an entire argument about how ritual human sacrifice is the original cultural technology and the root of all civilization. While I’m not fully convinced, the intellectual commitment is admirable.

Most bizarre of all, though, is the fact that these esoteric ideas have become a mainstream part of “Grey Tribe” thought. If I’d read this book in a vacuum, I wouldn’t expect that any of the ideas would have achieved much popularity—but in today’s world, describing things as “Girardian” or “mimetic” is de rigueur for the aspiring thought leader.

This observation itself is perhaps the best testimony to the strength of Girard’s ideas. What force other than mimetic rivalry could be strong enough to convince thousands of venture capitalists, each attempting to craft a contrarian high-conviction persona, to all reference the same Christian philosophical anthropologist?

A post on X from Tobias Huber.
Thanks to my wife and Ari Wagen for reading drafts of this piece, and for Micah from church for lending me a copy of this book.

Footnotes

  1. I took a stab at discussing some Girard ideas previously on the blog, although I did actually read the essay which I discuss.
  2. Full transparency: I’ve just read the SSC book review and not (yet) the full book, thus committing the very sin which I accuse others of committing above! "Let he who is without sin cast the first stone" &c.

How To Give An External Research Talk

July 21, 2025

(This post is copied from some notes I gave to our summer interns at Rowan almost without modification. Hopefully people outside Rowan find this useful too!)

This is a brief and opinionated guide on how to give a research talk to an external audience. Some initial points of clarification—this guide is for a research talk, not a sales call or a VC pitch. Research talks have their own culture and norms; treating a research talk as a sales call is likely to backfire disastrously.

When is a research talk appropriate? Generally, if you’re talking to scientists, you should view your talk as a research talk, unless specifically advertised otherwise.

This advice is not directly applicable for talks to internal audiences; collaborators need much less context than external audiences and you can streamline your talk accordingly. (Note that “external” and “internal” here refer to projects, not corporations—a talk to a different division of your company is “external” to the project even if you technically have the same employer.)

Goal

The goal of an external research talk is to teach the audience something. This is generally underappreciated. Many people act as if the goal of the talk is to show that they’re smart, or to show how impressive their research is, or to dump all the data from a given paper onto slides. All of these lead to talks that are mediocre at best and barbarous at worst.

If you learn something from a talk, it’s a good talk. This is also nice because it’s easy to learn something; even bad results can teach you something. For talks in different fields, I often learn more from the introduction to a talk than from the actual results section (which goes over my head).

Audience

In any given talk, there might be several categories of people.

A perfect talk has something for everyone; you can give the fans something they didn’t read in the paper, mollify the critics, teach most listeners something new, and maybe even interest the clueless folks.

Rough Structure

My contrarian take is that the background/history should comprise 30–40% of the talk. Most people have less context for your work than you expect; “context is that which is scarce.” It’s almost always worth spending more time explaining why what you’re doing is important, what other people have done, and how people in your field think about problems.

Paradoxically, the longer you spend on background, the more impactful your results might be. You’re both building tension, as the audience wonders when you’ll get to your research, and you’re positioning your research such that when you share your actual idea the audience will be maximally excited. (I always picture this like an old-school samurai duel; the longer you wait before you “strike” with your results, the better.)

Many scientists overemphasize discussing their implementation and results. These topics are always the focus of the actual work, because they take the bulk of the time, and papers focus on these too since they require the most details and associated data. But talks that just explain mathematics, statistics, or data cleaning in detail are usually boring or painful to listen to—just because you suffered through the details of this process doesn’t mean the audience needs to suffer too.

(Paradoxically, the fact that papers focus on implementation/results means that your talk is free to deemphasize them. People who are following the field may have already read your paper; people who aren’t following the field probably won’t understand the methodology or detailed results anyway. )

One exception is if there’s some personal or non-obvious story about implementation and results—if you wasted time on the wrong architecture or had some interesting realization that led to a breakthrough, these are great to include. I’ve often gone to talks just so I can hear any narrative details that aren’t “neat” enough to be in the paper.

The ending discussion can vary a lot from talk to talk, but I think it’s worth stating clearly what you think the conclusions should be from your work. What should someone in the field take away? What should someone in an adjacent field take away? What do you predict the future of this area of research will be?

Here's the last slide from a talk I gave at MSU in February:

It’s always good to save at least 10 minutes within the allotted time for Q&A, and to leave time after the talk for additional questions.

Slides

Different people and fields have different customs here; there’s no hard-and-fast rule. I rarely make new figures for talks, because it takes forever—I’ll instead take images from existing papers and put the citations at the bottom.

A few disjointed thoughts: