AlphaProteinStructure-2 is a deep learning model that can predict the structure of mesoscale protein complexes like amyloid fibrils. AlphaProteinStructure-2 is free for academic usage. Typically this means researchers at universities can use the model for free, while researchers at for-profit institutions are banned from using the model without an explicit license agreement.
This creates arbitrage opportunities. For-profit companies can “collaborate” with academic groups and use the model for free in exchange for other forms of compensation. Similarly, “academics” in the process of creating startups from their academic work are incentivized to maintain their institutional affiliations for as long as possible. Both of these loopholes deprive model creators of the chance to capture the value they’re creating, a problem which plagued AlphaProteinStructure-1.
AlphaProteinStructure-2 solves this by explicitly specifying that the model is free for academic usage, not for academic researchers. Running jobs for companies doesn’t count as academic usage, nor does research in support of a future startup. To use AlphaProteinStructure-2, scientists must explicitly disavow any future commercial applications of their work and pledge to maintain the highest standards of academic purity. Because of the inevitable diffusion of ideas within the university, this has led AlphaProteinStructure-2 to be completely banned by all major US research institutions.
The only academic users of AlphaProteinStructure-2 are a handful of land-grant universities whose tech-transfer offices have been shut down by federal regulators for abuse of the patent systems. To ensure that no future commercialization is possible, all incoming professors, postdocs, and graduate students must symbolically run a single AlphaProteinStructure-2 calculation when they join. It is believed that important breakthroughs in Alzheimer’s research have occurred at one or more of these universities, but no scientific publisher has yet been able to stomach the legal risk needed to publish the results.
Rand-1 is a multimodal spectroscopy model developed by a decentralized anarcho-capitalist research organization. Rand-1 is not licensed for non-commercial use; only for-profit companies are allowed to use Rand-1 (in exchange for a license purchase). Model-hosting companies are allowed to host Rand-1 but cannot allow any academics to use the model through their platform. Researchers at for-profit universities are fine, though.
Evolv-1a is a RNA language model that’s free for benchmarking but requires a paid license agreement for business usage. The somewhat muddy line between “benchmarking” and “business usage” is enforced by vigorous litigation. Most companies have minimized legal risk by using a single model system for benchmarking and explicitly guaranteeing that they will never use this model system for any commercial application.
For sociological reasons, tRNA has become the go-to standard for assessing Evolv-1a and its competitors, with virtually every company using tRNA-based model systems as internal benchmarks. This consensus seemed quite safe until a family of tRNA structural mutations was implicated in treatment-resistant depression. 29 of the top 30 pharmaceutical companies had used tRNA as a RNA-language-model benchmark, leaving Takeda free to pursue this target virtually without opposition. Efforts by other companies to acquire tRNA assets from startups were blocked by litigation, while Takeda’s drug is expected to enter Phase 3 later this year.
In future, it is expected that all RNA-language-model benchmarking will occur through shell corporations to mitigate risks of this sort.
DCD-2 is a pocket-conditioned generative model for macrocycles. DCD-2 is completely free to use: simply upload a protein structure, specify the pocket, and DCD-2 will output the predicted binding affinity (with uncertainty estimates) and the structure of the macrocycle in .xsdf format. Unfortunately, .xsdf is a proprietary file format, and decoding the structure back to regular .sdf format requires a special package with a $100K/year license.
PLM-3 is a protein language model that’s free for commercial entities as long as the usage isn’t directly beneficial to the business. The phrase “directly beneficial” is not clearly defined in the license agreement, though, leading to grey areas:
The company behind PLM-3 has been hiring large numbers of metaphysicists, suggesting that they plan to pursue aggressive litigation in this space.
Telos-1 is a Boltzmann generator for biopolymers. Telos-1 is free for any usage where the ultimate purpose is charitable—so research towards the public good is permitted, but research that’s intended to make money is banned. This worked as intended until Novo Nordisk sued, arguing that since they’re owned by the non-profit Novo Nordisk Foundation, the ultimate purpose of all their research is charitable. The lawsuit is ongoing.
NP-2 is a neural network potential that can only be used by non-profits. Originally, this was construed as only organizations possessing 501(c)(3) non-profit status—but after heartfelt appeals from small biotech startups, the company behind NP-1 agreed that companies that were losing money could also qualify as non-profits, since they weren’t actually making any profit.
This led to a predictable wave of financial engineering, and most pharmaceuticals started outsourcing all calculations to shell corporations. These corporations must be “losing money” each quarter, but this simply refers to the operating cash flow. So the shell corporation can simply be spun out with an initial capital outlay of ten million dollars or so, and then calculations can be sold below cost to the parent company until the money runs out.
These companies were originally intended to be disposable, but it turns out that the business model of “sell ML inference to pharma below cost” was very appealing to venture capitalists. Negative unit margins are commonplace in AI right now, and unlike other AI-for-drug-design startups, the shell corporations actually had meaningful enterprise traction. The largest of these companies, EvolvAI (formerly “Merck Sharp Dolme Informatics Solutions 1”) just closed a $200M Series D financing round despite no conceivable path to profitability.