As a veteran of the life sciences industry with over two decades of experience navigating the complex corridors of the NIH and top-tier consultancies like Monitor Deloitte, I have witnessed firsthand the cyclical nature of biotech hype. The current fervor surrounding artificial intelligence in drug discovery feels different—more urgent, more capitalized, and more fraught with high-stakes technical nuances. We are standing at a crossroads where the initial wave of Silicon Valley optimism is meeting the cold, hard reality of clinical trials, and the next twenty-four months will likely define the sector for a generation.
This conversation explores the critical distinction between biological “appearance” and biological “mechanism,” examining why some AI models have successfully cracked the code of protein folding while others struggle to outperform basic arithmetic in clinical settings. We delve into the three distinct camps of AI drug discovery—phenotypic, surrogate, and mechanistic—and analyze why the quality of training data, rather than the sheer size of the model, is the ultimate arbiter of success. Furthermore, we address the pressing need to move beyond simply accelerating the rate of failure to actually increasing the probability of a drug reaching the patient.
Over sixty billion dollars has flowed into AI drug discovery recently, yet zero AI-originated drugs have been approved by the FDA. Should this lack of clinical success be a cause for alarm for investors and scientists?
The headline of zero approvals can feel like a heavy weight, especially when you consider that roughly $60 billion has been funneled into this space since 2019. However, we have to look at the timeline of drug development, which traditionally moves at a glacial pace; having about 175 AI-originated programs currently in human trials is actually a massive surge in volume. While the zero is sobering, it is not entirely surprising given that the most advanced candidates are only just now reaching the critical mid-to-late-stage readout phase. We are essentially approaching the final turn of a very long race, where the next two years will provide the real verdict. The anxiety in the industry isn’t about the current lack of an approved drug, but rather the fear that we might just be getting better at failing faster rather than succeeding more often.
DeepMind’s AlphaFold recently won a Nobel Prize for solving the protein-folding problem. Why did this specific application of AI work so well when so many other drug discovery models are struggling?
AlphaFold was a triumphant moment for the industry because it attacked a problem where the training data perfectly matched the question being asked. It treated biology like a language, utilizing a corpus of more than 250 million sequences and a finite alphabet of twenty amino acids to uncover a hidden grammar. This statistical fingerprint—where positions in a sequence that touch during folding tend to mutate together over evolutionary time—was sitting right there in the data, waiting for an architecture like a Large Language Model to find it. When the mechanism you are trying to recover is natively encoded in the training set, the AI can perform wonders. The struggle for the rest of the field is that they are often training on “noisy” data like patient records or published literature, where the true causal mechanism of a disease is hidden behind layers of secondary symptoms or observation bias.
We’ve seen some high-profile setbacks with companies like BenevolentAI and Recursion. What does the failure of their lead candidates tell us about the limitations of using phenotypic data for training models?
The story of BenevolentAI is a cautionary tale about the difference between a molecule working as designed and a molecule actually helping a patient. They used a massive knowledge graph of scientific literature to identify a COVID-19 treatment successfully, which led to a public listing and a valuation of roughly £1.3 billion, but their lead eczema drug failed because the biological target itself didn’t translate to a clinical benefit. Similarly, Recursion’s REC-994 met its safety goals but failed to show sustained improvement, illustrating that even if an AI can analyze perturbed cell images with high fidelity, it doesn’t mean it has captured the underlying disease mechanism. In fact, independent benchmarking in 2025 showed that some of these massive cell-based foundation models with a hundred million parameters were no better at predicting new responses than a trivial baseline of simple arithmetic. If the causal link isn’t in the data, the model is essentially just guessing based on how things look, not how they actually work.
There is a second group of companies focusing on surrogate data to close the loop between design and testing. How does this approach change the timeline of discovery, and where does it fall short?
The surrogate data approach, championed by firms like Exscientia, has undeniably mastered the “velocity” aspect of the equation. We saw them move the drug DSP-1181 into trials in just 12 months, which is a staggering improvement over the traditional five-year timeline for medicinal chemistry. This “test-learn” loop creates its own labels, effectively teaching the model whether a molecule will bind to a specific target with incredible efficiency. The fatal flaw, however, is that “binding to a target” is not the same thing as “curing a disease.” As we saw when that specific molecule was discontinued in 2021, the AI was brilliant at the chemistry but perhaps less certain about the biology of the target itself. You can have the fastest car in the world, but if you’re driving in the wrong direction because the target selection was flawed, you’ll just reach the dead end sooner.
You’ve mentioned a third group focusing on mechanistic data, particularly in antibody design. What makes this approach feel like a more stable bet compared to the others?
The third group, which includes Generate Biomedicines and Isomorphic Labs, is essentially trying to replicate the AlphaFold “magic” by focusing on data that is inherently sequence-encoded. Antibody design is particularly exciting because it sits on a nearly unlimited supply of natural and engineered repertoires that act very much like a language. We are already seeing clinical signals here; for example, Generate Biomedicines has an anti-TSLP antibody, GB-0895, entering Phase 3 trials for severe asthma, and another COVID-neutralizing antibody that successfully hit a target previously thought to be undruggable. Because these models are built on the actual structural language of the proteins they are trying to influence, they start from a much more sound biological foundation. They aren’t just looking at images of sick cells; they are writing the actual instructions for the “keys” that unlock or lock biological “doors.”
Many people talk about “Eroom’s Law,” where drug discovery becomes more expensive and slower over time. Can AI actually reverse this trend, or is it just making the process more efficient at the wrong stages?
This is the billion-dollar question for the industry. Currently, AI-designed drugs are clearing Phase 1 safety gates at a rate that is better than the industry standard, but they are hitting a wall in later stages where efficacy is tested. If AI only speeds up the early discovery phase, it might actually accelerate Eroom’s Law by funneling more “fast failures” into the most expensive part of the process—the mid-to-late-stage clinical trials. To truly reverse the trend, AI must do more than just increase the velocity of discovery; it must fundamentally increase the probability of success. We don’t just need more candidates; we need better-validated targets so that the massive investments we make in Phase 2 and Phase 3 aren’t wasted on molecules that were destined to fail from the start.
When an investor or a scientist looks at a new AI drug discovery platform, what is the single most important question they should ask to see through the marketing hype?
The instinct is always to look at the “shiny” objects: the size of the neural network, the compute power involved, or the impressive benchmarks in a controlled demo. However, the most vital question you can ask is whether the training data underneath that model actually encodes the specific causal link you are asking the model to predict. If a company claims their tool can predict patient responses, you need to verify if the data holds the actual mechanism of the treatment-response relationship or if it simply records a list of who received which drug. As we have seen, if the mechanism isn’t in the data, no amount of algorithmic sophistication or parameter scaling can recover it. You cannot squeeze blood from a stone, and you cannot extract biological truth from data that only captures the appearance of disease.
What is your forecast for the AI drug discovery landscape over the next twenty-four months?
I believe the next two years will serve as a definitive sorting event that separates the “data tourists” from the true innovators. We are going to see a series of high-stakes trial readouts that will be interpreted by the public as a verdict on AI in medicine, but savvy observers will see it as a verdict on data quality. Where the training data successfully carried the underlying biological mechanism—as we see with antibody design and protein-language models—I expect to see robust clinical success and the first wave of FDA approvals. Conversely, the companies relying on phenotypic appearance or noisy literature graphs will likely face a reckoning, leading to a consolidation in the market where the industry shifts its focus away from “all-purpose” AI toward specialized models built on high-fidelity, mechanistic data. Ultimately, we will stop talking about “AI drugs” as a monolithic category and start valuing them based on the integrity of the biological insights that birthed them.
