Research Notes
Technical writing from the team that builds the models.
Binding affinity, ADMET prediction, off-target screening, and the mechanics of running a target-to-lead campaign in silico — written by the people doing the work, not a content team.
Scaffold Hopping with Generative Models: When It Works and When It Hallucinates
Generative models have changed scaffold hopping throughput and scope — but the hallucination problem is real. Here's what distinguishes proposals that bind from those that only satisfy pharmacophore geometry.
ADMET Species Translation: Why Rat Liver Microsomes Don't Predict Human Clearance
Rat liver microsome data is cheap and fast. It's also an unreliable predictor of human hepatic clearance for a meaningful fraction of drug-like chemical space. What drives the gap and how we model the translation.
Multi-Objective Lead Optimization: Balancing Potency, Selectivity, and Synthesizability
Potency, selectivity, and synthesizability don't point in the same direction. We use Pareto frontier methods to navigate that tension — and the frontier reveals trade-off structure that composite scoring hides.
Allosteric Site Discovery with Graph Neural Networks: Lessons from 40 Targets
We ran GNN-based allosteric site detection across 40 protein targets. Here's what the models find reliably, where they miss, and what we still need crystallography to confirm.
KRAS G12C: Why Covalent Inhibitor Design Rewards Computational Methods
KRAS G12C was considered undruggable until covalent chemistry opened the GDP-binding pocket. The covalent warhead placement problem is exactly where physics-based and ML scoring models diverge most.
Where Computational Drug Discovery Actually Saves Money (and Where It Doesn't)
The $2.6 billion drug development cost figure is often cited to justify every computational tool ever made. An honest accounting of where in silico methods genuinely reduce cost.
What IND-Enabling Studies Look Like When Your Lead Came From a Computer
When a candidate reaches IND-enabling stage having never been synthesized before, the wet-lab package looks different. Here's what preclinical teams should expect.
Computational Hit-to-Lead: How We Do SAR Without Synthesizing Anything
Structure-activity relationship optimization traditionally requires iterative synthesis rounds. We do SAR computationally — generating and scoring scaffold variants until the model converges.
Beyond Virtual Screening: The Case for Integrated In Silico Campaigns
Running binding, ADMET, and off-target as separate tools with manual hand-offs loses information at every step. What changes when you treat them as one integrated model.
Pan-Proteome Off-Target Screening: What 2,300 Structures Reveal
22% of Phase II failures trace to selectivity. We run off-target docking against 2,300 protein structures before synthesis. This is what that screen consistently surfaces — and what it misses.
AlphaFold Changed Our Inputs. It Didn't Change the Scoring Problem.
AlphaFold-predicted structures now routinely feed docking pipelines. The structural coverage problem is largely solved. The scoring function problem — predicting whether a ligand actually binds — is not.
Running Target-to-Lead Entirely In Silico: What It Actually Takes
Three years ago, purely computational target-to-lead was a stretch claim. Today it's a workflow. What the actual computational stack looks like, where the remaining hard problems are.
Physics-Based Scoring vs. Machine Learning: Not a Competition
The debate between force-field docking and GNN scoring misses the point. The real question is where each method fails and how to combine them. We've run both on the same targets.
ADMET Prediction in 2024: Eight Endpoints Are Not Enough
Standard ADMET tools cover CYP inhibition, solubility, and a few permeability endpoints. We surveyed the causes of Phase I failures over 10 years and found the endpoints that keep being missed.
A Practitioner's Guide to Binding Affinity Prediction Models in 2024
AutoDock Vina, Glide, FEP+, GNN-based models — a systematic comparison of what each does well, where each breaks down, and the training data coverage problem that most comparisons miss.
Why 95% of Virtual Screening Hits Fail When They Reach the Wet Lab
The hit rate problem isn't a wet lab problem — it's a scoring function problem. Here's why standard docking scores correlate poorly with experimental binding, and what the data says about fixing it.