Computational Hit-to-Lead: How We Do SAR Without Synthesizing Anything

Traditional hit-to-lead chemistry has a rhythm: you identify a hit series in a biochemical assay, a medicinal chemist proposes 10-20 analogs exploring key SAR positions, synthesis takes 2-4 weeks per round, and you iterate. Typically 3-5 rounds to get from a 10 µM hit to a 100 nM lead with acceptable physicochemical properties. Six to twelve months of synthesis time in a decent-moving program; often more.

The computational hit-to-lead cycle we run today looks different. After hit identification — which may itself be entirely in silico, or may come from a small wet-lab HTS — we spend 2-3 weeks running what amounts to a SAR campaign in silico: generating scaffold variants systematically, scoring each against binding affinity and ADMET models, using that gradient to guide the next generation of structures. The first compound we synthesize is not the starting hit — it's the end of 50+ computational SAR iterations, and by prediction it should already be a lead-quality molecule.

This post is about what that computational SAR process actually involves technically, where the limits of "SAR without synthesis" lie, and how we decide when to stop computing and start making compounds.

The Scaffold Variant Generation Problem

SAR optimization is a navigation problem in chemical space. From a starting scaffold, you want to identify the modifications — R-group substitutions, ring replacements, linker changes, stereochemical variants — that improve one or more properties without degrading others. The space of single-step modifications from any drug-like molecule is on the order of 10⁶ to 10⁹ enumerable variants, depending on how liberal you are with reaction transforms.

Exhaustive enumeration and scoring is computationally feasible for the first few modification layers. But exhaustive search doesn't work as a sustained strategy — the combinatorial explosion is real, and more importantly, exhaustive enumeration doesn't distinguish useful structural variation from noise. You can enumerate 10 million compounds from a hit, but 95% of them will be in a similar region of the potency landscape and won't advance understanding of the SAR.

What we actually want is directed exploration: generate modifications that probe different structural hypotheses (this H-bond donor is necessary; this aromatic ring can be replaced by a heterocycle; the stereocenters are load-bearing) while simultaneously improving the multi-objective score across binding, ADMET, and selectivity. This is a reinforcement learning framing, and we treat it as one: the model proposes a modification, receives a multi-objective scoring signal, and uses that signal to guide the next proposal.

The Generation Model

Our scaffold variant generator is built on a transformer-based molecular generation architecture fine-tuned for fragment-constrained design. The key constraint: we're not doing de novo generation from scratch. We're doing constrained generation around a fixed scaffold — the core ring system of the hit compound is preserved, and the generator explores substituent space and linker geometry. This produces compounds that are structurally recognizable variants of the hit, not entirely novel chemical matter, which matters for IP positioning and for experimental tractability downstream.

Concretely: given a hit compound encoded as SMILES with designated variable positions marked (R1, R2, etc., defined by the medicinal chemist based on binding site topology and synthetic accessibility), the generator proposes substituents at each variable position. Each generated variant is immediately scored by the integrated binding-ADMET-selectivity model. Variants that improve the multi-objective score propagate to the next generation — either by seeding the next round of generation directly, or by informing the generator's learned preference weights for which structural changes in this chemical series tend to correlate with score improvement.

We run 30-50 generations, each producing 200-500 variants, before reviewing the output. The total in silico SAR exploration covers roughly 10,000-20,000 distinct compounds per program. Runtime is 18-24 hours on available GPU, usually running overnight after the hit compound is defined.

Reading the SAR Landscape Computationally

One of the things that surprised us when we started running this process systematically was how much structural information emerges from the predicted SAR landscape even before wet-lab confirmation. By plotting the multi-objective score distribution across the variant space, you can read the SAR even without experimental data.

Sharp potency cliffs around specific positions indicate load-bearing interactions. If replacing a hydrogen bond donor with a methyl group drops the predicted binding affinity by more than 2 kcal/mol across 80% of variants that try it, that's strong evidence the H-bond donor contacts a key residue and should be treated as fixed. When the scoring model is calibrated and the target is in its training distribution, these predicted cliffs typically correlate with experimentally measured activity cliffs in the eventual wet-lab SAR.

Flat SAR at a position — small predicted affinity change across a wide range of substitution — indicates a solvent-exposed position or a region of the pocket with no specific contacts. This is your medicinal chemistry optimization space: you can tune physicochemical properties (logP, solubility, metabolic stability) at these positions without sacrificing potency. The ADMET head of the model simultaneously predicts which substitutions at flat-SAR positions improve the ADMET profile.

In practice, after a 50-generation run, we produce a SAR summary with the following outputs: predicted binding affinity distribution across all scored variants (mean, standard deviation, top-decile), predicted ADMET pass rate across variants, identified fixed versus variable positions based on sensitivity analysis, and a ranked list of top-50 candidates for synthesis consideration.

Convergence Criteria and When We Stop

We stop the computational SAR when one of three criteria is met: the top-decile multi-objective score plateaus for 10+ generations without improvement above a threshold improvement per generation of 0.05 kcal/mol; the top candidates all share a common scaffold (indicating the generator has converged on a local optimum in chemical space); or we've found at least 5 candidates meeting all lead criteria thresholds simultaneously (predicted IC50 below 100 nM, predicted ADMET pass across 12/14 endpoints, selectivity index above 30-fold against defined counterscreens).

When these criteria aren't met after 50 generations, it usually means one of two things: the starting hit has a fundamental scaffold problem (ring system that cannot simultaneously satisfy binding and ADMET requirements), or the model's training data is too sparse in this region of chemical space to give a reliable gradient. The second case is harder to diagnose computationally and is the main reason we insist on at least one experimental round before extensive in silico SAR on completely novel scaffold classes.

The Synthesis Decision

At the end of the computational SAR, we select 8-15 compounds for synthesis. The selection criteria are: coverage of diverse scaffolds within the top-decile predictions (we don't want 15 nearly-identical analogs — they won't teach us anything new); synthetic accessibility score above a threshold (calculated using our RA score model and reviewed by a medicinal chemist); and deliberate inclusion of 2-3 compounds that probe key SAR hypotheses with wet-lab validation potential — compounds specifically chosen to test whether the predicted cliffs and flat positions are real.

The last criterion is important and worth explaining. The computational SAR is not ground truth. It's a prediction. We synthesize compounds that will either validate or refute the predicted SAR structure, not just compounds that score highest. If the wet-lab results confirm the predicted SAR — even partially — we have calibrated information about how much to trust the computational gradient for this specific scaffold class against this target. If the predicted and measured SAR diverge significantly, we know we're in an out-of-distribution regime and need to adjust.

We're not claiming that computational SAR eliminates the need for experimental SAR. We're saying it front-loads the structural understanding so that the first synthesis round is maximally informative rather than exploratory. The question isn't "which compound should I make" — it's "which compound, if synthesized, will teach me the most about whether the computational model is reliable in this chemical series." That's a different, and in our view more productive, framing of what the first synthesis round is for.

Where This Breaks Down

Flat binding landscapes with shallow SAR are harder for the model to navigate. If a target's binding site is promiscuous — accepting a wide range of scaffolds with similar predicted affinity — the generator doesn't get a strong signal from the binding head and tends to optimize primarily on ADMET, which sometimes produces candidates that are overly Lipinski-compliant at the expense of any interesting chemical matter. This tends to occur on targets where the co-crystal structure is missing or uncertain, leaving the binding prediction underconfident.

Macrocycles and natural product-like scaffolds are outside the model's reliable range. The fragment-constrained generator handles drug-like chemical space well up to about 550 Da with standard ring systems; beyond that, ring closure chemistry, macrocycle conformational sampling, and the limited training data in that MW range combine to make predictions unreliable. For programs that land in macrocycle territory, we treat computational SAR as advisory and rely more heavily on parallel synthesis.