Running Target-to-Lead Entirely In Silico: What It Actually Takes

When we started building the Manas AI platform in 2023, "fully in silico target-to-lead" was a phrase that required qualification — a proviso about what "fully" meant, followed by an explanation of which wet lab steps you'd still need before calling the result a lead. By late 2024, the qualification is smaller, the workflow is more mature, and we've completed enough programs to describe what the computational stack actually looks like, which steps still carry the most uncertainty, and where the remaining irreducible wet lab dependencies lie.

This is a description of practice, not a capabilities claim. We're describing the workflow as it exists at this stage of our programs — not projecting forward to where it could be with additional development, and not glossing over the cases where the in silico prediction failed to anticipate experimental reality.

Stage 1: Target Characterization and Binding Site Definition

Before any docking or scoring can occur, the target structure needs to be prepared to a quality standard that supports reliable pose prediction. In practice this means:

Structure source assessment: For targets with experimental crystal structures in the PDB at resolution ≤ 2.5 Å, we use the experimental structure as the primary docking receptor, with side-chain optimization via PRIME (Schrodinger) or equivalent. For targets where only AlphaFold2 structures are available, we treat the structure as reliable for core secondary structure but flag loop regions with pLDDT < 70 as high-uncertainty zones where docked poses will be unreliable.
Binding site identification: For targets with a known binding mode for any compound class, we use the co-crystallized ligand position to define the grid. For targets with no known small-molecule binding mode, we use SiteMap or FPocket to identify druggable cavities, rank them by Dscore, and run preliminary docking on the top two candidates to assess whether either produces geometrically sensible poses for fragment-sized probes (MW < 250, 2-5 rotatable bonds).
Water network curation: Structural water molecules in the binding site are individually assessed for displacement probability using WaterMap or an equivalent explicit solvation calculation. Waters predicted to be displaceably favorable are retained in the docking receptor; waters that are displaceably unfavorable are removed; waters in bridging positions between protein and putative ligand are treated as conditional — modeled as an additional hydrogen bond donor/acceptor at that position.

The water network step is often skipped in rapid screening protocols because it adds computation time. In our experience it consistently affects which compounds from a diverse virtual library rank highest, particularly in binding sites with one or two structural waters that are partially displaceable. Skipping it means accepting systematic bias in the docking scores for compounds that interact with those water positions.

Stage 2: Library Design and Virtual Screening

For a new target program, we start from three library sources simultaneously:

Commercially available screening compounds — we use a filtered subset of the Enamine REAL database (approximately 7 million compounds selected by Ro5 compliance, SAScore ≤ 4, and PAINS filter) as the primary diversity library. The REAL database's advantage is on-demand synthesis in 2-4 weeks, which means virtual screening hits can proceed directly to synthesis without a separate CRO sourcing step.
Fragment-based virtual screening — a library of ~50,000 fragments (MW 150-300, clogP ≤ 3, ≤ 5 rotatable bonds) screened at lower concentration thresholds, intended to identify fragment binding modes that can be grown into leads using fragment-growing algorithms (FBDD-virtual).
De novo generation — for targets where the binding site has an unusual geometry (non-Ro5 pocket size, extended binding channel, macrocyclic-permissive cavity), we run a generative model conditioned on the binding site shape to propose compounds not present in the screening library. This generates SAScore-filtered candidates that are then docked as primary screening objects.

The virtual screening cascade for the diversity library runs Vina at 5 CPU-hours per thousand compounds (on our current infrastructure, 7M compounds completes in 10-12 days with 8 parallel GPU/CPU nodes), followed by Glide XP rescoring on the top 3% (~210,000 compounds), followed by GNN rescoring on the top 0.5% (~35,000 compounds). The GNN rescoring step takes approximately 6-8 hours for 35,000 compounds, which is the rate-limiting step.

Stage 3: Computational Hit Selection and Clustering

The output of the virtual screening cascade is a ranked list of approximately 500-1,000 compounds per target (after filtering by SAScore, PAINS, and structural diversity). Converting this ranked list into a set of compounds to advance requires three additional steps that are easy to describe but require careful judgment in practice:

Scaffold Clustering and Coverage Selection

Ranked compounds are clustered by Bemis-Murcko scaffold. From each cluster, we select the highest-ranked member as representative, then apply a maximum-diversity selection across cluster representatives to ensure the final selection covers distinct binding modes rather than multiple analogs of a single scaffold. This step explicitly trades absolute rank for scaffold diversity — a compound ranked 150th overall but representing a chemically distinct cluster may be selected over a compound ranked 20th that is the fifth analog of an already-represented scaffold.

ADMET Triage

All compounds in the hit pool are profiled through our 22-endpoint ADMET prediction suite before synthesis decisions are made. In our hit-selection workflow, ADMET filtering operates as a hard filter on flag endpoints (hERG, reactive metabolite alert, CYP TDI flag) and a soft filter on scored endpoints (CYP inhibition IC50 prediction, aqueous solubility, Caco-2 permeability). Compounds flagged on hard filters are excluded regardless of binding score; compounds with poor soft-filter scores are deprioritized but not automatically excluded if the binding evidence is strong and the medicinal chemistry modification to fix the ADMET issue is clear.

Selectivity Pre-screening

For targets where selectivity against a closely related family member is important (e.g., a kinase where selectivity against a related kinase subtype matters), we run a rapid off-target docking at the top 200 hits against the selectivity liabilities before final selection. Compounds with favorable binding scores against the off-target liabilities are flagged; compounds with selectivity ratios below our threshold are deprioritized.

Stage 4: In Silico SAR Iteration

The most conceptually unusual part of our workflow — and the step that most distinguishes it from traditional SBDD — is that SAR iteration begins computationally before any compound is synthesized. Once we have a validated virtual hit scaffold (i.e., a compound that passed all three stages above and represents a chemotype we're committing to explore), we generate and score a library of analogs in silico to define the preliminary SAR landscape before synthesis.

This in silico SAR library is generated by systematic enumeration of substituent variations at positions identified from the docked pose as having space for modification. For a typical hit scaffold with 3-4 modifiable positions, we enumerate 200-500 analogs using RDKit's enumeration utilities, filter by SAScore and commercial availability, and score the full enumerated set through the Glide XP + GNN cascade. The output is a predicted SAR map that identifies which positions tolerate substitution and in which direction.

We're not claiming this predicted SAR is accurate in absolute terms — it isn't. The uncertainty on any individual prediction within a congeneric series is still approximately ±1 log unit for K_d, which is too large to use as a sole basis for synthesis prioritization. What the in silico SAR does is identify the compounds with the highest expected binding affinity within the explored chemical space, and — equally importantly — identify the compounds with the highest model uncertainty, which are the ones where experimental data would most resolve our uncertainty about the scaffold's SAR.

Where the Remaining Hard Problems Are

Having run this workflow on five programs since we started, the steps that most consistently require revisiting after experimental data comes in are:

First, induced-fit for allosteric compounds. Twice we have identified virtual hits that dock favorably into the orthosteric site but whose actual binding mode (determined crystallographically after synthesis) involved an induced conformational change that opened a cryptic pocket adjacent to the primary site. Both compounds were confirmed binders, but their binding mode was different from the predicted one — the SAR deductions from the incorrect binding mode would have been misleading had we not obtained structural data early.

Second, protonation state effects in charged binding sites. On a program targeting an aspartate protease active site with two catalytic aspartates, we systematically mispredicted the relative ranking of a series of amine-containing inhibitors because the active-site protonation state at the pKa of the amine (pKa ~8.5) was handled inconsistently across the series in our preparation pipeline. Resolving this required implementing a per-compound protonation state enumeration at the active site rather than relying on the default Epik pKa predictions.

Third, selectivity predictions at closely related targets. Our off-target selectivity predictions are less reliable than primary binding predictions because the off-target structures are more likely to be AlphaFold-derived or low-resolution crystal structures. When selectivity against a structurally similar off-target is critical for the program, the computational selectivity prediction is a rough guide, not a confident prediction.

The common thread: all three hard problems involve the same underlying limitation — scoring functions and structure preparation methods that perform well on average but degrade in specific structural scenarios that require more physics than the current tools provide. The workflow as it stands is genuinely useful for advancing a program to a defined lead compound before wet lab synthesis. It's not a replacement for structural biology, and we're explicit with collaborators about which predictions carry structural data support and which are purely computational extrapolations.