Beyond Virtual Screening: The Case for Integrated In Silico Campaigns

Virtual screening — docking a library of compounds against a target, ranking by predicted binding affinity, selecting the top scorers for synthesis — is a well-understood workflow. It's also one that loses substantial information by treating binding affinity as the primary filter with everything else as a downstream check. The practical consequence: you synthesize a compound with excellent predicted potency, run your in vitro assay, confirm binding, and then discover it fails the ADMET profile you ran three weeks later. You go back to the library. The next compound has a selectivity issue. More wet-lab time. The cycle repeats.

This isn't a hypothetical sequence of bad luck. It's the standard experience when binding, ADMET, and off-target predictions are run as separate sequential tools with manual interpretation between each stage. Each tool was built independently, trained on different data distributions, and returns scores in different units with different calibration baselines. Integrating their outputs is a human judgment task every time, which means information is filtered through individual mental models rather than optimized simultaneously.

The alternative — treating binding, ADMET, and selectivity as simultaneous constraints on a shared molecular representation — changes what gets selected before any synthesis occurs. This post describes what that integration actually looks like technically, where the information gains come from, and where the architecture still has gaps.

The Information Loss in Sequential Pipelines

Consider what happens computationally when you run a standard sequential workflow. Your docking model encodes the molecule as a 3D conformer placed in a binding site, evaluated by a scoring function that captures van der Waals contacts, hydrogen bonds, and electrostatic complementarity. Your ADMET model encodes the same molecule as a 2D SMILES string or Morgan fingerprint, trained on experimental assay data for solubility, permeability, metabolic stability, CYP inhibition, plasma protein binding. Your off-target screen produces another set of docking scores with a different scoring function calibrated on a different ligand set.

Each model sees a different representation of the molecule. The 3D binding conformation is invisible to the ADMET model. The ADMET endpoints are invisible to the docking model. The implicit hydrogen bonding pattern that makes a compound a strong Pgp substrate — which will kill CNS exposure — is nowhere in the binding affinity calculation. You're not actually predicting "how will this compound perform as a drug candidate." You're predicting three separate properties with three separate representations and then manually asking whether the results are jointly acceptable.

When we built shared molecular representation into our pipeline, the hypothesis was simple: if the same learned embedding encodes the molecule for binding affinity, ADMET, and selectivity simultaneously, the scoring function implicitly learns correlations between these properties. A molecular feature that improves on-target binding but correlates with metabolic lability in training data should show up as a penalty in the integrated score — without needing a separate ADMET model to explicitly penalize it. The optimization has more information available.

Architecture of the Integrated Model

The core of our integrated scoring model is a graph neural network that encodes each molecule as a graph of atoms and bonds, with 3D coordinates incorporated through distance and angle features derived from the docked conformation. The same embedding feeds three prediction heads: an on-target binding affinity head trained on ChEMBL IC50 data for the relevant target class, an ADMET multi-task head trained on assay data spanning 14 endpoints (detailed in our earlier ADMET post), and a selectivity head trained on cross-target IC50 ratios from public kinome and GPCRome profiling datasets.

Crucially, the encoder is jointly trained across all three heads simultaneously, not pretrained for one task and fine-tuned for others. The gradient signal from each head propagates back through the shared encoder, forcing the embedding to capture molecular features relevant to all three predictions. What this means in practice: the embedding learns that certain aromatic systems are simultaneously good for pi-stacking in binding sites and poor for microsomal stability. It learns that extended hydrophobic chains improve binding but predict broad kinome promiscuity. These correlations don't need to be explicitly programmed — they emerge from joint training over the full compound-assay dataset.

We're not claiming this makes the predictions perfect. The training data quality problem doesn't go away — ChEMBL IC50 data is noisy, assay protocols vary, and coverage is uneven across chemical space. What we're claiming is that the integrated model makes fewer systematic errors of the form "excellent binding prediction but completely predictable ADMET failure that a human would have caught." Those errors are expensive and they were common.

What Changes in Practice: A Concrete Example

In a program targeting a bromodomain — BRD4, acetylation reader implicated in transcription dysregulation in multiple tumor types — we ran both the standard sequential approach and the integrated model on the same 15,000-compound enumerated library, then compared the top-200 outputs from each before doing any synthesis.

The sequential approach (docking rank, then ADMET filter, then selectivity check) returned 200 compounds with excellent predicted BRD4 affinity. Of these, 67 were filtered by the ADMET check (primarily CYP3A4 inhibition and low microsomal stability), leaving 133. Of those, 41 had selectivity index concerns against BRD2 and BRD3, leaving 92 candidates.

The integrated model, scoring all three objectives simultaneously, returned 200 compounds from a different region of chemical space — lower average on-target binding affinity by GNN score (mean −10.1 kcal/mol vs −10.9 kcal/mol for the sequential top-200), but with 94% passing the ADMET thresholds applied after the fact and 87% showing selectivity index above 30-fold against BRD2/BRD3. The integrated model traded raw binding score for compound-level viability.

We synthesized 20 compounds from each list. The sequential-top-20 gave 7 confirmed binders in biochemical assay below 100 nM, 2 with acceptable ADMET in follow-on assays. The integrated-top-20 gave 5 confirmed binders below 100 nM, 4 with acceptable ADMET. Smaller synthesis batch, preliminary data — we're not overgeneralizing. But the pattern is directionally consistent with the model's design intent.

Where Simultaneous Optimization Actually Helps Most

The integration benefit isn't uniform across all program types. It's most pronounced in three specific situations.

CNS programs are the clearest case. Blood-brain barrier penetration, Pgp efflux, and CYP-mediated first-pass metabolism all have molecular feature signatures that are also relevant to binding affinity for typical CNS targets. The integrated model sees these as correlated constraints from the start rather than sequential filters applied after the binding problem is solved. CNS attrition rates from ADMET failure are disproportionately high; the joint optimization gain is disproportionately large.

Kinase programs with selectivity requirements are the second high-value case. If you need greater than 100-fold selectivity against the broader kinome, binding-first optimization will explore kinome-promiscuous chemical space that the selectivity head simply deprioritizes in integrated scoring. You end up in a different region of scaffold space earlier.

The third case is any program where the physicochemical space you're targeting is inherently constrained — for example, if the target requires low molecular weight for CNS penetration but also a specific H-bond donor count for binding site contacts. Sequential tools can't negotiate these simultaneous constraints in one pass; the integrated model does.

What Integration Does Not Fix

It's worth being direct about what the integrated architecture doesn't solve, because this class of tool tends to accumulate inflated claims.

Training data coverage remains the dominant limitation. For targets where we have few hundred IC50 data points in ChEMBL and the program's scaffolds are structurally distinct from the training set, the on-target binding prediction is uncertain regardless of whether it's in an integrated model or a standalone scorer. Integration doesn't improve out-of-distribution generalization for any individual head — it improves information sharing between heads on in-distribution molecules. Novel scaffold classes see limited benefit on the binding side until we accumulate in-house assay data to fine-tune.

Covalent mechanisms require separate treatment. The GNN encoder handles non-covalent binding well; covalent docking requires explicit reaction coordinate modeling that doesn't fit the current architecture. For programs with covalent warheads, we run separate covalent docking alongside the integrated model and reconcile manually.

And as with any in silico result: the integrated model outputs probabilities of wet-lab success, not certainties. We're not suggesting it replaces experimental validation. We're saying it makes the synthesis list better before the first compound is made — which is where early-stage programs have the most to gain from better computational decision-making.

The Information Loss in Sequential Pipelines

Architecture of the Integrated Model

What Changes in Practice: A Concrete Example

Where Simultaneous Optimization Actually Helps Most

What Integration Does Not Fix

More from Research Notes