Methods

Predicting what chemists used to have to synthesize.

Our computational stack integrates structure-based docking, QSAR ensemble modeling, and off-target proteome screening into one ranked output.

Architecture

From PDB structure to ranked dossier: seven stages, one data pipeline.

PDB Structure mmCIF / homology model

Conformer Gen. RDKit ETKDG + tautomers

Physics Docking AutoDock-Vina variant

GNN Scorer ΔG + uncertainty

ADMET Ensemble 48 endpoints, calibrated

Off-Target Screen 2,300 structures

Ranked Dossier Top 20 + rationale

Binding Affinity

Binding affinity from graph-level protein-ligand representation

We represent each protein-ligand complex as a heterogeneous graph: atoms as nodes, bonds and inter-atomic contacts as edges. A message-passing GNN propagates atom-level features through 8 layers, pooling to a complex-level ΔG estimate. Trained on 14.3M complexes from PDB + ChEMBL + proprietary assay sets. Pearson r = 0.91 on held-out PDB benchmark (vs. 0.76 for standard AutoDock Vina).

Pearson r = 0.91 on PDBbind v2020 held-out set

ADMET

48 ADMET endpoints, not just logP and solubility

Most computational ADMET tools cover 8-12 endpoints. We run 48. The additional endpoints — including reactive metabolite flags, time-dependent CYP inhibition, mitochondrial toxicity, and phototoxicity — are routinely missed early and surface as Phase I surprises. Ensemble of 6 model architectures (RF, XGBoost, MPNN, AttentiveFP, GCN, DNN) with calibrated uncertainty estimates.

Endpoint	Category
CYP3A4 inhibition	Metabolic
CYP2D6 inhibition	Metabolic
hERG block	Cardiotoxicity
PAMPA permeability	Absorption
Pgp substrate	Transport
BBB penetration	Distribution
Plasma protein binding	Distribution
Aqueous solubility (kinetic)	Physical
Reactive metabolite flag	Metabolic
Hepatotoxicity flag	Toxicity
Phototoxicity flag	Toxicity
Mitochondrial toxicity	Toxicity

Off-Target

2,300 off-target structures. Screened before synthesis.

Selectivity failures cause 22% of Phase II discontinuations. We run pan-proteome docking against 2,300 non-redundant protein structures covering all major target classes: kinases (519), GPCRs (387), nuclear receptors (48), ion channels (156), proteases (234), and epigenetic regulators (211). Selectivity index (SI) reported per candidate as the ratio of off-target docking score to primary target score. This screen does not replace experimental selectivity assays — it prioritizes which off-targets to test first, reducing the number of counterscreens from hundreds to a targeted panel of 8–12.

Validation

Our models are validated against experimental data, not internal benchmarks.

~91% binding affinity correlation vs. ITC / SPR crystallographic assays

< 8% false-negative rate on ADMET flags across 12-assay external panel

5 candidates advanced to IND-enabling studies with zero early ADMET failures

Validation is prospective: we score candidates, then compare against experimental results from our collaborating wet labs. We do not back-fill training data with our own predictions.

Request the methods brief.

We share a technical methods document — model architecture details, training data provenance, validation protocols, and calibration certificates — with research collaborators. Write to us with your institutional email and the target class you're working on.

Request Methods Brief See the Pipeline