Skip to main content

Predicting what chemists used to have to synthesize.

Our computational stack integrates structure-based docking, QSAR ensemble modeling, and off-target proteome screening into one ranked output.

From PDB structure to ranked dossier: seven stages, one data pipeline.

Binding affinity from graph-level protein-ligand representation

We represent each protein-ligand complex as a heterogeneous graph: atoms as nodes, bonds and inter-atomic contacts as edges. A message-passing GNN propagates atom-level features through 8 layers, pooling to a complex-level ΔG estimate. Trained on 14.3M complexes from PDB + ChEMBL + proprietary assay sets. Pearson r = 0.91 on held-out PDB benchmark (vs. 0.76 for standard AutoDock Vina).

Pearson r = 0.91 on PDBbind v2020 held-out set

48 ADMET endpoints, not just logP and solubility

Most computational ADMET tools cover 8-12 endpoints. We run 48. The additional endpoints — including reactive metabolite flags, time-dependent CYP inhibition, mitochondrial toxicity, and phototoxicity — are routinely missed early and surface as Phase I surprises. Ensemble of 6 model architectures (RF, XGBoost, MPNN, AttentiveFP, GCN, DNN) with calibrated uncertainty estimates.

Endpoint Category
CYP3A4 inhibitionMetabolic
CYP2D6 inhibitionMetabolic
hERG blockCardiotoxicity
PAMPA permeabilityAbsorption
Pgp substrateTransport
BBB penetrationDistribution
Plasma protein bindingDistribution
Aqueous solubility (kinetic)Physical
Reactive metabolite flagMetabolic
Hepatotoxicity flagToxicity
Phototoxicity flagToxicity
Mitochondrial toxicityToxicity

2,300 off-target structures. Screened before synthesis.

Selectivity failures cause 22% of Phase II discontinuations. We run pan-proteome docking against 2,300 non-redundant protein structures covering all major target classes: kinases (519), GPCRs (387), nuclear receptors (48), ion channels (156), proteases (234), and epigenetic regulators (211). Selectivity index (SI) reported per candidate as the ratio of off-target docking score to primary target score. This screen does not replace experimental selectivity assays — it prioritizes which off-targets to test first, reducing the number of counterscreens from hundreds to a targeted panel of 8–12.

Our models are validated against experimental data, not internal benchmarks.

~91% binding affinity correlation vs. ITC / SPR crystallographic assays
< 8% false-negative rate on ADMET flags across 12-assay external panel
5 candidates advanced to IND-enabling studies with zero early ADMET failures

Validation is prospective: we score candidates, then compare against experimental results from our collaborating wet labs. We do not back-fill training data with our own predictions.

Request the methods brief.

We share a technical methods document — model architecture details, training data provenance, validation protocols, and calibration certificates — with research collaborators. Write to us with your institutional email and the target class you're working on.