Allosteric Site Discovery with Graph Neural Networks: Lessons from 40 Targets

Most proteins have more than one place you can bind a small molecule — and the orthosteric site, the canonical active site everyone docks into, is frequently the wrong choice. Competitive inhibitors at the active site face endogenous substrate concentrations in the hundreds of micromolar. An allosteric modulator binding 20 Å away can achieve the same functional effect at orders-of-magnitude lower occupancy, with better selectivity and often with more tractable physicochemical properties. The problem has always been finding those distal sites computationally before committing to crystallography. Over the past year we've run GNN-based allosteric detection across 40 targets in our pipeline. Here is an honest accounting of what the approach delivers.

Why Allosteric Sites Resist Classical Pocket Detection

Traditional pocket-finding algorithms — fpocket, SiteMap, DoGSiteScorer — detect geometric depressions in a static surface. They are excellent at finding orthosteric cavities because those sites are evolutionarily optimized to cradle substrates and therefore have well-defined geometry. Allosteric sites are different. Many exist only in specific conformational states, appearing and disappearing as the protein breathes through its dynamic ensemble. Others are cryptic: the surface looks flat in any single crystal structure but a pocket transiently opens during molecular dynamics. A geometry-based detector operating on a single PDB structure will simply miss these. Graph neural networks change the framing by encoding the protein not as a surface but as a communication network. Every residue is a node; edges represent physical contacts and coevolutionary couplings. The model learns which communication pathways, when perturbed, propagate signal to the active site — and the sites where that perturbation is energetically accessible become candidates.

Our GNN Architecture and Training Data

We built on a message-passing architecture where node features encode residue identity, secondary structure assignment, solvent accessibility, and B-factor as a proxy for local flexibility. Edge features encode Cα distance, contact type (backbone-backbone, backbone-sidechain, sidechain-sidechain), and mutual information derived from a multiple sequence alignment of the target family. The model was trained on a curated set of proteins with experimentally validated allosteric sites — structures where both the allosteric pocket and the functional coupling to the active site have been confirmed by mutagenesis or isothermal titration calorimetry. We held out entire protein families during training to avoid leakage. The training set is not large by deep learning standards. Experimentally validated allosteric sites are sparse in public databases, and many entries are annotated at low confidence. We supplemented with negative examples from proteins where extensive crystallography has found no allosteric pockets, accepting that some of those negatives are likely false. This is an ongoing limitation.

What the Model Finds Reliably

Across our 40-target survey, the GNN performed best on kinases, nuclear receptors, and GPCRs — protein families with both high structural coverage in public databases and a reasonable experimental allosteric literature to train against. For kinases specifically, the model consistently surfaces the allosteric back-pocket adjacent to the DFG motif, the αC-helix region associated with Type II inhibitor binding, and the myristoyl-binding groove in Abl-family kinases. These are not surprising results — they are sites with established medicinal chemistry — but the model's ability to rank them above false-positive surface pockets with a precision exceeding 70% at the top-5 prediction is practically useful. For nuclear receptors, the coactivator-binding surface comes up reliably as an allosteric candidate, which aligns with the extensive AF2H peptide competition literature. GPCRs present a more interesting case: the intracellular allosteric pocket accessible to nanobody-class modulators is consistently detected, but the extracellular vestibule sites — which are conformationally dynamic and appear only in specific activation states — are less reliably found in single-structure mode. Running the detector over an MD ensemble of 200 snapshots improved recall on those sites substantially.

Where It Fails and Why

The model struggles with intrinsically disordered proteins, which is predictable given the architecture's reliance on defined contact geometry. It also underperforms on targets where the allosteric site is formed at a protein-protein interface rather than within a single chain — the cooperative pocket that forms when a regulatory subunit docks does not appear in any monomer structure, and the GNN has no way to infer it without the complex structure. We observed systematic false positives at crystal contacts for high-resolution structures, where neighboring symmetry partners create artifactual pockets that score highly because they have favorable geometry and are surface-exposed. Filtering by conservation and evolutionary pressure removes most of these, but not all. Finally, there is a calibration issue: the model's confidence scores are not well-calibrated as probabilities. A score of 0.8 does not mean 80% chance of an experimentally valid site. We treat the output as a ranking, not a probability, and apply a fixed threshold determined by precision-recall analysis on the held-out validation set.

Integration with the Downstream Campaign

For a predicted allosteric site to enter the docking funnel, it has to pass three filters beyond the GNN score. First, it must achieve a minimum druggability score on SiteMap — volume, hydrophobicity, enclosure — because GNN-detected communication hubs are not always geometrically tractable for small molecules. Second, at least two independent MD snapshots must show the pocket open simultaneously, filtering out transient artifacts. Third, we require a positive conservation signal at the site: if the residues lining the predicted pocket are poorly conserved across the target family, it suggests the pocket is a structural accident rather than a functional communication node. Roughly 40% of GNN-predicted allosteric candidates survive all three filters. That subset proceeds to docking and, if hits emerge, to crystallographic confirmation. So far we have gotten crystallographic verification on 6 of 11 predicted allosteric sites we have taken to structure. The 5 misses were all cases where the MD ensemble showed the pocket opening in fewer than 15% of frames — a threshold we are now applying prospectively.

What We Still Need Crystallography For

The GNN tells us where to look. It does not tell us the binding mode, the pharmacophore geometry, or the conformational change the protein undergoes when a ligand occupies the site. All of those require structural data, and we have not found a computational substitute. What we have changed is the order of operations: instead of unbiased crystallographic fragment screening across the entire surface — expensive and low-throughput — we focus fragment soaks on the two or three GNN-predicted candidates. That directs crystallography effort rather than replacing it. For a 40-target program, replacing unbiased fragment screens with focused soaks against GNN-predicted sites has meaningfully reduced crystallography cost. More importantly, it surfaces allosteric candidates early enough in the campaign to influence the scaffold selection decision, rather than arriving after lead optimization has already committed to an orthosteric chemotype. That timing shift is where most of the value lies.