KRAS mutant-driven oncology has a canonical undruggable narrative: the GDP/GTP binding pocket has picomolar affinity for its nucleotide substrate, the protein surface is largely featureless for small molecule binding, and decades of drug discovery effort produced nothing that made it past Phase I. The G12C mutation changed that narrative — not by making KRAS easier to drug in general, but by creating a specific druggable handle: a mutant cysteine residue (Cys12) sitting adjacent to the GDP-binding pocket that can be targeted by covalent acrylamide warheads when the protein is in the GDP-bound inactive state.
The approval of sotorasib (AMG 510) and adagrasib (MRTX849) validated the covalent approach. Both work through the same mechanism: an acrylamide warhead forms an irreversible covalent bond with Cys12, locking the protein in the inactive GDP-bound state and preventing GTP exchange. The clinical validation is clear. What's less discussed — and what I want to address here — is why designing optimized covalent inhibitors for KRAS G12C (and the next generation of mutant-specific covalent targets) is exactly where computational methods have their highest return, and where they also face their most technically demanding challenges.
Why Covalent Warhead Placement Is a Hard Computational Problem
Non-covalent inhibitor optimization is, at its core, an affinity optimization problem. You want to maximize binding energy in a defined pocket. The scoring functions — whether physics-based or GNN-based — are trained on equilibrium binding data, and the prediction task is: given this 3D pose, what is the predicted binding affinity?
Covalent inhibitor optimization has a different structure. The final binding affinity of an irreversible covalent inhibitor is largely irrelevant once the covalent bond forms — what matters for efficacy is the rate at which the covalent bond forms (kinact) and the non-covalent binding affinity of the reversible encounter complex (Ki). These are distinct from equilibrium binding affinity and require separate modeling.
The rate of covalent bond formation depends on several simultaneous factors: the intrinsic electrophilicity of the warhead (acrylamide, vinylsulfonamide, cyanoacrylamide, and other electrophilic chemotypes have meaningfully different intrinsic reactivities); the geometric proximity of the warhead to Cys12 in the non-covalent encounter complex; and the trajectory of nucleophilic attack — whether the sulfur of Cys12 is geometrically aligned for Michael addition to the acrylamide beta-carbon in the lowest-energy bound conformation.
Standard non-covalent docking predicts the encounter complex geometry but doesn't model the reaction coordinate. It can tell you whether the warhead is proximate to the target cysteine. It cannot tell you whether the attack trajectory is favorable, whether the transition state is geometrically accessible, or whether the intrinsic warhead reactivity is appropriate for the reaction geometry. Getting any of these wrong produces a compound that either doesn't react with Cys12 at a useful rate, reacts too rapidly with off-target cysteines (causing toxicity), or forms the encounter complex in the wrong pose for efficient covalent bond formation.
Reaction Coordinate Modeling for Warhead Placement
To design covalent warheads with appropriate kinetics, you need to model the reaction coordinate explicitly — not just the ground-state binding pose. Our approach uses QM/MM (quantum mechanics/molecular mechanics) calculations on candidate warhead-Cys12 geometries to estimate the activation energy for Michael addition as a function of warhead geometry and attack trajectory. This is substantially more computationally expensive than non-covalent docking: a QM/MM energy scan across the reaction coordinate for a single compound takes roughly 4-8 hours on a GPU, compared to seconds for a docking pose evaluation.
The practical consequence is that covalent warhead optimization cannot be done by exhaustive enumeration of thousands of candidates. You select a small panel of candidate warhead-scaffold combinations based on non-covalent docking of the encounter complex, then run QM/MM reaction coordinate analysis on the 20-50 most promising candidates to rank them by predicted kinact. This narrows the synthesis list to high-confidence candidates with favorable predicted kinetics before any wet-lab work begins.
In a KRAS G12C program we ran as part of an academic collaboration, we screened 340 covalent candidate designs using non-covalent docking of the encounter complex against the KRAS G12C GDP-bound structure (PDB: 6OIM). The top-50 by non-covalent score went to QM/MM reaction coordinate analysis. Of these, 12 had predicted activation energies below a threshold consistent with kinact values in a useful range based on our calibration data from known covalent inhibitors. We synthesized 8 of those 12. Three showed kinact/Ki values above 1,000 M−1s−1 in biochemical kinetics assays — a hit rate we consider favorable for the complexity of the target and the selectivity requirements against the broader cysteine proteome.
The Switch II Pocket and Induced Fit
The druggable pocket on KRAS G12C is not the GDP-binding site itself — it's the Switch II Pocket (S-IIP), a cryptic allosteric site that opens when GDP is bound and the protein is in the inactive state. The S-IIP is only partially present in the apo crystal structure; it forms fully upon ligand binding through an induced-fit mechanism involving repositioning of the Switch II loop (residues 58-76) and the alpha3 helix.
This induced fit is critical for KRAS G12C drug design because the binding site you're optimizing into is not the site visible in the apo crystal structure. Rigid-receptor docking against the apo structure underestimates the available binding volume and produces compounds that are predicted to clash with residues that move upon ligand binding. The approved inhibitors all engage an S-IIP pocket that is substantially larger than the apo structure suggests — sotorasib binds a pocket volume of approximately 600 Å3, while the apo S-IIP volume is roughly 200 Å3.
We handle this through induced-fit docking using the structure of sotorasib-bound KRAS G12C (PDB: 6OIM) as the template for the open-pocket conformation, with ensemble docking against multiple S-IIP conformations extracted from MD simulation trajectories. The conformational ensemble approach recovers the binding geometry of known inhibitors with lower RMSD than rigid-receptor docking (mean 1.1 Å vs 2.3 Å across a test set of 12 KRAS G12C co-crystal structures) and better predicts the measured activity of novel analogs.
Selectivity Against the Cysteine Proteome
The covalent reactivity that makes KRAS G12C drugs work also creates a selectivity challenge that doesn't exist for non-covalent inhibitors. An electrophilic warhead that reacts with Cys12 can, in principle, react with any accessible cysteine residue in the proteome. The selectivity depends on (a) the non-covalent encounter complex driving substrate specificity before the reaction happens, and (b) the warhead reactivity being appropriately tuned — reactive enough to modify Cys12 at therapeutic doses, not so reactive that it broadly alkylates off-target cysteines.
The current generation of KRAS G12C inhibitors use acrylamide warheads calibrated for moderate intrinsic reactivity. Cyanoacrylamide and chloroacetamide warheads with higher intrinsic reactivity show more off-target reactivity in proteomic studies, consistent with the reactivity tuning hypothesis. This creates a design trade-off: higher warhead reactivity may improve kinact at KRAS G12C but at the cost of broader cysteine reactivity. Lower warhead reactivity improves selectivity but may require higher Ki (tighter non-covalent binding) to compensate.
Computational modeling of this trade-off requires covalent off-target profiling alongside on-target optimization. We run covalent docking against a library of 180 cysteines in the human proteome known to be targeted by covalent drugs or identified as reactive by isoTOP-ABPP proteomics. For each candidate warhead, we compare the predicted covalent efficiency at KRAS G12C versus the distribution of predicted covalent efficiencies at off-target cysteines to produce a selectivity index for covalent engagement.
This is computationally expensive and less well-calibrated than non-covalent selectivity screening, simply because the training data for covalent selectivity prediction is much thinner. The validated covalent proteomics datasets cover perhaps 1,000-2,000 high-confidence cysteine-ligand pairs — a small fraction of the training data available for non-covalent binding prediction. We treat covalent selectivity predictions as higher-uncertainty than non-covalent predictions and require experimental proteomics confirmation before advancing any covalent candidate with flagged off-target signals.
What Comes After G12C: The Next Mutant-Specific Opportunity
KRAS G12C is the most drugged KRAS allele, but it represents roughly 13% of KRAS-mutant cancers. G12D (the most prevalent KRAS mutation, at approximately 36% of KRAS-mutant tumors) lacks the mutant cysteine handle. G12V, G12A, and G13D together account for another 35-40% of KRAS mutations and also lack covalent handles under current chemistry.
The next generation of KRAS inhibitors must work through non-covalent mechanisms — either by achieving sufficient potency through the S-IIP in a competitive fashion, or by targeting alternative sites (KRAS4B membrane orientation, KRAS-effector interfaces, or the emerging KRAS G12D-selective non-covalent inhibitors that exploit the different electrostatic environment of the G12D pocket). Each of these is a distinct structural problem that requires fresh computational analysis rather than direct analog-of-sotorasib optimization.
This is where computational methods have compounding value relative to empirical screening for KRAS allele selectivity: the structural differences between KRAS G12C, G12D, and G12V pockets are subtle (single amino acid changes in a shallow binding site), and empirical SAR against each allele would require separate hit-finding campaigns with large synthesis investments. Computational allele selectivity profiling — docking and scoring the same chemical series against all three allele structures simultaneously — identifies the structural features that differentiate G12C selectivity from G12D or G12V activity at the design stage, before any synthesis.
The GNN scoring function we use encodes the electrostatic and steric differences between allele pockets with sufficient resolution to predict allele-selective binding, validated against published allele-selectivity data for known KRAS allele-selective tool compounds. We're not claiming this predicts absolute potency reliably across all three alleles — out-of-distribution scaffold classes are an issue as always. But for allele selectivity within a given scaffold series, the predictions are directionally reliable and meaningfully reduce the synthesis required to profile allele selectivity in a new chemical series.