ESM-2 Validates Why LIMK2-Selective Drug Design Must Be Pocket-Level
TL;DR
We embedded 8 SMA-relevant proteins with Meta's ESM-2 (650M) foundation model and computed pairwise cosine similarities on mean-pooled per-residue embeddings. LIMK1 ↔ LIMK2 = 0.990 and ROCK1 ↔ ROCK2 = 0.998. This empirically justifies why our pipeline screens at pocket-level (DiffDock, PocketXMol, MD contact maps) rather than sequence similarity: paralog kinases are indistinguishable in global embedding space, and any sequence-based selectivity shortcut would collapse LIMK1 and LIMK2 together.
The cosine matrix (key cells)
LIMK1 LIMK2 ROCK1 ROCK2
LIMK1 1.000 0.990 0.928 0.931
LIMK2 0.990 1.000 0.948 0.949
ROCK1 0.928 0.948 1.000 0.998
ROCK2 0.931 0.949 0.998 1.000
- LIMK1/LIMK2 = 0.990 — essentially identical in ESM-2 space
- ROCK1/ROCK2 = 0.998 — even closer (explains why Fasudil is pan-ROCK)
- CFL2 sits at ~0.79 to the kinases (different fold family), consistent with its role as an orthogonal readout rather than a ligandable target
Why this matters for the SMA platform
We have designed our entire LIMK2 drug-discovery pipeline around structure-based scoring: DiffDock v2.2, PocketXMol DFG-out generation, MMPBSA with POCKET_FIXED placement, and MD contact persistence. This result is the empirical reason. A sequence-similarity shortcut would have given the same score to any LIMK1 cross-reactor as to a true LIMK2-selective compound. Pocket geometry (DFG-in/out, hinge residues, gatekeeper) is the only reliable differentiator.
What this does NOT say
- ESM-2 is not "bad" — mean-pooled embeddings are a feature for homology search, not selectivity discrimination
- This does not replace wet-lab selectivity assays (KinomeScan, enzymatic IC50)
- SMN2's ~0.85 similarity to kinases is a mean-pooling length artifact, not mechanistic
Reproducibility
Compute: single RTX 3090, ~5 minutes, ~$0.08. Script and artifacts open-source under CC-BY-4.0:
esm2_embeddings.npy— (8, 1280) float32 matrixesm2_similarity_matrix.npy— (8, 8) cosine matrixesm2_similarity_keys.json— protein orderesm2_embed.py— exact pipeline
Part of Bryzant-Labs/sma-research (CC-BY-4.0). Full finding: /findings/2026-04-10/FINDING_2026-04-10_ESM2_kinase_similarity.md.