discoveryApr 10, 2026· SMA Research Platform

ESM-2 Validates Why LIMK2-Selective Drug Design Must Be Pocket-Level

#2026-04-10#ESM2#LIMK2#LIMK1#ROCK1#ROCK2#foundation-model#selectivity#methodology

TL;DR

We embedded 8 SMA-relevant proteins with Meta's ESM-2 (650M) foundation model and computed pairwise cosine similarities on mean-pooled per-residue embeddings. LIMK1 ↔ LIMK2 = 0.990 and ROCK1 ↔ ROCK2 = 0.998. This empirically justifies why our pipeline screens at pocket-level (DiffDock, PocketXMol, MD contact maps) rather than sequence similarity: paralog kinases are indistinguishable in global embedding space, and any sequence-based selectivity shortcut would collapse LIMK1 and LIMK2 together.

The cosine matrix (key cells)

             LIMK1  LIMK2  ROCK1  ROCK2
LIMK1        1.000  0.990  0.928  0.931
LIMK2        0.990  1.000  0.948  0.949
ROCK1        0.928  0.948  1.000  0.998
ROCK2        0.931  0.949  0.998  1.000

LIMK1/LIMK2 = 0.990 — essentially identical in ESM-2 space
ROCK1/ROCK2 = 0.998 — even closer (explains why Fasudil is pan-ROCK)
CFL2 sits at ~0.79 to the kinases (different fold family), consistent with its role as an orthogonal readout rather than a ligandable target

Why this matters for the SMA platform

We have designed our entire LIMK2 drug-discovery pipeline around structure-based scoring: DiffDock v2.2, PocketXMol DFG-out generation, MMPBSA with POCKET_FIXED placement, and MD contact persistence. This result is the empirical reason. A sequence-similarity shortcut would have given the same score to any LIMK1 cross-reactor as to a true LIMK2-selective compound. Pocket geometry (DFG-in/out, hinge residues, gatekeeper) is the only reliable differentiator.

What this does NOT say

ESM-2 is not "bad" — mean-pooled embeddings are a feature for homology search, not selectivity discrimination
This does not replace wet-lab selectivity assays (KinomeScan, enzymatic IC50)
SMN2's ~0.85 similarity to kinases is a mean-pooling length artifact, not mechanistic

Reproducibility

Compute: single RTX 3090, ~5 minutes, ~$0.08. Script and artifacts open-source under CC-BY-4.0:

esm2_embeddings.npy — (8, 1280) float32 matrix
esm2_similarity_matrix.npy — (8, 8) cosine matrix
esm2_similarity_keys.json — protein order
esm2_embed.py — exact pipeline

Part of Bryzant-Labs/sma-research (CC-BY-4.0). Full finding: /findings/2026-04-10/FINDING_2026-04-10_ESM2_kinase_similarity.md.

Dig deeper on the platform