SMA Research Platform

Evidence graph for Spinal Muscular Atrophy

Biology-first target discovery
Christian Fischer / Bryzant Labs
1,145Targets
453Trials
60Drugs
7Datasets
34,514Sources
43,071Claims
46,973Evidence
29,625Hypotheses
discoveryApr 10, 2026· SMA Research Platform

ESM-2 Validates Why LIMK2-Selective Drug Design Must Be Pocket-Level

#2026-04-10#ESM2#LIMK2#LIMK1#ROCK1#ROCK2#foundation-model#selectivity#methodology

TL;DR

We embedded 8 SMA-relevant proteins with Meta's ESM-2 (650M) foundation model and computed pairwise cosine similarities on mean-pooled per-residue embeddings. LIMK1 ↔ LIMK2 = 0.990 and ROCK1 ↔ ROCK2 = 0.998. This empirically justifies why our pipeline screens at pocket-level (DiffDock, PocketXMol, MD contact maps) rather than sequence similarity: paralog kinases are indistinguishable in global embedding space, and any sequence-based selectivity shortcut would collapse LIMK1 and LIMK2 together.

The cosine matrix (key cells)

             LIMK1  LIMK2  ROCK1  ROCK2
LIMK1        1.000  0.990  0.928  0.931
LIMK2        0.990  1.000  0.948  0.949
ROCK1        0.928  0.948  1.000  0.998
ROCK2        0.931  0.949  0.998  1.000
  • LIMK1/LIMK2 = 0.990 — essentially identical in ESM-2 space
  • ROCK1/ROCK2 = 0.998 — even closer (explains why Fasudil is pan-ROCK)
  • CFL2 sits at ~0.79 to the kinases (different fold family), consistent with its role as an orthogonal readout rather than a ligandable target

Why this matters for the SMA platform

We have designed our entire LIMK2 drug-discovery pipeline around structure-based scoring: DiffDock v2.2, PocketXMol DFG-out generation, MMPBSA with POCKET_FIXED placement, and MD contact persistence. This result is the empirical reason. A sequence-similarity shortcut would have given the same score to any LIMK1 cross-reactor as to a true LIMK2-selective compound. Pocket geometry (DFG-in/out, hinge residues, gatekeeper) is the only reliable differentiator.

What this does NOT say

  • ESM-2 is not "bad" — mean-pooled embeddings are a feature for homology search, not selectivity discrimination
  • This does not replace wet-lab selectivity assays (KinomeScan, enzymatic IC50)
  • SMN2's ~0.85 similarity to kinases is a mean-pooling length artifact, not mechanistic

Reproducibility

Compute: single RTX 3090, ~5 minutes, ~$0.08. Script and artifacts open-source under CC-BY-4.0:

  • esm2_embeddings.npy — (8, 1280) float32 matrix
  • esm2_similarity_matrix.npy — (8, 8) cosine matrix
  • esm2_similarity_keys.json — protein order
  • esm2_embed.py — exact pipeline

Part of Bryzant-Labs/sma-research (CC-BY-4.0). Full finding: /findings/2026-04-10/FINDING_2026-04-10_ESM2_kinase_similarity.md.

Login → Command Center