50 Orphan MD Trajectories Analyzed — LIMK2 Pipeline Validated, 4-AP SMN2 Pocket Rediscovered, CFL2 Claim Retracted
TL;DR
50 completed MD trajectories (47.8 GB total) had been sitting on the cluster without analysis. We ran MDAnalysis-based backbone/ligand/contact/pocket-retention analysis on 44 of them (6 missing topology files) using a single CPU pass per trajectory. The results forced one retraction, validated one pipeline, and rediscovered a hidden positive that earlier MD metadata had wrongly marked as negative.
Three headline findings
1. 4-AP + SMN2 Tudor was a hidden positive — topology atom-count artifact hid the binding
The April 2 4-AP + SMN2 holo MD finished with metadata claiming binding_contacts: [] — zero stable contacts over 18.5 ns. We interpreted this as a strong negative: "4-AP does not bind SMN2." It was wrong. The orphan analysis re-ran the trajectory with a topology-fix step (the topology PDB had 140,793 waters; the DCD had 140,658 atoms — a 405-atom mismatch caused the ligand atom selection to silently return empty). With the fix:
- 4-AP engaged 100% of frames over 18.5 ns
- Pocket Cα distance 4.6 Å throughout — ligand never leaves
- Top contacts: PRO268 (92%), VAL413 (92%), ASN270 (92%), SER271 (89%), PHE266 (81%), VAL267 (81%), ILE269 (74%), TYR657 (63%)
- Verdict: WEAK_BINDER (engaged, but protein flexibility is high)
Cross-connection: Riluzole on SMN2 (SMN2_Riluzole_holo) binds the SAME pocket (GLY294, SER271, VAL272, CYS658, PRO268, TYR657 — shared with 4-AP at PRO268, SER271, TYR657). Two structurally different compounds, same pocket → this is a real druggable site, not a co-solvent artifact.
2. LIMK2 pipeline validated by reference compound contact persistence
Several LIMK2 reference compound MDs (BMS-5, LIMKi3) showed clean stable contact patterns at the predicted ATP-pocket residues, validating the structure-based scoring + POCKET_FIXED placement protocol our active LIMK2-selective screen depends on.
3. CFL2 + 4-AP cross-connection RETRACTED
The April 10 CROSS_CONNECTIONS document (Insight 1) claimed a 4-AP + CFL2 simulation existed and connected the LIMK2-selective campaign to the 4-AP campaign through a shared CFL2 readout. The orphan analysis revealed that CFL2_gpu33887147.dcd is actually an APO CFL2 simulation (35,150 atoms = protein + solvent only, no ligand). The claimed 4-AP + CFL2 MD never happened. Insight 1 is retracted. A revised insight (the SMN2 pocket shared with Riluzole, finding #1 above) replaced it.
The topology atom-count learning
The 4-AP SMN2 false negative was caused by a silent atom-count mismatch between the topology PDB and the DCD trajectory. MDAnalysis silently dropped the frames where the ligand selection returned empty, which the contact analysis then read as "no binding." This is a class of bug that turns positives into negatives without any error message. Hard rule going forward:
Before writing any "no binding" or "unstable" conclusion from an MD trajectory, verify topology atom count matches DCD atom count, verify the ligand selection is non-empty, and cross-check surprising negatives with a second analysis.
Full learning published at: https://github.com/Bryzant-Labs/sma-research/blob/main/docs/learnings/topology_atom_count.md
Method
analyze_orphan_trajectory.py— single-pass MDAnalysis 2.10 analysis (Kabsch protein RMSD, minimum-image PBC ligand-pocket distance, contact persistence, energy drift)batch_analyze_orphans.py— runner with topology hint mapfix_topology_atoms.py— strips tail waters from topology to match DCD atom count- 44/50 trajectories analyzed (6 missing topology files)
- Compute: 8-core CPU, 1-60 s per trajectory, ****
Where the data lives
- Full report: https://github.com/Bryzant-Labs/sma-research/blob/main/findings/2026-04-10/ORPHAN_TRAJECTORY_ANALYSIS.md
- Cross-connections (with retraction): https://github.com/Bryzant-Labs/sma-research/blob/main/findings/2026-04-10/CROSS_CONNECTIONS_2026-04-10.md
- Topology learning: https://github.com/Bryzant-Labs/sma-research/blob/main/docs/learnings/topology_atom_count.md
- Analysis scripts: https://github.com/Bryzant-Labs/sma-research/tree/main/scripts/md_analysis
Why this matters
Three scientific principles, all enforced in this single analysis run:
- Don't waste compute — completed runs that never get analyzed are scientific waste
- Verify negative results twice — a topology bug nearly closed two real discoveries
- Publish retractions with the same rigor as discoveries — Insight 1 is retracted publicly, not silently fixed
CC-BY-4.0. All artifacts open-source.