Evidence Calibration
Grade ABayesian back-testing of convergence scores against known drug outcomes — the critical self-check that separates rigorous research from speculation. For each drug with a known clinical outcome, the platform asks: did our evidence scoring predict the right outcome?
▶How does Calibration Process work?
(1) Outcome collection— gather real-world drug approval/failure data from ClinicalTrials.gov and FDA records. (2) Score comparison— compare each drug’s convergence score against its actual clinical outcome. Approved drugs (nusinersen, risdiplam, onasemnogene) should score high; failed drugs should score low. (3) Bayesian updating— compute posterior probabilities measuring how well evidence mass predicts clinical success. (4) Grade assignment— A (well calibrated) to F (poorly calibrated).
A well-calibrated platform means researchers can trust the convergence scores when evaluating novel, untested targets.
Calibration Curve
Each bin groups drugs by their convergence score. The green line shows actual clinical success rate within that bin. The dashed line is perfect calibration (predicted = actual). Bins with n=0 have no scored outcomes.
Calibration Metrics
Convergence by Outcome Group
| Outcome | Count | Scored | Mean Convergence | Median | Min | Max |
|---|---|---|---|---|---|---|
| success | 177 | 133 | 63.0% | 62.9% | 62.9% | 63.2% |
| failure | 5 | 0 | — | — | — | — |
| ongoing | 20 | 8 | 63.0% | 62.9% | 62.9% | 63.2% |
Uncertainty Quantification
Wilson score confidence intervals on target support ratios. Grade A = narrow CI (high certainty), D = wide CI (needs more evidence). CI width reflects evidence volume, source diversity, and claim consistency per target.
Low certainty -- limited sources or wide CI