SAFETYGATE QEC DEMO

Postselect decoded QEC shots by matching-weight confidence. V7 temporal scorer hits 41.3% LER reduction on IBM Fez (block-stratified, 5-fold CV). Drift-stratified weight-only baseline: ~7.7%.

SURFACE CODE CONFIGURATION

Calibrated to hardware validation data from IBM Heron and Rigetti Ankaa-3 (Feb 2026).

WITHOUT SAFETYGATE

All shots accepted (no veto)

,

Total Shots

,

Logical Errors

,

Logical Error Rate (LER)

WITH SAFETYGATE

High-risk shots vetoed

,

Accepted

,

Vetoed

,

Errors

,

Logical Error Rate (LER)

MULTI-VENDOR HARDWARE RESULTS

SafetyGate adapts to each vendor's noise profile. Calibration windows are hours, not days, so results are reported per regime, stable sessions deliver a larger uplift, drift-stratified sessions a smaller one. Same safety guarantee in both.

Stable-regime sessions

41.3% LER reduction

IBM Torino + Rigetti Ankaa-3

Drift-stratified (IBM Fez)

7.7% LER reduction

CW matching-weight, stable blocks

Throughput retained

71-96%

Regime-dependent

Total validated

26,000+ shots

4 backends, 2 vendors

Hardware validation: IBM Heron (Torino, Fez, Marrakesh) + Rigetti Ankaa-3 (Feb-Apr 2026). Fez April 7 session showed intra-session drift (monotonic detection-rate ramp, R² = 0.992) — expected physics for superconducting qubits between calibration windows. Earlier “23.1%” CW number retired; honest drift-stratified baseline is 7.7%. See SafetyGate guide for full drift-audit methodology.

THE SCIENCE

Surface Code QEC

Surface codes encode logical qubits in a 2D grid of physical qubits. Error measurements detect faults, and decoders attempt to correct them. However, correlated errors can create patterns that fool the decoder.

SafetyGate Veto

SafetyGate analyzes error indicators to compute a risk score. Shots likely to cause errors are vetoed before they can corrupt results.

HARDWARE VALIDATED (FEB 2026)

Validated on IBM Heron and Rigetti Ankaa-3 quantum processors. SafetyGate reduces logical error rate by 39-57% on real hardware, across 26,000 shots on 4 backends from 2 vendors.

39-57%

LER Reduction

26,000

Hardware Shots

2

Vendors

4

Backends

Three-Regime Governance

SafetyGate adapts behaviour to the current calibration regime automatically. Calibration windows drift; SafetyGate responds.

RegimeBackendShotsBaseline LERGated LERAction
Below thresholdMarrakesh2,0000.00%0.00%99.9% allow
Near thresholdTorino18,0000.34%0.21%39% reduction
Near thresholdAnkaa-3 (Rigetti)8,0000.21%0.09%57% reduction
Above thresholdTorino (IBM)12,00038-49%n/a100% block

d=3 repetition code. IBM Heron: 6 jobs, 62 doomed shots. Rigetti Ankaa-3: 1 job, 17 doomed shots.

For Operators

SafetyGate vetoes 29% of shots, retaining 71% throughput. Among allowed shots, logical error rate drops 39%. Net effect: higher reliability at moderate throughput cost.

For QEC Researchers

Multi-round syndrome analysis is the key differentiator. Detection accuracy degrades significantly without temporal context, confirming that single-round heuristics miss correlated failure modes.

Fail-Closed Governance

When the backend drifts outside its published calibration window, SafetyGate blocks 100% of shots rather than returning results with understated error bars. This prevents compute waste and protects downstream applications from results they cannot defend.

VALIDATED PERFORMANCE MATRIX

100,000 Stim-simulated episodes across 4 code distances and 5 noise regimes. Wilson 95% confidence intervals. Every cell shows 88-100% LER reduction.

88-100%

LER Reduction

100,000

Episodes Validated

20

Test Configurations

<1%

Typical FAR-bad

DistanceSTABLEAGINGSTEP
CHANGE
GLITCHLEAKAGE
d=3

97.4%

FAR: 1.4%

Allow: 51%

95.8%

FAR: 1.8%

Allow: 42%

93.8%

FAR: 2.4%

Allow: 39%

91.4%

FAR: 4.7%

Allow: 54%

100.0%

FAR: 0.0%

Allow: 52%

d=5

96.7%

FAR: 1.0%

Allow: 31%

97.3%

FAR: 0.4%

Allow: 15%

96.0%

FAR: 0.6%

Allow: 14%

95.9%

FAR: 1.1%

Allow: 28%

98.4%

FAR: 0.5%

Allow: 28%

d=7

96.6%

FAR: 0.7%

Allow: 19%

99.0%

FAR: 0.1%

Allow: 6%

95.8%

FAR: 0.2%

Allow: 5%

96.1%

FAR: 0.6%

Allow: 16%

94.8%

FAR: 0.9%

Allow: 17%

d=9

94.6%

FAR: 0.7%

Allow: 12%

88.3%

FAR: 0.2%

Allow: 2%

91.8%

FAR: 0.1%

Allow: 2%

95.3%

FAR: 0.4%

Allow: 8%

96.5%

FAR: 0.3%

Allow: 10%

LER Reduction %. FAR-bad = fraction of allowed shots that carry logical errors. Higher allow rate = more throughput retained.

SCALING WITH CODE DISTANCE

As code distance grows, temporal correlation awareness becomes essential. SafetyGate maintains 88%+ LER reduction at every distance, with FAR-bad dropping below 1% at d=5 and above.

LER REDUCTION (%)

88% floor across all configs

MEAN BLOCK RATE (%)

51% allow31% allow19% allow12% allow

MEAN FAR-BAD (%)

Below 1% at d=5+

OPERATING FRONTIER

Explore the tradeoff between throughput cost and safety benefit. Each point is a validated noise regime. Slide to see how SafetyGate adapts.

LER Reduction (%)Block Rate (%)
88
92
96
100
20
40
60
80
100
STABLEGLITCHLEAKAGEAGINGSTEP CHANGE

Current Operating Point

d=5, STABLE

96.7%

LER Reduction

69.5%

Block Rate

30.5%

Throughput Retained

0.99%

FAR-bad

Baseline LER24.6%
Gated LER0.80%

96.7% of errors eliminated

Click points on the chart or slide to explore how SafetyGate adapts to different noise conditions. Higher block rate = more aggressive filtering = better safety at throughput cost.

FAQ

Does this replace the decoder?

No. SafetyGate runs alongside existing decoders and does not modify correction logic.

Can it veto good shots?

Yes. SafetyGate is a conservative safety layer. It may veto shots that would decode correctly in order to reduce overall logical risk.

Is this backend-specific?

SafetyGate uses vendor-specific noise profiles but the approach is hardware-agnostic. Validated on IBM Heron (3 backends) and Rigetti Ankaa-3. Vendor-specific noise profiles with <0.5% cross-backend variance within IBM.

Technical Notes

SafetyGate computes a risk score from error indicators.

High scores indicate error patterns that statistically correlate with failure.