Every agent decision is traced with OpenInference, scored by LLM-as-judge, and queryable by the agent itself.
| Trace | Operation | Lat | Verdict | G · P · R |
|---|---|---|---|---|
| TR-9F3A3F | investigate · UPI | 2.1s | allow | 939088 |
| TR-9F3A3E | investigate · IMPS | 2.6s | stepup | 908683 |
| TR-9F3A3D | investigate · UPI | 1.9s | allow | 949290 |
| TR-9F3A3C | investigate · UPI | 3.1s | block | 868174 |
| TR-9F3A3B | re-score · UPI | 2.0s | allow | 929189 |
| TR-9F3A3A | investigate · NEFT | 2.9s | hold | 888578 |
| TR-9F3A39 | investigate · UPI | 1.9s | allow | 959391 |
| TR-9F3A38 | investigate · UPI | 2.4s | stepup | 918785 |
| TR-9F3A37 | investigate · UPI | 1.9s | allow | 949290 |
Simulated example of the agent reading its own failing traces from Phoenix.
| Experiment | Source | Cases | Groundedness | Policy-fit | Reason-code | Status |
|---|---|---|---|---|---|---|
| EXP-2210 | failing_reasoncodes | 34 | +10 pts | +11 pts | +18 pts | pending |
| EXP-2188 | scam_cluster_block | 120 | +2 pts | +5 pts | +1 pts | shipped |
| EXP-2090 | verdict_rationale | 88 | +4 pts | +1 pts | +3 pts | shipped |
| EXP-1977 | min_score_sweep | 64 | +1 pts | −2 pts | −3 pts | rolled-back |