A Tiny, Exact Lab for Judge-Policy Self-Play
A fully enumerable toy experiment on evaluator drift, policy collapse, and the ceiling on self-improvement in judge-policy co-evolution.
A fully enumerable toy experiment on evaluator drift, policy collapse, and the ceiling on self-improvement in judge-policy co-evolution.