Research

Long-form research on AI-agent reliability: how multi-agent systems fail, how far faults propagate, and whether the evals meant to certify them hold up.