Research

I build the scientific foundations for AI evaluation and the institutional structures that let governance act on evaluation evidence. My work spans three connected layers: developing psychometric and measurement-theoretic standards that determine when evaluation results genuinely justify capability or risk claims; analyzing the evaluation ecosystem to understand how reporting practices and aggregation rules shape what evidence reaches decision-makers; and diagnosing where technical analysis is necessary for AI governance and how to embed it into institutional processes.

My research has been published at ICLR, IUI, FAccT, AIES, TMLR, and has been recognized with orals and spotlights at ICML, NeurIPS, and it has been covered by the MIT Technology Review and The New York Times among other outlets.

You can find the most up-to-date list of my papers on my Google Scholar profile.

Publications

Reuel, A., Hardy A., Smith, C., Lamparth, M., Kochenderfer, M. (2024). BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best Practices. Under review at 2024 Conference on Neural Information Processing Systems.

Reuel, A., Bucknall, B., Casper, S., Fist, T., Soder, L., Aarne, O., Hammond, L., Ibrahim, L., Chan, A., Wills, P., Anderljung, M., Garfinkel, B., Heim, L., Trask, A., Mukobi, G., Schaeffer, R., Baker, M., Hooker, S., Solaiman, I., Luccioni, A. S., Rajkumar, N., Moës, N., Ladish, J., Guha, N., Newman, J., Bengio, Y., South, T., Pentland, A., Koyejo, S., Kochenderfer, M. J., & Trager, R. (2024). Open Problems in Technical AI GovernancearXiv preprint arXiv:2407.14981.

Reuel, A., Soeder, L., Bucknall, B., & Undheim, T. A. (2024). On The Importance of Technical Research and Talent for AI Governance2024 International Conference on Machine LearningAccepted as oral – top 1.5% of papers

Reuel, A. & Ma, D. (2024). Fairness in Reinforcement Learning: A SurveyAAAI AI, Ethics & Society 2024.

Rivera, J.-P.*, Mukobi, G.*, Reuel, A.*, Lamparth, M., Smith, C., & Schneider, J. (2024). Escalation Risks from Language Models in Military and Diplomatic Decision-Making. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’24), 836-898.

Reuel, A.* & Undheim, T. A.* (2024). Generative AI Needs Adaptive Governance. Under review at Digital Policy, Regulation and Governance.

Undheim, T. A. & Reuel, A. (2024). A Literature Review of AI Governance Trends, 2020-2024. Under review at AI & Society.

Trager R., Harack, B. Reuel, A., Carnegie, A., Heim, L., Ho, L., Kreps, S., Lall, R., Larter, O., Ó hÉigeartaigh, S., Staffell, S., & Villalobos, J. (2023). International Governance of Civilian AI: A Jurisdictional Certification Approacharxiv:2308.15514.

Nie, A., Reuel, A., & Brunskill, E. (2023). Understanding the Impact of Reinforcement Learning Personalization on Subgroups of Students in Math TutoringInternational Conference on Artificial Intelligence in Education, pp. 688–694.

Schuett, J.Reuel, A. & Carlier, A. (2023). How to design an AI ethics boardAI & Ethics.

Lamparth, M., & Reuel, A. (2023) Analyzing And Editing Inner Mechanisms Of Backdoored Language ModelsProceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’24).

Reuel, A., Peralta, S., Sedoc, J., Sherman, G., & Ungar, L. (2022). Measuring the Language of Self-Disclosure across CorporaFindings of the 60th Annual Meeting of the Association for Computational Linguistics 2022.

Reuel, A., Koren, M., Corso, A., & Kochenderfer, M. (2021). Using Adaptive Stress Testing to Identify Paths to Ethical Dilemmas in Autonomous SystemsProceedings of the AAAI-22 Workshop on Artificial Intelligence Safety.

*Equal contribution.