Steven Basart
Computer Science PhD
Education
University of Chicago PhD in Computer Science (CS) Dec 2020
University of Miami B.S. in Biochemistry and CS May 2014
Technical Skills
Languages: (Order of Proficiency) Python, C, C++, Rust, Javascript
Machine Learning: Pytorch, Tensorflow, sklearn
Web Tools: HTML/CSS, NodeJS, Selenium
Other Tools: Git, Docker, Linux, Bash
Professional History
Center for AI Safety (CAIS)
Research Manager May 2023 to present
Led technical research teams, hiring all full-time technical and research staff and collectively outputting 10 research papers a year.
Cofounder/Research Engineer May 2022 to May 2023
Research Engineer responsibilities: Data collection and evaluation of huggingface models, as well as fine-tuning. Led small technical teams on various safety research projects.
Reliability Engineer responsibilities: Designed technical infrastructure and procedures for the company including github actions and formalized code review processes. Set up the CAIS compute cluster now used by over 300 users to do safety reseach.
SpaceX
Software Engineer II Februrary 2021 to May 2022
Updated the kubernetes environment to support new satellites. Worked on StarLink mobility effort to allow for mobile User Terminals (UTs). Worked on low level RAM interface that was used to store critical information about satellite ephemeris.
Autobon AI
Head of AI August 2019 to September 2020
Led development of the AI/ML infrastructure at Autobon, designing data ingestion into Amazon AWS, constructing labeling tasks, and quality assurance over the labeled data.
Google Brain
Research Intern May 2018 to September 2018
Researched the area of Fact Checking related to this paper and developed solutions to deal with the problem of content abuse and collaborated with the Google News team.
Publications
Remote Labor Index: Measuring AI Automation of Remote Work 2025
Built a broadly multi-sector benchmark of real-world, economically valuable remote-work projects to evaluate end-to-end agent performance in practical settings.
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? NeurIPS 2024
Quantified which AI safety benchmarks actually measure safety by analyzing whether task performance improves automatically as models scale.
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning ICML 2024 Designed the research methodology and developed a benchmark for measuring information hazards and a baseline unlearning method for removing dangerous knowledge from large language models.
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal ICML 2024
Designed a standardized evaluation framework for probing models against harmful behaviors including misinformation, cybercrime, and chemical/drug synthesis.
Representation Engineering: A Top-Down Approach to AI Transparency 2024
Introduced a top-down approach to probing model internals, enabling both interpretability and control of learned representations.
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark ICML 2023
Coordinated research planning and resource allocation for the benchmark evaluating ethical trade-offs in agent decision-making.
Scaling Out-of-Distribution Detection for Real-World Settings ICML 2022
Constructed synthetic and curated real-world datasets for anomaly segmentation, demonstrating that classic approaches can match or exceed modern methods.
Measuring Coding Challenge Competence With APPS NeurIPS 2021
Collected and built an evaluation benchmark for converting natural language programming problems into executable Python, with automated comparison against ground truth solutions.
Measuring Mathematical Problem Solving With the MATH Dataset NeurIPS 2021
Developed procedural generation functions for creating math problems across difficulty levels to evaluate model competency.
Aligning AI With Shared Human Values ICLR 2021 Created a benchmark measuring how well models align with human moral judgments.
Towards Robustness of Neural Networks Thesis 2021 Unified prior robustness work and extended it to few-shot learning settings.
Many Faces of Robustness: An Analysis of Out-of-Distribution Generalization ICCV 2021 Collected a new dataset and introduced a technique achieving state-of-the-art out-of-distribution generalization.
Natural Adversarial Examples ICML 2019
Constructed a dataset capturing long-tail distributions to expose generalization failures in current models.
DIODE: A Dense Indoor and Outdoor DEpth Dataset 2019
Captured indoor and outdoor scenes with a single depth sensor to produce the highest-accuracy depth dataset at time of publication.
Analysis of Generative Adversarial Models 2017
Introduced a quantitative measure for assessing generative model quality and a method for using GANs to interpret classifiers.