Satvik Golechha 7vik

Satvik Golechha

On a quest to understand intelligence and ensure that advanced AGI is safe and beneficial.

I’m an independent AI safety researcher currently working with:

CHAI, UC Berkeley — Optimal exploration and long-horizon planning in RL.
Adrià Garriga-Alonso (FAR AI) — Studying deceptive behavior in frontier AI systems at the MATS Program.
Nandi Schoots (Oxford) — Hierarchical representations and modular training for interpretability.

Previously:

Microsoft Research — Worked with Neeraj Kayal on representation learning theory, and Amit Sharma and Amit Deshpande on ICL robustness in LLMs.
Wadhwani AI — Formulated AI problems in public health and trained robust and interpretable ML large-scale deployments in India.
Mentored a SPAR 2025 project on zero-knowledge auditing for undesired behaviors.

=Equal contribution; full list at Google Scholar

Intricacies of Feature Geometry in Large Language Models
ICLR 2025 (poster); Runner-up, ICLR Blog Awards
Code | Blog
Among Us: A Sandbox for Measuring and Detecting Agentic Deception
Under Review
Poster | Blog
Auditing Language Models for Hidden Objectives
Anthropic (external collaboration)
Anthropic Blog | Blog
NICE: To Optimize In-Context Examples or Not?
ACL 2024
Code
Progress Measures for Grokking on Real-world Tasks
ICML 2024 Workshop on High-dimensional Learning Dynamics
Code
Challenges in Mechanistically Interpreting Model Representations
ICML 2024 Workshop on Mechanistic Interpretability
Code
A is for Absorption: Studying Feature Splitting and Absorption in SAEs
Under Review
CataractBot: An LLM-Powered Expert-in-the-Loop Chat System
IMWUT / UbiComp 2025
Code
Predicting Treatment Adherence of Tuberculosis Patients at Scale
PMLR 2022; Outstanding Paper, NeurIPS 2022
Media Coverage

Email: [email protected]
Website: 7vik.io
LinkedIn: @7vik
Open to collaborations in interpretability, alignment, deception audits, and theoretical ML.