Independent Researcher | AGI Safety β’ Interpretability β’ Reinforcement Learning
π UC Berkeley (CHAI), MATS Program
π§ [email protected] | π 7vik.io | Scholar | LinkedIn | GitHub
On a quest to understand intelligence and ensure that advanced AGI is safe and beneficial.
Iβm an independent AI safety researcher currently working with:
- CHAI, UC Berkeley β Optimal exploration and long-horizon planning in RL.
- AdriΓ Garriga-Alonso (FAR AI) β Studying deceptive behavior in frontier AI systems at the MATS Program.
- Nandi Schoots (Oxford) β Hierarchical representations and modular training for interpretability.
Previously:
- Microsoft Research β Worked with Neeraj Kayal on representation learning theory, and Amit Sharma and Amit Deshpande on ICL robustness in LLMs.
- Wadhwani AI β Formulated AI problems in public health and trained robust and interpretable ML large-scale deployments in India.
- Mentored a SPAR 2025 project on zero-knowledge auditing for undesired behaviors.
- AI Alignment & Safety
- Interpretability & Feature Geometry
- Long-horizon RL & Planning
- Representation Learning & Theory
=Equal contribution; full list at Google Scholar
-
Intricacies of Feature Geometry in Large Language Models
ICLR 2025 (poster); Runner-up, ICLR Blog Awards
Code | Blog -
Among Us: A Sandbox for Measuring and Detecting Agentic Deception
Under Review
Poster | Blog -
Auditing Language Models for Hidden Objectives
Anthropic (external collaboration)
Anthropic Blog | Blog -
Progress Measures for Grokking on Real-world Tasks
ICML 2024 Workshop on High-dimensional Learning Dynamics
Code -
Challenges in Mechanistically Interpreting Model Representations
ICML 2024 Workshop on Mechanistic Interpretability
Code -
A is for Absorption: Studying Feature Splitting and Absorption in SAEs
Under Review -
CataractBot: An LLM-Powered Expert-in-the-Loop Chat System
IMWUT / UbiComp 2025
Code -
Predicting Treatment Adherence of Tuberculosis Patients at Scale
PMLR 2022; Outstanding Paper, NeurIPS 2022
Media Coverage
AmongUs
β Agentic deception sandboxnice-icl
β ICL optimization toolsgrokking
β Measuring grokking dynamicsbyoeb
β Healthcare LLM deployment platform
- Email: [email protected]
- Website: 7vik.io
- LinkedIn: @7vik
- Open to collaborations in interpretability, alignment, deception audits, and theoretical ML.