Vedant Shah

Vedant Shah

I am a first year Ph.D. student at Mila and Université de Montréal where I am advised by Prof. Aaron Courville. Previously, I finished my Masters at Mila and UdeM under the supervision of Prof. Yoshua Bengio and Dr. Anirudh Goyal. My recent contributions have revolved around improving and evaluating the reasoning capabilities of large language models (LLMs) through RL based post-training of LLMs, synthetic data generation and test-time scaling. I am broadly interested in investigating self-improvement in LLM reasoning and how these advancements can be leveraged for open ended mathematical discovery. My previous work involved incorporating inductive biases into deep learning models for low-compute unlearning and for improving out-of-distribution generalization in multi-agent reinforcement learning.

I completed my undergraduate studies in Electronics and Communications Engineering from BITS Pilani, Goa Campus where I worked with Prof. Ashwin Srinivasan and Prof. Tanmay Verlekar at APPCAIR. I also spent a summer interning with Dr. Gautam Shroff at TCS Research and was selected for Google Summer of Code 2021.

Email / Scholar / LinkedIn / X / Github

Recent News

December 2025: New pre-print studying different KL estimators for RL finetuning of LLMs.
September 2025: New pre-print on leveraging CoT aggregation in LLMs for efficient test-time scaling.
September 2025: Started my PhD!
January 2025: Received the UNIQUE Excellence Scholarship of $10000 for my MSc studies!
October 2024: Our work AI-Assisted Generation of Difficult Math Questions has been accepted at the MATH-AI Workshop at NeurIPS 2024!
April 2024: Selected to attend the 11th Heidelberg Laureate Forum!
March 2024: Our work Towards DNA-Encoded Library Generation with GFlowNets has been accepted at the GEM Workshop at ICLR 2024!
March 2024: Our work Efficient Causal Graph Discovery Using Large Language Models has been accepted at the How Far are we from AGI? workshop at ICLR 2024!
November 2023: Check out our new preprint Unlearning via Sparse Representations
September 2023: Started as a Research Master's student at Mila and Université de Montréal!
May 2023: Gave a lecture covering the fundamentals of Natural Language Processing as a part for the AI4Good Lab lecture series
April 2023: I will be attending the DLRL Summer School 2023 in Montreal.
January 2023: SAF has been accepted at ICLR 2023! Latest version can be found here.
November 2022: Gave a talk on our work SAF at the Berkeley MARL reading group.
January 2022: Selected for Google Research Week 2022.
September 2021: Our work Forecasting Market Prices using DL with Data Augmentation and Meta-learning: ARIMA still wins! was accepted at the ICBINB Workshop at NeurIPS 2021.
September 2021: We at SAiDL and APPCAIR are organizing an AI Symposium. Register here.
August 2021: Joined as a Research intern at Mila. I'll be advised by Anirudh Goyal and Yoshua Bengio!
August 2021: Our work Adapting Deep Neural Networks for Pedestrian-Detection to Low-Light Conditions without Re-training was accepted at the 1st TradiCV Workshop at ICCV 2021.
May 2021: Selected for Google Summer of Code, 2021. Take a look at my project here.
May 2021: Started working as a research intern at TCS Research. I'll be working on investigating market data forecasting using Deep Learning.
January 2021: I'll be working at APPCAIR with Oyla Inc. on Pedestrian Detection in low-light conditions.

Selected Publications and Pre-prints

For a complete list of publications, please visit my Google Scholar page.

A Comedy of Estimators: On KL Regularization in RL Training of LLMs
Vedant Shah*, Johan Obando-Ceron*, Vineet Jain*, Brian Bartoldson, Bhavya Kailkhura, Sarthak Mittal, Glen Berseth, Pablo Samuel Castro, Yoshua Bengio, Nikolay Malkin, Moksh Jain, Siddarth Venkataraman, Aaron Courville
Pre-print
arXiv
We study the gradient bias (and its effects) induced by using different configurations of the KL regularization penalty in RL training of LLMs.

Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models
Siddarth Venkataraman*, Vineet Jain*, Sarthak Mittal*, Vedant Shah, Johan Obando-Ceron, Yoshua Bengio, Brian Bartoldson, Bhavya Kailkhura, Guillaume Lajoie, Glen Berseth, Nikolay Malkin, Moksh Jain
Pre-print
arXiv / Website
Using recursive aggregation of the chain of thought (CoT) of LLMs to unlock powerful test-time scaling capabilities.

AI-Assisted Generation of Difficult Math Questions
Vedant Shah, Dingli Yu, Kaifeng Lyu, Simon Park, Jiatong Yu, Yinghui He, Nan Rosemary Ke, Michael Mozer, Yoshua Bengio, Sanjeev Arora, Anirudh Goyal
MATH-AI Workshop, NeurIPS 2024
arXiv / Website / Poster
We use frontier LLMs such as GPT-4 and Claude 3 and Gemini to generate challenging mathematical questions by asking them to compose two domain skills at once. Based on this, we present a new math eval where a large number of open source as well as proprietary LLMs show significant drops relative to a standard eval.

Unlearning via Sparse Representations
Vedant Shah, Frederik Träuble, Ashish Malik, Hugo Larochelle, Michael Mozer, Sanjeev Arora, Yoshua Bengio, Anirudh Goyal
TMLR 2025
Openreview
We propose a nearly compute-free approach for class unlearning in multi-class classification settings.

Stateful Active Facilitator: Coordination and Environmental Heterogeneity in Cooperative Multi-Agent Reinforcement Learning
Dianbo Liu*, Vedant Shah*, Oussama Boussif*, Cristian Meo, Anirudh Goyal, Tianmin Shu, Michael Mozer, Nicolas Heess, Yoshua Bengio,
ICLR 2023
arXiv / Talk
Investigating the problem of environmental heterogeneity and using behvioural priors to tackle the problems of coordination and heterogeneity in MARL

Efficient Causal Graph Discovery Using Large Language Models
Thomas Jiralerspong, Xiaoyin Chen, Yash More, Vedant Shah, Yoshua Bengio
How Far are We From AGI? Workshop, ICLR 2024
arXiv / Poster
We propose an approach to discover full causal graphs in real-world settings using LLMs such at GPT-4 by augmenting LLM querying with a BFS like procedure.

Towards DNA-Encoded Library Generation Using GFlowNets
Michał Koziarski, Mohammed Abukalam, Vedant Shah, Louis Vaillancourt, Doris Alexandra Schuetz, Moksh Jain, Almer van der Sloot, Mathieu Bourgey, Anne Marinier, Yoshua Bengio
GEM Workshop, ICLR 2024
arXiv / Poster
We use PPI modulation task based reward models to train GFlowNets to address the combinatorially challenging task of generating DNA Encoded Libraries (DELs)

Experience

	Sep 2023 - Present Graduate Student Researcher Advisor - Dr. Anirudh Goyal and Prof. Yoshua Bengio Sep 2021 - July 2023 Research Intern Advisors - Dr. Anirudh Goyal and Prof. Yoshua Bengio
	Jan 2022 - Aug 2022 Undergraduate Researcher Advisor - Prof. Ashwin Srinivasan Jan 2021 - Aug 2021 Undergraduate Researcher Advisor - Prof. Tanmay Verlekar
	May 2021 - Sep 2021 Research Intern Advisor - Dr. Gautam Shroff
	May 2021 - Aug 2021 Student Developer Organisation - GFOSS (Suborganisation - OpenDR)
	Aug. 2020 - Nov. 2020 Research Intern Advisor - Prof. GC Nandi

Original Templates: 1,2