Publications

Please see my resume for a more up-to-date list.

PUBLICATIONS

Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization

Noman Razin, Sadhika Malladi, Adithya Bhaskar, Danqi Chen, Sanjeev Arora, and Boris Hanin

ICLR 2025

Sometimes, preference optimization leads to the reduction in the likelihood of the preferred responses. We shed light on this curious phenomenon.

Finding Transformer Circuits with Edge Pruning

Adithya Bhaskar, Alexander Wettig, Dan Friedman, and Danqi Chen

NeurIPS 2024 (Spotlight)

[paper] [code]

A faster and more precise circuit-finding method that also scales to multi-billion parameter models.

The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models

Adithya Bhaskar, Dan Friedman, and Danqi Chen

ACL 2024 (Oral)

[paper] [code]

Structured pruning reveals surprising insights about how Pretrained LMs generalize.

Benchmarking and Improving Text-to-SQL Generation Under Ambiguity

Adithya Bhaskar*, Tushar Tomar*, Ashutosh Sathe, and Sunita Sarawagi

EMNLP 2023 (Main)

[paper] [code]

Current Text-to-SQL conversion systems fall flat on their face when faced with ambiguity. We demonstrate this by introducing a new benchmark (AmbiQT), then propose a novel method improving coverage by up to 2.5x.

Prompted Opinion Summarization with GPT-3.5

Adithya Bhaskar, Alex Fabbri and Greg Durrett

ACL 2023 (Findings)

[paper] [code]

Novel evaluation metrics for summarization in the GPT-3.5 era.

Performance Bounds for LASSO under Multiplicative Noise: Applications to Pooled RT-PCR Testing

Richeek Das, Aaron Jerry Ninan, Adithya Bhaskar and Ajit Rajwade

Signal Processing, Vol. 214, January 2024

[paper]

Performance bounds for Group Testing of, e.g., COVID-19.

PREPRINTS

Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?

Adithya Bhaskar*, Alexander Wettig*, Tianyu Gao, Yihe Dong, and Danqi Chen

arXiv preprint, arXiv:2506.17121

[paper] [code]

How should we compare various KV compression methods? The answer turns out to be trickier than one thinks. We also introduce our own method, PruLong.

Continual Memorization of Factoids in Language Models

Howard Chen, Jiayi Geng, Adithya Bhaskar, Dan Friedman, and Danqi Chen

arXiv preprint, arXiv:2411.01715

[paper] [code]

Finetuning LMs on facts makes them forget older facts. Surprisingly, mixing in generic data when finetuning prevents forgetting.

Improving Language Understanding from Screenshots

Tianyu Gao, Zirui Wang, Adithya Bhaskar, and Danqi Chen

arXiv preprint, arXiv:2402.14073

[paper] [code]

Multimodal Language Models can't read well. We introduce a novel patch-and-text loss to remedy that.

FOR COMPLETE PUBLICATION LIST AND CITATION METRICS, PLEASE SEE CV