Atharva Naik

Intro

I am Atharva Naik a first year PhD (and former MLT) student advised by Prof. Carolyn Rose at the Language Technologies Institute at Carnegie Mellon University. My research interests lie around creating LLM agents that can collaborate with humans for software engineering while facilitating on-the-job learning about code quality, security and efficiency. I'm currently working on a benchmark for version specific best practice recommendation for Python and using code generation capabilities of LLMs for automated phonological reasoning. I recently finished a project where I created CRScore an automated metric for evaluating code review comments in a reference free manner using LLMs and static analysis tools like code smell detectors. In the past, I have published work on data augmentation for code translation, using curriculum learning to improve generalization in code search, and using LLMs to provide support to students for reflection-based learning in collaborative programming activities.

Full CV

Education

Aug 2024 (incoming)	PhD in Language Technologies Carnegie Mellon University
Aug 2022 - May 2024	Masters in Language Technologies Carnegie Mellon University GPA: 4.08/4
July 2018 - May 2022	Bachelor of Technology, Computer Science and Engineering Indian Institute of Technology, Kharagpur GPA: 9.66/10

Experience

Jan 2023	Research Assistant, Carnegie Mellon University
Aug 2021 - Dec 2021	Research Assistant Intern, Technische Universität Darmstadt
Apr 2021 - Sep 2021	Research Assistant Intern, University of Alberta
May 2021 - Aug 2021	Research Intern, Adobe, Bangalore
Feb 2019 - Mar 2020	Undergraduate Student Researcher, Autonomous Ground Vehicle Research Group, IIT Kharagpur

2025

Programming by Examples Meets Historical Linguistics: A Large Language Model Based Approach to Sound Law Induction
Atharva Naik*, Darsh Agrawal*, Manav Kapadnis*, Yuwei An*, Yash Mathur, Carolyn Rose, David Mortensen
Arxiv

Emergent (Mis)Alignment: Exploring the Hidden Link Between Code Security and AI Alignment
Atharva Naik*, Abhinav Rao*, Alex Xie, Anmol Agarwal, Shubham Gandhi, Michael Hilton, Carolyn Rose
Blogpost (WIP)

Programming by Examples Meets Historical Linguistics: A Large Language Model Based Approach to Sound Law Induction
Atharva Naik, Darsh Agrawal, Hong Sng, Clayton Marr, Kexun Zhang, Nathaniel R Robinson, Kalvin Chang, Rebecca Byrnes, Aravind Mysore, Carolyn Rose, David R Mortensen
ACL 2025 (to appear)

CRScore: Grounding the Evaluation of Code Review Comments in Code Claims and Smells
Atharva Naik, Marcus Alenius, Daniel Fried, Carolyn Rosé
NAACL 2025

2024

Providing tailored reflection instructions in collaborative learning using large language models
Atharva Naik, Jessica Ruhan Yin, Anusha Kamath, Qianou Ma, Sherry Tongshuang Wu, R. Charles Murray, Christopher Bogart, Majd Sakr, Carolyn P. Rosé
British Journal of Educational Technology (BJET)

Can Large Language Models Code Like a Linguist?: A Case Study in Low Resource Sound Law Induction
Atharva Naik, Kexun Zhang, Nathaniel Robinson, Aravind Mysore, Clayton Marr, Hong Sng, Rebecca Byrnes, Anna Cai, Kalvin Chang, David Mortensen
Arxiv

On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation
Atharva Naik
Arxiv

Generating Situated Reflection Triggers about Alternative Solution Paths: A Case Study of Generative AI for Computer-Supported Collaborative Learning
Atharva Naik, Jessica Ruhan Yin, Anusha Kamath, Qianou Ma, Sherry Tongshuang Wu, Charles Murray, Christopher Bogart, Majd Sakr, Carolyn P. Rose
🏆 Best Paper Nominee
🏆 Best Student Paper Nominee
AIED (2024)

Tricking LLMs into Disobedience: Understanding, Analyzing, and Preventing Jailbreaks
Abhinav Rao, Atharva Naik* Sachin Vashistha*, Somak Aditya, Monojit Choudhury
LREC-COLLING (2024): 16802-16830

(* - indicates equal contribution)

SkillCLIP: Skill Aware Modality Fusion Visual Question Answering (Student Abstract)
Atharva Naik, Yash Parag Butala, Navaneethan Vaikunthan, Raghav Kapoor
AAAI Student Abstracts (2024): 23592-23593

(* - indicates equal contribution)

2023

Data Augmentation for Code Translation with Comparable Corpora and Multiple References
Yiqing Xie, Atharva Naik, Daniel Fried, Carolyn Rose
EMNLP (2023) Findings

SYNC: A Structurally guided Hard Negative Curriculum for Generalizable Neural Code Search
Atharva Naik, Soumitra Das, Jyothi Vedurada, Somak Aditya
AACL (2023)

Weakly Supervised Explainable Phrasal Reasoning with Neural Fuzzy Logic
Zijun Wu, Zi Xuan Zhang, Atharva Naik, Zhijian Mei, Mauajama Firdaus, Lili Mou
ICLR (2023)

2022

Super-NaturalInstructions: Generalization via declarative instructions on 1600+ nlp tasks
Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei, Anjana Arunkumar, Arjun Ashok, Arut Selvan Dhanasekaran, Atharva Naik, ... , Chitta Baral, Yejin Choi, Noah A. Smith, Hannaneh Hajishirzi, Daniel Khashabi
EMNLP (2022): 5085–5109

Representation Learning for Conversational Data using Discourse Mutual Information Maximization
Bishal Santra, Sumegh Roychowdhury, Aishik Mandal, Vasu Gurram, Atharva Naik, Manish Gupta, Pawan Goyal
NAACL (2022): 1718–1734

Towards Enabling Synchronous Digital Creative Collaboration: Codifying Conflicts in Co-Coloring
Suryateja BV, Jeet Patel, Atharva Naik, Yash Parag Butala, Sristy Sharma, Niyati Chhaya
CHI Extended Abstracts (2022/4/27): 1-7

2021

Understanding the role of affect dimensions in detecting emotions from tweets: a multi-task approach
Rajdeep Mukherjee, Atharva Naik, Sriyash Poddar, Soham Dasgupta, Niloy Ganguly
ACM SIGIR (2021): 2303-2307

Email :

arnaik [at] andrew [dot] cmu [dot] edu