Yichi Zhang
Indiana University Bloomington
Title: Random-walk Debiased Contextual Preference Inference for Large Language Model Evaluation
Abstract: Various large language models, such as ChatGPT, Claude, Llama, and DeepSeek, excel across a wide range of tasks including translation, data analysis, code generation, medical assistance, and reasoning. This rapid proliferation demands rigorous and scalable evaluation of the performance of LLMs across diverse and context-dependent settings. In this talk, to address this need, we study the statistical inference for the pairwise comparisons of context-dependent preference score functions across domains. Focusing on the contextual Bradley-Terry-Luce model, we develop a semiparametric efficient estimator that automates the debiased inference through aggregating weighted residual balancing terms across the comparison graph. The efficiency is achieved when the weights are derived from a novel random walk-inspired construction on the comparison graph. Our inference procedure is valid for general score function estimators accommodating the practitioners' need to implement flexible deep learning methods. We further extend the procedure to multiple hypothesis testing using a Gaussian multiplier bootstrap that controls familywise error and to distributional shift via a cross-fitted importance-sampling adjustment for target-domain inference. Numerical studies, including language model evaluations under diverse contexts, demonstrate the accuracy, efficiency, and practical utility of our method. The talk will conclude with an outlook on future directions in trustworthy statistical inference for modern and complex data settings.
Bio: Dr. Yichi Zhang is an Assistant Professor in the Department of Statistics at Indiana University Bloomington. Before that, he was a Postdoctoral Scholar in the Fuqua School of Business and the Department of Biostatistics & Bioinformatics at Duke University. He received his Ph.D. in Statistics from North Carolina State University in 2023. His research focuses on causal learning and uncertainty quantification for complex and high-dimensional data.
Date: Friday, April 3, 2026, 10:00 AM, AUST 313
WebEx link: https://uconn-cmr.webex.com/uconn-cmr/j.php?MTID=mad410104ae7b47204704bad4810eefa9
Coffee will be available at 9:30 AM in the Noether Lounge (AUST 326)
For more information, contact: Yuwen Gu at yuwen.gu@uconn.edu