STATISTICS COLLOQUIUM
Joint colloquium with UConn Dept. of Operations and Information Management
Michael N. Katehakis
Professor and Chair
Department of Management Science and Information Systems
Rutgers University
On the Asymptotically Optimal Behavior of Sequential
Allocation Policies
ABSTRACT
Consider the problem of sampling sequentially from a finite number of N ≥ 2 populations or ‘bandits’, where each population i is specified by a sequence of iid random variables representing the reward received the every time a population is sampled. For any slowly increasing function g, subject to mild regularity constraints, we construct two policies (the g-Forcing, and the g-Inflated Sample Mean) that achieve a measure of regret of order O(g(n)) almost surely as n → ∞. In our constructions, the function g effectively controls the ‘exploration’ of the classical ‘exploration/exploitation’ tradeoff.
When additional parametric assumptions can be made, one can construct policies that are asymptotically optimal in the sense of achieving the lower bound on the logarithmic rate of increase of the regret of Burnetas and Katehakis (1996). We present such asymptotically optimal policies for the cases in which the iid random variables are Normal with unknown means and unknown variances and Uniform with unknown supports.
Finally we present asymptotically optimal policies for case in which the iid variables are of unknown, not necessarily finite, means and support, such as in the case of Pareto Bandits.
DATE: Friday, September 18, 2015
TIME: 10:30 a.m. -12:00 p.m.
PLACE: Business School, Rm. 106
For more information, contact: Tracy Burke at tracy.burke@uconn.edu