Course Outline

 

01/08/07  (Murphy) Introduction to multi-stage decision making: reinforcement learning compared to supervised learning,  to classical dynamic programming and to approximate dynamic programming.  Reinforcement learning & constructing dynamic treatment regimes in health applications.   Chapter 2 in Handbook of learning and approximate dynamic programming, edited by J. Si, A. Barto, W. Powell, D. Wunsch, IEEE Press (2004).  (you have the password).

 

01/15/07  No class –Martin Luther King day—reschedule to 01/17/07

 

01/17/07 in room MH2449 (Eric Laber) Chapter 5 in G. Parmigiani, Modeling in Medical Decision Making: A Bayesian Approach, 2002, Wiley  (you have the password)

&

(Ying Ding) Evaluating multiple treatment courses in clinical trials by P. Thall, R. Millikan and H. Sung, 2000 in Statistics in Medicine, vol. 19, pg. 1011-1028    (you have the password). Ying presentation

 

01/22/07 (Danny Almirall) Selecting Therapeutic Strategies Based on Efficacy and Death in Multi-Course Clinical Trials by P. Thall, H. Sung and E. Estey, 2002 in Journal of the American Statistical Association, vol 97, pg 29-39.  (you have the password).

 

Reinforcement Learning

 

01/29/07 (Murphy) Chapter 13, T. Mitchell, Machine Learning, 1997, WCB/McGraw-Hill; if desired also use Reinforcement Learning: An Introduction by R.S. Sutton and A.G. Barto 1998, MIT Press.  &

 (Murphy) Intro to Markov Decision Processes and Q-learning (Ch. 6 in  Sutton and Barto, Reinforcement Learning)

 

02/05/07 (Dan Ruan) Remainder of Ch. 6 in Sutton and Barto including SARSA & generalized policy iteration and if possible eligibility traces from ch. 7.  Dan presentation

 

02/12/07 (Murphy) Review

 

02/19/07 (Min Qian) Benjamin van Roy’s chapter on Neuro-dynamic Programming, Min presentation

 

02/26/07 Spring Break.

03/05/07 (Mark Kliger) Least-Squares Policy Iteration by Michail G. Lagoudakis, Ronald Parr, JMLR, 4(Dec):1107-1149, 2003. http://jmlr.csail.mit.edu/papers/volume4/lagoudakis03a/lagoudakis03a.pdf

 

 

03/12/07 Susan is out of town.

 

03/19/07 Review of homework.

 

03/26/07 (Ou Zhao) Kernel-based Reinforcement Learning by Dirk Ormoneit and Saunak Sen, Machine Learning, 49, pg. 161-178, 2002.  (you have the password).  Ou Presentation

 

Connections to Causal Inference

 

04/02/07 Susan is out of town; reschedule to 04/04/07

 

04/04/07 in MLB B131 (Murphy) Causal Inference Introduction; (Bibhas Chakraborty) Bias Correction in Non-Differentiable Estimating Equations for Optimal Dynamic Treatment Regimes  by Erica Moodie

 

Brief return to Reinfocement Learning:

 04/09/07 (Ali Shojaie) Tree-Based Batch Mode Reinforcement Learning by Damien Ernst, Pierre Guerts and Louie Wehenkel, JMLR, 6(2005)  503-556.

 

04/11/07 in MLB B131 (Bodhi Sen) Optimal Structural Nested Models for Optimal Sequential Decisions by James Robins with corrections.

 

04/16/07  Overflow and ending discussion.