Randomized Controlled Trials via Reinforcement Learning from Human Feedback

Kweku A. Opoku-Agyemang

Working Paper Class 12

This paper studies the problem of designing and evaluating interventions for policy decisions using reinforcement learning with human feedback (RLHF) and randomized controlled trials (RCTs). We propose a two-stage framework, where we first use RLHF to generate a set of candidate interventions based on human preferences and feedback, and then use RCTs to compare their effectiveness against a control group. We analyze the regret of this framework, which measures the difference between the expected outcome of the optimal intervention and the expected outcome of the chosen intervention. We derive a regret bound that depends on the number of interventions generated by RLHF, the sample complexity and variance of each intervention, the horizon and discount factor of the RCT phase, and the approximation error of the RCT algorithm. We also discuss some practical challenges and potential solutions for implementing this framework in real-world settings. Our results provide novel insights and guidance for combining RLHF and RCTs for policy evaluation and optimization.

The views in this Working Paper Class are those of the authors, not necessarily of Machine Learning X Doing.

Opoku-Agyemang, Kweku A. (2023). "Randomized Controlled Trials via Reinforcement Learning from Human Feedback." Machine Learning X Doing Working Paper Class 12. Machine Learning X Doing.

Copyright © 2023 Machine Learning X Doing Incorporated. All Rights Reserved.