Experimentation with Endogenously Changing Arms
I build on a single risky arm Poisson bandit environment and explore how the ability to endogenously change the arm affects the incentives for experimentation. More specifically, I assume that the decision maker can choose to improve the risky arm at a fixed cost by switching the bad arm to a good one. Such setup can be applied to analyse economic agents in their adaptation to new opportunities, like new technology, change in career path or hiring a new trainee, when the agents can invest in increasing the likelihood of successful outcome.
As opposed to standard good news Poisson bandits, I find that beliefs may evolve non-monotonically and that the decision maker may prefer to be stuck on a certain belief and invest forever at some intensity conditional on obtaining no news. In the context of adaptation, this means that the agent may keep pursuing the new opportunity forever despite not reaching a success, in contrast to necessarily giving up according to the traditional experimentation models. On a technical front, I show that the resulting value function may violate the smooth pasting, as well as require extra condition on top of traditionally used value matching and smooth pasting to identify a solution.
Download the paper