Configuration
observations
Thompson Sampling: Maintains posterior distribution for each arm's success rate. Samples from each posterior and picks the arm with highest sample. Balances exploration and exploitation automatically.
Results
Current Step
0
Total Reward
0