Summary: I used a Q-learner (reinforcement learning) to find an optimal buy/sell strategy for a given time period. I used several market indicators to train on which included simple moving average, Bollinger Bands, and relative strength index. I configured my strategy learner to be relatively conservative so at times of high uncertainty, the trader pulls out of the market. I compared the results to a benchmark which buys the max allotment of stock on day 1 and holds until the end. In addition, I took the three market indicators and devised a manual strategy to compare the learner to. The results for in-sample and out-of-sample results are shown in figures 1 and 2.
The manual strategy outperformed the strategy learner in in-sample results since it was an over-trained model for that time period. However, it also performed remarkably well for out-of-sample results, something that I was not expecting. I suspect that if I tried both models on other data sets, the strategy learner would perform better on average since, by definition, it is not as over fit to the original in-sample data set. Therefor, it is a better generalized strategy for out-of-sample data sets.