Take the Policy Gradient Actor-Critic Solution implemented here
[login to view URL]
and modify it to work with the attached CSV file.
The system shall:
1) Load in the input file
2) Attempt to maximize the average reward obtained across N configurable features (in this case examining columns 5-14, and beginning with N=4)
3) Output the step take at each epoch along with the corresponding reward obtained
4) Loop until the algorithm has converged for X epochs (where X is configurable but is initialized to 12)
The system shall utilize numPy, sciPy, and Tensorflow as found in the linked implementation
14 freelancere byder i gennemsnit $156 på dette job
Hello! I am a python developer. I looked at your project and it seems interesting. I have all necessary skills required for this project. Ping me to discuss in detail.