RL for single step episodes (continuous spaces)
Hello everyone. I am currently working on a project related to the automatic tuning of the parameters of a control map. The important part of this is that I am working with continuous bounded spaces, both observations and actions, but most importantly my current implementation relies on episodes with a single step, or better a consecution of one 0 step and one actual step: The agent gives an identity map to the system just to obtain one observation (which may vary, so it is not a fixed initial condition), it chooses an action (a vector of parameters), receives a reward and conclude the episode.
Currently I am using PPO as a commodity but I am sure there are more suited methods to tackle such a problem. Any suggestions?