الفهرس | Only 14 pages are availabe for public view |
Abstract Deep Reinforcement Learning is one of the most applied machine learning paradigms where software agents learn by interacting with their environment to maximize the received rewards. In practical scenarios, the rewards are usually delayed and therefore it is very difficult for the learning agents to understand what actions cause the rewards. This is known as the credit assignment problem that slows down the learning process. Moreover, it is important to study how the agent performance is affected in case of input observation loss or corruption. To speed up the training process, the presented work develops a novel training framework based on the state of the art asynchronous advantage actor-critic (A3C) algorithm. The training framework has three phases: 1) asynchronous variant of behavioral cloning supervised learning that we develop named (ABC), 2) jointly learning the ABC annealed to the A3C, and 3) learning the A3C for self-improvement. Furthermore, dual view architecture is developed to enhance robustness of the agent in case of input partial loss. In comparison to the standard A3C algorithm, the proposed framework succeeds to achieve a significant training speed improvement up to 2.5X, and the dual view architecture shows a more robust performance in case of partial data loss |