Search In this Thesis
   Search In this Thesis  
العنوان
Improving computer games performance using batch reinforcement learning /
المؤلف
Younes, Hebatullah Rashed Mohammed.
هيئة الاعداد
باحث / هبه الله راشد محمد يونس
مشرف / محمد الحسيني أبوالسعود
مشرف / شاهنده صلاح الدين حسين سرحان
الموضوع
Computer games - Programming. Reinforcement learning.
تاريخ النشر
2015.
عدد الصفحات
82 p. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
Computer Science Applications
تاريخ الإجازة
1/1/2015
مكان الإجازة
جامعة المنصورة - كلية الحاسبات والمعلومات - Computer Science Department.
الفهرس
Only 14 pages are availabe for public view

from 97

from 97

Abstract

Reinforcement learning (RL) is learning what the agent can do and how to map situations to actions in order to maximize the numerical reward signal. Reinforcement learning assists agents to discover which actions yield the most reward and the most punishment after trying them through trial-and-error and delayed reward. Reinforcement learning concentrated more on finding a balance between exploration of anonymous areas and exploitation of his current knowledge. Batch reinforcement learning (BRL) is a subfield of dynamic programming (DP) based reinforcement learning that recently has immensely grown. Batch RL is mainly used, where the complete amount of learning experience; usually a set of transitions sampled from the system is fixed and given a priori. The learning system main concern is to derive an optimal policy out of this given batch of samples. Batch reinforcement learning algorithms aim to achieve the utmost data efficiency through saving experienced data to make an aggregate batch of updates to the learned policy. Since the establishment of Artificial Intelligence (AI), game playing had a main role, because computer games are depending on AI. With the rapid advent of video games recently and the increasing numbers of players and gamers , only a tough game with high policy, actions and tactics survive. How the game responds to opponent actions is the key issue of popular games. Many algorithms were proposed to solve this problem as Least-Squares Policy Iteration (LSPI) and State-Action-Reward-State-Action (SARSA) but they mainly depend on discrete actions, while agents in such a setting has to learn from the consequences of its continuous actions, in order to maximize the total reward over time. So In this research we proposed a new algorithm based on LSPI called Least-Squares Continuous Action Policy Iteration (LSCAPI). The LSCAPI was implemented and tested on two different games 8-queens board game and Glest and StarCraft Brood War real-time strategy (RTS) game.