Author: Al-Basiouny,Eman Mohammad Reda Ahmad/ Title: Robust Visual Tracking by Using Deep Learning /

Search In this Thesis

العنوان

Robust Visual Tracking by Using Deep Learning /

المؤلف

Al-Basiouny,Eman Mohammad Reda Ahmad

هيئة الاعداد

باحث / مان محمد رضا أحمد البسيوني

مشرف / حازم محمود عباس

مناقش / حسن طاهر درة

مناقش / محمود إبراهيم خليل

تاريخ النشر

2023

عدد الصفحات

100p.:

اللغة

الإنجليزية

الدرجة

الدكتوراه

التخصص

هندسة النظم والتحكم

تاريخ الإجازة

1/1/2023

مكان الإجازة

جامعة عين شمس - كلية الهندسة - كهرباء حاسبات

الفهرس

Only 14 pages are availabe for public view

from

128

from

128

Abstract

Deep learning algorithms provide an unprecedented level of visual tracking robustness, but achieving an acceptable performance is still difficult due to the natural, continu- ous changes in the characteristics of foreground and background objects across videos. Among the most influential factors on the robustness of tracking algorithms is the se- lection of network hyperparameters, especially the network architecture and the depth. We constructed two models on an ordinary convolutional neural network (CNN), which consists of feature extraction and binary classifier networks. We integrated a generative adversarial network (GAN) into the CNN to enhance the tracking results through an adversarial learning process performed during the training phase. We used the discrim- inator as a classifier and the generator as a store that produces feature-level data with different appearances by applying masks to the extracted features.
In our first model, we present a competition between a convolutional and multilayer per- ceptron (MLP) generative models with the same depth to determine which architecture is more able to extract the most robust features from the input video frames. The two networks are trained offline via adversarial learning process, then the tracking and online fine-tuning process starts after removing the generator. The experiments showed that MLP generator is more able to extract main features which remain for a long temporal span, but the convolutional generator is more able to extract the spatial features which occur in the individual frames. The two networks showed a strong competition to the state-of-the-art visual trackers.
A robust visual tracking network using a very deep generator (RTDG) is proposed in the second model. In this study, we investigated the role of increasing the number of fully connected (FC) layers in adversarial generative networks and their impact on robustness. We used a very deep FC network with 22 layers as a high-performance generator for the first time. This generator is used via adversarial learning to augment the positive samples to reduce the gap between the hungry deep learning algorithm and the available training data to achieve robust visual tracking. The experiments showed that the proposed framework performed well against state-of-the-art trackers.