Author: El-Sokkary,Salma Khaled Ali/ Title: Acceleration of Artificial Neural Networks Using a Hardware Platform /

Search In this Thesis

العنوان

Acceleration of Artificial Neural Networks Using a Hardware Platform /

المؤلف

El-Sokkary,Salma Khaled Ali

هيئة الاعداد

باحث / سلمى خالد على السكرى

مشرف / محمد محمود أحمد طاهر

مناقش / محمد محمود أحمد طاهر

مناقش / محمد واثق علي كامل الخراشي

تاريخ النشر

2023

عدد الصفحات

115p.:

اللغة

الإنجليزية

الدرجة

ماجستير

التخصص

الهندسة الكهربائية والالكترونية

تاريخ الإجازة

1/1/2023

مكان الإجازة

جامعة عين شمس - كلية الهندسة - كهرباء حاسبات

الفهرس

Only 14 pages are availabe for public view

from

159

from

159

Abstract

Convolutional Neural Networks (CNN) integration in a broad spectrum of applications is undeniable. Applications like image recognition and classification, natural language processing, speech recognition, object detection and more are all dependent on CNNs. While CNNs are known to be computationally intensive, accelerating CNN operations is widely researched in the current years.
Internet of Things (IoT) edge computing devices come in various sizes and computational powers. from processors or Field Programmable Gate Arrays (FPGAs) to FPGA System on Chips (SoCs) containing both an FPGA fabric and an Advanced RISC Machine (ARM) processor. A dual platform edge computing device containing both an FPGA fabric and an ARM processor is targeted in this thesis. By dividing the computations between the two platforms, the first contribution of this thesis is that the acceleration of CNNs is done in a flexible way. That is, for the same image classification problem, we designed, implemented, and evaluated a range of acceleration rates each with a corresponding resource utilization. Such that, depending on the available SoC resources for the edge computing device, choose the combination that suits the best. Through twelve different combinations, three division scenarios are introduced: “Image Division”, “Single input channel Filter Division” and “Multi input channel Filter Division”. Each of these scenarios has four different implementation combinations, altering between the FPGA fabric and the ARM processor. The largest acceleration rate of the twelve is 19.73 and the smallest is 3.26 times compared to fully implementing the network on the ARM processor. While fully implementing the design on the FPGA gave a 49.04 acceleration times.
The second contribution of this thesis is that the designs are based on accelerating the CNN on the layer level rather than on the network level. By studying the different CNN layers’ types, we used a modified LeNet-5 CNN as a prototype for a network that has different layer types. It has both a single-channel input convolutional layer and a multi-channel input convolutional layer, maxpooling and dense layers, and finally ReLU and SoftMax activation functions. Using the MNIST dataset as an image classification problem with accuracy 97.54%.