Search In this Thesis
   Search In this Thesis  
العنوان
Building a System for Medical Text Summarization/
المؤلف
Ahmed, AlShimaa Mohamed.
هيئة الاعداد
باحث / الشيماء محمد احمد ابراهيم
مشرف / مصطفى محمد عارف
مشرف / ماركو الفونس
مشرف / عبير محمود محمود
تاريخ النشر
2024.
عدد الصفحات
89p. :
اللغة
الإنجليزية
الدرجة
ماجستير
التخصص
علوم الحاسب الآلي
تاريخ الإجازة
1/1/2024
مكان الإجازة
جامعة عين شمس - كلية الحاسبات والمعلومات - علوم الحاسب
الفهرس
Only 14 pages are availabe for public view

from 89

from 89

Abstract

Medical Text Summarization is the process of condensing and obtaining additional relevant information from medical articles. The number of medical publications is growing daily, especially after the Coronavirus disease (COVID-19) pandemic, and utilizing text summarization techniques can reduce the amount of time required to manually summarize medical papers. Firstly, the study provides an overview of recent publications in medical text summarization from 2018 to 2022. The collection consists of fifteen papers that address various approaches. In addition, the study describes the most recent datasets and evaluation metrics utilized in medical text summarization.
Text summarization and classification are critical techniques for dealing with the massive and ever-increasing volume of text documents published every day. Feature selection (FS) is a significant issue in text summarization and classification. The study provides a summary of the most common FS methods and recent techniques based on FSs used in text summarization and classification. It discusses 13 scientific publications. The FS methods discussed in the study are unsupervised such as correlation and supervised such as filter, wrapper, and embedded.
Nowadays, it is standard procedure to use pre-trained models, like the Bidirectional Encoder Representations from Transformers (BERT)-base model, to overcome medical text summarization challenges. This study introduces a new system for summarizing medical papers based on deep learning techniques. The system combines a token classification Part-of-Speech (POS) tagging with the 𝑥2- Statistic (Chi-square) feature selection technique and uses the feature selection output as input to the pre-training BERT-base model, then applies clustering algorithms for the sentence selection process. The main contribution is that, in comparison to earlier summarization techniques, the proposed model achieved high speed and accuracy. An extensive experiment was run on the randomly chosen open-access dataset from BioMed Central (BMC). The output from the proposed model is less complex and performs well compared to other models that require a large amount of training time. For an evaluation process, the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metric was employed. ROUGE-1 = 0.7611, ROUGE-2 = 0.3205, and ROUGE-L = 0.4544 are the model’s results