اختيار الموقع            تسجيل دخول
 

تسجيل دخول للنظام
  كود المستخدم
  كلمة السر
نسيت كلمة السر؟
دوريات النشر الإلكتروني



هندسة اللغة:
 هندسة اللغة:
  تفاصيل البحث
 
[9002017.] رقم البحث : 9002017 -
Novel Image Preprocessing Approach for Automatic Speech Recognition /
تخصص البحث : Speech Processing, Recognition and Synthesis
  هندسة اللغة:
  Amr M. Gody ( amg00@fayoum.edu.eg - ) - مؤلف رئيسي
  Yossra A. Emam ( eng.ussraemam@yahoo.com - )
  Nashaat M. Hussein ( Nmh01@fayoum.edu.eg - )
  English Phone Recognition, Automatic Speech recognition (ASR), Mel-Scale, DCT, Wavelet packets, HTK, BTE and MFCC.
  Abstract: This research is intending to provide a novel approach of manipulating automatic speech recognition using image recognition approach. This research introduces hybrid 2D-Image-Hidden Markov Model(2DI)-(HMM) approach to handle preprocessing classification task in Automatic Speech Recognition System (ASR). The focus in this research is in the classification task. Due to that the proposed approach is novel and is a task in the whole ASR, it is evaluated using relative comparison to other popular approaches to run the same task on the same database. The relative comparison with hybrid Gaussian Mixture (GMM)-HMM with Mel Frequency Cepstral (MFCC) features is considered as reference results. This research introduces a new method of mapping speech signal into two-dimensional space. Speech stream is segmented and then the frequency contents are projected into frequency domain using a balanced tree structure filter. The wavelet packets technique is used to implement the filtering. The tree structure is captured into image. Database is constructed of encoded images. The images then are segregated into speech classes. Hybrid Discrete Cosine Transform (DCT) based features are used for image encoding with (HMM) as Class model is evaluated against MFCC-HMM for the same classification problem. The proposed hybrid model indicates better balanced results over MFCC-HMM for handling the different classes. The considered classes in this research are vowels, consonants, plosives and speech silence.
KED-TIMIT Corpus is used in this research as source of speech information. This approach is indicating promising results especially in Silence and vowels detection.
  Download Paper


 







Powered by Future Library Software.All rights reserved © CITC - Mansoura University. Sponsored by Mansoura University Privacy Policy