Author ORCID Identifier
https://orcid.org/0000-0001-9520-9584
Date of Award
4-11-2024
Document Type
Thesis
School
School of Computing
Programme
Ph.D.-Doctoral of Philosophy
First Advisor
Dr.B.Santhi
Keywords
Impaired Speech Recognition, Speech Assistive tool, Neurological disorder, Machine learning, Deep learning
Abstract
Speech Assistive Tools have emerged in recent years to support individuals with cognitive and neurological disorders in the field of assistive technology. People affected by neurological disorders such as autism, stroke, cerebral palsy, dysarthria, Parkinson’s disease, and brain injury often find it difficult to articulate desired sounds, resulting in impaired speech. As the population of impaired speakers continues to increase every year, there is a strong need to develop intelligent speech recognition systems for affected individuals. The primary objective of this research is to develop an Impaired Speech Recognition (ISR) system for the Tamil language. Word Recognition Accuracy (WRA) is used as the performance metric, and a new dataset called the Impaired Speech Corpus in Tamil is created using speech samples collected from individuals with varying neurological disorders and intelligibility levels.
The proposed ISR system incorporates a Deep Neural Network–Hidden Markov Model (DNN-HMM) framework trained using the Lattice Free Maximum Mutual Information (LF-MMI) approach for effective recognition of impaired Tamil speech. Training and testing samples are collected from speakers with high, medium, low, and very low intelligibility levels. The recognition performance is evaluated and compared with baseline approaches using two datasets: a 20-word acoustically similar word dataset and a 50-word Impaired Speech Corpus in Tamil.
To address noisy, incomplete, and severely degraded impaired speech samples, an Enhancement Generative Adversarial Network (EGAN) is proposed for waveform enhancement. This approach improves the quality of impaired speech utterances and leads to better recognition performance on both the Tamil impaired speech datasets and the Universal Access benchmark database. The enhanced speech signals contribute to improved robustness and accuracy in impaired speech recognition.
Learning compact and efficient representations for disordered speech is challenging due to limited availability of impaired speech data. To overcome this issue, a novel sequence-to-vector representation based on HMM state sequences (HMM-SS) is proposed. This compact representation performs effectively with small datasets and is evaluated using four datasets: 50 words from TORGO, 100 common words from UA-SPEECH, 50 help-seeking words, and 100 common words from the Tamil impaired speech corpus. The proposed approach consistently outperforms baseline HMM, DNN-HMM, and state-of-the-art methods.
Finally, self-supervised and spectrogram-based approaches are explored to further improve impaired speech recognition. A Self-Supervised Learning (SSL) based Wav2Word framework using the wav2vec 2.0 encoder is proposed and evaluated on Tamil and English impaired speech datasets, achieving superior performance over conventional methods. In addition, a Denoising Convolutional Autoencoder (DCAE) is introduced to enhance spectrogram representations prior to CNN-based recognition. The proposed DCAE approach achieves significant performance improvements, with a maximum Word Recognition Accuracy of 96.07% on the Impaired Speech Corpus in Tamil, demonstrating its effectiveness for rehabilitation-oriented assistive technologies.
Recommended Citation
S, Vishnika Veni Ms, "Impaired Speech Recognition of Neurological Disorder Persons Using Machine Learning and Deep Learning Techniques" (2024). Theses and Dissertations. 156.
https://knowledgeconnect.sastra.edu/theses/156