Date of Award
16-4-2024
Document Type
Thesis
School
School of Computing
Programme
Ph.D.-Doctoral of Philosophy
First Advisor
Dr.R.Elakkiya
Keywords
Deep Learning, Sign Language, Video Generation, Geneative Adversarial Networks, Neural Machine Translation
Abstract
This dissertation presents a deep neural network based sign language video generation framework for translating the multilingual sentences into sign videos. This thesis addresses the challenges persist with the sign language video generation such as (i) Handling longer sequences of input sentences and new words (ii) Pose estimation with higher accuracy (iii) High quality photo realistic sign gesture video generation (iv) Improving realism in sign video generation. Hence, the thesis focuses four contributions to address the above issues.
The first contribution of this thesis automates the translation of multilingual sentences into sign glosses without manual intervention by incorporating Hybrid Neural Machine Translation and Attention Mechanism. To handle the issues, deep stacked GRU approach is introduced, and attention mechanism is incorporated for producing accurate translation results.
The second contribution develops a Dynamic GAN framework to generate cost effective photo-realistic high quality sign videos to serve the impaired community. To generate sign videos, conditional GAN approach is introduced and incorporation of pixel normalization, de-blurring and video completion approaches further facilitates high quality video generation.
The third contribution develops the end to end framework for sign gesture synthesis to attain high realism by combining the basic NLP techniques for translating the sentences into sign glosses. The proposed VidGenGAN model generates sign videos using deep stacked GRU approaches.
Finally, this thesis has thoroughly assessed and conducted both subjective and quantitative experiments using real-time signing videos obtained from various corpora and diverse sign language datasets such as RWTH-PHOENIX-Weather 2014T dataset for German Sign Language, and self-created dataset ISL-CSLTR for Indian Sign Language and How2Sign Dataset for American Sign Language. Also, it is proved that the system achieves plausible results over video generation tasks and produces high quality sign videos from the spoken language sentences.
Recommended Citation
B, Natarajan Mr, "Development of Deep Neural Architecture for Continuous Sign Language Video Generation" (2024). Theses and Dissertations. 63.
https://knowledgeconnect.sastra.edu/theses/63