Author ORCID Identifier
https://orcid.org/0000-0003-3406-5382
Date of Award
26-11-2024
Document Type
Thesis
School
Srinivasa Ramanujan Centre
Programme
Ph.D.-Doctoral of Philosophy
First Advisor
Dr.D.Narasimhan
Keywords
Deduplication, Audio, Cloud Computing, Machine Learning
Abstract
Cloud computing has become an integral part of modern internet-based services, with users relying heavily on cloud environments as primary storage solutions. However, the exponential growth in data volume presents a challenge (i.e) the proliferation of duplicated content within cloud repositories. Deduplication techniques provide a promising approach to mitigate this issue. This research focuses on detecting redundant audio content within a cloud environment, specifically targeting the sharing of extensive audio files, such as those in Waveform Audio File Format (WAV). The study proposes the Refined Super Subset Identification Algorithm (RSSIA) to efficiently identify redundant content and segments within existing audio files, refining the deduplicated content. Experimental results demonstrate the accuracy of the algorithm in identifying deduplicated files spanned various audio files, validating its effectiveness in real-time environments, and eliminating redundant content. The efficiency of the algorithm finds redundant portions of an audio file spread across the files.
Additionally, ensuring the security of voice-based message sharing over public networks poses a significant challenge. The proposed Multilayer Protection with Deduplication (MLPD) methodology enhances voice message security by employing multiple protection layers. The audio file is converted into numeric values, and the driven values are shuffled, transposed, and byte-swapped. They are converted into a matrix which helps in creating an encrypted audio. The MLPD outperforms existing encryption algorithms like DES and encryption and decryption technique proposed by Jihad Nadir et al [96].
The audio file storage pattern plays a crucial role in the storage. The wav file can be stored in two directions (forward/reverse). This study leverages deduplication to alleviate server burdens by identifying redundant audio content and reversed audio files within existing data. Moreover, as cloud storage popularity surges, maintaining server performance amid escalating data volumes is crucial. The study extends RSSIA features further and effectively detects reversed audio content.
This research endeavors to refine audio deduplication techniques to discern between audio files with identical content but different metadata or attributes. By addressing this challenge, the study aims to optimize server performance and enhance user experience within cloud storage environments, ultimately advancing the efficiency and effectiveness of audio deduplication processes.
Furthermore, it extends the scope to content-based deduplication audio files. When audio files have similar content with different phonetics and are influenced by the mother tongue, the audio files are treated as duplicate files by finding the similarity in the content with the support of the Wav2Vec2 model for transcription and Mel-frequency cepstral coefficients (MFCC) for feature extraction, combined with Dynamic Time Warping (DTW) for similarity measurement. The methods are evaluated through the standard dataset comprising similar audio content. It identifies duplicates and analyses audio data characteristics. The results highlight the potential for these methods in various applications, including digital archiving, content management, and data pre-processing. The research endeavours to refine audio deduplication techniques to find similar content but different metadata or attributes. By addressing this challenge, the study aims to optimise server performance and enhance the user experience within cloud storage environments, ultimately advancing the efficiency and effectiveness of audio deduplication processes.
Recommended Citation
K, Venkatesh Mr, "Identifying Redundant Audio Content over Cloud Environment Using Deduplication Techniques" (2024). Theses and Dissertations. 152.
https://knowledgeconnect.sastra.edu/theses/152