ISSN:2582-5208

www.irjmets.com

Paper Key : IRJ************385
Author: Prof. Runal Pawar,Vaishnav Ghenge,Jayesh Nikumbh,Atharva Nigal,Atharva Tapkir
Date Published: 06 Apr 2024
Abstract
The Video conferencing has emerged as a crucial tool for remote communication and collaboration, facilitating interactions among individuals and organizations across geographical boundaries. However, traditional video conferencing systems often overlook the needs of users with hearing impairments or those operating in noisy environments, leading to accessibility challenges and communication barriers. Real-time subtitle generation presents a promising solution to address these challenges by providing synchronized text captions alongside the video feed, enhancing accessibility and comprehension for all participants. In this paper, we propose a novel approach to real-time subtitle generation for video conferencing applications. Our system leverages automatic speech recognition (ASR) technology to transcribe spoken dialogue into text, which is then synchronized with the corresponding video frames in real-time. The system architecture consists of three main components: audio processing, speech recognition, and subtitle generation. The audio processing module preprocesses the incoming audio stream to remove noise and enhance speech clarity, while the speech recognition module utilizes state-of-the-art deep learning techniques to accurately transcribe the processed audio into text. Finally, the subtitle generation module synchronizes the transcribed text with the video frames to generate real-time subtitles. Overall, our research contributes to the advancement of real-time subtitle generation technology and its integration with video conferencing platforms. By seamlessly integrating ASR with video conferencing systems, our approach offers an efficient and effective solution to enhance accessibility and comprehension for users with hearing impairments or those operating in noisy environments. Future work may involve further optimization of the system for improved accuracy and scalability, as well as exploration of additional features such as multi-language support and speaker identification.
DOI LINK : 10.56726/IRJMETS51897 https://www.doi.org/10.56726/IRJMETS51897
Paper File to download :