This chatbot basically use idea of RAG, the pipeline has the following steps:
First: It will ask user to input a YouTube video url
Video Processor:
After receive the url, it will download it to the temporal directory using tempfile lib of python.
Then it will convert the video into images (it's also base on the video fps of your setting) and convert
video
audio into text.
Retriever Processor:
This processor will receive the folder path of image frames and audio, and it will index the data.
Second: After processing the video (it might take you some minutes to finish), user now can input text message and
ask about the information in the video.