tattle-made / feluda Goto Github PK

View Code? Open in Web Editor NEW

10.0 7.0 14.0 19.77 MB

A configurable engine for analysing multi-lingual and multi-modal content.

Home Page: https://tattle.co.in/products/feluda/

License: GNU General Public License v3.0

Python 97.33% Dockerfile 1.41% Shell 1.26%

search multilingual multimodal deep-learning phash indian-languages

feluda's People

Contributors

Stargazers

Watchers

Forkers

mohit61 kmjoshi scottrogowski elgohr-update aatmanvaidya duggalsu danni3-256 dennyabrain shashankss1205 riyadhami snehil-shah jyotikajayani-08 kes-edge chaithanya512

feluda's Issues

incorporate tesseract OCR as an operator

Write the Operator
Write an test for the operator

Clustering large amount of videos

@Snehil-Shah has been selected as part of DMP2024 to work on #81. He will work on the following features

Features

Important Dates

Date	Milestone
10th June	Coding Period Kickoff
30th June
31st July
31st August
11th September	End of Project

Run end-to-end clustering on custom Indian videos dataset

@Snehil-Shah lets use this issue to document all the work you are doing on running a complete end-to-end clustering approach on the custom Indian videos dataset.

Can you reply to the issue so that I can assign it to you!

[May 12 - May 25] Review DMP proposals for Feluda

Use Video Clustering for grouping similar media items

Reduce RAM Usage of the Audio Operator for large audio files

Operators are core components of Feluda, they are modules that help us analyse media items.

One such operator is the Audio Vec Embedding, it takes in a audio file as input and converts it to a vector of 2048 dimension.
The operator uses a pretrained CNN model for the conversion of a file to a vector dimension. All the code for how the operator works can be found in the src/core/operators/audio-cnn-model/ folder.

When processing large audio files, the RAM usage of the operator is significantly high, here are some profiling results for the operator. For a 5 min audio clip roughly 2 GB of RAM is getting used.

The task is to reduce the RAM usage and ensure efficient processing of large audio files.

This is an open ended issue and analysis of the audio-cnn-model will have to be done to figure out what is causing this issue.
Another solution to think about could be smart sampling of the audio, instead of loading the entire audio file, can we select key frames and then let the operator process it.
One thing to remember is that, in the process of reducing RAM usage, the search results should not decrease.

[DMP 2024]: Clustering large amount of audio

Ticket Contents

Description

Feluda allows researchers, factcheckers and journalists to explore and analyze large quantity of multimeda content. One important modality on Indian social media is audio. The scope of this task is to explore various automated techniques suited for this grouping similar audio together and visualizing them. After consultation with the team, implement an end to end workflow that can be used to surface visual or temporal trends in a large collection of audio.

Goals

Review Literature with our team and do research and prototyping to review state of the art ML and classical DSP techniques
Optimize the solution for consistent RAM and CPU usage (limit the spikes caused by variables like file size, video length etc) since it will need to scale up for million videos.
Integrate the solution into Feluda by creating a operator that adheres to Feluda operator's interface

Expected Outcome

Feluda's goal is to provide a simple CLI or scriptable interface for Analysing multimodal social media data. In that vein, all the work that you do should be executable and configurable via scripts and config files. The solution should look at feluda's architecture and its various components to identify best ways to enable this.
The solution should have a way to configure data source (database with file IDs or a S3 bucket with files), specify and implement the data processing pipeline and where the result will be stored. Our current implementation uses S3 and SQL database for data source and Elasticsearch for storing result but additional sources or stores can be added if apt for this project.

Acceptance Criteria

Regular Interactive Demos with the team using a public jupyter notebook pushed to our experiments repository
Working feluda operator with tests that can be run as an independent worker in the cloud to schedule processing jobs over a large dataset
Output Structured data that can be passed onto a UI service (web or mobile) for downstream use cases

Implementation Details

One way we have approached this is by using Vector Embeddings. We have done this to great success to surface visual trends in Images. We used ResNet model to generate vector embeddings and store them in elasticsearch. We also used t-sne to reduce the dimensions of the vector embeddings to then display them in a 2D visualization. It can be viewed here
A detailed report over feluda's usage in a project to analyze images can be read here
The relevant feluda operator can be studied here
The code for tsne is here
A prior study of various ways to get insights out of images has been documented here

Mockups/Wireframes

This is an interactive visualization of Image clustering done using Feluda.

Doing UI development or integrating with any UI software is not part of this project but it might help to see what sort of downstream applications we use Feluda for.

Product Name

Feluda

Organisation Name

Tattle

Domain

Open Source Library

Tech Skills Needed

Machine Learning, Python

Mentor(s)

@dennyabrain @duggalsu

Search text within videos

If we work under the assumption that the keyframe extraction function does a good job extracting important keyframes, its likely that these keyframes might have text in them. So running them through the text extractor and storing these in the index would make videos very searchable.

We Could also try some version of audio to text on these videos to get textual metadata from these videos.

Operator to detect language in an Audio file

Operators are core components of Feluda, they are modules that help us analyse media items - text, image, video, audio etc. All the code for operators can be found at src/core/operators

The task is to create a Operator that can detect language in a Audio media item. This is an open ended issue, you will have to investigate on what methods to do this are best and effective.
For now the operator should be able to identify/detect the English, Hindi, Telugu and Tamil languages.

Expected Outcome

Write an operator, profile it for RAM and CPU usage. A key outcome should be that the operator should have stabilised RAM usage. Some methods on profiling have been documented on the Wiki here.
Write a test for the operator
Document the operator on the Wiki

operator to detect objects using YOLOv8-segment

write an operator using the YOLOv8
write an test for the operator

Retrieve portions containing non consensual sexual imagery in a video

Overview

RATI (Rights. Action. Technology. Inclusion.) Foundation is based out of Mumbai and works to address the issue of violence against children & women both in onground & online spaces.

To create safe online spaces, they report problematic content to Social Media Platforms like Meta, Snapchat etc. This work involves reviewing large amount of videos on the internet that might contain intimate content. Since this work can be stressful and traumatic, we want to evaluate the feasibility of building an automated solution that reduces the manual labour needed to do this.

Please note that the goal of the task is not to give a classification labels like nsfw or porn to an entire video. Instead the goal is to detect portions where non consensual sexual imagery might be present so that it can be reviewed by humans.

The scope of this task would be to

Review literature about Computer vision and Machine Learning techniques that would be suitable for this task
Evaluate off the shelf pre-trained models or available FOSS software projects
Find responsible and ethical ways of sourcing data (if needed) for this
Present your work to RATI foundation to get feedback on concerns around victim privacy and safety in your work
Integrate the tool as an operator in Feluda

Safe Practices

Because of the nature of media involved in this work, we recommend only starting your work after discussing your approach with us and RATI. Feel free to comment on the issue to schedule an introductory call with us.

Potential Impact

Social media platforms rely on take down requests from users and organizations like RATI to ensure their platforms are safe. The amount of user generated content being uploaded and shared on the internet is increasingly exponentially. Manual review of this content is one of the bottlenecks to scaling up any efforts to take down this content. Having an open source solution that can be shown to use technology and community for the use case of RATI would setup a good precedent in developing community managed FOSS software for online safety.

Domains

Online Trust and Safety, Content Moderation, Machine Learning, Social Science

Try out Embedding models and evaluate clustering

Try out ResNet, CLIP, ViT, VideoMAE (or something you like) and use tsne (or other approaches) to evaluate clustering visually. You can do this on a jupyter notebook and show results. Use an publicly available dataset. Evaluate if any of these models can be fine tuned

Create a benchmark dataset of Audio Deepfakes

Goal

To create a benchmark dataset for audio files to assist evaluation of deepfake detection tools.

Overview

During the first quarter of launch of DAU, a trend that has emerged is the presence of various manipulation techniques in audio content. This also includes video files whose audio is manipulated. As such being able to reliably identify manipulated portions of an audio file is essential. The manipulation techniques noted so far are

Splicing in synthetically generated media in a natural audio recording
Overdubbing a video with mimicry (by a human and hence no synthetic media)
Use of tools like eleven labs to generate synthetic media in a celebrity's voice using text

While work is underway to create techniques that can detect the various types of manipulation technique used in an audio file received by the DAU, we want to create a standard benchmark dataset of audio files. The goal with this dataset is to be a useful tool in evaluating performance of various proprietary and open source tools that we might use in the project.

Working Definitions

To avoid confusion, we will use the following definitions while working on this issue :

Natural Audio : Recording of a person made using a microphone and saved in a digital file
Synthetic Audio : An audio generated from scratch using techniques like Generative AI and consumer apps like midjourney, canva etc
Audio Efffects : This could be the application of any DSP technique like stretching, slowing down on a natural audio file

Scope of the task

List about 10-15 public figures split into language, accent and gender.
get their audio recording from publicly available repositories like youtube.
strip the audio and generate different versions of the audio e.g. single sentence, long speech, monologue. where applicable.
Automatically generate transcript of their speech.
Convert the transcripts back to synthetic data using open models and proprietary models. The dataset will include a column to mark how the synthetic media was generated.

Deliverable

An open dataset with the following columns

Name of the celebrity
Language being spoken in the audio
Gender
Quality of the audio
Natural or Synthetic
if Synthetic, tool used

Approach

Lets plan to work on this collaboratively. We can discuss :

which celebrity's data we are working on
which transcription tool we are using;
which tool are we using to generate synthetic audio

Having a mix of techniques and transcription tools shouldn't hurt. But it would be nice if we keep sharing our progress here so we're not solving problems that we have a working solution for.

Feluda participation in DMP2024

Features

#270

Write documentation for the Tesseract Operator

[Apr 28 - May 11] Engage with contributors

Help the contributors identify problem statement and their chosen approach. Setup a regular way to interact with them

Clustering videos using vector-similarity

Related to #81

Description

@dennyabrain I tried clustering around 300 videos (from this dataset) using algorithms from your experiment's repo.

Google colab notebook

I first used your approach of taking 5 frames of a video, extracting their features using the RESNET model and taking their average to generate the final embedding. And then using your approach of t-SNE reduction, plotted the thumbnails on a graph:

Observations listed in the notebook

I will be doing some R&D on some other ways to extract features from videos and using different models in our current approach as well (like CLIP which I have used before).

I will be now be working on setting up feluda and studying how feluda operators work etc. Would appreciate some directions...

[April 14 - April 27] Engage with contributors

Help the contributors identify problem statement and their chosen approach. Setup a regular way to interacti with them

Release Blog Post on Deploying Tattle Search To K8

Video Operator should process video of any length and size

A core component of Feluda are Operators , they are modules that help us analyse media items such a text, image, video, audio etc. One key operator is the video operator which takes in a video file as input and generates the vector representation of the file in output.

Link to the Video Vec Operator - https://github.com/tattle-made/feluda/blob/main/src/core/operators/vid_vec_rep_resnet.py

Currently, The video operator has a size limit of the videos it can process. The task is to modify the operator so that it can process large video files. When we say video of any length and size should be processed, what we mean is that the RAM usage should be constant and not increase linearly with video size/length

Expected Outcome

Refactor the video operator to support large files.
Profile the modified operator for CPU and RAM usage against a variety of large video files (15 mins to 1 hour). Some methods on profiling have been documented on the Wiki here.
Write a unit test for handling of large video files.

Files to look at

Video Operator - feluda/src/core/operators/vid_vec_rep_resnet.py at main · tattle-made/feluda
- some parts in the code to start looking at are here and here
Video Operator Test file - feluda/src/core/operators/test_vid_vec_rep_resnet.p

Evaluate similarity or clustering algorithms for audio files

Improve RAM Usage for the operator. see this issue for reference - #321
Evaluate similarity or clustering algorithms for audio
Evaluate sampling strategy help improve performance.

Evaluate similarity or clustering algorithms for video files

Evaluate similarity or clustering algorithms for video frames. Finalize Sampling Strategy for videos.

Tesseract Worker with support Hindi, English, Tamil, Telugu

Feluda has a operator that extracts text from images (OCR) using Tesseract.

The operator can be found here - detect_text_in_image_tesseract - https://github.com/tattle-made/feluda/blob/main/src/core/operators/detect_text_in_image_tesseract.py

Currently the operators supports extraction of text in English and Hindi language.

The task :-

Add support for Tamil and Telugu languages to the operator
Write a Worker for the Tesseract operator

What is a worker?

Workers are used to deploy operators at scale. They bring all the core parts of Feluda like Store, Queue and Operators together. Worker picks up a media file from Queue and uses Operators to analyse that media file. It then stores the analyzed data (output of operators in vector form) in Elasticsearch and sends a report containing the results back to the Queue.

A detailed documentation of the workers can be found here

All the worker files can be found at src/worker/ folder.

Expected Outcome

Modify the Tesseract Operator to support Tamil and Telugu.
Write worker for the Tesseract operator
- Create necessary Dockerfiles
- Create worker and worker payload writer files
Document the Worker on the Wiki

This issue can also be split into 2 issues depending on the time needed to be spent on this.

fix: known bugs and issues

in the Elasticsearch test - test_es_vec.py - fix es.refresh() and es.delete_indices()
in the Dockefile we are installing using requirements.txt, which assumes someone's already generated it. this requirements.in -> requirements.txt transformation should happen in the Dockerfile too
in test_index_api_as_client.py - fix when POST request for image and video when multipart/form-data is sent. Do not send the header for this request. fix the fromRequestPayload()function
in test_index_api_as_client.py - the testIndexVideo() is passing the test, but not returning an ID
in test_search_api_as_client.py fix testIndexVideo()
move all the tests to a single folder

Extraction of unique keyframes from video

Ticket Contents

Related to #81
@dennyabrain
I was exploring the ways to extract unique frames from the video. I tried to extract unique keyframes from some videos using ffmpeg - to extract keyframes from the video and k-means- to extract unique keyframes from keyframes extracted by ffmpeg, and here are the results :-

(We can select one image from each cluster, as the representation of that cluster, then further we can use some image captioning models to generate small captions for each image. Next we can combine all captions to generate the final caption for the video or use them to classify the video accurately.)

Google Collab Notebook

Video Link : https://drive.google.com/file/d/1Qr08m4Bf0JjTszExDLoey2LCqcJjJl3n/view?usp=drive_link
Clusters :

Video 2 link : https://drive.google.com/file/d/1QnupjsK7ILQUYrqlPT2pTdTAzoy8Wi-C/view?usp=drive_link
Clusters :

I'll be now working on ways to cluster the images such that it selects the no. of clusters automatically, Please give your reviews and directions for the future work.

Goals

Implementation Details

ffmpeg - to extract keyframes from the video and k-means- to extract unique keyframes from keyframes extracted by ffmpeg

Product Name

Feluda

Organisation Name

Tattle

Domain

⁠Learning & Development

Tech Skills Needed

Computer Vision

Add basic authentication to Feluda's /search endpoint.

All API requests from clients go via Kosh API server and this has worked well for us in the past. When it comes to searching for images or videos, going to feluda via kosh server adds a latency that makes our search feel very slow. our image search engine actually returns results in milliseconds but adding these intermediary nodes between the client and search server causes latency of 3,5 seconds. In our current code for search operations clients talk directly to feluda and the experience is great but this can't be deployed as it would be insecure.

Add support for printing the aggregate data stored in rabbit mq

I would like to quickly see the status of rabbitmq from the feluda node.
This should be added to the debug script here - https://github.com/tattle-made/tattle-api/blob/master/src/api/debug.py

What i would like to do is run python debug.py stat -s queue to be able to get the total number or items enqueued in both the queues. any other useful aggregate status like number of failures, retries etc should also be included here.

rename project to search

Going forward we are using the convention Tattle to name all our services/products etc.
eg Tattle Archive, Tattle Tiktok Scraper etc
This project will be called Tattle Search.

Write Test for Queue [RabbitMQ]

Feluda relies on tests to make sure each component if functioning properly. It help us ensure code reliability and test expected behaviour.

All the tests can be found at src/tests/ folder. The documentation for running the tests can be found on the Wiki here and here

The task is to write a test for RabbitMQ to test the connection and creation of queues.

Expected Outcome

write a test in this folder location - src/tests/queue/
The test should connect to the RabbitMQ and check if all the queues are being created properly.

python profiling for the tesseract operator

do a python profiling for the tesseract operator

look at the memory usage
load testing

how does memory usage (RAM) go up?

profiling

memory profiling

load-testing

Submit Feluda's DPG Application

Relevant Resources:

[DMP 2024]: Clustering large amount of videos

Ticket Contents

Description

Feluda allows researchers, factcheckers and journalists to explore and analyze large quantity of multimeda content. One important modality on Indian social media is video. The scope of this task is to explore various automated techniques suited for this task and after consultation with the team, implement an end to end workflow that can be used to surface visual or temporal trends in a large collection of videos.

Goals

Review Literature with our team and do research and prototyping to review state of the art ML and classical DSP techniques
Optimize the solution for consistent RAM and CPU usage (limit the spikes caused by variables like file size, video length etc) since it will need to scale up for million videos.
Integrate the solution into Feluda by creating a operator that adheres to Feluda operator's interface

Expected Outcome

Acceptance Criteria

Regular Interactive Demos with the team using a public jupyter notebook pushed to our experiments repository
Working feluda operator with tests that can be run as an independent worker in the cloud to schedule processing jobs over a large dataset
Output Structured data that can be passed onto a UI service (web or mobile) for downstream use cases

Implementation Details

Mockups/Wireframes

This is an interactive visualization of Image clustering done using Feluda.

Doing UI development or integrating with any UI software is not part of this project but it might help to see what sort of downstream applications we use Feluda for.

Product Name

Feluda

Organisation Name

Tattle

Domain

Open Source Library

Tech Skills Needed

Computer Vision, Docker, Machine Learning, Performance Improvement, Python

Mentor(s)

@dennyabrain @duggalsu

APIs should support handling with text data via file URL instead of actual string.

The length of text for different media is variable. We should pass around file URLs instead of the text data.
Lets support this for upload APIs for now and then lets use this convention goign forward for all APIs

upload_image does not handle certain pngs

when querying the application locally, the following image: https://tattle-services.s3.ap-south-1.amazonaws.com/115c08f0-d90c-11e9-ba13-838b5376152a

using command:

curl -X POST $LIVE_SERVER/upload_image -H "Content-Type: application/json" -d '{"image_url": "https://tattle-services.s3.ap-south-1.amazonaws.com/115c08f0-d90c-11e9-ba13-838b5376152a"}'

gives the error:

  File "/mnt/d/tattle-api/application.py", line 190, in upload_image                                               
    image_vec = resnet18.extract_feature(image)                                                                    
  File "/mnt/d/tattle-api/analyzer.py", line 67, in extract_feature                                                
    self.to_tensor(self.scaler(img))).unsqueeze(0))                                                                
  File "/home/kjoshi/py36/lib/python3.6/site-packages/torchvision/transforms/transforms.py", line 166, in __call__ 
    return F.normalize(tensor, self.mean, self.std, self.inplace)                                                  
  File "/home/kjoshi/py36/lib/python3.6/site-packages/torchvision/transforms/functional.py", line 217, in normalize
    tensor.sub_(mean[:, None, None]).div_(std[:, None, None])                                                      
RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dimension 0

but it works for other pngs such as: https://png.pngtree.com/element_our/sm/20180327/sm_5aba147bcacf2.png

Clustering large amount of audio

@Chaithanya512 has been selected as a part of the DMP 2024 program to work on #82

Features

#361
#360

Important Dates

Date	Milestone
10th June	Coding Period Kickoff
30th June
31st July
31st August
11th September	End of Project

Below are some inspirations as the final outcome of the project

[DMP 2024]: K-means clustering of chroma vectors generated from audio data in wav format

This is an Issue raised related to the Issue #82 with a concrete idea based on 3rd approach given. I am Shashank Shekhar Singh, a sophomore at IIT BHU, India having interests in Machine Learning model development and deployment.

Approach Chosen(amongst the 3 mentioned) :

"Recreate our code on a jupyter notebook or google collab notebook
We already have some code that takes audio files and converts them into vectors. We also have code that takes these vectors and clusters them. I would take this approach if you are a software engineer with some ML engineering skills and you know your way around using ML models. Once you get this working on your notebook we can try out different pretrained models to evaluate performance."

Link to my first approach: My Colab Notebook

Background: Reference

Future Trial Work on this approach:

Data pre-processing using various techniques, data augmentation and compatibility for all audio formats including wav, mp3 etc..
Support Vector Machines (SVMs): SVMs can be used for genre classification, artist classification, and mood classification. They work by finding a hyperplane that best separates the data points belonging to different classes.
Deep Neural Networks (DNNs): DNNs are a class of artificial neural networks that have multiple hidden layers. They can be used for a wide variety of tasks, including music genre classification, and music recommendation. Convolutional Neural Networks (CNNs) are a specific type of DNN that are well-suited for tasks that involve analyzing sequential data, such as audio.
Hidden Markov Models (HMMs): HMMs are statistical models that can be used to model sequences of events. They can be used for music genre classification, music segmentation (dividing a song into different sections), and music rhythm analysis.

Please provide me with any feedback related to this development so that I can delve deep into this. @dennyabrain @duggalsu

Improve Resilience

I have noticed 2 issues that prevents feluda from being resilient to errors :

Feluda's API server, indexer and reporter should be scaled up individually without issues. The most useful one here is the indexer. It can take anything from 1 second to 10 second to index an image or video. So being able to spin up multiple instances of an indexer can be very useful to processing large index requests quickly.

When the indexer or reporter lose connectivity with RabbitMQ or elasticsearch, they don't seem to recover well from this. But since they are still running as processes, kubernetes does not register this as a crashing or dead container. This adds the need to manually restart them.

We can build on the work here to audit feluda's performance and identifying bottlenecks
tattle-made/services-infrastructure#6

index and retrive source of the post

Is there a way to filter the response based on author of the post?

lets say we use two services to index media -

a whatsapp bot that scrapes data from whatsapp groups
a web scraper that scrapes data from fact checking websites

on the portal we have a way to differentiate source. since each user/bot has a user id, we are able to (with some granularity) say where a post originated from.

so if lets say index a media item, its doc_id and its user_id, can i query tattle-api like this - send me doc ids of all media that is similar to this image AND that was inded by user_id X?

is this possible in its current iteration? and if not, can we support this?

Move Documentation from Gatsby Docs to Wiki

Some of the older documentation of Feluda is present in a Gatsby website in the docs folder.

Link to older documentation - feluda/docs

Documenting the codebase is always an ongoing task. However, for this issue, here are the specific deliverables:

Transfer all documentation from the Gatsby website to the GitHub Wiki.
- This will include moving all the existing pages from the website to the Wiki. Think of a effective way to structure new things on the Wiki.
Create an Architecture page on the Wiki explaining the core parts of Feluda in brief. Add description for Operators, Store, Queue, Server, Config and Worker.
Update the Home page of the Wiki with a Table of Contents linking to the right pages.
Consider shortening the README.md and moving relevant content to the Wiki. Ensure README.md contains links to the Wiki where necessary.
In the Worker documetation, for each Worker add
- how the input and output looks like
- add their dockerhub image tags

This issue will be worked on jointly with @aatmanvaidya

Operator to detect language in a Video file

Operators are core components of Feluda, they are modules that help us analyse media items - text, image, video, audio etc. All the code for operators can be found at src/core/operators

The task is to create a Operator that can detect language in a Video media item. This is an open ended issue, you will have to investigate on what methods to do this are best and effective.
For now the operator should be able to identify/detect the English, Hindi, Telugu and Tamil languages.

Expected Outcome

Write an operator, profile it for RAM and CPU usage. A key outcome should be that the operator should have stabilised RAM usage. Some methods on profiling have been documented on the Wiki here.
Write a test for the operator
Document the operator on the Wiki

clustering of audio data using feluda's audio-cnn operator

Related to #82

hii @dennyabrain

check out this notebook. I tried clustering of audio data (indic languages) using audio-cnn operator of feluda and also implementation of basic autoencoder architecture for the same. The dataset consists of audio clippings, which are processed and converted from youtube videos using these scripts. The scripts take the youtube video and give the processed .wav file by breaking the entire video into small clips. To represent real-world social media data, I have taken videos from different indic languages such as assamese, hindi, telugu, kannada are taken and also various themes like education, entertainment, motivation, politics, horoscope/devotion, business are represented in the dataset.

dataset: https://drive.google.com/drive/folders/1kzdQrvNs0cG-9wuKv17m41xdXujoaHWh?usp=sharing

thumbnails: https://drive.google.com/drive/folders/1M8HEmZ654kTdmpcY54g27yg-tEkhz9Pr?usp=sharing

The thumbnails folder also contains the trained autoencoder model.

above is the image of tsne clustering of audio data using audio-cnn operator

above is the image of tsne clustering of audio data using autoencoder

I would be happy to hear any feedback and suggestions for improvements. Furthermore, I will now be doing some research on different approaches to obtain transcriptions of the audio as the need for them is high for efficient and thematic clustering. I have found indic subtitler as a good starting point. I would appreciate any directions.

Setup Feluda
Convert video files to audio
Try out Feluda AudioVec (or something you like) and use t-SNE (or other approaches) to evaluate clustering visually
- can do this in a jupyter notebook and show results
Look at other pre-trained models to convert audio files to embeddings.

tattle-made / feluda Goto Github PK

feluda's People

Contributors

Stargazers

Watchers

Forkers

feluda's Issues

Features

Important Dates

The task is to reduce the RAM usage and ensure efficient processing of large audio files.

Ticket Contents

Description

Goals

Expected Outcome

Acceptance Criteria

Implementation Details

Mockups/Wireframes

Product Name

Organisation Name

Domain

Tech Skills Needed

Mentor(s)

Category

Expected Outcome

Overview

Safe Practices

Potential Impact

Domains

Goal

Overview

Working Definitions

Scope of the task

Deliverable

Approach

Features

Description

Expected Outcome

Files to look at

The task :-

What is a worker?

Expected Outcome

Ticket Contents

Goals

Implementation Details

Product Name

Organisation Name

Domain

Tech Skills Needed

Category

Expected Outcome

Ticket Contents

Description

Goals

Expected Outcome

Acceptance Criteria

Implementation Details

Mockups/Wireframes

Product Name

Organisation Name

Domain

Tech Skills Needed

Mentor(s)

Category

Features

Important Dates

Expected Outcome

Scope

Feature

Recommend Projects

Recommend Topics

Recommend Org