pliang279 / workspace Goto Github PK

View Code? Open in Web Editor NEW

2.0 1.0 1.0 24.96 MB

License: MIT License

workspace's Introduction

workspace

explorations in multisensory and multiaction global neuronal workspace

workspace's People

Contributors

Stargazers

Watchers

Forkers

applexi

workspace's Issues

Description

We want to detect the audio talking exists appeared in the video. We utilize whisper API to implement this.

Additional Information

No response

Description

Need to add video-llama model and predict scripts for video understanding.

Additional Information

No response

[FEAT]: Support wolfram alpha API

Description

We want to support our model with the ability to do mathmatical calculation. We want to utilize the wolfram alpha API to do this.

Additional Information

No response

[FEAT]: Update the readme with PR requirements and brief introduction

Description

We want to update the README with necessary information.

Additional Information

No response

[FEAT]: Support google search API

Description

We want to support Google Search python API to connect our model as part of the interenet.

Additional Information

No response

Description

Add .gitignore to avoid files including checkpoints or other not important files to be uploaded

Additional Information

No response

[FEAT]: Add video slicing code

Description

We want to add code supporting video slicing for Ego4D or other video clipping.

Additional Information

No response

[FEAT]: Flusk for API calling wrap up

Description

Wrap up API calling for google search and mathematica

Additional Information

No response

[FEAT]: Have access to data examples

Description

We want to create a demo based on Ego4D dataset. We plan to cherry pick several video segments as examples in our demo.

Additional Information

No response

[FEAT]: Support Face Detection

Description

We want to detect the face appeared in the frames in our video data. We use API or local running model to do this. Based on the detected face, we want to build emotion classification / gender classification based on detected face.

Additional Information

No response

[FEAT]: Support multimodal model that generate synergistic information that only exists when considering multimodality

Description

One possible synergistic information could be brought due to modality disagreement or Visual QA type things.

Additional Information

No response

[FEAT]: Support Audio Emotion Detection

Description

We want to classify the speaker's emotion based on their audio. We use Hubert model as encoder to do this.

Additional Information

No response

[FEAT]: Wrap Vidoe-LLaMA as API with Flask

Description

Need to expose each processor model as API for later inference

Additional Information

Optimally should write a generic flask script for each model; each model should wrap their prediction functions to be called by the flask script

pliang279 / workspace Goto Github PK

workspace's Introduction

workspace

workspace's People

Contributors

Stargazers

Watchers

Forkers

workspace's Issues

Description

Additional Information

Description

Additional Information

Description

Additional Information

Description

Additional Information

Description

Additional Information

Description

Additional Information

Description

Additional Information

Description

Additional Information

Description

Additional Information

Description

Additional Information

Description

Additional Information

Description

Additional Information

Description

Additional Information

Description

Additional Information

Recommend Projects

Recommend Topics

Recommend Org