Giter Club home page Giter Club logo

multimodal-depression-from-video's Introduction

Adrian Cosma

Fantasies have to be unrealistic because the moment, the second that you get what you seek, you don't, you can't want it anymore. In order to continue to exist, desire must have its objects perpetually absent. It's not the "it" that you want, it's the fantasy of "it". [...] What it means to be fully human is to strive to live by ideas and ideals and not to measure your life by what you've attained in terms of your desires but those small moments of integrity, compassion, rationality, even self-sacrifice. Because in the end, the only way that we can measure the significance of our own lives is by valuing the lives of others. - David Gale, The Life of David Gale

A (non-exhaustive) list of projects I've worked on:

🚢🏻 Gait Analysis 🚢🏻:

πŸ˜Άβ€πŸŒ«οΈ Psychology and Mental Health πŸ˜Άβ€πŸŒ«οΈ:

πŸ“– Romanian Corpora πŸ“–

πŸ”¨ Home-Made Tools πŸ”¨

multimodal-depression-from-video's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

multimodal-depression-from-video's Issues

Implement positional embeddings

We're processing sequences in order. Use nn.Embedding() to map positions (i.e. numbers between 0 and sequence_length) to a vector. Add each vector to each corresponding latent vector.

For hand gestures, add "token_type_ids"

Each hand should have a different embedding to differentiate between them. The embedding (nn.Embedding(...)) should be added to the projected coordinates. Similar to "token_type_ids" in huggingface transformers library.

Add mean predictions to TemporalEvaluator

Evaluating the baseline model with temporal evaluator only takes the final predictions into account.

Include the mean prediction over the whole video in TemporalEvaluator. Add a line to the .csv results file with kind = 'mean'. Add kind = 'last' to the current line.

Implement Temporal Evaluator

Take windows in order, process all of them in order (with perceiver or not?). Take the majority vote or the last prediction.

We can use this to visualize the depression score over time, make nice plots for the paper.

Implement a simple baseline

After each modality is encoded, let's try to implement a simple baseline: just a simple transformer encoder that encodes all modalities and does the classification per window.

Evaluation is majority vote.

correct structure to store the videos

HI @david-gimeno.. I have downloaded all the videos from the D-vlog dataset.. should i split it based on the ids given in the test, train and validation.csv file?? or there is a separate file python3 ./scripts/feature_extraction/dvlog/extract_wavs.py --csv-path ./data/D-vlog/video_ids.csv --column-video-id video_id --video-dir $VIDEO_DIR --dest-dir $WAV_DIR video_id.csv which is missing in the repo?? please help me with this..

Implement Perceiver Architecture

  • Implement Time-Unaware Perceiver Architecture
  • Discuss when modality embeddings are needed
  • Discuss details & Implement Time-Aware Perceiver Architecture

Implement per Modality Embedding

Differentiate between modalities. Use nn.Embedding() to map a modality id to a vector representation. Add the vector to each latent vector in the corresponding modality.

Fix normalization for landmarks

Decide on how to normalize the landmarks.

Use a batch-norm for the last dimension, probably, before applying the projection.

Common Positional Encodings for all modalities

It's kind of wierd that we have different positional encodings for each modality. We should probably have the same ones for all of them. But how do we take into account the different framerates?

Maybe some framerate-aware positional encoding? (i.e. don't use learned positional encodings, but "classical" sin/cos P.E.)

Something like this (Fractional positional encoding): https://opus.bibliothek.uni-augsburg.de/opus4/frontdoor/deliver/index/docId/96742/file/96742.pdf

Repository: https://github.com/philm5/fpe-vtt/blob/master/vtt_transformer.py#L85

The implementation is in tensorflow, we need to convert it to torch (use ChatGPT?).

Experiments before final experiments

  • Exploring different schedulers & learning rates for the baseline model
  • Check if the perceiver is learning something, check if the same batch size works for it
  • Ablation study on number of windows for perceiver, check if it works
  • Small ablation on presence_threshold, maybe lower is better (more data), or maybe higher is better (more quality)
  • Ablation on the modalities. Which combination works better?

while setting variables

Hello, can you please help me with what value I should set for the variable in each file ?? for example PASE_CONFIG=
PASE_CHCK_PATH=
USE_AUTH_TOKEN= what should be the value for this??

I do not know how to remove the issue: Implement Hand Landmarks Projection

As we have 21 landmarks of 5 coordinates for each N frame of each W window when dealing with hand landmarks:

  • Original Shape: (W, N, 2, 21, 5)
  • Input Transformer Shape: (N, embed_size)

So, we need a linear projection layer to transform the dimension of each modality to the embed_size of the Transformer. I am a bit confused how to preserve the distinction between both hands. Because the order in the forward pass would be:

  1. x = linear_proj(x)
  2. x += pos_emb
  3. x += modality_emb
  4. x[:, :, :embed_size//2] += left_hand_type_emb & x[:, :, embed_size//2:] += right_hand_type_emb
  5. x = transformer(x)

The other option is feeding the linear_proj with a reshaped tensor of (W, N, 2, 21*5), but then we have other problems when reshaping to the shape it should be.

This is for next week. See you tomorrow :)

Probably need another dataset

If we want ECIR, we need to add another dataset with some other particularity.

  • Pre-process DAIC-WOZ dataset + Implement specific dataloader
  • Pre-process Extended DAIC-WOZ dataset + Implement specific dataloader
  • Experiments DAIC-WOZ
  • Experiments E-DAIC-WOZ

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.