Giter Club home page Giter Club logo

gerardplanella / multilingual_stereotypes Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 10.17 MB

Study on stereotype transfer accross Multilingual Language Models for English, Spanish, French, Greek and Croatian. The emotion profiles for different social groups are obtained from pre-trained XLM-RoBERTa and fine-tuned versions of it.

Python 24.15% Shell 0.92% Jupyter Notebook 74.93%
ai computational-semantics deep-learning nlp large-language-models xlm-roberta

multilingual_stereotypes's Introduction

Equal Contributors : ADAM VALIN, VELJKO KOVAC, GERARD PLANELLA

STEREOTYPES IN MULTILINGUAL LANGUAGE MODELS

Description

This repository contains all the code used for the realisation of the ATCS Project : Stereotypes In MultiLingual Language Models. this project, we study stereotypes that emerge within pre-trained Multilingual Language Models(MLMs). These models are typically trained on large-scale multilingual text corpora, learning torepresent the different languages in a shared embedding space. This shared representation allows the model to generalize knowledge learned in one language to other languages, a phenomenon known as cross-lingual transfer (Conneau et al., 2019). Our objective is to build upon the work of Choenni et al., 2021 by initially contrasting emotion profilesof identical social groups across diverse languages within these MLMs.

Installing the Project Environment

This project uses an Anaconda environment for managing dependencies. Follow the instructions below to set up the environment:

First, ensure that you have Anaconda or Miniconda installed on your system. If not, you can download Anaconda from https://www.anaconda.com/products/distribution.

Clone the repository and navigate to the project directory:

'''python
git clone https://github.com/username/project.git
cd project
'''

The environment.yaml file in the project root contains the specifications for the project's conda environment. Create the environment using the following command:

'''python
conda env create -f environment.yaml
'''

Once the environment is created, you can activate it using:

'''python
conda activate env_name
'''

Replace env_name with the name of the environment specified in the environment.yaml file.

To check that the environment was installed correctly, you can list the environments:

'''python
conda env list
'''

You should see your new environment in the list.

Now you are ready to start working on the project!

Remember to replace username, project, and env_name with the appropriate values for your project.

Generating the priors

The priors are already generated and pushed to the repository so this step can be skipped. However, if wanted, one could run create_priors.py with the right arguments to regenrate them.

Creating the emotion profiles

The first thing to do is to generate emotion profiles for given social groups for given languages. You can run the file run_test_normalization.py, it will generate emotion profiles for each social group and each language for a given model. Modify the arguments to modify either the used model or the top_k which corresponds to the number of predictions taken into account to generate the emotion profiles per social group per prompt.

You can run the emotion profiles for specific fine-tuned models by changing the --finetuned_model flag to either 'french', 'english', 'greek', 'spanish' or 'base'.

Correlations

Ideally, you want to compare the baseline emotion profiles with fine-tuned emotion profiles, to see how the outputs of the model changed with fine-tuning. For this purpose, first run emotion profiles with the pretrained xlm-R model and with a fine-tuned model. Then run the run_correlations.py file with the right fine_tuned model.

Fine-tuning of the models

Fine-tuning is possible through the use of all finetuning/train_*.py file. Running them directly is possible. You can also change the flags for parameters such as batch_size, number of epochs, output_directory ...

Fine-tuned Models

The weights of the already trained models that are discussed in our paper can be found here: https://drive.google.com/drive/folders/1VdK-m7a8uM2A6pQref9VNrMq_lryzCEV?usp=sharing

Emotion Profile Analysis

To perform similar analysis on the shifts of emotion profiles before and after finetuning as in our paper, you can follow the "Emotion Profile Analysis.ipynb".

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.