bias-in-llms

Occupational Bias in Open-Source Pretrained Large Language Models: Analyzing Polarity towards Creative and Technical Professions Student Researcher: Minh Phuc (Phineas) Pham
Supervisor: Dr. Sarah Supp

Author
Table of contents
Project Overview
Installation and Setup
Data
Code structure
Results and evaluation
Future work
Acknowledgments and references
License

Author

Minh Phuc (Phineas) Pham
[email protected]
Senior at Denison University majoring in Computer Science and Data Analytics.

Project Overview

Background:

As Large Language Models (LLMs) transform the tech industry, their integration into numerous applications raises concerns about potential biases. While these powerful models enable rapid prototyping and ideation, their training process, which often relies on internet data, can lead to unequal representation and biased language understanding. This research investigates the occupational bias present in some of the most widely used LLMs in the industry. By analyzing their outputs, I discovered that all the selected models exhibit a more positive bias towards technical jobs compared to creative professions. Notably, larger models tend to display greater occupational bias. Although our study focuses on a limited number of LLMs, limiting the generalizability of our conclusions, it serves as a starting point for further research into evaluating and mitigating bias in language models. Identifying the root causes of bias is crucial for developing better training methods that can reduce bias in LLMs, ensuring their outputs align with social values and promote inclusivity. As generative AI continues to shape the tech landscape, addressing bias in LLMs is paramount to harnessing their full potential while upholding ethical standards and promoting fair representation across all occupations and domains.

Research Design:

To understand the bias in those pretrained language models with probabilistic nature, researchers have developed several approaches to assess how these models performed on a specific dataset to test the bias in these models’ text generation. Generally, the most common framework developed by previous 230 researchers is to assess models’ bias performance based on their probable text generation under a defined data or context. The dataset is created based on the type of bias of the study (e.g., gender, race, or religion bias), language model architecture (autoregressive language models or masked language models), and the methodology of assessment (Gallegos et al., 2023). In this research, I build an evaluation pipeline (figure 1) surrounding the BOLD (Bias in Open-ended Language Generation) dataset created by researchers at 235 Amazon to study and benchmark social biases in open-ended language generation systematically (Dhamala et al., 2021). This dataset contains prompts of many contexts and works for models that are capable of text generation tasks and create texts following the input prompt, in which I evaluate the professional bias based on the text generated by these models.

Figure 1: Measuring language Polarity with Evaluate. This is the overall pipeline of the evaluating process. The first section “Dataset” contains the process of filtering and splitting BOLD dataset into two set of prompts: creative and technical occupations. These prompts will be used to prompt each LLM to extract the text generations for each set of prompts. The text generations from each model will be evaluated using Evaluate model from Hugging Face (Hugging Face, 2024) to produce sentiment scores like positive, negative, neutral, and other scores.

Quickstart

This repository contains the codebase to reproduce the results in the paper.

Prerequisites

Before you begin, make sure you have the following installed:

Git
Conda (or any other Python environment manager)

Step 1: Clone the Repository

Open your terminal or command prompt and navigate to the directory where you want to clone the repository. Then, run the following command:

git clone https://github.com/Ph1n-Pham/bias-in-llms.git

Step 2: Create a Conda Environment

Next, create a new Conda environment and install the required dependencies. Navigate to the cloned repository and run the following commands:

conda create -n myenv python=3.10
conda activate myenv
pip install -r requirements.txt

This will create a new Conda environment named myenv with Python 3.10 and install the required packages listed in the requirements.txt file.

Step 3: Run the Sample Script

Once the dependencies are installed, you can run the sample script prompt.py to generate text based on a given prompt and reproduce regard results for the predefined models. Navigate to the repository's root directory and run the following command:

python prompt.py --model_path openlm-research/open_llama_3b_v2 --tokenizer_path openlm-research/open_llama_3b_v2

This command will use prompt.py to prompt model openlm-research/open_llama_3b_v2 and its tokenizer openlm-research/open_llama_3b_v2 from HuggingFace API and reproduce regard result for this model. Other example commands can be viewed from job.sh to reproduce other models used in this project.

Contributing:

If you'd like to contribute to this project, please follow the standard GitHub workflow:

Fork the repository
Create a new branch (git checkout -b feature/your-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin feature/your-feature)
Create a new Pull Request

Data

Source Data and Acquisition

Our data source is from paper "BOLD: Dataset and metrics for measuring biases in open-ended language generation" (Dhamala et al., 2021). We acquire this data from HuggingFace API.

Data Preprocessing

To reproduce the data from this source to measure occupation language polarity, I split the profession prompts from the source into two groups: creative and technical occupations. More information on how I group these prompts can be viewed from the paper or in BOLD-dataset/profession_prompts

Code structure

The codebase of this project is structured as below:

├── BOLD-dataset/
│   ├── profession_prompts
│   │   ├── creative_occ_prompts.txt
│   │   ├── technical_occ_prompts.txt
│   ├── prompts
│   │   ├── gender_prompt.json
│   │   ├── political_ideology_prompt.json
│   │   ├── profession_prompt.json
│   │   ├── race_prompt.json
│   │   ├── religious_ideology_prompt.json
│   ├── wikipedia
│   │   ├── gender_wiki.json
│   │   ├── political_ideology_wiki.json
│   │   ├── profession_wiki.json
│   │   ├── race_wiki.json
│   │   ├── religious_ideology_wiki.json
│   ├── CODE_OF_CONDUCT.md
│   ├── CONTRIBUTING.md
│   ├── LICENSE.md
│   ├── README.md
├── regard_result/
│   ├── allenai_OLMo-1B_bias.txt
│   ├── allenai_OLMo-7B-Twin-2T_bias.txt
│   ├── allenai_OLMo-7B_bias.txt
│   ├── lmsys_vicuna-13b-v1.5_bias.txt
│   ├── lmsys_vicuna-7b-v1.5_bias.txt
│   ├── openlm-research_open_llama_13b_bias.txt
│   ├── openlm-research_open_llama_3b_v2_bias.txt
│   ├── openlm-research_open_llama_7b_v2_bias.txt
│   ├── tiiuae_falcon-7b_bias.txt
├── .gitignore
├── LICENSE
├── README.md
├── job.sh
├── official_paper.pdf
├── prompt.py
└── requirements.txt

BOLD-dataset Directory

profession_prompts
- creative_occ_prompts.txt: Contains text prompts related to creative occupations.
- technical_occ_prompts.txt: Contains text prompts related to technical occupations.
prompts
- gender_prompt.json: JSON file containing prompts related to gender.
- political_ideology_prompt.json: JSON file containing prompts related to political ideology.
- profession_prompt.json: JSON file containing prompts related to professions.
- race_prompt.json: JSON file containing prompts related to race.
- religious_ideology_prompt.json: JSON file containing prompts related to religious ideology.
wikipedia
- gender_wiki.json: JSON file containing Wikipedia data related to gender.
- political_ideology_wiki.json: JSON file containing Wikipedia data related to political ideology.
- profession_wiki.json: JSON file containing Wikipedia data related to professions.
- race_wiki.json: JSON file containing Wikipedia data related to race.
- religious_ideology_wiki.json: JSON file containing Wikipedia data related to religious ideology.
CODE_OF_CONDUCT.md: Markdown file outlining the code of conduct for contributors.
CONTRIBUTING.md: Markdown file providing guidelines for contributing to the project.
LICENSE.md: Markdown file containing the licensing information for the project.
README.md: Markdown file with an overview and general information about the dataset.

regard_result Directory

allenai_OLMo-1B_bias.txt: Text file containing bias analysis results for the AllenAI OLMo-1B model.
allenai_OLMo-7B-Twin-2T_bias.txt: Text file containing bias analysis results for the AllenAI OLMo-7B-Twin-2T model.
allenai_OLMo-7B_bias.txt: Text file containing bias analysis results for the AllenAI OLMo-7B model.
lmsys_vicuna-13b-v1.5_bias.txt: Text file containing bias analysis results for the LMSys Vicuna-13B-v1.5 model.
lmsys_vicuna-7b-v1.5_bias.txt: Text file containing bias analysis results for the LMSys Vicuna-7B-v1.5 model.
openlm-research_open_llama_13b_bias.txt: Text file containing bias analysis results for the OpenLM-Research Open Llama 13B model.
openlm-research_open_llama_3b_v2_bias.txt: Text file containing bias analysis results for the OpenLM-Research Open Llama 3B v2 model.
openlm-research_open_llama_7b_v2_bias.txt: Text file containing bias analysis results for the OpenLM-Research Open Llama 7B v2 model.
tiiuae_falcon-7b_bias.txt: Text file containing bias analysis results for the TIIUAE Falcon-7B model.

Root Directory

.gitignore: Git ignore file specifying files and directories ignored by Git.
LICENSE: License file for the project.
README.md: Main readme file with an overview and instructions for the project.
job.sh: Command templates, used to run prompt.py script for different models.
official_paper.pdf: PDF version of the original research paper.
prompt.py: Python script for prompting LLMs.
requirements.txt: Python libraries.

Result

Language Polarity towards Technical Occupations

Overall, all selected models exhibit a difference of positive scores less than 0. This shows that all models are biased in generating more positive responses for prompts for technical jobs than creative ones. While negative scores are insignificant (less than 0.05), most of the negative scores are positive, meaning that these models also give more negative responses for creative jobs than technical ones.

For more information, take a good read of the official paper for more detailed explanation.

Future work

This research serves as a starting point for further investigations into the root causes of bias in language models and the development of strategies to build more equitable and socially responsible AI systems. By fostering interdisciplinary collaborations between computer scientists, social scientists, and domain experts, we can work towards creating language models that truly reflect the diversity and richness of human experiences, free from the constraints of historical biases and prejudices. Ultimately, the goal should be to harness the immense potential of LLMs while ensuring that their outputs align with societal values of fairness, inclusivity, and respect for all individuals and communities, regardless of their chosen profession or creative pursuits.

Acknowledgments and references:

I would like to express my deepest appreciation to Dr. Sarah Supp and Dr. Matthew Lavin from the Denison University Data Analytics Program for their supervision and feedback throughout the project. Additionally, this endeavor would not have been possible without the computing resources from the Ohio Supercomputer Center and the Denison Computer Science Department.

I am also grateful to my friends Hung Tran and Linda Contreras Garcia for their writing help, late-night study sessions, and emotional support. Their support, in many ways, helps keep pushing the research forward throughout the semester.

Lastly, words cannot express my gratitude to my family members, especially my mom. Their belief in me kept me motivated during downtimes throughout the project.

References:

A Brief History of Large Language Models (LLM) | LinkedIn. (n.d.). Retrieved February 8, 2024, from https://www.linkedin.com/pulse/brief-history-large-language-models-llm-feiyu-chen/

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

OpenAI. (2022, November 30). ChatGPT: Optimizing Language Models for Dialogue. https://openai.com/blog/chatgpt/

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.

Khashabi, D., Khot, T., Sabharwal, A., Clark, P., Etzioni, O., & Roth, D. (2020). Unifiedqa: Crossing format boundaries with a single qa system. arXiv preprint arXiv:2005.00700.

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the dangers of stochastic parrots: Can language models be too big?. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency.

Sheng, E., Chang, K. W., Natarajan, P., & Peng, N. (2019). The woman worked as a babysitter: On biased word embeddings. arXiv preprint arXiv:1905.09866.

Liang, P. P., Wu, C., Baral, C., & Tian, Y. (2022). Towards understanding and mitigating social biases in language models. arXiv preprint arXiv:2202.08918.

Dinan, E., Roller, S., Shuster, K., Fan, A., Boureau, Y. L., & Weston, J. (2019). Wizard of wikipedia: Knowledge-powered conversational agents. arXiv preprint arXiv:1811.01241.

Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological review, 65(6), 386.

McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4), 115-133.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.

Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683.

Beltagy, I., Peters, M. E., & Cohan, A. (2020). Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150.

Free Software Foundation. (2007). GNU General Public License. https://www.gnu.org/licenses/gpl-3.0.en.html

Massachusetts Institute of Technology. (n.d.). The MIT License. https://opensource.org/licenses/MIT

Blodgett, S. L., Barocas, S., Daumé III, H., & Wallach, H. (2020). Language (technology) is power: A critical survey of "bias" in NLP. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (pp. 5454-5476).

Bolukbasi, T., Chang, K. W., Zou, J. Y., Saligrama, V., & Kalai, A. T. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Advances in neural information processing systems, 29.

OpenLM-Research. (2023). Open_llama [GitHub repository]. https://github.com/openlm-research/open_llama

Groeneveld, D., Kharitonov, E., Hanina, A., Sharir, O., Patu, K., Majumder, O., ... & Shoham, Y. (2024). OLMo: Open Language Models. arXiv preprint arXiv:2304.01256.

Lowe, R., Ananyeva, M., Blackwood, R., Chmait, N., Foley, J., Hsu, M., ... & Zellers, R. (2023). Vicuna: An Open-Source Chatbot Impressing Humans in the Wild. arXiv preprint arXiv:2303.09592.

Almazrouei, M., Elhajj, I. H., Alqudah, A., Alqudah, A., & Alsmadi, I. (2023). Falcon: A 180 Billion Parameter Open-Source Language Model. arXiv preprint arXiv:2304.07142.

Tan, S., Tunuguntla, D., & van der Wal, O. (2022). You reap what you sow: On the Challenges of Bias Evaluation Under Multilingual Settings. https://openreview.net/forum?id=rK-7NhfSIW5

Tatman, R. (2017). Gender and Dialect Bias in YouTube's Automatic Captions. Proceedings of the First ACL Workshop on Ethics in Natural Language Processing, 53-59. https://doi.org/10.18653/v1/W17-1606

Tripodi, F. (2023). Ms. Categorized: Gender, notability, and inequality on Wikipedia. New Media & Society, 25(7), 1687-1707.

Turpin, M., Michael, J., Perez, E., & Bowman, S.R. (2023). Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting. arXiv:2305.04388

U.S. Bureau of Labor Statistics. (2022). Employed persons by detailed occupation, sex, race, and Hispanic or Latino ethnicity. https://www.bls.gov/cps/cpsaat11.htm

Vanmassenhove, E., Hardmeier, C., & Way, A. (2018). Getting Gender Right in Neural Machine Translation. Proceedings of EMNLP 2018, 3003-3008. https://doi.org/10.18653/v1/D18-1334

Venkit, P.N., Gautam, S., Panchanadikar, R., Huang, T.H., & Wilson, S. (2023). Nationality Bias in Text Generation. arXiv:2302.02463

Venkit, P.N., Srinath, M., & Wilson, S. (2022). A Study of Implicit Bias in Pretrained Language Models against People with Disabilities. Proceedings of COLING 2022, 1324-1332.

Dhamala, J., Sun, T., Kumar, V., Krishna, S., Pruksachatkun, Y., Chang, K.-W., & Gupta, R. (2021). BOLD: Dataset and metrics for measuring biases in open-ended language generation. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 862-872. https://doi.org/10.1145/3442188.3445924

Caliskan, A., Bryson, J. J., & Narayanan, A. (2017). Semantics derived automatically from language corpora contain human-like biases. Science, 356(6334), 183-186.

Sap, M., Card, D., Gabriel, S., Choi, Y., & Smith, N. A. (2019). The risk of racial bias in hate speech detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 1668-1678).

Sheng, E., Chang, K. W., Natarajan, P., & Peng, N. (2021). The Societal Biases in Language Datasets and their Impact on Model Prediction. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (pp. 1711-1724).

Mukaka, M. M. (2012). A guide to appropriate use of correlation coefficient in medical research. Malawi Medical Journal, 24(3), 69-71.

Sedgwick, P. (2012). Pearson's correlation coefficient. BMJ, 345, e4483. https://doi.org/10.1136/bmj.e4483

Taylor, R. (1990). Interpretation of the correlation coefficient: A basic review. Journal of Diagnostic

Medical Sonography, 6(1), 35-39. https://doi.org/10.1177/875647939000600106

Wasserstein, R. L., & Lazar, N. A. (2016). The ASA statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129-133. https://doi.org/10.1080/00031305.2016.1154108

License

This project is licensed under the MIT License.

ph1n-pham / bias-in-llms Goto Github PK

bias-in-llms's Introduction