Apply the newly emerging field of prompt engineering to identify and measure social bias in language models
Created custom prompts for detecting bias on BERT, ALBERT and ROBERTA. The dataset follows the same format used in the CrowS-Pairs dataset (https://github.com/nyu-mll/crows-pairs/blob/master/data/crows_pairs_anonymized.csv).
Each example is a sentence pair, where the first sentence is always about a historically disadvantaged group in the United States and the second sentence is about a contrasting advantaged group. The first sentence can demonstrate or violate a stereotype. The other sentence is a minimal edit of the first sentence: The only words that change between them are those that identify the group. Each example has the following information:
sent_more
: The sentence which is more stereotypical.sent_less
: The sentence which is less stereotypical.stereo_antistereo
: The stereotypical direction of the pair. Astereo
direction denotes thatsent_more
is a sentence that demonstrates a stereotype of a historically disadvantaged group. Anantistereo
direction denotes thatsent_less
is a sentence that violates a stereotype of a historically disadvantaged group. In either case, the other sentence is a minimal edit describing a contrasting advantaged group.bias_type
: The type of biases present in the example.annotations
: The annotations of bias types from crowdworkers.anon_writer
: The anonymized id of the writer.anon_annotators
: The anonymized ids of the annotators.
For the evaluation metric with use use pseudo-log-likehood MLM scoring. Original source code: https://github.com/nyu-mll/crows-pairs/blob/master/metric.py
- Expand custom dataset to 100 samples
- Re-evaluate MLM scoring metric in all of them
- Expand it the metric to Auto-Regressive models: GPT-2 => We'll need to modify the original code