Light

drsy / dgen Goto Github PK

View Code? Open in Web Editor NEW

19.0 19.0 2.0 4.43 MB

[AAAI 2021]Knowledge-Driven Distractor Generation for Cloze-Style Multiple Choice Questions

Python 100.00%

distractor-generation question-generation

dgen's Introduction

Hi there 👋

😉 I am Siyu Ren.

🎓 I got my Bachelor degree from Tong Ji University and Ph.D degree at Shanghai Jiao Tong University.

🔎 Currently, my research interest includes Efficient Methods for NLP/Large Language Models and techniques around mechanistic understanding of LLMs.

📚 For my academic publications, please refer to https://drsy.github.io/.

dgen's People

Contributors

Stargazers

Watchers

Forkers

abbey4799 faheem-khaskheli

dgen's Issues

What dataset are you used for evaluation in paper?

I am researching your paper Knowledge-Driven Distractor Generation for Cloze-Style Multiple Choice
Questions for my graduation topic. I already trained a model, and I want to use your dataset to evaluate my model.

Could you tell me what dataset are you used for evaluation in paper?
Is it the file in DGen/Layer1/dataset/total_new_cleaned_test.json ?

thanks!

Instructions on how to run

Can you please give instructions in which order we can implement it and test it?

Guess for missing informations

In the code, there is a mention about "cosine similarity from BERT-based embeddings but observed longer inference time and
similar performance", and this version of code is using SBRT similarity, which is missing "from Layer2.Fine_tuned_BERT import get_similarity_from_SBERT" file.

So in order to implement the paper settings, we need to replace this sbert similarity function to LDA similarity function.
(

DGen/Conceptualizer.py

Line 68 in ddbe0e4

probabilities_of_concepts = self.__calculate_probs_of_concepts_bert(

->

DGen/Conceptualizer.py

Line 169 in ddbe0e4

def __calculate_probs_of_concepts(self, concepts, sentence, debug):

)
The code is fragmented, but it seems to contain all the important information. Thank you for providing codes for feature similarity calculation

Evaluation Metrics

Hey can you please elaborate how did you find recall@3. The cosine similarity between true distractors and the predicted distractors will lie between 0 and 1. Please elaborate how did you convert this fraction into a number where your recall@3=12.98
Please let me know if i have understood wrongly.
Let me elaborate my doubt:
For example original distractors where = ['red', 'black', 'blue']
And Predicted distractor is =['red','yellow','green']
Then the cosine similarity would be (returned values from word2vec similarity function): [1,0.8,0.5]
Similarly for n generated questions you get n such lists of length=3
That is: [
[1,0.8,0.5],
[0.7,0,0.04],
[0.3,0.8,0.2],
*
*
*
[0.2,0.4,0.6]
]
Now how did you calculate recall@3 or precision@3 ??
@DRSY

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.