clinc / oos-eval Goto Github PK

Repository that accompanies "An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction" (EMNLP 2019)

License: Other

oos-eval's Issues

Question about threshold in paper

Hi,
In paper "An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction", you made comparisons between three approaches: oos-train, oss-threshold and oss-binary.
But the paper did not clearly tell which threshold was used ? 0.5, 0.6, ....

Can you please give me that information ?
Rgds,

What are the 10 domains covered in this dataset?

Could not find the answer to this question in the documentation.
What are the different domains covered in the dataset?
Is there a domain - intent mapping, to extract the data of interest?

Clarification re results in Table 2

Hi,

Do the numbers in Table 2 for oos-train correspond to the test set? I am trying to replicate the BERT results (using transformers and datasets library) with the hyperparameters provided in this repo, but there's a near 10 point difference in the test set performance; however, my validation set performance is quite close to the numbers provided in the paper. It'd be great if you could clarify this.

Thanks,
Gaurav.

Hyperparameter details for fine tuning bert-large in the oos-train setting

Hi,

I am using huggingface to fine-tune bert large on the CLINC dataset. I follow the hyperparameters mentioned in hyperparams.csv but there's ~3 point difference in inscope accuracy for the oos-train setting (93.49 v/s 96.9 for Full version of the dataset; similarly for the OOS-Plus setting). I am wondering if this is due to some HF defaults, for e.g., HF defaults to 1.0 for gradient clipping, I am not sure what did you use. Would it be possible to clarify a bit more about your fine-tuning process? It'd be very helpful.

Thanks,
Gaurav.

confusing about threshold setting

In our evaluation, the out-of-scope threshold was chosen to be the value which yielded the highest validation score across all intents, treating out-of-scope as its own intent.

I am a little confused by this sentence. Does it mean that we select oos's highest score on the known intention as the threshold in the validation set? If so, isn't oos's recall equal to 1 in each epoch of validation set, how do we early stop and select hyper-parameters?

confusing about Table 3 in paper.

I'm confusing about Table 3 in paper.

What is the experimental process?

I guess the binary classifier (oos detector) is first trained on "binary_undersample.json" (or "binary_wiki_aug.json" in wiki aug experiment), to detection whether the utterances are "in" or "oos", then build downstream multi-classes classifier (e.g. 150 classes for in-scope data) to deal with "in" samples from upstream oos detector.

In-Scope Accuracy was evaluated on "test" in "data_oos_plus.json", and Out-of-Scope Recall was evaluated on "oos_test" in "data_oos_plus.json".

Re-partitioning Data: Which worker wrote which query?

Hi! I want re-partition the dataset to create 5 different train/valid/test splits for my analyses. In the paper, you mention that all queries from a given crowd worker were place in a single split. Is it possible to share information about which queries were generated by the same worker? I'd like to minimize any in-scope biases in my splits as well.

clinc / oos-eval Goto Github PK

oos-eval's Issues

Question about threshold in paper

What are the 10 domains covered in this dataset?

Clarification re results in Table 2

Hyperparameter details for fine tuning bert-large in the oos-train setting

confusing about threshold setting

confusing about Table 3 in paper.

Re-partitioning Data: Which worker wrote which query?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent