Giter Club home page Giter Club logo

oos-eval's Issues

Question about threshold in paper

Hi,
In paper "An Evaluation Dataset for Intent Classification and Out-of-Scope Prediction", you made comparisons between three approaches: oos-train, oss-threshold and oss-binary.
But the paper did not clearly tell which threshold was used ? 0.5, 0.6, ....

Can you please give me that information ?
Rgds,

Clarification re results in Table 2

Hi,

Do the numbers in Table 2 for oos-train correspond to the test set? I am trying to replicate the BERT results (using transformers and datasets library) with the hyperparameters provided in this repo, but there's a near 10 point difference in the test set performance; however, my validation set performance is quite close to the numbers provided in the paper. It'd be great if you could clarify this.

Thanks,
Gaurav.

Hyperparameter details for fine tuning bert-large in the oos-train setting

Hi,

I am using huggingface to fine-tune bert large on the CLINC dataset. I follow the hyperparameters mentioned in hyperparams.csv but there's ~3 point difference in inscope accuracy for the oos-train setting (93.49 v/s 96.9 for Full version of the dataset; similarly for the OOS-Plus setting). I am wondering if this is due to some HF defaults, for e.g., HF defaults to 1.0 for gradient clipping, I am not sure what did you use. Would it be possible to clarify a bit more about your fine-tuning process? It'd be very helpful.

Thanks,
Gaurav.

confusing about threshold setting

In our evaluation, the out-of-scope threshold was chosen to be the value which yielded the highest validation score across all intents, treating out-of-scope as its own intent.

I am a little confused by this sentence. Does it mean that we select oos's highest score on the known intention as the threshold in the validation set? If so, isn't oos's recall equal to 1 in each epoch of validation set, how do we early stop and select hyper-parameters?

confusing about Table 3 in paper.

I'm confusing about Table 3 in paper.

What is the experimental process?

I guess the binary classifier (oos detector) is first trained on "binary_undersample.json" (or "binary_wiki_aug.json" in wiki aug experiment), to detection whether the utterances are "in" or "oos", then build downstream multi-classes classifier (e.g. 150 classes for in-scope data) to deal with "in" samples from upstream oos detector.

In-Scope Accuracy was evaluated on "test" in "data_oos_plus.json", and Out-of-Scope Recall was evaluated on "oos_test" in "data_oos_plus.json".

Re-partitioning Data: Which worker wrote which query?

Hi! I want re-partition the dataset to create 5 different train/valid/test splits for my analyses. In the paper, you mention that all queries from a given crowd worker were place in a single split. Is it possible to share information about which queries were generated by the same worker? I'd like to minimize any in-scope biases in my splits as well.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.