Hello! Firstly, thanks for your fantastic efforts and research. The

Question on Production Use-Cases about i-code HOT 6 CLOSED

logan-markewich commented on September 16, 2024

Question on Production Use-Cases

from i-code.

Comments (6)

zinengtang commented on September 16, 2024

Generative models are searching answers from a very large space compared to discriminative models that only search from context or a limited label pools. But there are two cases of QA.

In the case of extractive QA, i.e., answers are in the context, we can give generative model a prompt to tell that it is extractive. So, the model would know that it should obtain answer only from context and it can be more confident. This prompt needs to appear in pretraining of course.

But for QA with answers not in the context, we can only use generative models. It is exactly the nature of generative models to generate versatile outputs. For this part, it depends the tasks you are dealing with and the dataset trained on. What is the specific example you can give?

from i-code.

logan-markewich commented on September 16, 2024

Yes, I should have clarified my interest was in extractive QA, like the DocVQA dataset. When asking questions to a model trained on DocVQA, it can be very helpful to use the confidence to filter the answer and only show an end user the most confident predictions. Of course, this only works when predicting simple start/end positions of answers.

The paper doesn't specify exactly, but I'm assuming you used UDOP as a generative model when training/evaluating on DocVQA?

from i-code.

zinengtang commented on September 16, 2024

Yes it is generative model.
Usually we use beam search or greedy search and obtain the best sequence. The confidence here is joint probability of the searched words. I am not sure why finding the confidence is an issue.

from i-code.

logan-markewich commented on September 16, 2024

Finding the confidence is easy enough for sure! In the past, I've just found the confidence of the generated string is less helpful compared to the traditional start/end logits. Most recently I was using the Donut model after training on DocVQA. Often the confidences do not correlate with an answer being right or wrong (>95% confidence but the answer is wrong, that was a common finding). Instead, I had to rely on the model predicting an "answer not found" token, but even that was not as helpful.

Contrasted with traditional start/end position predictions, those confidences are an excellent indicator of a correct or wrong answer.

In any case, I am excited to see if this situation has improved with UDOP :)

from i-code.

zinengtang commented on September 16, 2024

Why in traditional start/end position predictions, confidences are excellent indicator? Do you have any experiments or stats to support this claim? For example, you can show that choosing the most confident answer is better than choosing from top5 in generative model. > 95 % does not mean anything absolutely but only relatively. >95% does not mean model is confident.

from i-code.

logan-markewich commented on September 16, 2024

Yes, 95% is relative. But if you don't use softmax, you can get a better indication of absolute confidence. So, maybe I shouldn't be using percentages here haha, it's more of a signal strength then.

I've been doing a lot of work with extractive QA. Unfortunately, it's all NDA and whatnot 🙄, but broadly speaking, I am extracting specific fields from documents, and each field has multiple questions that might lead to a desired answer. However, there is no guarantee each field is in the document.

For each field, one can determine an appropriate threshold, after testing on a large enough test set, so that the extracted answer is most likely correct.

Usually, these confidence thresholds range from 20-60. But with my experiments with generative models like Donut, this process starts to fall apart. I wish I could share the data, but it seems others on the Donut github shared similar thoughts.

In any case, this is just my thoughts/experiences from a commercial perspective. I'm still very excited to take UDOP for a spin 💪🏻

from i-code.

Question on Production Use-Cases about i-code HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent