Comments (6)
Generative models are searching answers from a very large space compared to discriminative models that only search from context or a limited label pools. But there are two cases of QA.
In the case of extractive QA, i.e., answers are in the context, we can give generative model a prompt to tell that it is extractive. So, the model would know that it should obtain answer only from context and it can be more confident. This prompt needs to appear in pretraining of course.
But for QA with answers not in the context, we can only use generative models. It is exactly the nature of generative models to generate versatile outputs. For this part, it depends the tasks you are dealing with and the dataset trained on. What is the specific example you can give?
from i-code.
Yes, I should have clarified my interest was in extractive QA, like the DocVQA dataset. When asking questions to a model trained on DocVQA, it can be very helpful to use the confidence to filter the answer and only show an end user the most confident predictions. Of course, this only works when predicting simple start/end positions of answers.
The paper doesn't specify exactly, but I'm assuming you used UDOP as a generative model when training/evaluating on DocVQA?
from i-code.
Yes it is generative model.
Usually we use beam search or greedy search and obtain the best sequence. The confidence here is joint probability of the searched words. I am not sure why finding the confidence is an issue.
from i-code.
Finding the confidence is easy enough for sure! In the past, I've just found the confidence of the generated string is less helpful compared to the traditional start/end logits. Most recently I was using the Donut model after training on DocVQA. Often the confidences do not correlate with an answer being right or wrong (>95% confidence but the answer is wrong, that was a common finding). Instead, I had to rely on the model predicting an "answer not found" token, but even that was not as helpful.
Contrasted with traditional start/end position predictions, those confidences are an excellent indicator of a correct or wrong answer.
In any case, I am excited to see if this situation has improved with UDOP :)
from i-code.
Why in traditional start/end position predictions, confidences are excellent indicator? Do you have any experiments or stats to support this claim? For example, you can show that choosing the most confident answer is better than choosing from top5 in generative model. > 95 % does not mean anything absolutely but only relatively. >95% does not mean model is confident.
from i-code.
Yes, 95% is relative. But if you don't use softmax, you can get a better indication of absolute confidence. So, maybe I shouldn't be using percentages here haha, it's more of a signal strength then.
I've been doing a lot of work with extractive QA. Unfortunately, it's all NDA and whatnot 🙄, but broadly speaking, I am extracting specific fields from documents, and each field has multiple questions that might lead to a desired answer. However, there is no guarantee each field is in the document.
For each field, one can determine an appropriate threshold, after testing on a large enough test set, so that the extracted answer is most likely correct.
Usually, these confidence thresholds range from 20-60. But with my experiments with generative models like Donut, this process starts to fall apart. I wish I could share the data, but it seems others on the Donut github shared similar thoughts.
In any case, this is just my thoughts/experiences from a commercial perspective. I'm still very excited to take UDOP for a spin 💪🏻
from i-code.
Related Issues (20)
- Data Collator Incorrect When Using a Decoder Prefix
- from core.common.utils import img_trans_torchvision, get_visual_bbox Module not found error
- layout token unkown HOT 1
- special vis token
- Image loading in dataloader code HOT 5
- Img2txt result is pretty bad on 16bit HOT 4
- Img2Img Broken? HOT 1
- Trainig pipeline of CoDi in i-Code-V3 HOT 6
- i-Code-V3: How could I implement training tasks in i-Code-V3?
- Can you provide classifier-free guidance probability? HOT 1
- Cuda out of memory HOT 1
- Finetuning on InfographicVQA HOT 12
- Release of the model UDOP-1024
- UDOP document editing/generation
- COCO Caption benchmark
- CoDi: Fine Tuning
- VAE in Training of each LDM in CoDi HOT 1
- CoDi-2 Dataset Availability HOT 1
- Convert to onnx format for interferencing in c#
- Unable to reproduce the results of the paper HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from i-code.