Comments (3)
I manually disabled autocasting in the linear blocks and got the forward pass to work, but I'm getting nan
s now in the backward pass.
I'll update with more details if I'm able to get the train the model in a stable manner in mixed precision.
from i-bert.
Hi, thanks for your interest and I apologize for my late response.
Do you still encounter the same problem?
Since we haven't tried with lower precision than the default 8-bit setting, we have not encountered the same issue.
It is probably because we haven't taken into account lower bit precisions when writing the code and there might be some corner cases we haven't debugged.
In case you have already had a solution for this, we would also greatly appreciate it if you could please open up a PR for it.
from i-bert.
Thanks for your response!
I got around that issue by disabling autocasting in the linear blocks but I realized after doing that, that it defeats the purpose of mixed precision training because most of the computations are in the linear layers which are in fp32 (after the disabling), so it yielded no improvement in training time.
I've given up on it for now because I had to move on to other things, but I'll definitely provide an update if I get it working in the future.
from i-bert.
Related Issues (20)
- How is the scaling factor S implemented with integer?
- Is ibert-roberta-base on huggingface model hub the same as roberta-base
- why is Integer-only finetuning is much more slower than fp32 finetune
- Can not inference the quantilized model in my device by int8 HOT 1
- How can we change the quantization settings?
- Another setting for quantization
- IBert problems of quant_model=true HOT 1
- Wrong script of downloading GLUE datasets
- Arguments in run.py
- Latency 20x with quant_mode = true HOT 1
- Storing both float32 and int parameters
- Pre-trained weights for specific tasks
- About scaling_factor
- 0
- Where can I find the integer-sqrt kernel ?
- Task name references in strings are wrong HOT 1
- Cannot perform quantization-aware-finetuning due to NaN values
- About CUDA out of memory
- python run.py --arch roberta_base --task STS-B
- error when runing download_glue_data.py
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from i-bert.