Comments (4)
In a lot of cases, the error is not very meaningful.
What do you mean by meaningful, each xfunc usually adds more context on top of the error (e.g. a path), and it's up to the upper layer to decide whether the error is meaningful or not.
In some cases, we lose some info (ex: xopen, xclose will return 1 error type only)
They all embed errno or something else so no you don't lose info.
In some cases, we might even fail while failing, ex: xstrdup, that can fail due to a bad alloc. If we fail, we will go to error_set, which will call asprintf, which will alloc memory. Yes!
How is it failing exactly? In the case of strdup it's just trying to allocate an error string. You could be lucky and be able to do that, if you can't, you simply don't have an error msg populated.
in other cases, the error check we have to do after is the same (xdlopen ?)
What do you mean?
My opinion is that these functions SHOULD exit when failing. This lib has no fallback when something goes wrong. Errors are set, but not handled, so we can either crash, or have UBs.
My suggestion is rewrite the whole file, and each function, if failing should:
write a meaningful message to stderr
abort
What do you mean by errors are not handled? When the library fails you usually have a pretty good idea why. What crash or UB are you experiencing?
No and no!
First, you don't get to abort a program from a shared library, especially not a container runtime. You have no idea what has been done and what needs to be undone and you don't even know what your abort is going to do, maybe you're PID 1, maybe you're a subreaper, maybe you're inside a cgo constructor...
Same applies to stderr, you have no idea where it's pointing at and whether it's safe or not to use it.
from libnvidia-container.
Hi Jonathan,
What do you mean by meaningful
They all embed errno or something else so no you don't lose info.
Well, from what I understood, this file is supposed to give helpers that will simplify error handling. errno and non-errno function unified with a common API that set the error message. errno function will give you more details, example, xopen will give the reason behind the failure. But here, we loose it. We only know opening Y file failed. And the user of the functions has to use errno to get the value. But if he does that, we loose the "uniformization" effort brought the file.
How is it failing exactly? In the case of strdup it's just trying to allocate an error [...]
This rely on the underlying library to handler properly allocations. But sometime, it's not the case, so I think it would be safer to have a code which does not use dynamic allocation when handling dynamic allocation errors. Just a matter of logic.
in other cases, the error check we have to do after is the same (xdlopen ?)
I think this function has the same issue than the open function. We have an uniform API with struct error, and we don't use it since we just say "load library error". We would like to call dlerror here, to get more context.
unhandled errors
The close case by example, which is not checked.
First, you don't get to abort a program from a shared library
yes, that's true...
Same applies to stderr
Right, I agree on that. But couldn't we have a better way to output error log ?
should we rewrite the whole thing ?
This is clearly a clumsy over-statement. Do you think there is some part we could rework here ?
from libnvidia-container.
Again you do not lose anything, if you look at how the error functions work, they append the system error string to the input message (similar to err
) and they store the error code. dlerror
is called in the case of dso errors.
close is not checked because it's pointless if you look at where it's used.
There are already logging facilities implemented, you can easily get comprehensive logs with the environment variable NVC_DEBUG_FILE
.
It's clearly not perfect but the points you're making are not really valid as is. Error handling in a reentrant library can be tricky, if you're curious you can check how others do it in the industry, for example libtls has a very similar approach.
from libnvidia-container.
if you look at how the error functions work
oops, missed the error_vset wrapper, see that.
close is not checked
logging facilities
ok
Ok right, see your point.
from libnvidia-container.
Related Issues (20)
- libnvidia-container ubuntu22.04/amd64 HOT 4
- libnvidia_container fails to compile with mold HOT 3
- Issue in permissions checking in nvcgo/internal/cgroup/ebpf.go ? HOT 2
- nvidia-container-runtime segfault HOT 2
- sudo yum install -y nvidia-container-toolkit failed - No such device
- nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
- Warning of Key is stored in legacy trusted.gpg keyring HOT 2
- Unprivileged `nvidia-container-cli --user configure`
- ldconfig-free deployment
- Unable to use more than 5 GPU cards HOT 2
- Building libnvidia-container 1.14.5 builds 1.14.4 HOT 19
- nvidia-container-cli: mount error: failed to add device rules: unable to generate new device filter program from existing programs: unable to create new device filters program: load program: invalid argument: 0: (69) r2 = *(u16 *)(r1 +0)
- Trouble Running NVIDIA GPU Containers on Custom Yocto-Based Distro on HPE Server with NVIDIA A40 GPU HOT 5
- How to mirror this Nvidia libnividia rmp repo with artifactory rpm repo HOT 1
- versions.mk and common.mk use PATCH variable for different things
- Support for Ubuntu 24.04 HOT 3
- Error linking when the library version on the host is lower than that in the image
- Setting up nvidia drivers to work with docker container HOT 1
- Ubuntu22.04 make err HOT 1
- Broken link on Unsupported Distribution github page HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from libnvidia-container.