Comments (16)
It looks like same issue in #1795
When trying to open media driver unsuccessfully, then media driver will exit abnormally since failing to handle error. This should be fixed in media driver.
In your case, monitor connected to an iGPU that media driver doesn't support, so we expect it exits media driver normally without any crash.
from media-driver.
Please paste the crash log message if you have it. And the crash happens in media driver, libva or vpl rt?
from media-driver.
Does either vainfo
or vpl-inspect
command crash?
If not, and you do not have backtrace [1] pointing crash to be originating from one of the Intel media libraries, this looks more like OBS not supporting multiple VAAPI drivers listing GPUs (i.e. you should file it against OBS).
[1] Some alternatives for getting backtraces:
- easy, but lots of output:
strace -f -k <app>
- 30x slowdown, but much better backtraces:
valgrind <app>
- allows further debugging: start
gdb
, andrun
application from it
(On most distros, above tools come from packages with the same name.)
from media-driver.
I apologize for not responding sooner. Only today did I have an opportunity to give this the time and attention needed. I ran all the requested tests, and I am attaching the output for the various tests. It did crash when I ran vpl-inspect. Oddly enough, it did not crash when I ran OBS via Valgrind. Therefore I am not attaching anything from valgrind. I also am unable to attach the output from strace because the log file is about 46 MiB in size.
gdb.txt
vainfo.txt
valgrind.txt
vpl-inspect.txt
from media-driver.
I unintentionally attached a file for valgrind above, but it was just the console output from OBS. Here is the actual Valgrind output after properly directing it to a log file.
from media-driver.
Thanks! From OBS Gdb backtrace it can be seen that it's not OBS bug:
double free or corruption (!prev)
Thread 1 "obs" received signal SIGABRT, Aborted.
0x00007ffff4aaea9c in ?? () from /usr/lib64/libc.so.6
(gdb) bt full
#0 0x00007ffff4aaea9c in ??? () at /usr/lib64/libc.so.6
...
#7 0x00007ffff4abd253 in free () at /usr/lib64/libc.so.6
#8 0x00007fffba037d37 in ??? () at /usr/lib64/va/drivers/iHD_drv_video.so
#9 0x00007fffb9e824ca in ??? () at /usr/lib64/va/drivers/iHD_drv_video.so
#10 0x00007fffba59a56b in ??? () at /usr/lib64/va/drivers/iHD_drv_video.so
#11 0x00007fffb9ffae26 in __vaDriverInit_1_21 () at /usr/lib64/va/drivers/iHD_drv_video.so
#12 0x00007ffff49fb7e2 in vaInitialize () at /usr/lib64/libva.so.2
This happening both when vpl-inspect
and obs
initialize VA-API, but not when vainfo
does is a bit weird.
As this can be seen also with vpl-inspect
, which is simpler, and Intel-only stack, that's best way for demonstrating the issue.
from media-driver.
Valgrind output shows both the intial alloc and the extra free to happen within media-driver (not in VA-API library):
==30669== Invalid read of size 8
==30669== at 0x35ABBFC5: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669== by 0x35ABBD01: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669== by 0x359064C9: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669== by 0x3601E56A: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669== by 0x35A7EE25: __vaDriverInit_1_21 (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669== by 0xA6EC7E1: vaInitialize (in /usr/lib64/libva.so.2.2100.0)
...
==30669== Address 0x14f88930 is 1,600 bytes inside a block of size 63,544 free'd
==30669== at 0x484395F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==30669== by 0x35D4A275: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669== by 0x35A3E27D: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669== by 0x35ABBC08: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669== by 0x359064C9: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669== by 0x3601E56A: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669== by 0x35A7EE25: __vaDriverInit_1_21 (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669== by 0xA6EC7E1: vaInitialize (in /usr/lib64/libva.so.2.2100.0)
...
==30669== Block was alloc'd at
==30669== at 0x4847E43: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==30669== by 0x35ABBB28: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669== by 0x359064C9: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669== by 0x3601E56A: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669== by 0x35A7EE25: __vaDriverInit_1_21 (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669== by 0xA6EC7E1: vaInitialize (in /usr/lib64/libva.so.2.2100.0)
==30669== by 0x37C523CE: check_adapter(void*, char const*, unsigned int) (in /usr/lib64/obs-plugins/obs-qsv11.so)
What I assume to have happened in this case is that libva calls media driver init twice, on one of the calls media driver recognizes the GPU, on another, it does not, and it does not handle that correctly.
This is a bit odd as media driver is tested and works on setups with multiple Intel dGPUs, so multiple init calls succeeding should not be a problem.
from media-driver.
I was able to reproduce the vpl-inspect
abort with following stack:
- libva: 2.20.0
- GMMlib: intel-gmmlib-22.3.17
- Media: intel-media-24.1.2
But not any more with a newer one than yours:
- libva: 2.21.0
- GMMlib: intel-gmmlib-22.3.19
- Media: intel-media-24.2.1
=> @PaddyMac Could you try upgrading?
This was on server with a non-Intel iGPU and several Intel dGPUs:
# head -3 /sys/class/drm/card?/device/uevent
==> /sys/class/drm/card0/device/uevent <==
DRIVER=ast
PCI_CLASS=30000
PCI_ID=1A03:2000
==> /sys/class/drm/card1/device/uevent <==
DRIVER=i915
PCI_CLASS=38000
PCI_ID=8086:56C0
==> /sys/class/drm/card2/device/uevent <==
DRIVER=i915
PCI_CLASS=38000
PCI_ID=8086:56C0
Note: ast
iGPU offers only card0
DRM node, not renderXXX
one.
When testing with HadesCanyon (KabyLake) NUC that had in same package both Intel KBL iGPU and AMD Vega dGPU:
# head -3 /sys/class/drm/card?/device/uevent
==> /sys/class/drm/card0/device/uevent <==
DRIVER=i915
PCI_CLASS=38000
PCI_ID=8086:591B
==> /sys/class/drm/card1/device/uevent <==
DRIVER=amdgpu
PCI_CLASS=30000
PCI_ID=1002:694C
Issue was not reproducible. I guess the reason was Intel GPU being before non-Intel one.
=> @PaddyMac If you cannot upgrade the driver, or upgrade does not help, could you try whether double free goes away if you have Intel GPU before AMD one?
from media-driver.
=> @PaddyMac If you cannot upgrade the driver, or upgrade does not help, could you try whether double free goes away if you have Intel GPU before AMD one?
Ah, read the bug description closer. Bug is triggered by i7-3770S being IvyBridge, i.e. having Intel iGPU that is not supported by this Intel driver project (only by legacy i965 media driver), i.e. same case as non-Intel GPU. And as it's iGPU, if it's enabled, it will be the first found GPU => one will need a fixed media-driver.
from media-driver.
It looks like same issue in #1795
That one happens also with media driver v24.2.1, whereas in my testing this issue seems to be fixed in that release, see #1789 (comment).
@PaddyMac Can you confirm whether your issue is fixed by upgrading to v2.4.1 (or newer) media driver: https://github.com/intel/media-driver/tags ?
from media-driver.
Most probably the culprit code is
int mos_get_device_id(int fd, uint32_t *deviceId)
{
int device_type = mos_query_device_type(fd);
if (DEVICE_TYPE_I915 == device_type)
{
return mos_get_dev_id_i915(fd, deviceId);
}
#ifdef ENABLE_XE_KMD
else if (DEVICE_TYPE_XE == device_type)
{
return mos_get_dev_id_xe(fd, deviceId);
}
#endif
return -ENODEV;
}
here return a negative value which is used in
if (mos_get_device_id(fd, &devId))
{
MOS_OS_ASSERTMESSAGE("Failed to get the chipset id\n");
return MOS_STATUS_INVALID_HANDLE;
}
so this branch will never be bit...
I don't have systems with both intel GPU and non-intel GPU.
You may try this pull request #1805
If it won't help, very appreciated if gdb crash log from release-internal build is provided. (build with
-DBUILD_TYPE=release-internal)
from media-driver.
@MicroYY
Driver returning error code here doesn't matter since it opens an unsupported igpu. Problem is that driver could not exit with all resource released and sw stack crash happens in media driver. In this case, application may have no chance to handle error.
from media-driver.
Auto Created VSMGWL-73921 for further analysis.
from media-driver.
I don't have systems with both intel GPU and non-intel GPU.
@MicroYY This bug is reported for a setup with only Intel GPUs. Both of them supported by Intel KMD, but first one not being supported by the media driver.
You can create similar setup by using older GEN host with iGPU and building media driver version where support for that GEN is disabled. Or by having 2 different dGPUs, assigning first one to Xe KMD and second to i915, and building media driver without Xe KMD support.
from media-driver.
I guess I reproduced it on a adl+dg2 system via vpl-inspect
...
vpl-inspect
opens iGPU first and then dGPU. When querying engines, to be exact, vebox engine, of dGPU, no vebox is got from Ioctl
. Media driver will crash with double free because of vebox context creation failure.
Modified code of mos_bufmgr_get_driver_info
:
drvInfo->hasVebox = 0;
retValue = 0;
if (mos_get_param(fd, I915_PARAM_HAS_VEBOX, &retValue))
{
drvInfo->hasVebox = !!retValue;
printf("fd %d has vebox %u\n", fd, retValue);
}
from media-driver.
I investigated more and here is a summary. The double free issue is reproduceable on i+d system where dGPU vebox engine cannot be queried from KMD. In such case vebox context on dGPU cannot be created and some allocations were double-freed by media driver.
A right expectation is vebox being queried from KMD and context being created successfully on dGPU. KMD could have some issues on exposing hw engine. Nevertheless, media driver should handle whatever cases and return either 0 or error code.
Updated #1805 to fix double free crash.
from media-driver.
Related Issues (20)
- [Bug]: ENABLE_XE_KMD not active by default. HOT 2
- [Bug]: Gstreamer link is dead HOT 2
- [Bug]: build error in Ubuntu 24.04 HOT 2
- [Bug]: Transcoding using jellyfin-ffmpeg5 causes a system crash HOT 5
- [Bug]: double free abort in media workloads when given KMD uAPI support is not enabled in media driver HOT 2
- [Bug]: Gstreamer vaapipostproc element converts ARGB format incorrectly HOT 1
- [Bug]: media-driver 24.2.2 fails to compile with gcc-14.1 HOT 3
- [Bug]: compile 23.2.4 in ubuntu 16.04 failed HOT 2
- [Bug]: 10bit RT_FORMAT are contains in VP9 profile 0 and 1 encoding HOT 2
- [Bug]: Arc A750 (DG2) shows segfault or bus error with raw video from GStreamer ximagesrc (works fine with AMD) HOT 8
- [Bug]: vainfo shows iHD_drv_video init failed HOT 4
- [Bug]: h264Encode sample test from libra-utils uses IDR frame as ref picture all the time HOT 1
- [Bug]: musl build fails HOT 3
- [Bug]: VAAPI in Container with WSL2 Windows not working
- [Bug]: Unable to use EncTools on DG1 HOT 7
- `master` build + gcc14.1 FAILs @ undecl syms `VPHAL_VEBOX_{RGB32,AYUV,RGB64`; tag `intel-media-24.2.4` OK HOT 9
- [Bug]: Build failure on Synology DSM 7.1 using spksrc HOT 1
- [Feature]: support I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS_CC
- [Bug]: GPU Hang on decoding stream HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from media-driver.