Giter Club home page Giter Club logo

Comments (16)

Jexu avatar Jexu commented on July 17, 2024 1

It looks like same issue in #1795

When trying to open media driver unsuccessfully, then media driver will exit abnormally since failing to handle error. This should be fixed in media driver.

In your case, monitor connected to an iGPU that media driver doesn't support, so we expect it exits media driver normally without any crash.

from media-driver.

Jexu avatar Jexu commented on July 17, 2024

Please paste the crash log message if you have it. And the crash happens in media driver, libva or vpl rt?

from media-driver.

eero-t avatar eero-t commented on July 17, 2024

Does either vainfo or vpl-inspect command crash?

If not, and you do not have backtrace [1] pointing crash to be originating from one of the Intel media libraries, this looks more like OBS not supporting multiple VAAPI drivers listing GPUs (i.e. you should file it against OBS).

[1] Some alternatives for getting backtraces:

  • easy, but lots of output: strace -f -k <app>
  • 30x slowdown, but much better backtraces: valgrind <app>
  • allows further debugging: start gdb, and run application from it

(On most distros, above tools come from packages with the same name.)

from media-driver.

PaddyMac avatar PaddyMac commented on July 17, 2024

I apologize for not responding sooner. Only today did I have an opportunity to give this the time and attention needed. I ran all the requested tests, and I am attaching the output for the various tests. It did crash when I ran vpl-inspect. Oddly enough, it did not crash when I ran OBS via Valgrind. Therefore I am not attaching anything from valgrind. I also am unable to attach the output from strace because the log file is about 46 MiB in size.

gdb.txt
vainfo.txt
valgrind.txt
vpl-inspect.txt

from media-driver.

PaddyMac avatar PaddyMac commented on July 17, 2024

I unintentionally attached a file for valgrind above, but it was just the console output from OBS. Here is the actual Valgrind output after properly directing it to a log file.

valgrind.txt

from media-driver.

eero-t avatar eero-t commented on July 17, 2024

Thanks! From OBS Gdb backtrace it can be seen that it's not OBS bug:

double free or corruption (!prev)

Thread 1 "obs" received signal SIGABRT, Aborted.
0x00007ffff4aaea9c in ?? () from /usr/lib64/libc.so.6
(gdb) bt full
#0  0x00007ffff4aaea9c in ??? () at /usr/lib64/libc.so.6
...
#7  0x00007ffff4abd253 in free () at /usr/lib64/libc.so.6
#8  0x00007fffba037d37 in ??? () at /usr/lib64/va/drivers/iHD_drv_video.so
#9  0x00007fffb9e824ca in ??? () at /usr/lib64/va/drivers/iHD_drv_video.so
#10 0x00007fffba59a56b in ??? () at /usr/lib64/va/drivers/iHD_drv_video.so
#11 0x00007fffb9ffae26 in __vaDriverInit_1_21 () at /usr/lib64/va/drivers/iHD_drv_video.so
#12 0x00007ffff49fb7e2 in vaInitialize () at /usr/lib64/libva.so.2

This happening both when vpl-inspect and obs initialize VA-API, but not when vainfo does is a bit weird.

As this can be seen also with vpl-inspect, which is simpler, and Intel-only stack, that's best way for demonstrating the issue.

from media-driver.

eero-t avatar eero-t commented on July 17, 2024

Valgrind output shows both the intial alloc and the extra free to happen within media-driver (not in VA-API library):

==30669== Invalid read of size 8
==30669==    at 0x35ABBFC5: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669==    by 0x35ABBD01: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669==    by 0x359064C9: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669==    by 0x3601E56A: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669==    by 0x35A7EE25: __vaDriverInit_1_21 (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669==    by 0xA6EC7E1: vaInitialize (in /usr/lib64/libva.so.2.2100.0)
...
==30669==  Address 0x14f88930 is 1,600 bytes inside a block of size 63,544 free'd
==30669==    at 0x484395F: free (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==30669==    by 0x35D4A275: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669==    by 0x35A3E27D: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669==    by 0x35ABBC08: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669==    by 0x359064C9: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669==    by 0x3601E56A: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669==    by 0x35A7EE25: __vaDriverInit_1_21 (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669==    by 0xA6EC7E1: vaInitialize (in /usr/lib64/libva.so.2.2100.0)
...
==30669==  Block was alloc'd at
==30669==    at 0x4847E43: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==30669==    by 0x35ABBB28: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669==    by 0x359064C9: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669==    by 0x3601E56A: ??? (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669==    by 0x35A7EE25: __vaDriverInit_1_21 (in /usr/lib64/va/drivers/iHD_drv_video.so)
==30669==    by 0xA6EC7E1: vaInitialize (in /usr/lib64/libva.so.2.2100.0)
==30669==    by 0x37C523CE: check_adapter(void*, char const*, unsigned int) (in /usr/lib64/obs-plugins/obs-qsv11.so)

What I assume to have happened in this case is that libva calls media driver init twice, on one of the calls media driver recognizes the GPU, on another, it does not, and it does not handle that correctly.

This is a bit odd as media driver is tested and works on setups with multiple Intel dGPUs, so multiple init calls succeeding should not be a problem.

from media-driver.

eero-t avatar eero-t commented on July 17, 2024

I was able to reproduce the vpl-inspect abort with following stack:

  • libva: 2.20.0
  • GMMlib: intel-gmmlib-22.3.17
  • Media: intel-media-24.1.2

But not any more with a newer one than yours:

  • libva: 2.21.0
  • GMMlib: intel-gmmlib-22.3.19
  • Media: intel-media-24.2.1

=> @PaddyMac Could you try upgrading?

This was on server with a non-Intel iGPU and several Intel dGPUs:

# head -3 /sys/class/drm/card?/device/uevent
==> /sys/class/drm/card0/device/uevent <==
DRIVER=ast
PCI_CLASS=30000
PCI_ID=1A03:2000

==> /sys/class/drm/card1/device/uevent <==
DRIVER=i915
PCI_CLASS=38000
PCI_ID=8086:56C0

==> /sys/class/drm/card2/device/uevent <==
DRIVER=i915
PCI_CLASS=38000
PCI_ID=8086:56C0

Note: ast iGPU offers only card0 DRM node, not renderXXX one.

When testing with HadesCanyon (KabyLake) NUC that had in same package both Intel KBL iGPU and AMD Vega dGPU:

# head -3 /sys/class/drm/card?/device/uevent
==> /sys/class/drm/card0/device/uevent <==
DRIVER=i915
PCI_CLASS=38000
PCI_ID=8086:591B

==> /sys/class/drm/card1/device/uevent <==
DRIVER=amdgpu
PCI_CLASS=30000
PCI_ID=1002:694C

Issue was not reproducible. I guess the reason was Intel GPU being before non-Intel one.

=> @PaddyMac If you cannot upgrade the driver, or upgrade does not help, could you try whether double free goes away if you have Intel GPU before AMD one?

from media-driver.

eero-t avatar eero-t commented on July 17, 2024

=> @PaddyMac If you cannot upgrade the driver, or upgrade does not help, could you try whether double free goes away if you have Intel GPU before AMD one?

Ah, read the bug description closer. Bug is triggered by i7-3770S being IvyBridge, i.e. having Intel iGPU that is not supported by this Intel driver project (only by legacy i965 media driver), i.e. same case as non-Intel GPU. And as it's iGPU, if it's enabled, it will be the first found GPU => one will need a fixed media-driver.

from media-driver.

eero-t avatar eero-t commented on July 17, 2024

It looks like same issue in #1795

That one happens also with media driver v24.2.1, whereas in my testing this issue seems to be fixed in that release, see #1789 (comment).

@PaddyMac Can you confirm whether your issue is fixed by upgrading to v2.4.1 (or newer) media driver: https://github.com/intel/media-driver/tags ?

from media-driver.

MicroYY avatar MicroYY commented on July 17, 2024

Most probably the culprit code is

int mos_get_device_id(int fd, uint32_t *deviceId)

int mos_get_device_id(int fd, uint32_t *deviceId)
{
    int device_type = mos_query_device_type(fd);

    if (DEVICE_TYPE_I915 == device_type)
    {
        return mos_get_dev_id_i915(fd, deviceId);
    }
#ifdef ENABLE_XE_KMD
    else if (DEVICE_TYPE_XE == device_type)
    {
        return mos_get_dev_id_xe(fd, deviceId);
    }
#endif
    return -ENODEV;
}

here return a negative value which is used in

if (mos_get_device_id(fd, &devId))

if (mos_get_device_id(fd, &devId))
{
    MOS_OS_ASSERTMESSAGE("Failed to get the chipset id\n");
    return MOS_STATUS_INVALID_HANDLE;
}

so this branch will never be bit...

I don't have systems with both intel GPU and non-intel GPU.
You may try this pull request #1805
If it won't help, very appreciated if gdb crash log from release-internal build is provided. (build with
-DBUILD_TYPE=release-internal)

from media-driver.

Jexu avatar Jexu commented on July 17, 2024

@MicroYY
Driver returning error code here doesn't matter since it opens an unsupported igpu. Problem is that driver could not exit with all resource released and sw stack crash happens in media driver. In this case, application may have no chance to handle error.

from media-driver.

intel-mediadev avatar intel-mediadev commented on July 17, 2024

Auto Created VSMGWL-73921 for further analysis.

from media-driver.

eero-t avatar eero-t commented on July 17, 2024

I don't have systems with both intel GPU and non-intel GPU.

@MicroYY This bug is reported for a setup with only Intel GPUs. Both of them supported by Intel KMD, but first one not being supported by the media driver.

You can create similar setup by using older GEN host with iGPU and building media driver version where support for that GEN is disabled. Or by having 2 different dGPUs, assigning first one to Xe KMD and second to i915, and building media driver without Xe KMD support.

from media-driver.

MicroYY avatar MicroYY commented on July 17, 2024

I guess I reproduced it on a adl+dg2 system via vpl-inspect...
vpl-inspect opens iGPU first and then dGPU. When querying engines, to be exact, vebox engine, of dGPU, no vebox is got from Ioctl. Media driver will crash with double free because of vebox context creation failure.

Capture2
Modified code of mos_bufmgr_get_driver_info:

    drvInfo->hasVebox = 0;
    retValue = 0;
    if (mos_get_param(fd, I915_PARAM_HAS_VEBOX, &retValue))
    {
        drvInfo->hasVebox = !!retValue;
        printf("fd %d has vebox %u\n", fd, retValue);
    }

from media-driver.

MicroYY avatar MicroYY commented on July 17, 2024

I investigated more and here is a summary. The double free issue is reproduceable on i+d system where dGPU vebox engine cannot be queried from KMD. In such case vebox context on dGPU cannot be created and some allocations were double-freed by media driver.
A right expectation is vebox being queried from KMD and context being created successfully on dGPU. KMD could have some issues on exposing hw engine. Nevertheless, media driver should handle whatever cases and return either 0 or error code.
Updated #1805 to fix double free crash.

from media-driver.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.