Issue livepeer_bench from current

OK, so I pushed <a href="https://github.com/livepeer/lpms/commit/5d47719ef96bcec5a8696

To clarify the issue a bit further: <a href="https://github.co

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Transcoding performance issue on specialized hardware about lpms HOT 12 CLOSED

cyberj0g commented on July 29, 2024

Transcoding performance issue on specialized hardware

from lpms.

Comments (12)

MikeIndiaAlpha commented on July 29, 2024 1

Some progress. Change described above triggers the inefficiency, but if the hw_type is not propagated into encoder.c (for example by replacing encoder.c:241 octx->hw_type = ictx->hw_type; by octx->hw_type = 0;), then everything is fine again. I am now investigating what kind of change hw_type causes in encoder.c

from lpms.

MikeIndiaAlpha commented on July 29, 2024 1

OK, so I pushed the commit that sits on top of the branch and seems to work. Also performance seems indeed a bit better than in previous perf commit.

from lpms.

cyberj0g commented on July 29, 2024

With fuzzy error gone now, the performance issue is compressed down to a single (revert) commit. Prior to this commit, livepeer_bench performance is degraded, but code closely matches master branch. Most likely, it is related to re-opening the encoder for each segment in transcoder.c:180:

if (!h->initialized || ((AV_HWDEVICE_TYPE_NONE == octx->hw_type || AV_HWDEVICE_TYPE_MEDIACODEC == octx->hw_type )&& !ictx->transmuxing)) {
  printf("open_output.... h->initialized %d\n", h->initialized);
  ret = open_output(octx, ictx);
  if (ret < 0) LPMS_ERR(transcode_cleanup, "Unable to open output");
  if (ictx->transmuxing) {
    octx->oc->flags |= AVFMT_FLAG_FLUSH_PACKETS;
    octx->oc->flush_packets = 1;
  }
  continue;
}

Despite hw_type-dependent logic in multiple places of encoder.c, transcoder.c and decoder.c, the actual value of this flag is never populated from params structure in open_video_decoder, except for CUDA mode (decoder.c:256):

    if (params->hw_type == AV_HWDEVICE_TYPE_CUDA) {
      // First set the hw device then set the hw frame
      ret = av_hwdevice_ctx_create(&ctx->hw_device_ctx, params->hw_type, params->device, NULL, 0);
      if (ret < 0) LPMS_ERR(open_decoder_err, "Unable to open hardware context for decoding")
-->      ctx->hw_type = params->hw_type;
      vc->hw_device_ctx = av_buffer_ref(ctx->hw_device_ctx);
      vc->get_format = get_hw_pixfmt;
    }

That's why no useful information on how AV_HWDEVICE_TYPE_MEDIACODEC (which is associated with Netint hardware) supposed to be handled can be extracted from the code, and one has to follow trial and error path, when enabling the propagation. Furthermore, there are other changes in 'high-performance' version, which are applied both in software and Netint mode. Handling Netint as software codec on master branch causes 'No streams to mux' error which needs to be investigated separately.

Next steps

Current version on jai/netint branch is compatible with master go-livepeer. To merge Netint support into master, following sub-tasks need to be completed:

Enable propagation of hw_type to input_ctx and output_ctx for all values
Test livepeer_bench with CUDA and software codec to make sure there are no issues or performance regression
Replicate logic from good commit for Netint codec, keeping in mind that all AV_HWDEVICE_TYPE_MEDIACODEC checks in the commit are meaningless because parameter value does not propagate to input_ctx and output_ctx
Test software, CUDA, Netint modes noting the performance

from lpms.

yondonfu commented on July 29, 2024

Overall I think these next steps make sense, but I have a few questions:

Enable propagation of hw_type to input_ctx and output_ctx for all values

Do we still need to make sure there are checks on hw_type to ensure that the only allowable values correspond to hardware that LPMS supports? i.e. AV_HWDEVICE_TYPE_MEDIACODEC (for NETINT), AV_HWDEVICE_TYPE_CUDA (for Nvidia)

Replicate logic from good commit for Netint codec, keeping in mind that all AV_HWDEVICE_TYPE_MEDIACODEC checks in the commit are meaningless because parameter value does not propagate to input_ctx and output_ctx

Do the AV_HWDEVICE_TYPE_MEDIACODEC checks go away once we enable propgation of hw_type to input_ctx and output_ctx?

Also is it possible for LPMS to handle all hardware acceleration in the same way right now (i.e. using flushing, keeping decoder + encoder sessions open, etc.) so that Nvidia + NETINT use the same code paths? Or is there additional NETINT specific logic that we would need to have which would require checking if hw_type is AV_HW_DEVICE_TYPE_MEDIACODEC?

from lpms.

cyberj0g commented on July 29, 2024

Do the AV_HWDEVICE_TYPE_MEDIACODEC checks go away once we enable propgation of hw_type to input_ctx and output_ctx?
Also is it possible for LPMS to handle all hardware acceleration in the same way right now

Unlikely, there seem to be a subtle difference between how CUDA and NETINT should be handled, especially in the latest version, that's what still needs to be investigated.

from lpms.

cyberj0g commented on July 29, 2024

To clarify the issue a bit further:

good commit 'works', which means livepeer_bench reports Real-Time Duration Ratio (RTDR) less than 1 for 46 concurrent sessions
bad commit doesn't work => RTDR > 1 @ 46 sessions
we actually need all changes from bad commit + performance issue fixed to further work on merging the NETINT support into master
please use tip of the branch for fixes, there are few other changes on top of bad=>good commit, which doesn't cause performance issue.

Most details of that as well as very likely cause are above.
@MikeIndiaAlpha happy to answer any questions and provide access to the server with hardware attached once you are ready.

from lpms.

MikeIndiaAlpha commented on July 29, 2024

For the record: I am working on this issue. I did some quick checks, including testing the Ivan's idea that hw_type is not set correctly. This proved not to be the case, in transcoder.c I can see correct value (11) being set in both input and output contexts. I can also see that hw_type is set for these context unconditionally, at least in the "bad" commit (febc41, "Keep encoder and decoder open..."). So unless I misunderstood what Ivan wrote, issue is somewhere else. I'll try to analyse the changes more systematically now.

from lpms.

cyberj0g commented on July 29, 2024

@MikeIndiaAlpha good insights, but I think there's misunderstanding here, because I'm sure hw_type does not propagate correctly from input_params struct into input_ctx in 'good' commit (the tip of the branch). It propagates only for CUDA. The merge of this branch with master was attempted before performance was fixed back by reverting to old version of these files (took some efforts to isolate the changes), so things are in reverse on jai/netint branch.

from lpms.

MikeIndiaAlpha commented on July 29, 2024

@cyberj0g Oh, it is my bad then. What I am doing is moving between good and bad commits and checking the differences, in order to isolate the performance issue. Obviously, if you meant the tip of the branch that doesn't apply at all. Still, I have some progress, and have some guesses now. Will check them soon.

from lpms.

cyberj0g commented on July 29, 2024

@MikeIndiaAlpha the issue is clearly isolated in specific commit mentioned above, I just suggested using the tip of the branch for changes, so we won't need to rebase the trivial commits which are after performance issue fixing commit.

from lpms.

MikeIndiaAlpha commented on July 29, 2024

@cyberj0g Yep, I am aware of that. My commit (once I have it) will go on top of all other changes.
But right now I am still digging to find out what the exact issue might be. I now think I understand what you have in mind wrt hw_type. The misunderstanding before was because I was looking at the "bad" commit (I know that it is bad, just was trying to reason which of the removed parts of it may trigger inefficiency), and what you have in mind is "good" commit.

I've made one more observation: restoring just one line (ctx->hw_type = params->hw_type; decoder.c:282) in a "good" commit triggers inefficiency back. Now I am hoping that I can carefully follow the changes in the flow of the program triggered by this change, and maybe find the root cause.

from lpms.

MikeIndiaAlpha commented on July 29, 2024

@cyberj0g Looked at this today - since #318 is merged and closed I guess that this can be closed, too?

from lpms.

Transcoding performance issue on specialized hardware about lpms HOT 12 CLOSED

Comments (12)

Next steps

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent