Giter Club home page Giter Club logo

Comments (12)

MikeIndiaAlpha avatar MikeIndiaAlpha commented on July 29, 2024 1

Some progress. Change described above triggers the inefficiency, but if the hw_type is not propagated into encoder.c (for example by replacing encoder.c:241 octx->hw_type = ictx->hw_type; by octx->hw_type = 0;), then everything is fine again. I am now investigating what kind of change hw_type causes in encoder.c

from lpms.

MikeIndiaAlpha avatar MikeIndiaAlpha commented on July 29, 2024 1

OK, so I pushed the commit that sits on top of the branch and seems to work. Also performance seems indeed a bit better than in previous perf commit.

from lpms.

cyberj0g avatar cyberj0g commented on July 29, 2024

With fuzzy error gone now, the performance issue is compressed down to a single (revert) commit. Prior to this commit, livepeer_bench performance is degraded, but code closely matches master branch. Most likely, it is related to re-opening the encoder for each segment in transcoder.c:180:

if (!h->initialized || ((AV_HWDEVICE_TYPE_NONE == octx->hw_type || AV_HWDEVICE_TYPE_MEDIACODEC == octx->hw_type )&& !ictx->transmuxing)) {
  printf("open_output.... h->initialized %d\n", h->initialized);
  ret = open_output(octx, ictx);
  if (ret < 0) LPMS_ERR(transcode_cleanup, "Unable to open output");
  if (ictx->transmuxing) {
    octx->oc->flags |= AVFMT_FLAG_FLUSH_PACKETS;
    octx->oc->flush_packets = 1;
  }
  continue;
}

Despite hw_type-dependent logic in multiple places of encoder.c, transcoder.c and decoder.c, the actual value of this flag is never populated from params structure in open_video_decoder, except for CUDA mode (decoder.c:256):

    if (params->hw_type == AV_HWDEVICE_TYPE_CUDA) {
      // First set the hw device then set the hw frame
      ret = av_hwdevice_ctx_create(&ctx->hw_device_ctx, params->hw_type, params->device, NULL, 0);
      if (ret < 0) LPMS_ERR(open_decoder_err, "Unable to open hardware context for decoding")
-->      ctx->hw_type = params->hw_type;
      vc->hw_device_ctx = av_buffer_ref(ctx->hw_device_ctx);
      vc->get_format = get_hw_pixfmt;
    }

That's why no useful information on how AV_HWDEVICE_TYPE_MEDIACODEC (which is associated with Netint hardware) supposed to be handled can be extracted from the code, and one has to follow trial and error path, when enabling the propagation. Furthermore, there are other changes in 'high-performance' version, which are applied both in software and Netint mode. Handling Netint as software codec on master branch causes 'No streams to mux' error which needs to be investigated separately.

Next steps

Current version on jai/netint branch is compatible with master go-livepeer. To merge Netint support into master, following sub-tasks need to be completed:

  1. Enable propagation of hw_type to input_ctx and output_ctx for all values
  2. Test livepeer_bench with CUDA and software codec to make sure there are no issues or performance regression
  3. Replicate logic from good commit for Netint codec, keeping in mind that all AV_HWDEVICE_TYPE_MEDIACODEC checks in the commit are meaningless because parameter value does not propagate to input_ctx and output_ctx
  4. Test software, CUDA, Netint modes noting the performance

from lpms.

yondonfu avatar yondonfu commented on July 29, 2024

Overall I think these next steps make sense, but I have a few questions:

Enable propagation of hw_type to input_ctx and output_ctx for all values

Do we still need to make sure there are checks on hw_type to ensure that the only allowable values correspond to hardware that LPMS supports? i.e. AV_HWDEVICE_TYPE_MEDIACODEC (for NETINT), AV_HWDEVICE_TYPE_CUDA (for Nvidia)

Replicate logic from good commit for Netint codec, keeping in mind that all AV_HWDEVICE_TYPE_MEDIACODEC checks in the commit are meaningless because parameter value does not propagate to input_ctx and output_ctx

Do the AV_HWDEVICE_TYPE_MEDIACODEC checks go away once we enable propgation of hw_type to input_ctx and output_ctx?

Also is it possible for LPMS to handle all hardware acceleration in the same way right now (i.e. using flushing, keeping decoder + encoder sessions open, etc.) so that Nvidia + NETINT use the same code paths? Or is there additional NETINT specific logic that we would need to have which would require checking if hw_type is AV_HW_DEVICE_TYPE_MEDIACODEC?

from lpms.

cyberj0g avatar cyberj0g commented on July 29, 2024

Do the AV_HWDEVICE_TYPE_MEDIACODEC checks go away once we enable propgation of hw_type to input_ctx and output_ctx?
Also is it possible for LPMS to handle all hardware acceleration in the same way right now

Unlikely, there seem to be a subtle difference between how CUDA and NETINT should be handled, especially in the latest version, that's what still needs to be investigated.

from lpms.

cyberj0g avatar cyberj0g commented on July 29, 2024

To clarify the issue a bit further:

  • good commit 'works', which means livepeer_bench reports Real-Time Duration Ratio (RTDR) less than 1 for 46 concurrent sessions
  • bad commit doesn't work => RTDR > 1 @ 46 sessions
  • we actually need all changes from bad commit + performance issue fixed to further work on merging the NETINT support into master
  • please use tip of the branch for fixes, there are few other changes on top of bad=>good commit, which doesn't cause performance issue.

Most details of that as well as very likely cause are above.
@MikeIndiaAlpha happy to answer any questions and provide access to the server with hardware attached once you are ready.

from lpms.

MikeIndiaAlpha avatar MikeIndiaAlpha commented on July 29, 2024

For the record: I am working on this issue. I did some quick checks, including testing the Ivan's idea that hw_type is not set correctly. This proved not to be the case, in transcoder.c I can see correct value (11) being set in both input and output contexts. I can also see that hw_type is set for these context unconditionally, at least in the "bad" commit (febc41, "Keep encoder and decoder open..."). So unless I misunderstood what Ivan wrote, issue is somewhere else. I'll try to analyse the changes more systematically now.

from lpms.

cyberj0g avatar cyberj0g commented on July 29, 2024

@MikeIndiaAlpha good insights, but I think there's misunderstanding here, because I'm sure hw_type does not propagate correctly from input_params struct into input_ctx in 'good' commit (the tip of the branch). It propagates only for CUDA. The merge of this branch with master was attempted before performance was fixed back by reverting to old version of these files (took some efforts to isolate the changes), so things are in reverse on jai/netint branch.

from lpms.

MikeIndiaAlpha avatar MikeIndiaAlpha commented on July 29, 2024

@cyberj0g Oh, it is my bad then. What I am doing is moving between good and bad commits and checking the differences, in order to isolate the performance issue. Obviously, if you meant the tip of the branch that doesn't apply at all. Still, I have some progress, and have some guesses now. Will check them soon.

from lpms.

cyberj0g avatar cyberj0g commented on July 29, 2024

@MikeIndiaAlpha the issue is clearly isolated in specific commit mentioned above, I just suggested using the tip of the branch for changes, so we won't need to rebase the trivial commits which are after performance issue fixing commit.

from lpms.

MikeIndiaAlpha avatar MikeIndiaAlpha commented on July 29, 2024

@cyberj0g Yep, I am aware of that. My commit (once I have it) will go on top of all other changes.
But right now I am still digging to find out what the exact issue might be. I now think I understand what you have in mind wrt hw_type. The misunderstanding before was because I was looking at the "bad" commit (I know that it is bad, just was trying to reason which of the removed parts of it may trigger inefficiency), and what you have in mind is "good" commit.

I've made one more observation: restoring just one line (ctx->hw_type = params->hw_type; decoder.c:282) in a "good" commit triggers inefficiency back. Now I am hoping that I can carefully follow the changes in the flow of the program triggered by this change, and maybe find the root cause.

from lpms.

MikeIndiaAlpha avatar MikeIndiaAlpha commented on July 29, 2024

@cyberj0g Looked at this today - since #318 is merged and closed I guess that this can be closed, too?

from lpms.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.