Comments (12)
Some progress. Change described above triggers the inefficiency, but if the hw_type
is not propagated into encoder.c (for example by replacing encoder.c:241 octx->hw_type = ictx->hw_type;
by octx->hw_type = 0;
), then everything is fine again. I am now investigating what kind of change hw_type causes in encoder.c
from lpms.
OK, so I pushed the commit that sits on top of the branch and seems to work. Also performance seems indeed a bit better than in previous perf commit.
from lpms.
With fuzzy error gone now, the performance issue is compressed down to a single (revert) commit. Prior to this commit, livepeer_bench
performance is degraded, but code closely matches master
branch. Most likely, it is related to re-opening the encoder for each segment in transcoder.c:180
:
if (!h->initialized || ((AV_HWDEVICE_TYPE_NONE == octx->hw_type || AV_HWDEVICE_TYPE_MEDIACODEC == octx->hw_type )&& !ictx->transmuxing)) {
printf("open_output.... h->initialized %d\n", h->initialized);
ret = open_output(octx, ictx);
if (ret < 0) LPMS_ERR(transcode_cleanup, "Unable to open output");
if (ictx->transmuxing) {
octx->oc->flags |= AVFMT_FLAG_FLUSH_PACKETS;
octx->oc->flush_packets = 1;
}
continue;
}
Despite hw_type
-dependent logic in multiple places of encoder.c
, transcoder.c
and decoder.c
, the actual value of this flag is never populated from params
structure in open_video_decoder
, except for CUDA mode (decoder.c:256):
if (params->hw_type == AV_HWDEVICE_TYPE_CUDA) {
// First set the hw device then set the hw frame
ret = av_hwdevice_ctx_create(&ctx->hw_device_ctx, params->hw_type, params->device, NULL, 0);
if (ret < 0) LPMS_ERR(open_decoder_err, "Unable to open hardware context for decoding")
--> ctx->hw_type = params->hw_type;
vc->hw_device_ctx = av_buffer_ref(ctx->hw_device_ctx);
vc->get_format = get_hw_pixfmt;
}
That's why no useful information on how AV_HWDEVICE_TYPE_MEDIACODEC
(which is associated with Netint hardware) supposed to be handled can be extracted from the code, and one has to follow trial and error path, when enabling the propagation. Furthermore, there are other changes in 'high-performance' version, which are applied both in software and Netint mode. Handling Netint as software codec on master branch causes 'No streams to mux' error which needs to be investigated separately.
Next steps
Current version on jai/netint
branch is compatible with master go-livepeer
. To merge Netint support into master, following sub-tasks need to be completed:
- Enable propagation of
hw_type
toinput_ctx
andoutput_ctx
for all values - Test
livepeer_bench
with CUDA and software codec to make sure there are no issues or performance regression - Replicate logic from good commit for Netint codec, keeping in mind that all
AV_HWDEVICE_TYPE_MEDIACODEC
checks in the commit are meaningless because parameter value does not propagate toinput_ctx
andoutput_ctx
- Test software, CUDA, Netint modes noting the performance
from lpms.
Overall I think these next steps make sense, but I have a few questions:
Enable propagation of hw_type to input_ctx and output_ctx for all values
Do we still need to make sure there are checks on hw_type
to ensure that the only allowable values correspond to hardware that LPMS supports? i.e. AV_HWDEVICE_TYPE_MEDIACODEC
(for NETINT), AV_HWDEVICE_TYPE_CUDA
(for Nvidia)
Replicate logic from good commit for Netint codec, keeping in mind that all AV_HWDEVICE_TYPE_MEDIACODEC checks in the commit are meaningless because parameter value does not propagate to input_ctx and output_ctx
Do the AV_HWDEVICE_TYPE_MEDIACODEC
checks go away once we enable propgation of hw_type
to input_ctx
and output_ctx
?
Also is it possible for LPMS to handle all hardware acceleration in the same way right now (i.e. using flushing, keeping decoder + encoder sessions open, etc.) so that Nvidia + NETINT use the same code paths? Or is there additional NETINT specific logic that we would need to have which would require checking if hw_type
is AV_HW_DEVICE_TYPE_MEDIACODEC
?
from lpms.
Do the AV_HWDEVICE_TYPE_MEDIACODEC checks go away once we enable propgation of hw_type to input_ctx and output_ctx?
Also is it possible for LPMS to handle all hardware acceleration in the same way right now
Unlikely, there seem to be a subtle difference between how CUDA and NETINT should be handled, especially in the latest version, that's what still needs to be investigated.
from lpms.
To clarify the issue a bit further:
- good commit 'works', which means
livepeer_bench
reports Real-Time Duration Ratio (RTDR) less than 1 for 46 concurrent sessions - bad commit doesn't work => RTDR > 1 @ 46 sessions
- we actually need all changes from bad commit + performance issue fixed to further work on merging the NETINT support into master
- please use tip of the branch for fixes, there are few other changes on top of bad=>good commit, which doesn't cause performance issue.
Most details of that as well as very likely cause are above.
@MikeIndiaAlpha happy to answer any questions and provide access to the server with hardware attached once you are ready.
from lpms.
For the record: I am working on this issue. I did some quick checks, including testing the Ivan's idea that hw_type
is not set correctly. This proved not to be the case, in transcoder.c I can see correct value (11) being set in both input and output contexts. I can also see that hw_type is set for these context unconditionally, at least in the "bad" commit (febc41, "Keep encoder and decoder open..."). So unless I misunderstood what Ivan wrote, issue is somewhere else. I'll try to analyse the changes more systematically now.
from lpms.
@MikeIndiaAlpha good insights, but I think there's misunderstanding here, because I'm sure hw_type
does not propagate correctly from input_params
struct into input_ctx
in 'good' commit (the tip of the branch). It propagates only for CUDA. The merge of this branch with master was attempted before performance was fixed back by reverting to old version of these files (took some efforts to isolate the changes), so things are in reverse on jai/netint
branch.
from lpms.
@cyberj0g Oh, it is my bad then. What I am doing is moving between good and bad commits and checking the differences, in order to isolate the performance issue. Obviously, if you meant the tip of the branch that doesn't apply at all. Still, I have some progress, and have some guesses now. Will check them soon.
from lpms.
@MikeIndiaAlpha the issue is clearly isolated in specific commit mentioned above, I just suggested using the tip of the branch for changes, so we won't need to rebase the trivial commits which are after performance issue fixing commit.
from lpms.
@cyberj0g Yep, I am aware of that. My commit (once I have it) will go on top of all other changes.
But right now I am still digging to find out what the exact issue might be. I now think I understand what you have in mind wrt hw_type. The misunderstanding before was because I was looking at the "bad" commit (I know that it is bad, just was trying to reason which of the removed parts of it may trigger inefficiency), and what you have in mind is "good" commit.
I've made one more observation: restoring just one line (ctx->hw_type = params->hw_type;
decoder.c:282) in a "good" commit triggers inefficiency back. Now I am hoping that I can carefully follow the changes in the flow of the program triggered by this change, and maybe find the root cause.
from lpms.
@cyberj0g Looked at this today - since #318 is merged and closed I guess that this can be closed, too?
from lpms.
Related Issues (20)
- TestTranscoder_DetectionFreq failing in CI HOT 4
- Single IDR frame segment as input HOT 19
- Undefined constant or missing '(' in 'high' error when transcoding certain segments using NETINT HOT 12
- Same number of frames should be in transcoded output HOT 2
- Failing test - potential DNN filter issue HOT 1
- TestTranscoder_FormatOptions fails due to .flv muxer not being available HOT 1
- TestTranscoder_StreamCopyAndDrop fails because of .md5 "muxer" not being available. HOT 1
- There are some audio missed segments during transcoding HOT 3
- B gets some broken mpeg7 hash files when comparison of two hashes HOT 2
- Document HW session re-initialization behavior when switching from no audio to audio segment HOT 1
- P3 - Make sure LPMS works without video HOT 4
- Scaling bugs when using HW encoder HOT 4
- `GetCodecInfo()` returns resolution of 0x0 HOT 2
- Only mark CUDA_ERROR_ILLEGAL_ADDRESS errors as unrecoverable errors HOT 3
- P2 - Add docs for running transcode test using failed segment bucket
- P2 - Clear all segments that can be transcoded from the livepeer-production-failed-transcodes bucket HOT 1
- Add rotation step after decoder and before outputs video filtergrapth HOT 3
- LPMS_USE_NEW_TRANSCODE places sentinel frames in output HOT 1
- Transcoder produces corrupt frames HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lpms.