Giter Club home page Giter Club logo

Comments (14)

MushMal avatar MushMal commented on July 24, 2024

@masda70 there are number of issues that I would like to cover.

  • 0x32000002 coming from the Producer SDK indicates that your media pipeline (encoder) has produced a frame with a decoding or presentation timestamp which is less than the fragment start timestamp. This is rather odd and I haven't seen this before. Possible issue with the encoder. Suggest to update the Pi firmware and reboot. Please let us know if this persists or how often do you get it. It's also possible that the RPi gets CPU loads which block the media pipeline/encoder and possibly causes a clock drift - not sure.
  • The log snippet you've attached indicates an INVALID_MKV_DATA error ACK returned from the backend. This could happen in case the SDK buffer gets overrun and the tail frames which are partially sent get dropped. Need to ensure you have good connectivity. Can you capture the metrics output from the logs before these errors? It can tell us what's the buffer depth when we get the dropped frames.
  • The curl error could happen when the C++ SDK doesn't "detect" a dropped connection after the error. We have done a lot of fixes in the latest SDK updates. Please update to the latest and let us know if it persists.

from amazon-kinesis-video-streams-producer-sdk-cpp.

runtheops avatar runtheops commented on July 24, 2024

Hi guys,

Facing a somewhat similar issues here on latest SDK.
I'm using demo gstreamer rtsp producer for testing, multiple instances of it streaming each into a corresponding Kinesis Video stream.
One of the main requirements for the whole installation is sustainability, that is I'm generally OK with having producer process occasionally dead if it can't keep pushing streams properly, that at least can be trivially detected, a process can be restarted and whoever it may concern - notified.
What I'm facing though is that in a long run, producer fells into eternal loop of
viewItemRemoved(): Reporting a dropped frame/fragment ...

One of the latest cases of producers being stuck is directly related to a temporary network outage, there's a bunch of errors like
ERROR - curl perform failed for url https://kinesisvideo.eu-west-1.amazonaws.com/getDataEndpoint with result Timeout was reached: Resolving timed out after 5517 milliseconds
preceding that eternal loop. The thing is though network got back up in a few minutes, producers never did recover up until they were manually killed and restarted.
Following is a CloudWatch PutMedia.IncomingBytes graph for 3 streams for this particular case.
streams_dead

I do also occasionally get INVALID_MKV_DATA error, but that one appears to be detected, and producer tries to restart the stream.

And though in this case with network outage all of the streams died simultaneously, I had a few occasions where one or a couple of them stopped working randomly with the same warnings, while others survived. Will try and catch that moment as well.

from amazon-kinesis-video-streams-producer-sdk-cpp.

MushMal avatar MushMal commented on July 24, 2024

@runtheops - very cool post indeed! Thanks. Here are some comments that hopefully will shed some light.

  • The applications are just sample applications - you can extend or reuse the code in your application. They are not handling higher-level recovery logic.
  • You can have a single producer object but multiple streams - this way you can optimize for the storage as it's shared across the streams.
  • You can configure your buffer duration to store longer duration - based on your scenario and the available storage.
  • There is a latency pressure callback/notification that gets triggered before the frames are dropped. Ideally, your application reacts to the latency pressure (configurable) to avoid dropped frames. Dropped frames will produce INVALID_MKV_DATA error which will be recoverable but you might risk loosing older frames. Say, you have a buffer duration of 3 minutes. You can configure your max latency to be 2 minutes so you will get notification when the buffer duration reaches 2 minutes but before the frames are dropped. This way you can alarm, drop frames at the source, drop resolution/frame rate to reduce the density of the stream, drop other streams to allow higher priority streams to go through, etc...
  • There is also staleness notification - this happens when the backend application layer ACKs are not received for a given time - this is an indication that the stream is progressing but the backend is not receiving the frames. Could possibly indicate an edge or the LB dropping the packets silently.
  • We will be publishing a new API shortly to "pulse" the stream - instead of tearing down the stream and re-creating it will introduce a reset which will still allow for a new connection to be acquired and the existing buffer to be streamed.

The producer stream has a state machine that knows what state to go to on which error/condition and knows the retries count with progressive back-offs. In case of multi-minute timeout, the likely case is the state machine goes back to either DescribeStream state or GetDataEndpoint state where it times out MAX retry number of times and gives up. This is the case when the producer simply "fails hard" and returns an error. The error will be bubbled up through various channels - most likely in your case it will be putFrame API fall failure. The reason for the error being bubbled up through the putFrame is due to the fact that the producer is "lifeless" - it doesn't have any threads and utilizes the caller's threads to push the work. There are two sources of "lifeness" - the media pipeline which calls putFrame and the networking threads calling in with the network IO results.

The sample applications do not handle the putFrame errors - they simply are just samples. They also don't do much when the callbacks like dropped frame callback get fired - they simply print out the info and do nothing.

We would love to hear back from you and others on the fail/recovery cases and see how much we can optimize the behavior.

from amazon-kinesis-video-streams-producer-sdk-cpp.

runtheops avatar runtheops commented on July 24, 2024

@MushMal thank you for the quick and meaningful reply, this definitely helps a lot. I mean like, it probably isn't even possible to do a better job explaining the SDK behavior than you did.

Following is just my perception and general thoughts on the samples thing, given an experience with them so far: since samples are built on top of gstreamer (great idea, btw), which itself is capable of handling so many use cases with trivial modifications of the pipeline and which there are numerous example and so many questions answered for - it is relatively easy for a general audience to pick the sample and evolve it to something meaningful from the gstreamer perspective. But not so much from Kinesis SDK perspective of it, since even basic error handling (like the case with network outage) requires digging in SDK behavior, states transition and errors popping paths.
That said, it appears really valuable for me to have a solid (from common errors handling perspective) Kinesis-specific sample(s). Parsing, decoding/encoding, muxing/demuxing and elsehow manipulating a/v streams is a more or less general knowledge, and Kinesis Video is more of a niche one, and this is where samples are priceless.

from amazon-kinesis-video-streams-producer-sdk-cpp.

MushMal avatar MushMal commented on July 24, 2024

@runtheops we hear you. In fact, we will continue working on both the samples and the SDK to make it even easier to integrate. The philosophy we take is that the higher layers should be extremely simple to use even without too much investment into it. The core layer is super flexible and can be used in a small footprint device integration scenarios - like firmware or less known platforms.

As you noticed, it's extremely powerful to be able to plug into an established media pipeline like gstreamer. We also have a sample application that integrates with Android's media pipeline. In both of these cases we don't currently do a great job in demonstrating how to recover from hard-failures, etc... We will work on it.

from amazon-kinesis-video-streams-producer-sdk-cpp.

masda70 avatar masda70 commented on July 24, 2024

@MushMal: I've upgraded to the latest RPi firmware and the latest version of this repository. My main issue (the infinite "dropped frame" problem) hasn't been fixed. For the other issues, I can't tell.
However, @runtheops has put me on the right track, the "dropped frame" problem I encounter is due to network instability. It's easy to reproduce by taking the network down for a couple of minutes. The buffer then starts growing but before it blows, curl throws a timeout error, followed by:

DEBUG - Connection for Kinesis Video stream: raspberrypi closed.
INFO - Network thread for Kinesis Video stream: raspberrypi with upload handle: 0 exited. http status: 0
WARN - Stream for raspberrypi has exited without triggering end-of-stream. Service call result: 599

kinesisVideoStreamTerminated(): Stream terminated event.DEBUG - describeStreamHandler invoked

Once the network is back, the application indefinitely throws "Dropped frame!".

Thank you @runtheops ! I'm interested in a definite solution to this problem. In the meantime, if you know of any workaround, please let me know. Perhaps a simple script that detects when the application goes down and the restarts it every 30 seconds or so? Is any of the above log lines sufficient to say the application is broken?

from amazon-kinesis-video-streams-producer-sdk-cpp.

MushMal avatar MushMal commented on July 24, 2024

@masda70 these are real-time low-latency solutions and as such your real application should respond accordingly to the pressures. In your case, the streaming is at or below the capacity of the observed bandwidth (if I remember it correct but you could paste some of the metrics output from the logs I could double check). You should try to reduce the density of the frame rate.

The SDK provides the callbacks/notifications for various latency/pressures and it's up-to the application to respond to them before the frames are dropped.

The sample application currently doesn't do much - it just prints out some information. We will enhance this application to do something better to illustrate how a real commercial application would respond to this situations. For example, it could pause the media pipeline, or, reduce the frame rate at the source or resolution.

from amazon-kinesis-video-streams-producer-sdk-cpp.

Shumakriss avatar Shumakriss commented on July 24, 2024

@runtheops @masda70 Are either of you able to view your stream either from the AWS console or from the parser library examples? I am seeing a similar dropped frames issue with Raspberry Pi (not a Zero) and I am trying to determine if we have the same issue. Thanks!

from amazon-kinesis-video-streams-producer-sdk-cpp.

runtheops avatar runtheops commented on July 24, 2024

@masda70, I don't think it makes sense to invest time into a workaround, the only right way to do this is the one @MushMal pointed out - implement proper error handling on top of what SDK provides. Authors will eventually implement it in samples, by their approach might unnecessarily work correctly for you particular use case, there's just to many of em. If (better say once) we come up with something internally though, I'll def share it.

@Shumakriss, yes, can see em all. I had a few occasions where I couldn't see the stream, but that was cos it never really made it to Kinesis, for various reasons. If you share some more details, logs specifically - we'll figure it out.

from amazon-kinesis-video-streams-producer-sdk-cpp.

Shumakriss avatar Shumakriss commented on July 24, 2024

Thanks @runtheops, I actually do not see anything. My Java parser crashes almost immediately and after some time, my producer starts dropping all of its frames. I think I am dealing with a separate issue: why the bit of data I send causes the parser to crash and why it starts dropping frames so early. However, once I figure that out, if I am still dropping frames I will look to this thread. It's been helpful to see how things are handled!

from amazon-kinesis-video-streams-producer-sdk-cpp.

MushMal avatar MushMal commented on July 24, 2024

@masda70 , we will try to publish something as soon as possible. In the interim, you can play with it by pulsing the stream, stopping, tearing down and re-creating it in case of dropped frames for example.

@Shumakriss can I ask you to open a separate issue in the consumer parser library project so it can be tracked properly.

Please close this issue if it's resolved and open a separate ones so we can focus one-by-one

from amazon-kinesis-video-streams-producer-sdk-cpp.

masda70 avatar masda70 commented on July 24, 2024

Small update: I tried modifying the sample app code to trigger the restart of the stream after a network timeout happens. However, my understanding of the code was insufficient to properly write such logic. When such an error happens, the streamer retries 5 times then enters a state from I do not know recover from. Even after changing the maximum number of retries to infinite, the streamer is unable to continue. Because of this, I implemented workaround. My current code triggers a g_main_loop_quit when a stream error happens, and the app is run by a script which restarts the application if it exits in such way. I confess it is not a pretty solution, but in my current application long network outages are unfrequent enough that I don't need to recover the missing data frames (which could go over a minute). For the reference, here is the modified SampleStreamCallbackProvider code

class SampleStreamCallbackProvider : public StreamCallbackProvider {
public:
    SampleStreamCallbackProvider(CustomData* data);
[...]
    StreamErrorReportFunc getStreamErrorReportCallback() override {
        if(data->main_loop == NULL){
            LOG_AND_THROW("Failed to create Kinesis Video Stream.");
        }
        g_main_loop_quit(data->main_loop);
        return streamErrorReportHandler;
    };

[...]

private:
    CustomData* data;

[...]
};

[...]

SampleStreamCallbackProvider::SampleStreamCallbackProvider(CustomData* data) : data(data){
    
}

The callback provider is now created by passing the CustomData object.

    unique_ptr<StreamCallbackProvider> stream_callback_provider = make_unique<SampleStreamCallbackProvider>(data);

from amazon-kinesis-video-streams-producer-sdk-cpp.

MushMal avatar MushMal commented on July 24, 2024

@masda70 sorry for the delay. I don't have all the details about main_loop but here are few observations/comments.

  • As the core of the SDK is "live-less" - aka doesn't have it's own threads, it can bubble up errors through the calling path - it can be putFrame call during the run-time or one of the events going in. Checking the errors and acting accordingly is a good idea. For example, you could check the result of the putFrame call.
  • The error callback will be fired on stream error - sometimes the dropped frame can cause INVALID_MKD_DATA error.
  • There is a stream latency pressure, stream staleness and dropped frame callbacks. You can handle these callbacks - for example, on stream pressure you can print and get statistics and perhaps "pulse" the stream by calling resetConnection() API to re-start the current connection without touching the buffer - this can help if the connection is somehow stale.
  • You can handle the dropped frame callback too, this is the case that the reset connection didn't help and you might want to entirely drop the stream and re-create without affecting other streams (if your producer streams multiple streams). In this case, for example, you could stop the stream, then, you need to free the stream after which you can re-create the stream and re-start streaming.

You could, however, like in your case, drop the entire application and re-start too - just too harsh in my opinion.

from amazon-kinesis-video-streams-producer-sdk-cpp.

MushMal avatar MushMal commented on July 24, 2024

Closing for now. Please re-open if the issue still stands

from amazon-kinesis-video-streams-producer-sdk-cpp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.