Giter Club home page Giter Club logo

Comments (8)

tommedema avatar tommedema commented on July 20, 2024

The retry class as a whole is very questionable / naive. For example:
Screen Shot 2021-03-13 at 7 25 24 PM

"Just toss it"? No, this is a case that should never occur-- it is clearly worth raising an exception for. It's a little scary to use something like this in production since this means that companies are going to drop production events without knowing it. There's no way to catch these cases because you're simply swallowing them.

At the current stage I'd advocate putting a big beta flag on this SDK and warn against using it in production

from amplitude-node.

kelvin-lu avatar kelvin-lu commented on July 20, 2024

Thanks @tommedema for the feedback. Sorry about the friction you've experienced using the SDK, we'll be taking it to make improvements.

As for RetryClass, it's exported as part of @amplitude/types but it definitely can be exported as well, as well as the default transport. We can create a BaseRetry class that has utilities to make this easier as well and you can extend it.

No, this is a case that should never occur-- it is clearly worth raising an exception for.

The goals of the lines around error 400 for events send against the HTTP v2 API is to find and remove the events that caused the API to raise an invalid payload response, like this utility that's used to prune out events. This case is for an invalid payload of length 1 - in which case it would not make sense to retry afterwards as it means this event would not ever be successfully uploaded.

Is await client.logEvent() and await client.flush() not returning Response objects with error codes on them? We could also throw them as exceptions, but the errors are not entirely gone since the same response being passed to onEventsError is also returned in sendEventsWithRetry

from amplitude-node.

kelvin-lu avatar kelvin-lu commented on July 20, 2024

Additionally, it seems like the retry logic is currently ignoring the most common kind of throttling.

Right, we don't read from throttledUsers or throttledDevices - what were your plans around using these two? If the user/device is not in the exceededDaily... fields, we retry the events on loop after seconds pause, since the throttle window is a second.

from amplitude-node.

tommedema avatar tommedema commented on July 20, 2024

The goals of the lines around error 400 for events send against the HTTP v2 API is to find and remove the events that caused the API to raise an invalid payload response, like this utility that's used to prune out events. This case is for an invalid payload of length 1 - in which case it would not make sense to retry afterwards as it means this event would not ever be successfully uploaded.

This doesn't address my point. You're talking about an invalid state, yet you're not throwing any exceptions. This requires basic error handling rather than simply swallowing these errors. I don't disagree that you can't retry them, but you shouldn't swallow the errors either. This should throw an exception so the consumer can handle them.

from amplitude-node.

tommedema avatar tommedema commented on July 20, 2024

Is await client.logEvent() and await client.flush() not returning Response objects with error codes on them? We could also throw them as exceptions, but the errors are not entirely gone since the same response being passed to onEventsError is also returned in sendEventsWithRetry

Perhaps I'm not explaining myself correctly. The fact that you return a response doesn't help if that response is not a good representation of what actually happened. For example if I log an event without a userId, this is clearly an application error and I should know about this, yet you simply remove this event in _pruneEvents and then return a success response.

I.e. you say it's successful yet clearly it was not because you just lost an event.

from amplitude-node.

tommedema avatar tommedema commented on July 20, 2024

Right, we don't read from throttledUsers or throttledDevices - what were your plans around using these two? If the user/device is not in the exceededDaily... fields, we retry the events on loop after seconds pause, since the throttle window is a second.

In the docs it says you should retry after 30 seconds in these cases:

You should pause sending events for that user / device for a period of 30 seconds before retrying and continue retrying until you no longer receive a 429 response.

from amplitude-node.

kelvin-lu avatar kelvin-lu commented on July 20, 2024

Perhaps I'm not explaining myself correctly. The fact that you return a response doesn't help if that response is not a good representation of what actually happened. For example if I log an event without a userId, this is clearly an application error and I should know about this, yet you simply remove this event in _pruneEvents and then return a success response.

The client itself should return a SKIPPED response in newer versions, but I agree that this is dangerous without that layer. We will take this out ASAP from _pruneEvents because it can make it very confusing to debug

from amplitude-node.

tommedema avatar tommedema commented on July 20, 2024

Perhaps I'm not explaining myself correctly. The fact that you return a response doesn't help if that response is not a good representation of what actually happened. For example if I log an event without a userId, this is clearly an application error and I should know about this, yet you simply remove this event in _pruneEvents and then return a success response.

The client itself should return a SKIPPED response in newer versions, but I agree that this is dangerous without that layer. We will take this out ASAP from _pruneEvents because it can make it very confusing to debug

Sure, do note that you're doing this in many other places too. For example here, here, here, here, here, here, etc.

We ended up replacing the entire class with something that doesn't retry but at least bubbles up what is going wrong:

https://gist.github.com/tommedema/04de4004ee5b0f17fcb3fd29b2ad92b1

This then allows the consumer to actually retry or throw an application error:

https://gist.github.com/tommedema/0f80ae4c7334e6531c56c598643f0eb7

from amplitude-node.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.