Giter Club home page Giter Club logo

Comments (12)

TMRh20 avatar TMRh20 commented on May 27, 2024

From my experience, the symptoms you identify usually point to a power supply issue. The RPi specifications list 50ma max current draw from the 3.3v pins, so if you are using a module (ie: LNA + PA) that draws more, it is likely you just don't have enough current for the radio module to operate, so it resets.

As far as the infinite looping, I look at it this way:

  1. Actual hardware failures will be very few and far between. After many years of using many types of modules, I have yet to see one actually fail.
  2. In your specific case, if the radio has reset due to drawing too much current, the worst thing to do is keep resetting the radio and attempting to get it to function with obvious hardware issues. This is why many computer systems force users to shut down when a hardware failure is encountered. It is usually best to stop everything and fix the hardware issue.

In an extremely mission critical or high availability situation, I can see how having a timeout period would allow error checking and reporting of hardware failures, but again most users should never encounter the issue, and with an Arduino/RPi, I see it potentially causing more harm than good.

from rf24.

sven337 avatar sven337 commented on May 27, 2024

On 2014-05-15 16:34, TMRh20 wrote:

From my experience, the symptoms you identify usually point to a power
supply issue. The RPi specifications list 50ma max current draw from
the 3.3v pins, so if you are using a module (ie: LNA + PA) that draws
more, it is likely you just don't have enough current for the radio
module to operate, so it resets.

I don't have an external amplifier on this module so I doubt the current
limits of the Pi are exceeded. I'll double check. Thanks for the
suggestion!

As far as the infinite looping, I look at it this way:

  1. Actual hardware failures will be very few and far between. After
    many years of using many types of modules, I have yet to see one
    actually fail.

Well, I'm apparently seeing one :)

This
is why many computer systems force users to shut down when a hardware
failure is encountered. It is usually best to stop everything and fix
the hardware issue.

This is not an accurate statement. As long as the hardware can be made
to work, most device drivers will do their best to, and if they can't,
they'll report an error. Infinite loop or crashes are never expected
behavior.

In an extremely mission critical or high availability situation, I can
see how having a timeout period would allow error checking and
reporting of hardware failures, but again most users should never
encounter the issue, and with an Arduino/RPi, I see it potentially
causing more harm than good.

I just encountered it. Error reporting is important, you can't have a
lib that goes into an infinite loop. Imagine I'm running a Pi a RF24
module and your lib, no network, and no keyboard. If a stray finger or
whatever disconnects the module, your lib will enter an infinite loop,
and there will be no way to fix that short of electrically resetting the
Pi.
Your changes to the original lib claim better performance and better
reliability. Don't enter infinite loops and you'll have reliability :)

Thanks

from rf24.

TMRh20 avatar TMRh20 commented on May 27, 2024

Well, I'm apparently seeing one :)

You keep saying you have encountered a hardware failure. I would argue that the cause has not been shown to be anything more than user error in wiring or configuration of hardware, which should not be addressed by software.

This is not an accurate statement. As long as the hardware can be made
to work, most device drivers will do their best to, and if they can't,
they'll report an error. Infinite loop or crashes are never expected
behavior.

I guess you have never seen the error message " "Windows has been shut down to prevent damage to your computer."

The point is that it depends on the type of error and potential outcomes. My statement was meant to explain that the current functionality is designed to protect your hardware. With the timeout added, the radio could constantly be reset and keep overloading the circuits for hours on end, potentially causing additional hardware issues or actual failures.

I just encountered it. Error reporting is important, you can't have a
lib that goes into an infinite loop. Imagine I'm running a Pi a RF24
module and your lib, no network, and no keyboard. If a stray finger or
whatever disconnects the module, your lib will enter an infinite loop,
and there will be no way to fix that short of electrically resetting the
Pi.

If you disconnect a module, you have to reconnect it and restart the program. That is all, regardless of whether there is a timeout or not, the fix is exactly the same.

Your changes to the original lib claim better performance and better
reliability. Don't enter infinite loops and you'll have reliability :)

Definition of reliability:

  1. Capable of being relied on; dependable: a reliable assistant; a reliable car.
  2. Yielding the same or compatible results in different clinical experiments or statistical trials.

Reliability engineering, the ability of a system or component to perform its required functions under stated conditions for a specified period of time.

The stated conditions of operation do not include hardware issues. Fix the hardware.

from rf24.

sven337 avatar sven337 commented on May 27, 2024

On 2014-05-15 17:54, TMRh20 wrote:

Well, I'm apparently seeing one :)

You keep saying you have encountered a hardware failure.

I encountered a situation where your driver gets into an infinite loop,
when it is not required to do that, and can easily be fixed not to
behave like that. That's what is relevant.

This is not an accurate statement. As long as the hardware can be made
to work, most device drivers will do their best to, and if they can't,
they'll report an error. Infinite loop or crashes are never expected
behavior.

I guess you have never seen the error message " "Windows has been shut
down to prevent damage to your computer."

I have, and it was my job to investigate those cases and make it so the
driver doesn't trigger that the next time. And I'm telling you: drivers
attempt to avoid that situation. Your statement was simply not true,
computer systems do not fail as long as they can work.

The point is that it depends on the type of error and potential
outcomes. My statement was meant to explain that the current
functionality is designed to protect your hardware. With the timeout
added, the radio could constantly be reset and keep overloading the
circuits for hours on end, potentially causing additional hardware
issues or actual failures.

I don't buy that justification at all. With a timeout, you can detect
the error and actually report that there is a problem. Resetting the
radio is just one example of something that can be done (and that would
be perfectly valid if you're running from say a supercap and a solar
panel), other possibilities that make more sense are to send a message,
light a LED, ... Without a timeout, you just can't detect that there's a
problem.

If you disconnect a module, you have to reconnect it and restart the
program. That is all, regardless of whether there is a timeout or not,
the fix is exactly the same.

That is not true. With a timeout, you could query the radio after the
timeout occurs and notice that it's not there (reading all 1s) or has
resetted, and you can then handle the error situation without human
intervention. Without a timeout, your program is stuck, and the final
solution has reliability issues. I'm not sure why you're quoting a
dictionary?

The stated conditions of operation do not include hardware issues. Fix
the hardware.

I don't question that the hardware needs to be fixed, still, I'd very
much rather the driver allowed me to detect hardware issue. A timeout is
a simple thing to add, you've removed it in your patch with a comment
that doesn't really justify doing so. I'm not sure why you so vehemently
oppose adding a timeout, which is very standard behavior in all drivers
and libraries on the planet.

You're doing the right thing by not clearing the interrupt flags in the
wrong place, so I figure you must know what you're doing. How come,
then, that I end up having to argue for a one liner that prevents
infinite loops as happened to one of your users? Timeouts are a standard
reliability feature.

from rf24.

TMRh20 avatar TMRh20 commented on May 27, 2024

Your arguments have so many flaws, contradictions and holes in them, this is going nowhere, so I'll keep it simple.

Demonstrate how these issues can be detected, resolved, and the user notified without any human intervention, and I will include a timeout.

from rf24.

sven337 avatar sven337 commented on May 27, 2024

I've argued enough about it. I've seen my share of stupid arguments in open source but you're probably arguing one of the least defendable changes I've ever seen as a driver developer. I can't work with you when you're of bad faith.
You're contributing to the sorry state of the RF24 environment, which is already pitiful. That's too bad, because you've done otherwise good work.

Anyhow, failure detection is trivial:

  • attempt write
  • if attempt failed, check MAX_RT
  • if MAX_RT not reached, potential hardware issue to investigate
  • check if radio channel has reset to 0x02 instead of whatever was configured, if yes, we had a hardware reset (you can double check with other registers, it's the easiest thing on the planet)
  • check if various registers all read as 0xFF, if yes, we had a hardware disconnect
  • in case of hardware issue, notify error by toggling a GPIO PIN or sending a message over Wifi or whatever

Resolving the problem in the "reset" case is just about re-running radio.begin(). It will magically recover (until the next issue) - unless the lib prevents it for no good reason.

from rf24.

TMRh20 avatar TMRh20 commented on May 27, 2024

Do you really think calling my arguments "stupid" and calling all the work I and fellow contributors have done "pitiful" is the proper way to handle the situation?

You could simply wait for others to weigh in. I'm sure that if I am that far off base, many other users will share your opinion.

from rf24.

henrikekblad avatar henrikekblad commented on May 27, 2024

Woah... some hard words here! Calm it down.

Sven, couldn't you create a Pull Request with your suggestions?

from rf24.

mannkind avatar mannkind commented on May 27, 2024

Woah -- With such a trivial change, I'm sure you could implement it (without issues, bugs, and remaining backwards compatible) sven.

from rf24.

zephyrr avatar zephyrr commented on May 27, 2024

I hear two philosophies of error handling at odds here:

TMRh20: "My statement was meant to explain that the current functionality is designed to protect your hardware. With the timeout added, the radio could constantly be reset and keep overloading the circuits for hours on end, potentially causing additional hardware issues or actual failures."

Sven337: "With a timeout, you could query the radio after the timeout occurs and notice that it's not there (reading all 1s) or has resetted, and you can then handle the error situation without human
intervention"

Both views may make sense in some use cases. Can we reach a compromise which allows the user to chose the approach that makes the most sense to their situation?

How about an OPTIONAL timeout which attempts to put the radio in a safe state (eg: power down) and sets a disable flag that inhibits restarting until manually cleared. That should protect the hardware from accidental repeated resets, one of TMRh20's concerns. The library caller can however clear the disable flag and restart the radio if that makes sense for their use case, that is they have the option to consciously over-ride the default protective approach TMRh20 advocates.

Perhaps even conditionally compile this option.

I'd rather that than have the code base forked just to add timeouts for some users. This repository has the makings of one nRF24L01+ library for most purposes. ATMega Arduinos, ATtiny, Due, Teensy and RPi support, fast and flexible - and still improving.

Up to you, TMRh20, of course.

from rf24.

TMRh20 avatar TMRh20 commented on May 27, 2024

Since we are talking philosophy here, mine is that this is open-source and I am no dictator, I try to be more of quality control and developement.
Even if I am completely opposed to a change, I will approve it if there is general support for it OR if I can be convinced of its necessity. I will even code it myself if there is enough user support for it. Please remember that I do not profit from any of this, (although one contributor was awesome enough to send me a nice little NRF24 PCB!) the goal is just to make things better in the RF24 world. I didn't say no, but I still don't know what the best solution is.

That being said, as always, you do make a pretty convincing argument.

from rf24.

TMRh20 avatar TMRh20 commented on May 27, 2024

Ok, I've finally made some changes to the code that I hope will suit all users and potential scenarios. I'm still not sure this is the best solution, so feel free to make suggestions. Functionality and reasoning are as follows:

  1. I added failure handling/timeouts as optional for a couple of reasons. Again, not 100% sure this is best, but I think it is, mainly for simplicity and just to keep code size down.
  2. Enable by un-commenting #define FAILURE_HANDLING in RF24_config.h
  3. When a failure is occurring the following actions will now be taken:
    a: Writes will return 0
    b: Set radio.failureDetected = 1
    c: If debug is enabled, error message "FAIL_DETECT" will be printed
  4. To recover from an error:
if(radio.failureDetected == 1){ 
   radio.failureDetected = 0;
   printf("FAILURE_DETECTED\n\r");
   radio.begin();    
   radio.openWritingPipe(pipes[0]);
   radio.openReadingPipe(1,pipes[1]);
}

5, Any other previously configured settings should be re-configured

Notes:
a) This may not catch all errors. If a reset occurs very quickly, or between sends, configuration will be lost, but the radio may not timeout.
b) Due to a), the failureDetected boolean is of limited value. It may be best for users to poll a known configuration value as well, to detect all errors and timeouts.

This issue will be left open for a while in case further changes are needed.

from rf24.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.