Giter Club home page Giter Club logo

Comments (8)

ginoledesma avatar ginoledesma commented on July 29, 2024

Thanks for letting me know. Out of curiosity, how many panels do you have? I’ve seen similar behavior from someone who has 30+ panels.

out of curiosity, does it resolve on its own, whether by some automatic resolution (such as out of memory) or does it requires manual intervention like a reboot? I can imagine reducing the scrape frequency would just delay this.

Officially, SunPower has stopped “supporting” this API access, so I wouldn’t be surprised if they’re gonna disable it in a future build.

I’m looking at alternate ways to capture the data — whether tapping into the micro inverter stream itself or the connection back upstream.

from sunpower-pvs-exporter.

aqua avatar aqua commented on July 29, 2024

16 panels. A relationship between inverter count and the memory load occurred to me too, especially because the DeviceList and Stop commands both tend to take ~4-6 seconds under normal conditions. I'm holding out a faint hope that the bug arises from fragile coding in the PVS6 service interface around session start/stop, and that a more generous timeout may avoid triggering the leak. Probably not, but if larger systems are more prone to it, it seems a possibility. I agree that if not, adjusting the polling interval is just going to extend the time to failure.

The PVS6 does not appear to recover on its own if the exporter is disabled, though I think the longest I've let it sit that way is about a day; there's either no watchdog in the unit or it's a very tolerant one.

With Prometheus on a 60s collection interval, time to failure after a PVS6 reset seems to be about 14 hours. The shortest my data shows is 10h. If a longer collection interval linearly extends the TTF, as I suspect, I'll toy with the timeouts a bit also.

from sunpower-pvs-exporter.

jeeftor avatar jeeftor commented on July 29, 2024

Is there a reason we need to call start and stop ... doesn't the DeviceList call work w/out them? (not sure if that's related at all)

from sunpower-pvs-exporter.

ginoledesma avatar ginoledesma commented on July 29, 2024

Probably not. I think the start and stop calls are to indicate a locking-like behavior for configuring the PVS.

from sunpower-pvs-exporter.

jeeftor avatar jeeftor commented on July 29, 2024

That's my understanding as well

from sunpower-pvs-exporter.

aqua avatar aqua commented on July 29, 2024

I let the unit run a couple days with polling reduced to 5m intervals and --timeout=240. After getting gradually slower over the next 18h it seemed to stop getting worse, with call latencies up in the tens of seconds and loadavg up around 80 (status LED went orange), and it ran that way for about another 24h, then abruptly cleaned up and slowly degraded again as before.

With slow polls and very generous deadlines it's actually more or less producing useful data, albeit low resolution and with a lot of dropouts (about every hour or so GridProfileGet will fail, and the scan_time metric jumps to the many minutes once an hour, perhaps when the PVS6 does its push to Sunpower's service).

I'm giving it another shot with the start/stop calls disabled, and if that doesn't make things better I might also try without the Get_Comm calls (those seems to be pretty consistently the slowest call, at around 50s once in the bad state, so perhaps more likely to trip on a memory management or driver bug.)

Screen Shot 2022-01-26 at 12 15 09 AM

from sunpower-pvs-exporter.

aqua avatar aqua commented on July 29, 2024

A couple more mostly-unhelpful observations:

  • Eliminating the start/stop calls doesn't prevent the issue (but it doesn't cause any problems either)
  • Eliminating the Get_Comm call doesn't fix it (but does make collection more stable since the Get_Devices call can usually finish in ~20s or so even when the load is high)

I'm giving it a whirl with GridProfileGet disabled as well, and will let that run a while to see the periodic self-recovery recurs or was a fluke.

from sunpower-pvs-exporter.

aqua avatar aqua commented on July 29, 2024

I never did find a viable workaround (disabling GridProfileGet doesn't help), but in digging through installer manuals I noticed the only documented reason for a steady orange status LED (which would usually surface once the loadavg got up there) is a failure to update the device, which got me thinking a bit, and after forcing the PVS6 to update itself to 2022.4, build 60630 exporter scan times and loadavg have been low and stable for ~24h, which it never did before. So possibly the act of forcing an upgrade stopped the issue from triggering, or possibly Sunpower fixed the underlying bug sometime in the last several months. I don't think it's just a failure to update, since that status LED never went orange when I wasn't polling for monitoring data, but conceivably a buggy interaction between the two.

For the record, the way an upgrade is forced is first to do a GET request to/cgi-bin/dl_cgi/firmware/new_version, which returns a short JSON blob like so:

{
	"url":	"https://fw-assets-pvs6-dev.dev-edp.sunpower.com/release-13.3/60630/fwup/fwup.lua"
}

.... and then do a POST to /cgi-bin/dl_cgi/firmware/upgrade?url=THAT_URL. The actual update process looks like it A/B swaps root+kernel partitions and takes about five minutes.

from sunpower-pvs-exporter.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.