Comments (3)
On a 1.0.3 system, via the jumphost:
support@oxz_switch:/tmp$ ./humility --archive gimlet_sp-gimlet-e-1.0.3.tar.gz -i fe80::aa40:25ff:fe04:5c1%gimlet10 dump --task=sprot --initialize-dump-agent
humility: connecting to fe80::aa40:25ff:fe04:5c1%gimlet10
humility: using UDP dump agent
humility dump failed: invalid response: Err(DumpAreaInUse)
from hubris.
Seeing the same thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 0', /home/matt/humility/humility-dump-agent/src/lib.rs:536:35
error on Rack 3. Attempting to dump a single task (as Alan did above) also fails. Single task dumps do work though by running humility -a <archive> --ip <sled> dump --initialize-dump-agent
on its own first, followed by the task request.
from hubris.
As with many things, there is a lot going on here.
First, I don't think this is what is happening on Rack 3, but the attest
task in the RoT is panic looping:
$ humility tasks
humility: attached to 1fc9:0143:PVQ3SGS11OJT3 via CMSIS-DAP
system time = 36470
ID TASK GEN PRI STATE
0 jefe 0 0 recv, notif: fault timer(T+30)
1 hiffy 0 6 ready
2 idle 0 7 ready
3 update_server 0 3 recv
4 syscon_driver 0 2 recv
5 gpio_driver 0 3 recv
6 sprot 0 6 ready
7 swd 0 4 recv
8 dumper 0 5 recv
9 sp_measure 0 6 not started
10 attest 14621 5 RUNNING
This is due to a stack overflow:
$ humility jefe -H attest
humility: attached to 1fc9:0143:PVQ3SGS11OJT3 via CMSIS-DAP
humility: successfully changed disposition for attest
$ humility tasks attest
humility: attached to 1fc9:0143:PVQ3SGS11OJT3 via CMSIS-DAP
system time = 205098
ID TASK GEN PRI STATE
10 attest 79417 5 FAULT: stack overflow; sp=0x20008b18 (was: ready)
Making the stack size much larger and then running humility stackmargin
reveals that the current size (9304) is pretty far off the maximum depth:
$ humility stackmargin
humility: attached to 1fc9:0143:PVQ3SGS11OJT3 via CMSIS-DAP
ID TASK STACKBASE STACKSIZE MAXDEPTH MARGIN
0 jefe 0x20006680 1536 296 1240
1 hiffy 0x20008dc0 2048 616 1432
2 idle 0x20005c80 256 256 0
3 update_server 0x20007700 2560 1664 896
4 syscon_driver 0x20005d80 1024 192 832
5 gpio_driver 0x20006200 1024 256 768
6 sprot 0x2000b680 16384 6704 9680
7 swd 0x20005000 1000 328 672
8 dumper 0x20008180 2600 1072 1528
9 sp_measure 0x20006ce0 2048 104 1944
10 attest 0x20010420 19304 9984 9320
Alleviating this problem and then taking the dump reveals the reason for the failure:
$ humility ringbuf dumper
humility: attached to 1fc9:0143:PVQ3SGS11OJT3 via CMSIS-DAP
humility: ring buffer task_dumper::__RINGBUF in dumper:
NDX LINE GEN COUNT PAYLOAD
0 125 60 1 Writing(0x30020fc0, 0x8, 0xea8)
1 99 60 1 ReadingRegister(0x0)
2 125 60 1 Writing(0x30020fc8, 0x8, 0xeb0)
3 99 60 1 ReadingRegister(0x0)
4 125 60 1 Writing(0x30020fd0, 0x8, 0xeb8)
5 99 60 1 ReadingRegister(0x0)
6 125 60 1 Writing(0x30020fd8, 0x8, 0xec0)
7 99 60 1 ReadingRegister(0x0)
8 125 60 1 Writing(0x30020fe0, 0x8, 0xec8)
9 99 60 1 ReadingRegister(0x0)
10 125 60 1 Writing(0x30020fe8, 0x8, 0xed0)
11 99 60 1 ReadingRegister(0x0)
12 125 60 1 Writing(0x30020ff0, 0x8, 0xed8)
13 99 60 1 ReadingRegister(0x0)
14 138 60 1 Done(Err(OutOfRegisterSpace))
15 166 60 1 Resumed
Note that it's attempting to dump register 0 over and over again. This is because the fix for #1532 changed the semantics of the code (which were admittedly a little odd) to no longer increment the register value. That's the root problem but there's another annoyance here: the RoT knew that the dump failed but in fact returned success -- which brought us on the collision course to the issue in Humility that is the actual panic (which should also be modified to bail a tad more cleanly).
So we need to fix the stack size for attest
, restore the correct register dumping semantics, propagate an error to denote a dump failure, and fix humility dump
to bail a little more cleanly. PRs coming!
from hubris.
Related Issues (20)
- Missing 6 sensors in the sensors list on Gimlet (and probably some on Sidecar too) HOT 2
- reading BMR491 PMBus data over Idol hangs bus
- BMR491 reports NaN for power
- Release 6 | ROT/SP release requests HOT 1
- Release 7 | ROT/SP release requests HOT 2
- We need to be able to capture and report sequencer register state on units in the rack HOT 5
- thermal task dying on divide-by-zero
- SP serial console via MGS occasionally goes out to lunch
- Gimlet: support `rev-f` PCBAs HOT 4
- We need to be able to change board names on update HOT 5
- We probably need a way for a task trying to TX a packet to not get RX notifications.
- query vpd lock status from `pilot racktest`
- attestation API suitable for use from `(faux-)?mgs`
- Sidecar: support rev-d PCBAs
- gimlet went back to A0 while wicket was updating its phase1 flash HOT 4
- SP should not auto-reboot host in response to a host-reported boot failure
- control-plane-agent should have multiple levels of ringbuf
- control-plane-agent and kin should add event counters HOT 4
- Being able to power on a sled during a host flash update is probably bad
- Serving as temporal memory for the host HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hubris.