Giter Club home page Giter Club logo

fabric-tna's Issues

Update all scripts to use public Docker image for stratum_bfrt

We currently use the image hosted in the Aether private docker registry. To make it easier for the community to run PTF tests, we should use the version of the image that is available on DockerHub.

If we don't publish stratum_bfrt to Docker Hub yet, we should do that ASAP.

Pipeline drops half of the INT reports

We observed this issue in the production pod, the cause is still unclear.

This is especially evident when monitoring high bandwidth (~10Gbps) TCP flows generated by iperf: DeepInsight shows a rate of dropped reports that is proportional, and in most cases the same, to that of successfully processed reports:
Screen Shot 2020-10-05 at 4 36 01 PM

DeepInsight uses the seq_no field in the INT report fixed header to detect dropped reports. In an iperf test, the INT reports delivered to the server have missing seq_nos. From this pcap trace, we see that reports have seq_no

07 88 0b a0
07 88 0b a1
07 88 0b a3 # skipped 1
07 88 0b a5 # skipped 1
07 88 0b a7 # skipped 1
07 88 0b a9 # skipped 1
07 88 0b ab # skipped 1
07 88 0b ad # skipped 1
07 88 0b b1 # skipped 3
07 88 0b b3 # skipped 1
07 88 0b b5 # skipped 1
07 88 0b b7 # skipped 1
07 88 0b b9 # skipped 1
07 88 0b bb # skipped 1
07 88 0b bd # skipped 1
07 88 0b bf # skipped 1
07 88 0b c3 # skipped 2
...

We don't believe it's an issue with seq_no computation in tofino, as when generating low bit rate traffic, the issue cannot be reproduced. Instead, we believe this is connected to how we use mirroring sessions and/or recirculation ports, and the fact that the port attached to the DI server is a 10G. The issue does not manifest when running a similar test in the staging server, where the DI port is 40G. On 03/30/2020 we have observed the same issue on the staging pod which uses 40G interfaces for the collector.

Direction field of spgw source interface table is optimized out causing read/write asymmetry

When reading from device direction is always 0x0. The Stratum log suggests it's one of those cases where the compiler optimizes out action arguments that write to unused PHVs.

19:56:55.347 WARN [P4RuntimeFlowRuleProgrammable] Table entry obtained from device device:leaf1 is different from one in in translation store: device=PiTableEntry{tableId=FabricIngress.spgw_ingress.interface_lookup, matchKey={ipv4_dst_addr=0xc0a8fb01/32, gtpu_is_valid=0x1}, tableAction=FabricIngress.spgw_ingress.set_source_iface(skip_spgw=0x0, src_iface=0x1, direction=0x0), priority=N/A, timeout=PERMANENT}, store=PiTableEntry{tableId=FabricIngress.spgw_ingress.interface_lookup, matchKey={ipv4_dst_addr=0xc0a8fb01/32, gtpu_is_valid=0x1}, tableAction=FabricIngress.spgw_ingress.set_source_iface(skip_spgw=0x0, src_iface=0x1, direction=0x1), priority=N/A, timeout=PERMANENT}
19:56:55.348 WARN [P4RuntimeFlowRuleProgrammable] Table entry obtained from device device:leaf1 is different from one in in translation store: device=PiTableEntry{tableId=FabricIngress.spgw_ingress.interface_lookup, matchKey={ipv4_dst_addr=0xafa0000/16, gtpu_is_valid=0x0}, tableAction=FabricIngress.spgw_ingress.set_source_iface(skip_spgw=0x0, src_iface=0x2, direction=0x0), priority=N/A, timeout=PERMANENT}, store=PiTableEntry{tableId=FabricIngress.spgw_ingress.interface_lookup, matchKey={ipv4_dst_addr=0xafa0000/16, gtpu_is_valid=0x0}, tableAction=FabricIngress.spgw_ingress.set_source_iface(skip_spgw=0x0, src_iface=0x2, direction=0x2), priority=N/A, timeout=PERMANENT}

test.FabricIPv4UnicastGtpTest fails for fabric-spgw profile

************************************************
STARTING PTF TESTS...
************************************************
python -u ptf_runner.py --device stratum-bfrt --port-map port_map.veth.json --ptf-dir fabric.ptf --cpu-port 320 --device-id 1 --grpc-addr "127.0.0.1:28000" --p4info /p4c-out/p4info.txt --tofino-pipeline-tar /p4c-out/pipeline.tar.bz2 test.FabricIPv4UnicastGtpTest
INFO:PTF runner:Sending P4 config
INFO:PTF runner:Executing PTF command: ptf --test-dir fabric.ptf -i 0@veth1 -i 1@veth3 -i 2@veth5 -i 3@veth7 -i 4@veth9 -i 5@veth11 -i 6@veth13 -i 7@veth15 --test-params=p4info='/p4c-out/p4info.txt';grpcaddr='127.0.0.1:28000';device_id='1';cpu_port='320';device='stratum-bfrt' test.FabricIPv4UnicastGtpTest
WARNING: No route found for IPv6 destination :: (no default route?)
test.FabricIPv4UnicastGtpTest ... FAIL

======================================================================
FAIL: test.FabricIPv4UnicastGtpTest
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/fabric-p4test/tests/ptf/base_test.py", line 813, in handle
    return f(*args, **kwargs)
  File "fabric.ptf/test.py", line 115, in runTest
    self.runIPv4UnicastTest(pkt, next_hop_mac=HOST2_MAC)
  File "/fabric-p4test/tests/ptf/fabric_test.py", line 864, in runIPv4UnicastTest
    testutils.verify_packet(self, exp_pkt, self.port2)
  File "/usr/local/lib/python2.7/dist-packages/ptf/testutils.py", line 2546, in verify_packet
    % (device, port, result.format()))
AssertionError: Expected packet was not received on device 0, port 2.
========== EXPECTED ==========
dst        : DestMACField         = '00:00:00:00:00:02' (None)
src        : SourceMACField       = '00:00:00:00:aa:01' (None)
type       : XShortEnumField      = 2048            (0)
--
version    : BitField             = 4               (4)
ihl        : BitField             = None            (None)
tos        : XByteField           = 0               (0)
len        : ShortField           = None            (None)
id         : ShortField           = 1               (1)
flags      : FlagsField           = 0               (0)
frag       : BitField             = 0               (0)
ttl        : ByteField            = 63              (64)
proto      : ByteEnumField        = 17              (0)
chksum     : XShortField          = None            (None)
src        : Emph                 = '10.0.3.1'      (None)
dst        : Emph                 = '10.0.4.1'      ('127.0.0.1')
options    : PacketListField      = []              ([])
--
sport      : ShortEnumField       = 2152            (53)
dport      : ShortEnumField       = 2152            (53)
len        : ShortField           = None            (None)
chksum     : XShortField          = None            (None)
--
version    : BitField             = 1               (1)
PT         : BitField             = 1               (1)
reserved   : BitField             = 0               (0)
E          : BitField             = 0               (0)
S          : BitField             = 0               (0)
PN         : BitField             = 0               (0)
gtp_type   : ByteField            = 255             (255)
length     : ShortField           = None            (None)
teid       : IntField             = 4009738480      (0)
--
version    : BitField             = 4               (4)
ihl        : BitField             = None            (None)
tos        : XByteField           = 0               (0)
len        : ShortField           = None            (None)
id         : ShortField           = 1               (1)
flags      : FlagsField           = 0               (0)
frag       : BitField             = 0               (0)
ttl        : ByteField            = 64              (64)
proto      : ByteEnumField        = 17              (0)
chksum     : XShortField          = None            (None)
src        : Emph                 = '10.0.1.1'      (None)
dst        : Emph                 = '10.0.2.1'      ('127.0.0.1')
options    : PacketListField      = []              ([])
--
sport      : ShortEnumField       = 5061            (53)
dport      : ShortEnumField       = 5060            (53)
len        : ShortField           = None            (None)
chksum     : XShortField          = None            (None)
--
load       : StrField             = '\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab' ('')
--
0000   00 00 00 00 00 02 00 00  00 00 AA 01 08 00 45 00   ..............E.
0010   00 C0 00 01 00 00 3F 11  60 2B 0A 00 03 01 0A 00   ......?.`+......
0020   04 01 08 68 08 68 00 AC  08 D4 30 FF 00 9C EE FF   ...h.h....0.....
0030   C0 F0 45 00 00 9C 00 01  00 00 40 11 63 4F 0A 00   [email protected]..
0040   01 01 0A 00 02 01 13 C5  13 C4 00 88 D5 68 AB AB   .............h..
0050   AB AB AB AB AB AB AB AB  AB AB AB AB AB AB AB AB   ................
0060   AB AB AB AB AB AB AB AB  AB AB AB AB AB AB AB AB   ................
0070   AB AB AB AB AB AB AB AB  AB AB AB AB AB AB AB AB   ................
0080   AB AB AB AB AB AB AB AB  AB AB AB AB AB AB AB AB   ................
0090   AB AB AB AB AB AB AB AB  AB AB AB AB AB AB AB AB   ................
00a0   AB AB AB AB AB AB AB AB  AB AB AB AB AB AB AB AB   ................
00b0   AB AB AB AB AB AB AB AB  AB AB AB AB AB AB AB AB   ................
00c0   AB AB AB AB AB AB AB AB  AB AB AB AB AB AB         ..............
========== RECEIVED ==========
1 total packets. Displaying most recent 1 packets:
------------------------------
dst        : DestMACField         = '00:00:00:00:00:02' (None)
src        : SourceMACField       = '00:00:00:00:aa:01' (None)
type       : XShortEnumField      = 2048            (0)
--
version    : BitField             = 3L              (4)
ihl        : BitField             = 0L              (None)
tos        : XByteField           = 255             (0)
len        : ShortField           = 156             (None)
id         : ShortField           = 61183           (1)
flags      : FlagsField           = 6L              (0)
frag       : BitField             = 240L            (0)
ttl        : ByteField            = 69              (64)
proto      : ByteEnumField        = 0               (0)
chksum     : XShortField          = 192             (None)
src        : Emph                 = '0.1.0.0'       (None)
dst        : Emph                 = '63.17.96.43'   ('127.0.0.1')
options    : PacketListField      = [<IPOption  copy_flag=0L optclass=control option=experimental_measurement length=0 value='\x03\x01\n\x00\x04\x01\x08h\x08h\x00\xac\x08\xd4\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab' |>, <IPOption_MTU_Probe  copy_flag=1L optclass=1L option=mtu_probe length=171 |>] ([])
--
load       : StrField             = '\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab\xab' ('')
--
0000   00 00 00 00 00 02 00 00  00 00 AA 01 08 00 30 FF   ..............0.
0010   00 9C EE FF C0 F0 45 00  00 C0 00 01 00 00 3F 11   ......E.......?.
0020   60 2B 0A 00 03 01 0A 00  04 01 08 68 08 68 00 AC   `+.........h.h..
0030   08 D4 AB AB AB AB AB AB  AB AB AB AB AB AB AB AB   ................
0040   AB AB AB AB AB AB AB AB  AB AB AB AB AB AB AB AB   ................
0050   AB AB AB AB AB AB AB AB  AB AB AB AB AB AB AB AB   ................
0060   AB AB AB AB AB AB AB AB  AB AB AB AB AB AB AB AB   ................
0070   AB AB AB AB AB AB AB AB  AB AB AB AB AB AB AB AB   ................
0080   AB AB AB AB AB AB AB AB  AB AB AB AB AB AB AB AB   ................
0090   AB AB AB AB AB AB AB AB  AB AB AB AB AB AB AB AB   ................
00a0   AB AB AB AB AB AB AB AB  AB AB AB AB AB AB AB AB   ................
00b0   AB AB                                              ..
==============================

Incorrect IPv4 total length for SPGW downlink packets

We observed this issue in production with

  • Pipeconf: org.stratumproject.fabric-spgw.stratum_bfrt.mavericks_sde_9_2_0
  • Latest stratum-bfrt

The attached evidence includes:

  • pcap capture at the router side
  • pcap capture at the enb side
  • switch flows and groups

The ping reply coming out of the switch is delivered at the enb with wrong IPv4 total lenght. The reported length is 16 bytes longer than the captured packet:
Screen Shot 2020-09-22 at 6 32 48 PM

evidence.zip

Add SCTP packet type to PTF tests

In Aether, fabric switches programmed with fabric-tna are supposed to forward SCTP traffic between the eNB and the mobile control plane. For this reason, we should add SCTP to the packet types tested with PTF (currently TCP, UDP, GTP, and ICMP).

Consider using sub-parsers to simplify handling of INT reports in egress

Today we use egress-to-egress mirroring to generate INT reports. When processing such a mirrored packet, we use the egress parser to remove headers we don't want to show up in the INT report (e.g., GTP-U, MPLS). However, that results in a quite intricated parser implementation with:

  • Parser states branching on the validity of INT headers (check-parse pattern, e.g., check_mpls→strip/parse_mpls→check_ipv4→strip/parse_ipv4...)
  • Many #ifdef enabling/disabling such branching inside the parser states for the different profiles

It looks like we could obtain the same behavior in a much more elegant and simpler way by using sub-parsers:
https://p4.org/p4-spec/docs/P4-16-v1.0.0-spec.html#sec-invoke-subparser

In the egress parser, we could have two sub-parsers: (1) for regular packets and (2) for INT report mirrors. Upon detecting the packet type, we could invoke (1) or (2). (2) would have states that by default skip parsing unwanted headers.

If we care about preserving parser resources for the different profiles, we would have to use #ifdef only to wrap (2) and its invocation.

Note that if/when DeepInsight will be capable of handling tunneling protocols, we won't need to care about removing headers...

Loopback tests failing on hardware

Switch info table init is in PacketIn test run function. In case of loopback, dataplane packets also come as packet-ins, since cpu_port is set only in PacketIn test, all the other loopback tests fail. Plan to move the switch info table init to test setUp()

Missing context.json when running tests with tofino-model

Because of #118, we no longer pass the necessary context.json and other files required by tofino-model to produce logs with symbol names (table names, action names, etc.)

Possible options:

  • In run script, unpack binary config file to a temporary directory and pass that to tofino-model
  • Update run script to mount the content of ./tmp instead of ./src/main/resources/p4c-out inside the tofino-model container (maybe move/rename ./tmp to something more meaningful such as ./p4src/build)

Remove code related to simple next

The "simple" next table was introduced in fabric-v1model as an alternative to the hashed table for HW targets that did not support action selectors. We do have support for action selectors in TNA so that logic is obsolete and should be removed.

That means removing:

  • P4 code wrapped in #ifdef WITH_SIMPLE_NEXT
  • Java code that references simple-related tables (look for TODO: add profile with simple next or remove references)

Make stratum_bfrt auto-detect tofino-model

Issue with register reset annotations and tofino-model:

  • It takes ~70ms to reset 65k register cells on HW
    • Better to spin up separate thread
  • Add --testing-tofino-model (default false) to:
    • Disable register reset
    • Use longer timeouts for counter syncs
  • OR: Use pal API to get type of device (model or asic)
    • include/tofino/bf_pal/pltfm_intf.h
    • bf_status_t bf_pal_pltfm_type_get(bf_dev_id_t dev_id, bool *is_sw_model);

Fix INT terminology

We use the term "INT" everywhere, but that's not correct. Our current implementation does not support inband telemetry. We do support the standard telemetry report format (v0.5), but we're not appending INT metadata to data packets.

We should replace current references toint with something else like dtel (as in data plane telemetry) or tel or something else.

Increase table size

Current sizing is the default for the bmv2 build of the old fabric-v1model, which is too small. We have plenty of memory resources to make our tables bigger without worrying about pipeline optimizations.

Not sure why we abandoned #6

pipeline tarball contains more than what we are allowed to publish

Last time we asked BF they said:

We can allow to publish p4 program + context.json, tofino.bin, and bfrt.json/p4info.txt, no other compilation artifacts.

.conf file is not mentioned, but we already publish part of it as part of stratum: tofino_skip_p4.conf

The diff between tofino_skip_p4.conf and fabric-tna.conf is mostly file paths. I can't see any other sensitive information.

However, the tarball contains much more that must be removed:

.
├── bfrt.json
├── fabric-tna.conf
├── fabric-tna.p4pp
├── manifest.json
├── p4info.txt
└── pipe
    ├── context.json
    ├── fabric-tna.bfa
    ├── fabric-tna.dynhash.json
    ├── fabric-tna.prim.json
    ├── graphs
    │   ├── FabricEgress.dot
    │   ├── FabricEgressDeparser.dot
    │   ├── FabricEgressParser.dot
    │   ├── FabricIngress.dot
    │   ├── FabricIngressDeparser.dot
    │   ├── FabricIngressParser.dot
    │   ├── dep.json
    │   ├── egress.power.dot
    │   ├── ingress.power.dot
    │   ├── placement_graph.dot
    │   ├── power_graph.dot
    │   ├── program_graph.dot
    │   └── table_dep_graph_placement_0.dot
    ├── logs
    │   ├── flexible_packing.log
    │   ├── mau.characterize.log
    │   ├── mau.json
    │   ├── mau.resources.log
    │   ├── metrics.json
    │   ├── pa.characterize.log
    │   ├── pa.results.log
    │   ├── parser.characterize.log
    │   ├── parser.log
    │   ├── phv.json
    │   ├── phv_allocation_0.log
    │   ├── power.json
    │   ├── pragmas.log
    │   ├── resources.json
    │   ├── table_dependency_graph.log
    │   ├── table_placement_1.log
    │   └── table_summary.log
    └── tofino.bin

Missing metadata in packet-out

stratum-bfrt complains about:

E20200916 20:57:42.347164    40 bfrt_packetio_manager.cc:272] Return Error: DeparsePacketOut(packet, &buf) failed with StratumErrorSpace::ERR_INVALID_PARAM: 'it != packet.metadata().end()' is false. Missing metadata with Id 2 in PacketOut payload: "\034I{\266\036\025T\207\336\255\276\357\010\006\000\001\010\000\006\004\000\002T\207\336\255\276\357\300\250\373\001\034I{\266\036\025\300\250\373\313" metadata { metadata_id: 1 value: "\000\275" }

A fix for this already exists in #40, but we should fix master ASAP if we're not planning to merge #40 soon, as packet-out is broken when using stratum-bfrt.

Update ONOS behaviors to work with chip-independent pipeconf

Even if the P4 program is now chip-independent (#40), we still produce to pipeconf IDs for the different Tofino chip types (montara, mavericks) since some behaviors depend on the pipeconf ID to decide which entries to insert (e.g., for the switch_info table, or INT mirroring sessions).

The plan is to update the behaviors to avoid depending on the pipeconf ID, but instead, retrieve the chip-type via gNMI. This might require support in ONOS core drivers.

Fix mirror

We now replace the clone session/mirror with copy_to_cpu flag for flows that copying packets to CPU.
However, we still need a mirror for INT.
The current implementation of the mirror is not correct, the egress parser should check the mirror flag and parse additional mirror header.

Cleanup after PTF tests

The TM container will always add veth pairs and set up DMA when running the test
We should do some cleanup after the test, like remove all veth pair and revert the DMA setting.

MPLS TTL behaviour might be wrong

Currently, we set the MPLS TTL value to a default one(64), however, we should copy the TTL from the IP header.
We also need to set the TTL back to the IP header when we pop the MPLS label.

Not much detail in the original RFC showing how to handle TTL with IP packet
https://tools.ietf.org/html/rfc3031#section-3.23

But there are some rules in RFC2032 (MPLS Label Stack Encoding)
https://tools.ietf.org/html/rfc3032#section-2.4.3

2.4.3. IP-dependent rules

   We define the "IP TTL" field to be the value of the IPv4 TTL field,
   or the value of the IPv6 Hop Limit field, whichever is applicable.

   When an IP packet is first labeled, the TTL field of the label stack
   entry MUST BE set to the value of the IP TTL field.  (If the IP TTL
   field needs to be decremented, as part of the IP processing, it is
   assumed that this has already been done.)

   When a label is popped, and the resulting label stack is empty, then
   the value of the IP TTL field SHOULD BE replaced with the outgoing
   TTL value, as defined above.  In IPv4 this also requires modification
   of the IP header checksum.

   It is recognized that there may be situations where a network
   administration prefers to decrement the IPv4 TTL by one as it
   traverses an MPLS domain, instead of decrementing the IPv4 TTL by the
   number of LSP hops within the domain.

Also, there are some explanations on these websites:
https://www.ciscopress.com/articles/article.asp?p=680824&seqNum=4
http://wiki.kemot-net.com/mpls-ttl-behavior

Which says we need to set the TTL value to TTL-1 from the previous header (push/swap/pop)

Make tests more readable

Now we are using multiple layers of for-loop to create combinations of input parameters, for example:

@autocleanup
def doRunTest(self, param1, param2, ...., paramN):
    # run the test with parameters

def runTest(self):
    for param1 in [...]
        for param2 in [...]
            # skip test for some combanitaions
            if param1 == A and param2 == B:
                continue
            for param3 in [....]
                doRunTest(param1, param2, ..., paramN)

See comment below

We might be able to use python decorator to make the test more readable, for example:

@autocleanup
@test_param('param1', .......)
@test_param('param2', .......)
@skip_test_if(param1=A, param2=B)
@test_param('param3', .......)
def doRunTest(self, param1, param2, param3):
    # run the test with parameters

def runTest(self):
    doRunTest()

Inconsistent table entry for `switch_info` table

23:18:04.727 WARN [P4RuntimeFlowRuleProgrammable] Table entry obtained from device device:leaf1 is different from one in in translation store: device=PiTableEntry{tableId=FabricEgress.pkt_io_egress.switch_info, matchKey=DEFAULT-ACTION, tableAction=NoAction(), priority=N/A, timeout=PERMANENT}, store=PiTableEntry{tableId=FabricEgress.pkt_io_egress.switch_info, matchKey=DEFAULT-ACTION, tableAction=FabricEgress.pkt_io_egress.set_cpu_port(cpu_port=0x140), priority=N/A, timeout=PERMANENT}

The issue was observed after reloading the fabric-tna pipeconf app in a running ONOS instance (production pod).

The ONOS core does not contain any flow for the switch_info table, which is weird. Seems like a bug with the ONOS translation store, and with fabric-tna pipeline initialization logic.

ONOS pushes the wrong pipeconf

The current pipeconf build for bfrt is broken and always builds the old PI pipeline format. This causes the pipeline push to fail. It's a runtime issue.

I suspect commit e87824a broke something.

First bytes of the pushed pipeline:

00000000: 3f00 0000 6f72 672e 7374 7261 7475 6d70  ?...org.stratump
00000010: 726f 6a65 6374 2e66 6162 7269 632d 7370  roject.fabric-sp
00000020: 6777 2d69 6e74 2e73 7472 6174 756d 5f62  gw-int.stratum_b
00000030: 662e 6d6f 6e74 6172 615f 7364 655f 395f  f.montara_sde_9_
00000040: 325f 30ac d217 0000 0000 48b4 0000 0002  2_0.......H.....
00000050: 6173 6d5f 7665 7273 696f 6e00 0600 0000  asm_version.....

ONOS log:

onos_1  | 21:43:10.030 INFO  [PipelineConfigClientImpl] Setting pipeline config for device:leaf1 to org.stratumproject.fabric-spgw-int.stratum_bf.montara_sde_9_2_0...

FabricIPv4UnicastGroupTest* failed randomly

FAIL: test.FabricIPv4UnicastGroupTestAllPortTcpSport
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/fabric-p4test/tests/ptf/base_test.py", line 1006, in handle
    return f(*args, **kwargs)
  File "/fabric-p4test/tests/ptf/base_test.py", line 982, in handle
    return f(*args, **kwargs)
  File "fabric.ptf/test.py", line 256, in runTest
    [exp_pkt_to2, exp_pkt_to3], [self.port2, self.port3])
  File "/fabric-p4test/tests/ptf/base_test.py", line 443, in verify_any_packet_any_port
    return testutils.verify_any_packet_any_port(self, pkts, ports)
  File "/usr/local/lib/python2.7/dist-packages/ptf/testutils.py", line 2707, in verify_any_packet_any_port
    verify_no_other_packets(test, device_number=device_number)
  File "/usr/local/lib/python2.7/dist-packages/ptf/testutils.py", line 2579, in verify_no_other_packets
    "packets.\n%s" % (result.device, result.port, result.format()))
AssertionError: A packet was received on device 0, port 3, but we expected no packets.
========== RECEIVED ==========
0000   00 00 00 00 00 03 00 00  00 00 AA 01 08 00 45 00   ..............E.
0010   00 56 00 01 00 00 3F 06  64 A0 0A 00 01 01 0A 00   .V....?.d.......
0020   02 01 04 E5 00 50 00 00  00 00 00 00 00 00 50 02   .....P........P.
0030   20 00 77 6B 00 00 00 01  02 03 04 05 06 07 08 09    .wk............
0040   0A 0B 0C 0D 0E 0F 10 11  12 13 14 15 16 17 18 19   ................
0050   1A 1B 1C 1D 1E 1F 20 21  22 23 24 25 26 27 28 29   ...... !"#$%&'()
0060   2A 2B 2C 2D                                        *+,-
==============================

FAIL: test.FabricIPv4UnicastGroupTestAllPortIpSrc
----------------------------------------------------------------------
Traceback (most recent call last):
  File "fabric.ptf/test.py", line 405, in runTest
    self.IPv4UnicastGroupTestAllPortL4SrcIp("udp")
  File "/fabric-p4test/tests/ptf/base_test.py", line 1006, in handle
    return f(*args, **kwargs)
  File "/fabric-p4test/tests/ptf/base_test.py", line 982, in handle
    return f(*args, **kwargs)
  File "fabric.ptf/test.py", line 379, in IPv4UnicastGroupTestAllPortL4SrcIp
    [exp_pkt_to2, exp_pkt_to3], [self.port2, self.port3])
  File "/fabric-p4test/tests/ptf/base_test.py", line 443, in verify_any_packet_any_port
    return testutils.verify_any_packet_any_port(self, pkts, ports)
  File "/usr/local/lib/python2.7/dist-packages/ptf/testutils.py", line 2711, in verify_any_packet_any_port
    "device %d.\n%s" % (ports, device_number, result.format()))
AssertionError: Did not receive any expected packet on any of ports [2, 3] for device 0.

FAIL: test.FabricIPv4UnicastGroupTestAllPortTcpDport
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/fabric-p4test/tests/ptf/base_test.py", line 1006, in handle
    return f(*args, **kwargs)
  File "/fabric-p4test/tests/ptf/base_test.py", line 982, in handle
    return f(*args, **kwargs)
  File "fabric.ptf/test.py", line 317, in runTest
    [exp_pkt_to2, exp_pkt_to3], [self.port2, self.port3])
  File "/fabric-p4test/tests/ptf/base_test.py", line 443, in verify_any_packet_any_port
    return testutils.verify_any_packet_any_port(self, pkts, ports)
  File "/usr/local/lib/python2.7/dist-packages/ptf/testutils.py", line 2711, in verify_any_packet_any_port
    "device %d.\n%s" % (ports, device_number, result.format()))
AssertionError: Did not receive any expected packet on any of ports [2, 3] for device 0.
========== RECEIVED ==========
0 total packets.
==============================


FAIL: test.FabricIPv4UnicastGroupTestAllPortIpDst
----------------------------------------------------------------------
Traceback (most recent call last):
  File "fabric.ptf/test.py", line 475, in runTest
    self.IPv4UnicastGroupTestAllPortL4DstIp("tcp")
  File "/fabric-p4test/tests/ptf/base_test.py", line 1006, in handle
    return f(*args, **kwargs)
  File "/fabric-p4test/tests/ptf/base_test.py", line 982, in handle
    return f(*args, **kwargs)
  File "fabric.ptf/test.py", line 450, in IPv4UnicastGroupTestAllPortL4DstIp
    [exp_pkt_to2, exp_pkt_to3], [self.port2, self.port3])
  File "/fabric-p4test/tests/ptf/base_test.py", line 443, in verify_any_packet_any_port
    return testutils.verify_any_packet_any_port(self, pkts, ports)
  File "/usr/local/lib/python2.7/dist-packages/ptf/testutils.py", line 2707, in verify_any_packet_any_port
    verify_no_other_packets(test, device_number=device_number)
  File "/usr/local/lib/python2.7/dist-packages/ptf/testutils.py", line 2579, in verify_no_other_packets
    "packets.\n%s" % (result.device, result.port, result.format()))
AssertionError: A packet was received on device 0, port 2, but we expected no packets.
========== RECEIVED ==========
0000   00 00 00 00 00 02 00 00  00 00 AA 01 08 00 45 00   ..............E.
0010   00 56 00 01 00 00 3F 06  64 41 0A 00 01 01 0A 00   .V....?.dA......
0020   02 60 04 D2 00 50 00 00  00 00 00 00 00 00 50 02   .`...P........P.
0030   20 00 77 1F 00 00 00 01  02 03 04 05 06 07 08 09    .w.............
0040   0A 0B 0C 0D 0E 0F 10 11  12 13 14 15 16 17 18 19   ................
0050   1A 1B 1C 1D 1E 1F 20 21  22 23 24 25 26 27 28 29   ...... !"#$%&'()
0060   2A 2B 2C 2D                                        *+,-
==============================

No reports generated when using iperf in UDP mode

Given a watchlist entry matching only on IPv4 source and destination address, we get reports when running iperf in TCP mode, but we don't get any when using UDP mode.

The issue was experienced on the production pod.

gen-p4-constants.py should contain constants for all profiles

Currently, we generate constants for fabric-spgw-int as we assume that profile is the one with all preprocessor flags enabled, but that might no longer be the case. It seems a better idea to pass a list of p4info files to the script, and have the script output all constants for all profiles

Include enums to the P4InfoConstants.java file

Now we can use the serializale_enums from the p4info file to generate constants, here is anexample of serializable_enums:

type_info {
  serializable_enums {
    key: "BridgedMdType_t"
    value {
      underlying_type {
        bitwidth: 8
      }
      members {
        name: "INVALID"
        value: "\000"
      }
      members {
        name: "INGRESS_TO_EGRESS"
        value: "\001"
      }
      members {
        name: "EGRESS_MIRROR"
        value: "\002"
      }
      members {
        name: "INGRESS_MIRROR"
        value: "\003"
      }
    }
  }
}

Consider decapsulating GTPU packets in the parser to reduce pipeline complexity

The current approach of maintaining two copies of inner/outer L4 headers adds annoying complexity and hogs PHV resources. An alternative approach is to resubmit packets that require decapsulation, and simply skip the outer headers in the parser. This will greatly simplify the pipeline and parser at the cost of reducing bandwidth, which should be ok given the heckin' fat maximum bandwidth of tofino. It is also currently unclear how much resubmit really reduces maximum bandwidth, and needs further investigation.

Use "exactMatch" to check if captured flow rule is expected one

Some tests are using codes like below:

flowRuleService.applyFlowRules(eq(expectedFlow));
expectLastCall().andVoid().once();
verify(flowRuleService);

The eq function uses equals from the object, and the DefaultFlowRule won't actually compare the treatment.

We should capture the flow rule and use exactMatch to compare the result.

INT watchlist causes table hit for ARP and LLDP pkt

The watchlist is designed to match only on IPv4 flows. This is probably happening because of PHV conflicts (arp/lldp fields ending up in ipv4/udp/tcp fields).

We should update the watchlist table to either:

  • match on ipv4 validity bit
  • wrap table apply with gateway condition (if (hdr.ipv4.isValid()) watchlist.apply())

We should check all the other tables in the pipeline. Do we check header validity when matching certain fields?

I wish the compiler would be capable of emitting warnings for such conditions.

FabricMPLSSegmentRoutingTest packet mismatch on hardware

FabricMPLSSegmentRoutingTest fails with packet mismatch for next_hop_spine_True scenarios. All the tests are run on hardware in loopback mode.

=== RUN   FabricMplsSegmentRoutingTest0
=== RUN   FabricMplsSegmentRoutingTest0/tcp_next_hop_spine_True
WARN[2020-10-22T10:20:13.412Z]p4rt.go:141 verifyPacketIn() Payloads don't match
Expected: 00 00 00 00 00 02 00 00 00 00 aa 01 88 47 00 06 41 3f 45 00 00 42 00 01 00 00 40 06 63 b4 0a 00 01 01 0a 00 02 01 04 d2 00 50 00 00 00 00 00 00 00 00 50 02 20 00 d6 fb 00 00 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19
Actual  : 00 00 00 00 00 02 00 00 00 00 aa 01 08 00 45 00 00 42 00 01 00 00 40 06 63 b4 0a 00 01 01 0a 00 02 01 04 d2 00 50 00 00 00 00 00 00 00 00 50 02 20 00 d6 fb 00 00 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 
=== RUN   FabricMplsSegmentRoutingTest0/Undo_Write_Requests
--- FAIL: FabricMplsSegmentRoutingTest0 (0.03s)
    --- FAIL: FabricMplsSegmentRoutingTest0/tcp_next_hop_spine_True (0.02s)
    --- PASS: FabricMplsSegmentRoutingTest0/Undo_Write_Requests (0.00s)
=== RUN   FabricMplsSegmentRoutingTest1
=== RUN   FabricMplsSegmentRoutingTest1/tcp_next_hop_spine_False
=== RUN   FabricMplsSegmentRoutingTest1/Undo_Write_Requests
--- PASS: FabricMplsSegmentRoutingTest1 (0.02s)
    --- PASS: FabricMplsSegmentRoutingTest1/tcp_next_hop_spine_False (0.01s)
    --- PASS: FabricMplsSegmentRoutingTest1/Undo_Write_Requests (0.00s)
=== RUN   FabricMplsSegmentRoutingTest2
=== RUN   FabricMplsSegmentRoutingTest2/udp_next_hop_spine_True
WARN[2020-10-22T10:20:13.447Z]p4rt.go:141 verifyPacketIn() Payloads don't match
Expected: 00 00 00 00 00 02 00 00 00 00 aa 01 88 47 00 06 41 3f 45 00 00 42 00 01 00 00 40 11 63 a9 0a 00 01 01 0a 00 02 01 04 d2 00 50 00 2e 8c 04 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25
Actual  : 00 00 00 00 00 02 00 00 00 00 aa 01 08 00 45 00 00 42 00 01 00 00 40 11 63 a9 0a 00 01 01 0a 00 02 01 04 d2 00 50 00 2e 8c 04 00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 20 21 22 23 24 25 
=== RUN   FabricMplsSegmentRoutingTest2/Undo_Write_Requests
--- FAIL: FabricMplsSegmentRoutingTest2 (0.02s)
    --- FAIL: FabricMplsSegmentRoutingTest2/udp_next_hop_spine_True (0.01s)
    --- PASS: FabricMplsSegmentRoutingTest2/Undo_Write_Requests (0.00s)
=== RUN   FabricMplsSegmentRoutingTest3
=== RUN   FabricMplsSegmentRoutingTest3/udp_next_hop_spine_False
=== RUN   FabricMplsSegmentRoutingTest3/Undo_Write_Requests
--- PASS: FabricMplsSegmentRoutingTest3 (0.02s)
    --- PASS: FabricMplsSegmentRoutingTest3/udp_next_hop_spine_False (0.02s)
    --- PASS: FabricMplsSegmentRoutingTest3/Undo_Write_Requests (0.00s)
=== RUN   FabricMplsSegmentRoutingTest4
=== RUN   FabricMplsSegmentRoutingTest4/icmp_next_hop_spine_True
WARN[2020-10-22T10:20:13.487Z]p4rt.go:141 verifyPacketIn() Payloads don't match
Expected: 00 00 00 00 00 02 00 00 00 00 aa 01 88 47 00 06 41 3f 45 00 00 42 00 01 00 00 40 01 63 b9 0a 00 01 01 0a 00 02 01 08 00 64 6c 00 00 00 00 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
Actual  : 00 00 00 00 00 02 00 00 00 00 aa 01 08 00 45 00 00 42 00 01 00 00 40 01 63 b9 0a 00 01 01 0a 00 02 01 08 00 64 6c 00 00 00 00 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 
=== RUN   FabricMplsSegmentRoutingTest4/Undo_Write_Requests
--- FAIL: FabricMplsSegmentRoutingTest4 (0.02s)
    --- FAIL: FabricMplsSegmentRoutingTest4/icmp_next_hop_spine_True (0.01s)
    --- PASS: FabricMplsSegmentRoutingTest4/Undo_Write_Requests (0.00s)
=== RUN   FabricMplsSegmentRoutingTest5
=== RUN   FabricMplsSegmentRoutingTest5/icmp_next_hop_spine_False
=== RUN   FabricMplsSegmentRoutingTest5/Undo_Write_Requests
--- PASS: FabricMplsSegmentRoutingTest5 (0.02s)
    --- PASS: FabricMplsSegmentRoutingTest5/icmp_next_hop_spine_False (0.01s)
    --- PASS: FabricMplsSegmentRoutingTest5/Undo_Write_Requests (0.00s)
FAIL

Linked Jenkins build here:
https://jenkins.stratumproject.org/blue/organizations/jenkins/fabric-tna-hardware-loopback/detail/fabric-tna-hardware-loopback/45/pipeline/73

Bridged packet matching VLAN broadcast entry even when unicast entry exists

I'm sending the following packet:

21:24:22.548175 00:1e:67:d2:ee:ea > 00:1e:67:d2:ce:52, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 54276, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.250.20 > 192.168.250.21: ICMP echo request, id 8766, seq 62, length 64

But ONOS reports that the VLAN broadcast entry is the one being matched, even when there is one for dest MAC 00:1e:67:d2:ce:52:
Screen Shot 2020-09-16 at 2 25 25 PM

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.