Giter Club home page Giter Club logo

amzn-drivers's Introduction

amzn-drivers's People

Contributors

akiyano avatar amitbern-aws avatar awsnb avatar davidarinzon avatar dkkranz avatar emagutu avatar evgeny17387new avatar firasj avatar gal-pressman avatar gerwand avatar gtzalik avatar i-gor-c avatar kensington avatar mpeg2tom avatar mrgolin avatar netanelbelgazal avatar reelieuglie avatar rwespetal avatar sargun avatar segevido avatar semihalf-gorecki-dawid avatar semihalf-kozik-rafal avatar semihalf-rojek-artur avatar shaharitzko avatar shaibran avatar shayagros avatar yonatannachum avatar ysam12345 avatar zhngaj avatar zorikm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

amzn-drivers's Issues

Release notes and stable version

Hi,

Could you please start adding release notes to the ENA driver releases so that one can make an informed decision about whether to upgrade to the latest releases or not?

Besides, adding a stable tag or something would also be of great help.

Thanks.

Cannot turn on adaptive-rx

I get an error turning on adaptive rx:

# ethtool -C eth4 adaptive-rx on
Cannot set device coalesce parameters: Operation not supported

Info about module:

# cat /sys/module/ena/version 
1.5.0g

I don't see anything in sysfs related to this:

# find /sys|grep intr_moderation
# cat /boot/config-$(uname -r)|grep ^CONFIG_SYSFS=
CONFIG_SYSFS=y

Download version 1.3.0 but got 1.0.2 after compiled

Hi,
I downloaded release 1.3.0 file from this github, after following instruction, I only got 1.0.2 instead of 1.3.0

Download link: https://github.com/amzn/amzn-drivers/releases/tag/ena_linux_1.3.0

OS installed: CentOS 7 (1708)
Steps:

# yum install gcc kernel-devel-$(uname -r)
# cd amzn-drivers-ena_linux_1.3.0/kernel/linux/ena
# make
# insmod ena.ko
# modinfo ena
filename:       /lib/modules/3.10.0-693.5.2.el7.x86_64/kernel/drivers/net/ethernet/amazon/ena/ena.ko.xz
version:        1.0.2
license:        GPL
description:    Elastic Network Adapter (ENA)
author:         Amazon.com, Inc. or its affiliates
rhelversion:    7.4
srcversion:     0ADEBA934369F8D450E5CE4
alias:          pci:v00001D0Fd0000EC21sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd0000EC20sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd00001EC2sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd00000EC2sv*sd*bc*sc*i*
depends:
intree:         Y
vermagic:       3.10.0-693.5.2.el7.x86_64 SMP mod_unload modversions
signer:         CentOS Linux kernel signing key
sig_key:        C7:57:A9:FB:BD:0D:82:C9:E5:40:52:02:9A:09:08:D1:7C:F1:AD:C7
sig_hashalgo:   sha256
parm:           debug:Debug level (0=none,...,16=all) (int)

Where did I get wrong?

Please submit this for inclusion in the Linux kernel

Maintaining an out of tree driver is unhelpful to you, unhelpful to your users and unhelpful to Linux kernel developers. Having this included in the kernel means everyone can benefit from using it on AWS without having to deliberately go and jump through hoops to install it. Please, we'll all love you more for doing so!

memory corruption panic on FreeBSD11.0 with ena0

I hit a panic with ena driver on FreeBSD11.0. Built the ena driver for FreeBSD11.0 and tried it on a AWS instance (r4.large). The system comes up and then panics. The console messages look -
...
ena0: mem 0x83000000-0x83003fff at device 3.0 on pci0
ena0: Elastic Network Adapter (ENA)ena v0.6 (Jan 12, 2017)

ena0: Allocated msix_entries, vectors (cnt: 3)
ena0: initalize 2 io queues
ena0: Ethernet address: 02:b8:a4:5f:b0:a2
...
ena0: link state changed to UP
...
ena0: ena_setup_io_intr vector: 2
ena0: ena_setup_io_intr vector: 3
ena0: queue 0 - cpu 0
ena0: queue 1 - cpu 1
ena0: RX Soft LRO[0] Initialized
ena0: RX Soft LRO[1] Initialized
DHCPDISCOVER on ena0 to 255.255.255.255 port 67 interval 5
DHCPOFFER from 172.31.16.1
DHCPREQUEST on ena0 to 255.255.255.255 port 67
DHCPACK from 172.31.16.1
ena0: device is going DOWN
ena0: ena_free_rx_resources qid 0
ena0: ena_free_rx_resources qid 1
ena0: ena_setup_io_intr vector: 2
ena0: ena_setup_io_intr vector: 3
ena0: queue 0 - cpu 0
ena0: queue 1 - cpu 1
ena0: RX Soft LRO[0] Initialized
ena0: RX Soft LRO[1] Initialized
bound to 172.31.29.238 -- renewal in 1800 seconds.
Starting Network: lo0 ena0.
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
inet 127.0.0.1 netmask 0xff000000
groups: lo
ena0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 9001
options=1043a<TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,LRO,VLAN_HWFILTER>
ether 02:b8:a4:5f:b0:a2
inet 172.31.29.238 netmask 0xfffff000 broadcast 172.31.31.255
media: Ethernet autoselect (10Gbase-T )
status: active
Enabling pfpanic: Memory modified after free 0xffff8480059bf150(8) val=0 @ 0xffff8480059bf150

cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffff860039949140
vpanic() at vpanic+0x114/frame 0xffff860039949180
panic() at panic+0x43/frame 0xffff8600399491e0
trash_ctor() at trash_ctor+0x61/frame 0xffff860039949200
uma_zalloc_arg() at uma_zalloc_arg+0x588/frame 0xffff860039949270
counter_u64_alloc() at counter_u64_alloc+0x1b/frame 0xffff860039949290
pfioctl() at pfioctl+0x16b5/frame 0xffff860039949780
devfs_ioctl_f() at devfs_ioctl_f+0x159/frame 0xffff8600399497e0
kern_ioctl() at kern_ioctl+0x15e/frame 0xffff860039949840
sys_ioctl() at sys_ioctl+0x173/frame 0xffff860039949920
amd64_syscall() at amd64_syscall+0x305/frame 0xffff860039949ab0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xffff860039949ab0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800dfc3ca, rsp = 0x7fffffffce18, rbp = 0x7fffffffda40 ---

One thing that looks odd is that the link goes DOWN and then comes UP during DHCP.

error: nested redefinition of 'enum pkt_hash_types'

This is on CentOS 7.3.1611 w/ kernel 3.10.0-693.2.2.el7.x86_64 whereas kernel 3.10.0-514.26.2.el7.x86_64 worked.

make[1]: Entering directory `/usr/src/kernels/3.10.0-693.2.2.el7.x86_64'
  CC [M]  /opt/applause/src/amzn-drivers-ena_linux_1.1.3/kernel/linux/ena/ena_netdev.o
In file included from /opt/applause/src/amzn-drivers-ena_linux_1.1.3/kernel/linux/ena/ena_netdev.h:44:0,
   from /opt/applause/src/amzn-drivers-ena_linux_1.1.3/kernel/linux/ena/ena_netdev.c:53:
/opt/applause/src/amzn-drivers-ena_linux_1.1.3/kernel/linux/ena/kcompat.h:382:6: error: nested redefinition of 'enum pkt_hash_types'
 enum pkt_hash_types {
      ^
compilation terminated due to -Wfatal-errors.
make[2]: *** [/opt/applause/src/amzn-drivers-ena_linux_1.1.3/kernel/linux/ena/ena_netdev.o] Error 1
make[1]: *** [_module_/opt/applause/src/amzn-drivers-ena_linux_1.1.3/kernel/linux/ena] Error 2
make[1]: Leaving directory `/usr/src/kernels/3.10.0-693.2.2.el7.x86_64'
make: *** [all] Error 2

error: static declaration of ‘devm_ioremap_wc’ follows non-static declaration

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.5 LTS
Release:        14.04
Codename:       trusty
# uname -a
Linux ip-10-165-8-226 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
# make
make -C /lib/modules/3.13.0-129-generic/build M=/usr/src/amzn-drivers/kernel/linux/ena modules
make[1]: Entering directory `/usr/src/linux-headers-3.13.0-129-generic'
  CC [M]  /usr/src/amzn-drivers/kernel/linux/ena/ena_netdev.o
In file included from /usr/src/amzn-drivers/kernel/linux/ena/ena_netdev.h:44:0,
                 from /usr/src/amzn-drivers/kernel/linux/ena/ena_netdev.c:53:
/usr/src/amzn-drivers/kernel/linux/ena/kcompat.h:557:29: error: static declaration of ‘devm_ioremap_wc’ follows non-static declaration
 static inline void __iomem *devm_ioremap_wc(struct device *dev,
                             ^
compilation terminated due to -Wfatal-errors.
make[2]: *** [/usr/src/amzn-drivers/kernel/linux/ena/ena_netdev.o] Error 1
make[1]: *** [_module_/usr/src/amzn-drivers/kernel/linux/ena] Error 2
make[1]: Leaving directory `/usr/src/linux-headers-3.13.0-129-generic'
make: *** [all] Error 2

Build failed for RedHat RHEL 7.5 beta kernel

Build failed for RedHat RHEL 7.5 beta kernel:

$ make
make -C /lib/modules/3.10.0-830.el7.x86_64/build M=/home/fg/amzn-drivers/kernel/linux/ena modules
make[1]: Entering directory `/usr/src/kernels/3.10.0-830.el7.x86_64'
  CC [M]  /home/fg/amzn-drivers/kernel/linux/ena/ena_netdev.o
In file included from /home/fg/amzn-drivers/kernel/linux/ena/ena_netdev.h:44:0,
                 from /home/fg/amzn-drivers/kernel/linux/ena/ena_netdev.c:53:
/home/fg/amzn-drivers/kernel/linux/ena/kcompat.h:416:6: error: nested redefinition of 'enum pkt_hash_types'
 enum pkt_hash_types {
      ^
compilation terminated due to -Wfatal-errors.
make[2]: *** [/home/fg/amzn-drivers/kernel/linux/ena/ena_netdev.o] Error 1
make[1]: *** [_module_/home/fg/amzn-drivers/kernel/linux/ena] Error 2
make[1]: Leaving directory `/usr/src/kernels/3.10.0-830.el7.x86_64'
make: *** [all] Error 2
$ rpm -qip kernel-3.10.0-830.el7.x86_64.rpm 
warning: kernel-3.10.0-830.el7.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID fd431d51: NOKEY
Name        : kernel
Version     : 3.10.0
Release     : 830.el7
Architecture: x86_64
Install Date: (not installed)
Group       : System Environment/Kernel
Size        : 64339119
License     : GPLv2
Signature   : RSA/SHA256, Mon Jan 15 21:45:08 2018, Key ID 199e2f91fd431d51
Source RPM  : kernel-3.10.0-830.el7.src.rpm
Build Date  : Mon Jan 15 17:47:58 2018
Build Host  : x86-040.build.eng.bos.redhat.com
Relocations : (not relocatable)
Packager    : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>
Vendor      : Red Hat, Inc.
URL         : http://www.kernel.org/
Summary     : The Linux kernel
Description :
The kernel package contains the Linux kernel (vmlinuz), the core of any
Linux operating system.  The kernel handles the basic functions
of the operating system: memory allocation, process allocation, device
input and output, etc.

cannot build amzn-drivers for Ubuntu 3.13.0-29-generic

I was following this steps to enable ENA on Ubuntu AWS instance - https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking-ena.html

:~# uname -a
Linux ip-x-x-x-x 3.13.0-29-generic #53-Ubuntu SMP Wed Jun 4 21:00:20 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

It failed at Step 7 - Sub step b

Error:
`root@ip-x-x-x-x:~# sudo dkms build -m amzn-drivers -v 1.0.0

Kernel preparation unnecessary for this kernel. Skipping...

Building module:
cleaning build area....
make KERNELRELEASE=3.13.0-29-generic -C kernel/linux/ena/ BUILD_KERNEL=3.13.0-29-generic....(bad exit status: 2)
ERROR (dkms apport): binary package for amzn-drivers: 1.0.0 not found
Error! Bad return status for module build on kernel: 3.13.0-29-generic (x86_64)
Consult /var/lib/dkms/amzn-drivers/1.0.0/build/make.log for more information.`

Full log: /var/lib/dkms/amzn-drivers/1.0.0/build/make.log
DKMS make.log for amzn-drivers-1.0.0 for kernel 3.13.0-29-generic (x86_64) Mon Apr 23 12:28:40 UTC 2018 make: Entering directory /var/lib/dkms/amzn-drivers/1.0.0/build/kernel/linux/ena'
make -C /lib/modules/3.13.0-29-generic/build M=/var/lib/dkms/amzn-drivers/1.0.0/build/kernel/linux/ena modules
make[1]: Entering directory /usr/src/linux-headers-3.13.0-29-generic' CC [M] /var/lib/dkms/amzn-drivers/1.0.0/build/kernel/linux/ena/ena_netdev.o /var/lib/dkms/amzn-drivers/1.0.0/build/kernel/linux/ena/ena_netdev.c: In function ‘ena_set_rx_hash’: /var/lib/dkms/amzn-drivers/1.0.0/build/kernel/linux/ena/ena_netdev.c:1052:22: error: storage size of ‘hash_type’ isn’t known enum pkt_hash_types hash_type; ^ compilation terminated due to -Wfatal-errors. make[2]: *** [/var/lib/dkms/amzn-drivers/1.0.0/build/kernel/linux/ena/ena_netdev.o] Error 1 make[1]: *** [_module_/var/lib/dkms/amzn-drivers/1.0.0/build/kernel/linux/ena] Error 2 make[1]: Leaving directory /usr/src/linux-headers-3.13.0-29-generic'
make: *** [all] Error 2
make: Leaving directory /var/lib/dkms/amzn-drivers/1.0.0/build/kernel/linux/ena'

4.11 Compile Errors

I cannot compile this driver using 4.11 kernel. 4.11.2-1.el7.elrepo.x86_64

+ make -C /usr/src/kernels/4.11.2-1.el7.elrepo.x86_64 M=/root/rpmbuild/BUILD/ena-1.1.3/obj/default/kernel/linux/ena 'NOSTDINC_FLAGS=-I /root/rpmbuild/BUILD/ena-1.1.3/obj/default/include'
make: Entering directory `/usr/src/kernels/4.11.2-1.el7.elrepo.x86_64'
  LD      /root/rpmbuild/BUILD/ena-1.1.3/obj/default/kernel/linux/ena/built-in.o
  CC [M]  /root/rpmbuild/BUILD/ena-1.1.3/obj/default/kernel/linux/ena/ena_netdev.o
/root/rpmbuild/BUILD/ena-1.1.3/obj/default/kernel/linux/ena/ena_netdev.c: In function 'ena_init_napi':
/root/rpmbuild/BUILD/ena-1.1.3/obj/default/kernel/linux/ena/ena_netdev.c:1493:3: error: implicit declaration of function 'napi_hash_add' [-Werror=implicit-function-declaration]
   napi_hash_add(&adapter->ena_napi[i].napi);
   ^
compilation terminated due to -Wfatal-errors.

Error when compiling on RHEL 6.7

Compilation fails on RHEL 6.7 (EC2), headers and devel packages are installed

make -C /lib/modules/2.6.32-696.el6.x86_64/build M=/home/ec2-user/amzn-drivers/kernel/linux/ena modules
make[1]: Entering directory /usr/src/kernels/2.6.32-696.el6.x86_64' CC [M] /home/ec2-user/amzn-drivers/kernel/linux/ena/ena_netdev.o /home/ec2-user/amzn-drivers/kernel/linux/ena/ena_netdev.c: In function ‘ena_intr_msix_io’: /home/ec2-user/amzn-drivers/kernel/linux/ena/ena_netdev.c:1264: error: implicit declaration of function ‘__napi_schedule_irqoff’ compilation terminated due to -Wfatal-errors. make[2]: *** [/home/ec2-user/amzn-drivers/kernel/linux/ena/ena_netdev.o] Error 1 make[1]: *** [_module_/home/ec2-user/amzn-drivers/kernel/linux/ena] Error 2 make[1]: Leaving directory /usr/src/kernels/2.6.32-696.el6.x86_64'
make: *** [all] Error 2

DPDK ENA PMD - TX L4 offload flags set in RX path

When using the ENA PMD with an application that does IPsec tunneling, I have encountered a problem. Some packets (UDP or TCP) that are sent over an IPsec tunnel fail the integrity check at the receiving side. Other IP protocols (ICMP) are not affected, they pass the integrity check and are successfully received. I debugged and found that the following is occurring:

  • When a UDP or TCP packet arrives on an ENA interface, ena_rx_mbuf_prepare() sets PKT_TX_TCP_CKSUM or PKT_TX_UDP_CKSUM on ol_flags for the packet mbuf.
  • The DPDK AESNI MB cryptodev encrypts the original packet buffer in place. An ESP header and IP header are prepended in the packet headroom.
  • The modified packet is sent out an ENA interface. Since PKT_TX_TCP_CKSUM or PKT_TX_UDP_CKSUM is set on ol_flags for the packet mbuf, ena_tx_mbuf_prepare() enables hardware L4 checksum generation for the L4 protocol of the original unencrypted packet.
  • The ENA hardware generates a TCP or UDP checksum and writes it at an appropriate offset for a TCP or UDP packet. Since the next header after the IP header is now an ESP header (instead of original TCP or UDP), this ends up writing the an L4 checksum into a location that is not valid for the modified packet. If the original packet was a UDP packet, the lower 16 bits of the ESP sequence number are overwritten. If the original packet was a TCP packet, the TCP checksum is written into the payload containing the padded/encrypted version of the packet.
  • The packet is forwarded to the IPsec tunnel peer, which calculates an ICV using the ESP header & encrypted packet. The hash doesn’t match the received ICV, which was generated based on the packet contents prior to their modification by the hardware, so the packet is dropped.

This seems to be unique to the ENA PMD. The same behavior does not occur when tunneling packets over IPsec using the same application running against some other NIC types (ixgbe, i40e, e1000). I have not tested other types of tunnels (IP-in-IP, GRE, etc), but it seems like a very good possibility that packets that are sent over those types of tunnels could also be affected.

A patch similar to the following resolves the issue for the application running in my tests:

diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c
index 22db895..6f982f6 100644
--- a/drivers/net/ena/ena_ethdev.c
+++ b/drivers/net/ena/ena_ethdev.c
@@ -261,16 +261,6 @@ static inline void ena_rx_mbuf_prepare(struct rte_mbuf *mbuf,
 {
 	uint64_t ol_flags = 0;
 
-	if (ena_rx_ctx->l4_proto == ENA_ETH_IO_L4_PROTO_TCP)
-		ol_flags |= PKT_TX_TCP_CKSUM;
-	else if (ena_rx_ctx->l4_proto == ENA_ETH_IO_L4_PROTO_UDP)
-		ol_flags |= PKT_TX_UDP_CKSUM;
-
-	if (ena_rx_ctx->l3_proto == ENA_ETH_IO_L3_PROTO_IPV4)
-		ol_flags |= PKT_TX_IPV4;
-	else if (ena_rx_ctx->l3_proto == ENA_ETH_IO_L3_PROTO_IPV6)
-		ol_flags |= PKT_TX_IPV6;
-
 	if (unlikely(ena_rx_ctx->l4_csum_err))
 		ol_flags |= PKT_RX_L4_CKSUM_BAD;
 	if (unlikely(ena_rx_ctx->l3_csum_err))

Would it be possible to make a change to the PMD code similar to the above that leaves the TX offload flags unset when the packet is received? This would allow the application to decide whether to set the TX checksum offload flags. If it is necessary to store information about the L3/L4 protocols of a received packet so the application can use that information while processing the packet, the mbuf packet_type could be used for that purpose.

Compilation error with 4.15 kernel for ena driver

Compiling this driver on 4.15 kernel results in the following error:

  CC [M]  /var/lib/dkms/ena/1.5.0/build/kernel/linux/ena/ena_netdev.o
/var/lib/dkms/ena/1.5.0/build/kernel/linux/ena/ena_netdev.c: In function ‘ena_refill_rx_bufs’:
/var/lib/dkms/ena/1.5.0/build/kernel/linux/ena/ena_netdev.c:544:12: error: ‘__GFP_COLD’ undeclared (first use in this function); did you mean ‘__GFP_COMP’?
            __GFP_COLD | GFP_ATOMIC | __GFP_COMP);
            ^~~~~~~~~~
            __GFP_COMP
compilation terminated due to -Wfatal-errors.

__GFP_COLD has been removed from the kernel: torvalds/linux@453f85d

RSS hash config support

I am trying to read/configure the RSS hash key as part of a DPDK application running on a i3.4xlarge AWS instance with enhanced networking support running Ubuntu 16.04.

The function rte_eth_dev_rss_hash_conf_get() returns an error because dev_ops-> rss_hash_conf_get is null (not supported). Is this expected?

I noticed the linux kernel ENA driver README states that the user can provide the RSS hash function and RSS key via ethtool. However, the device does not seem to support this:

$ sudo ethtool --show-rxfh ens3
Cannot get RX flow hash indir size and/or key size: Operation not permitted

Am I missing something? Which AWS instances (if any) support reading/updating the RSS config?

error: wrong stats when using ena_get_stats (kernel < 2.6.36)

Hi,

I observed that stats get wrong when using ena-driver 1.2.0 in kernel 2.6.32 (CentOS 6).

$ ethtool -i eth0
driver: ena
version: 1.2.0g
firmware-version:
bus-info: 0000:00:03.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

$ cat /proc/net/dev; sudo ethtool -S eth0 | grep tx_bytes
Inter-|   Receive                                                |  Transmit
 face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
    lo: 6919841279 29352140    0    0    0     0          0         0 6919841279 29352140    0    0    0     0       0          0
  eth0: 6225205581 376664102    0    0    0     0          0         0 20661073610 284794167    0    0    0     0       0          0
     queue_0_tx_bytes: 4087952612
     queue_1_tx_bytes: 6640832540
     queue_2_tx_bytes: 4582622571
     queue_3_tx_bytes: 5800756290
     queue_4_tx_bytes: 4048643657
     queue_5_tx_bytes: 4257765549
     queue_6_tx_bytes: 4023754776
     queue_7_tx_bytes: 4398625609

eth0 TX bytes should be about 37.84G according to ethtool. However, /proc/net/dev shows that the TX bytes are 20.66G.

It seems that bytes, packets and rx_drops in ena_get_stats have wrong types. IMHO, They should be unsigned long instead of unsigned int.
http://elixir.free-electrons.com/linux/v2.6.32.71/source/include/linux/netdevice.h#L128

kernel >= 2.6.36 is not affected since ena_get_stats64 is used.

I have confirmed that the problem is fixed after following patching.

--- a/kernel/linux/ena/ena_netdev.c
--
+++ b/kernel/linux/ena/ena_netdev.c
@@ -2545,20 +2545,20 @@ static struct net_device_stats *ena_get_stats(struct net_device *netdev)
{
struct ena_adapter *adapter = netdev_priv(netdev);
struct ena_ring *rx_ring, *tx_ring;
-       unsigned int rx_drops;
+       unsigned long rx_drops;
struct net_device_stats *stats = &netdev->stats;
unsigned int start;
int i;
 
memset(stats, 0, sizeof(*stats));
for (i = 0; i < adapter->num_queues; i++) {
-               unsigned int  bytes, packets;
+               unsigned long  bytes, packets;
 
tx_ring = &adapter->tx_ring[i];
do {
start = u64_stats_fetch_begin_irq(&tx_ring->syncp);
-                       packets = (unsigned int)tx_ring->tx_stats.cnt;
-                       bytes = (unsigned int)tx_ring->tx_stats.bytes;
+                       packets = (unsigned long)tx_ring->tx_stats.cnt;
+                       bytes = (unsigned long)tx_ring->tx_stats.bytes;
} while (u64_stats_fetch_retry_irq(&tx_ring->syncp, start));
 
stats->tx_packets += packets;
@@ -2568,8 +2568,8 @@ static struct net_device_stats *ena_get_stats(struct net_device *netdev)
 
do {
start = u64_stats_fetch_begin_irq(&tx_ring->syncp);
-                       packets = (unsigned int)rx_ring->rx_stats.cnt;
-                       bytes = (unsigned int)rx_ring->rx_stats.bytes;
+                       packets = (unsigned long)rx_ring->rx_stats.cnt;
+                       bytes = (unsigned long)rx_ring->rx_stats.bytes;
} while (u64_stats_fetch_retry_irq(&tx_ring->syncp, start));
 
stats->rx_packets += packets;
@@ -2578,7 +2578,7 @@ static struct net_device_stats *ena_get_stats(struct net_device *netdev)
 
do {
start = u64_stats_fetch_begin_irq(&tx_ring->syncp);
-               rx_drops = (unsigned int)adapter->dev_stats.rx_drops;
+               rx_drops = (unsigned long)adapter->dev_stats.rx_drops;
} while (u64_stats_fetch_retry_irq(&tx_ring->syncp, start));
 
stats->rx_dropped = rx_drops;

Unable to load ena module in freebsd

FreeBSD c12-nc1c-OPNsense-111.0-RELEASE-p12 FreeBSD 11.0-RELEASE-p12 #0 02581be96(stable/17.7): Sat Aug 26 11:00:39 CEST 2017 root@sensey64:/usr/obj/usr/src/sys/SMP amd64
This is the uname -a output. This doesn't provide a kernel revision number, so I went ahead and compiled using stable kernel sources. (This is a freebsd installation boostrapped to have opnsense, a fork of pfsense on top of it.) The compiled module fails to load saying KLD if_ena.ko: depends on kernel - not available or version mismatch linker_load_file: Unsupported file type
Googling around, I found that it might be due to the kern.osreldate and the BSD_VERSION in param.h. I modified the param.h file located at /usr/src/sys/sys/param.h and at /usr/include/sys/param.h to reflect the sysctl value and recompiled. It still fails, throwing the same error. Any help/ guidance is appreciated. I have not been able to find a way to force load that module.

Thanks.

Build failed with Oracle UEK4 kernel

$ make BUILD_KERNEL=4.1.12-124.8.1.el7uek.x86_64
make -C /lib/modules/4.1.12-124.8.1.el7uek.x86_64/build M=/home/fg/amzn-drivers/kernel/linux/ena modules
make[1]: Entering directory `/usr/src/kernels/4.1.12-124.8.1.el7uek.x86_64'
/home/fg/amzn-drivers/kernel/linux/ena/Makefile:30: *** only UEK3 with kernel version 3.8.13 is suppported.  Stop.
make[1]: *** [_module_/home/fg/amzn-drivers/kernel/linux/ena] Error 2
make[1]: Leaving directory `/usr/src/kernels/4.1.12-124.8.1.el7uek.x86_64'
make: *** [all] Error 2

CentOS 7.4 complaining non-stop in /var/log/messages

Nothing special has been done to install ena support other than updating to the latest release of CentOS 7.4.

[root@machine ~]# modinfo ena
filename:       /lib/modules/3.10.0-693.17.1.el7.x86_64/kernel/drivers/net/ethernet/amazon/ena/ena.ko.xz
version:        1.0.2
license:        GPL
description:    Elastic Network Adapter (ENA)
author:         Amazon.com, Inc. or its affiliates
rhelversion:    7.4
srcversion:     3A6B9F1766C9A0B5CBC7D01
alias:          pci:v00001D0Fd0000EC21sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd0000EC20sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd00001EC2sv*sd*bc*sc*i*
alias:          pci:v00001D0Fd00000EC2sv*sd*bc*sc*i*
depends:        
intree:         Y
vermagic:       3.10.0-693.17.1.el7.x86_64 SMP mod_unload modversions 
signer:         CentOS Linux kernel signing key
sig_key:        50:6C:68:68:80:9D:2C:BF:54:0B:F0:D9:83:D5:C6:70:9D:BC:4F:22
sig_hashalgo:   sha256
parm:           debug:Debug level (0=none,...,16=all) (int)
[root@webwest2 ~]# uname -a
Linux machine.domain 3.10.0-693.17.1.el7.x86_64 #1 SMP Thu Jan 25 20:13:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
[root@machine ~]# cat /etc/redhat-release 
CentOS Linux release 7.4.1708 (Core) 
[root@machine ~]# ethtool -i eth0
driver: ena
version: 1.0.2
firmware-version: 
expansion-rom-version: 
bus-info: 0000:00:05.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no
[root@machine ~]#  tail -f /var/log/messages
Feb  8 21:50:32 machine kernel: ena: Feature 27 isn't supported
Feb  8 21:50:35 machine kernel: ena: Feature 27 isn't supported
Feb  8 21:50:38 machine kernel: ena: Feature 27 isn't supported
Feb  8 21:50:41 machine kernel: ena: Feature 27 isn't supported
Feb  8 21:50:44 machine kernel: ena: Feature 27 isn't supported
Feb  8 21:50:47 machine kernel: ena: Feature 27 isn't supported
Feb  8 21:50:50 machine kernel: ena: Feature 27 isn't supported
Feb  8 21:50:53 machine kernel: ena: Feature 27 isn't supported
Feb  8 21:50:56 machine kernel: ena: Feature 27 isn't supported
Feb  8 21:50:59 machine kernel: ena: Feature 27 isn't supported
Feb  8 21:51:02 machine kernel: ena: Feature 27 isn't supported

FreeBSD: ASSERT panic in RX path when doing scp of a large file

I am hitting a panic with the latest driver bits (commit a5c5750).

scp of a large file from my on-prem laptop to the AWS instance triggers the panic. iperf3 from another AWS VM works fine though. So, this looks like a timing related race.

Panic details:

ena_rx_mbuf() [TID:100097]: Assert failed on src/freebsd/sys/dev/ena/ena.c:ena_rx_mbuf:1441:Invalid alloc frag buffer

Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0x1c
fault code              = supervisor write data, page not present
instruction pointer     = 0x820:0xffff860000347017
stack pointer           = 0x828:0xffff8600399998c0
frame pointer           = 0x828:0xffff8600399999c0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 11 (irq258: ena0)
1c: L4 (0) 0
trap number             = 12
panic: page fault
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffff8600399994b0
vpanic() at vpanic+0x114/frame 0xffff8600399994f0
panic() at panic+0x43/frame 0xffff860039999550
trap_fatal() at trap_fatal+0x36b/frame 0xffff8600399995b0
trap_pfault() at trap_pfault+0x171/frame 0xffff8600399995f0
trap() at trap+0x39f/frame 0xffff860039999800
calltrap() at calltrap+0x8/frame 0xffff860039999800
--- trap 0xc, rip = 0xffff860000347017, rsp = 0xffff8600399998d0, rbp = 0xffff8600399999c0 ---
ena_handle_msix() at ena_handle_msix+0x237/frame 0xffff8600399999c0
intr_event_execute_handlers() at intr_event_execute_handlers+0x9e/frame 0xffff860039999a10
ithread_loop() at ithread_loop+0xb6/frame 0xffff860039999a70
fork_exit() at fork_exit+0x84/frame 0xffff860039999ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xffff860039999ab0

Looking at the code in ena.c -

1680         for (i = 0; i < CLEAN_BUDGET; ++i) {
1681                 rxc = ena_rx_cleanup(rx_ring);
1682 
1683                 /* Protection from calling ena_tx_cleanup from ena_start_xmit */
1684                 ENA_RING_MTX_LOCK(tx_ring);
1685                 txc = ena_tx_cleanup(tx_ring);
1686                 ENA_RING_MTX_UNLOCK(tx_ring);

it looks on line 1681, we need some kind of protection. I tried adding the
ENA_RING_MTX_LOCK(rx_ring). It panicked in a different place with that change -

panic: Memory modified after free 0xffff86004790c000(16384) val=85006e02 @ 0xffff86004790c000

cpuid = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffff86003998f690
vpanic() at vpanic+0x114/frame 0xffff86003998f6d0
panic() at panic+0x43/frame 0xffff86003998f730
trash_ctor() at trash_ctor+0x61/frame 0xffff86003998f750
mb_ctor_clust() at mb_ctor_clust+0x18/frame 0xffff86003998f780
uma_zalloc_arg() at uma_zalloc_arg+0x588/frame 0xffff86003998f7f0
m_getjcl() at m_getjcl+0xe3/frame 0xffff86003998f840
ena_refill_rx_bufs() at ena_refill_rx_bufs+0xb3/frame 0xffff86003998f8d0
ena_handle_msix() at ena_handle_msix+0x5b6/frame 0xffff86003998f9c0
intr_event_execute_handlers() at intr_event_execute_handlers+0x9e/frame 0xffff86003998fa10
ithread_loop() at ithread_loop+0xb6/frame 0xffff86003998fa70
fork_exit() at fork_exit+0x84/frame 0xffff86003998fab0
fork_trampoline() at fork_trampoline+0xe/frame 0xffff86003998fab0

napi_hash_add is now static

Compilation error against 4.10 kernel

/root/rpmbuild/BUILD/ena-1.1.3/obj/default/kernel/linux/ena/ena_netdev.c:1493:3: error: implicit declaration of function 'napi_hash_add' [-Werror=implicit-function-declaration]
   napi_hash_add(&adapter->ena_napi[i].napi);
   ^
compilation terminated due to -Wfatal-errors.
cc1: some warnings being treated as errors

Possibly related to this: https://www.mail-archive.com/[email protected]/msg136423.html

FreeBSD: too many descriptors. Last segment: 16!

I'm sometimes getting this message on the console.

too many descriptors. Last segment: 16!

It looks like the dma tag in the driver is being setup wrong. Here is the check that is failing:

        if (nsegs > (adapter->max_tx_sgl_size - 2)) {
                device_printf(adapter->pdev,
                    "too many descriptors. Last segment: %d!\n", nsegs);
                for (i = 0; i <= nsegs; i++) { 
                        ena_trace(ENA_WARNING, "frag[%d]: addr:0x%llx, len 0x%x",                           
                            i, (unsigned long long)tx_info->bufs[i].paddr,
                            tx_info->bufs[i].len);
                }

                counter_u64_add(tx_ring->tx_stats.dma_mapping_err, 1);

                rc = ENA_COM_INVAL;
                goto dma_error;
        }

You're doing a check for (adapter->max_tx_sgl_size - 2) segments, however when you setup the dma tag, you do not subtract two when passing the max number of segments.

        /* Create DMA tag for Tx buffers */
        err = bus_dma_tag_create(bus_get_dma_tag(adapter->pdev),
            1, 0,                       /* alignment, bounds    */
            BUS_SPACE_MAXADDR,          /* lowaddr              */
            BUS_SPACE_MAXADDR,          /* highaddr             */
            NULL, NULL,                 /* filter, filterarg    */
            ENA_TSO_MAXSIZE,            /* maxsize              */
            adapter->max_tx_sgl_size,   /* nsegments            */
            ENA_TSO_MAXSIZE,            /* maxsegsize           */
            0,                          /* flags                */
            NULL,                       /* lockfunc             */
            NULL,                       /* lockfuncarg          */
            &tx_ring->buf_tag);

The mismatch is what will cause this error.

FreeBSD: missing TSO and RXCSUM support

I am running FreeBSD 11.1-RC2 #0 on an AWS r4.large instance and see the following in the 'options' field -

ena0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
options=422<TXCSUM,JUMBO_MTU,LRO>

I don't see TSO in the 'options' field. I see in the README file -
The ENA driver supports:

  • TSO over IPv4/IPv6

Is this a device limitation or the FreeBSD ena driver limitation?

iflib support

Have a look at the iflib man pages on -CURRENT: iflibdd(9), iflibtxrx(9)

The driver should be using this for queue management and txrx. I don't see anything in the NIC design that is mutually exclusive to this.

Compile errors on kernel 3.2 (Debian 7 Wheezy 64 bits / 3.2.0-4-amd64)

Trying to compile the ENA driver on a Debian Wheezy 7.8.aws.1 (ami-e0efab88 / us-east-1) but it fails with this error. Was checking the kcompat.h file and it seems to me that 3.2 kernels are supported on other distros ?

Not an expert here so I'm asking if support for this can be added ?

Thanks

Carlos

###
# Debian 7 Wheezy
# uname -a
# Linux wheezy64-base 3.2.0-4-amd64 #1 SMP Debian 3.2.81-1 x86_64 GNU/Linux
# master @ a5c5750
###

root@wheezy64-base:/usr/src/amzn-drivers-1.1.3# dkms add -m amzn-drivers -v 1.1.3

Creating symlink /var/lib/dkms/amzn-drivers/1.1.3/source ->
                 /usr/src/amzn-drivers-1.1.3

DKMS: add completed.
root@wheezy64-base:/usr/src/amzn-drivers-1.1.3# dkms build -m amzn-drivers -v 1.1.3

Kernel preparation unnecessary for this kernel.  Skipping...

Building module:
cleaning build area....
make KERNELRELEASE=3.2.0-4-amd64 -C kernel/linux/ena/ BUILD_KERNEL=3.2.0-4-amd64....(bad exit status: 2)
Error! Bad return status for module build on kernel: 3.2.0-4-amd64 (x86_64)
Consult /var/lib/dkms/amzn-drivers/1.1.3/build/make.log for more information.
root@wheezy64-base:/usr/src/amzn-drivers-1.1.3# 
root@wheezy64-base:/usr/src/amzn-drivers-1.1.3# cat /var/lib/dkms/amzn-drivers/1.1.3/build/make.log
DKMS make.log for amzn-drivers-1.1.3 for kernel 3.2.0-4-amd64 (x86_64)
Fri Jun 23 15:29:02 GMT 2017
make: Entering directory `/var/lib/dkms/amzn-drivers/1.1.3/build/kernel/linux/ena'
make -C /lib/modules/3.2.0-4-amd64/build M=/var/lib/dkms/amzn-drivers/1.1.3/build/kernel/linux/ena modules
make[1]: Entering directory `/usr/src/linux-headers-3.2.0-4-amd64'
  CC [M]  /var/lib/dkms/amzn-drivers/1.1.3/build/kernel/linux/ena/ena_netdev.o
In file included from /var/lib/dkms/amzn-drivers/1.1.3/build/kernel/linux/ena/ena_netdev.h:44:0,
                 from /var/lib/dkms/amzn-drivers/1.1.3/build/kernel/linux/ena/ena_netdev.c:53:
/var/lib/dkms/amzn-drivers/1.1.3/build/kernel/linux/ena/kcompat.h:242:8: error: redefinition of ‘struct dev_ext_attribute’
compilation terminated due to -Wfatal-errors.
make[4]: *** [/var/lib/dkms/amzn-drivers/1.1.3/build/kernel/linux/ena/ena_netdev.o] Error 1
make[3]: *** [_module_/var/lib/dkms/amzn-drivers/1.1.3/build/kernel/linux/ena] Error 2
make[2]: *** [sub-make] Error 2
make[1]: *** [all] Error 2
make[1]: Leaving directory `/usr/src/linux-headers-3.2.0-4-amd64'
make: *** [all] Error 2
make: Leaving directory `/var/lib/dkms/amzn-drivers/1.1.3/build/kernel/linux/ena'
root@wheezy64-base:/usr/src/amzn-drivers-1.1.3#

Compiling ENA driver on RHEL 6.5 with 3.16.49 kernel fails

Hi,

I'm trying to compile kernel/linux/ena driver on RHEL 6.5 with 3.16.49 kernel on an EC2 instance but it fails with:

  CC [M]  /usr/src/amzn-drivers-1.0.0/kernel/linux/ena/ena_netdev.o
/usr/src/amzn-drivers-1.0.0/kernel/linux/ena/ena_netdev.c: In function ‘ena_probe’:
/usr/src/amzn-drivers-1.0.0/kernel/linux/ena/ena_netdev.c:3480: error: implicit declaration of function ‘devm_ioremap_wc’

I see the function devm_ioremap_wc is defined at the bottom of kcompat.h if KERNEL_VERSION is less than 4.1.0 but that doesn't seem to be working for my case. linux/version.h has:

#define LINUX_VERSION_CODE 200753

ena driver 1.1.3 failure after a few days of uptime

Hello! I'm running an i3.4xlarge instance with ENA, on Amazon Linux, and I'm encountering intermittent network outages and ena driver error output.

Originally I was running on Amazon Linux (ECS Optimized) 2016.09.g, which is kernel 4.4.51-40.58.amzn1.x86_64. After a few times when my i3 instances were terminated by my ASG for failed instance status checks, I disabled termination, and when the issue reoccurred I was able to pull these logs from the machine: https://gist.github.com/mfenniak/56f050b9a4877d1b07388e4d01de829c. These logs represent a period when the server was unable to use the network for about 20 minutes, but eventually recovered.

I upgraded the same i3 instance to the latest available kernel package (4.9.27-14.31.amzn1.x86_64) and ena driver (1.1.3). After about 4 days, the same symptoms occurred; a network outage, similar log output: https://gist.github.com/mfenniak/d1bfa2c1c94ee9980113e70ba19ca563. This time the instance rebooted (cause unknown) and recovered.

It seems suspicious both occurrences that I've logged occurred roughly the same time after boot (335605 - 93 hrs after boot, 365557 - 101 hrs after boot), and in both instances the server would've been under similar load. I suppose I can expect to see the same issue again in about another 4 days.

Please let me know if there is any additional information I can capture, or any other way I can help diagnose or resolve this issue.

No COPYING file included

Your source says:

 * This software is available to you under a choice of one of two
 * licenses.  You may choose to be licensed under the terms of the GNU
 * General Public License (GPL) Version 2, available from the file
 * COPYING in the main directory of this source tree, or the
 * BSD license below:

But you do not include the COPYING file that you mention.

Compilation fails on kernel 3.10.0-514.2.2.el7.x86_64

DKMS install, on a CentOS 7 / RHEL 7 instance, fails with DKMS errors:


DKMS make.log for amzn-drivers-1.1.2 for kernel 3.10.0-514.2.2.el7.x86_64 (x86_64)
Tue Dec 13 16:56:11 UTC 2016
make: Entering directory `/var/lib/dkms/amzn-drivers/1.1.2/build/kernel/linux/ena'
make -C /lib/modules/3.10.0-514.2.2.el7.x86_64/build M=/var/lib/dkms/amzn-drivers/1.1.2/build/kernel/linux/ena modules
make[1]: Entering directory `/usr/src/kernels/3.10.0-514.2.2.el7.x86_64'
make[1]: warning: jobserver unavailable: using -j1.  Add `+' to parent make rule.
  CC [M]  /var/lib/dkms/amzn-drivers/1.1.2/build/kernel/linux/ena/ena_netdev.o
In file included from /var/lib/dkms/amzn-drivers/1.1.2/build/kernel/linux/ena/ena_netdev.h:43:0,
                 from /var/lib/dkms/amzn-drivers/1.1.2/build/kernel/linux/ena/ena_netdev.c:53:
/var/lib/dkms/amzn-drivers/1.1.2/build/kernel/linux/ena/kcompat.h:298:0: warning: "GENMASK" redefined [enabled by default]
 #define GENMASK(h, l) (((U32_C(1) << ((h) - (l) + 1)) - 1) << (l))
 ^
In file included from include/linux/kernel.h:10:0,
                 from include/linux/cpumask.h:9,
                 from include/linux/cpu_rmap.h:13,
                 from /var/lib/dkms/amzn-drivers/1.1.2/build/kernel/linux/ena/ena_netdev.c:36:
include/linux/bitops.h:21:0: note: this is the location of the previous definition
 #define GENMASK(h, l) \
 ^
In file included from /var/lib/dkms/amzn-drivers/1.1.2/build/kernel/linux/ena/ena_netdev.h:43:0,
                 from /var/lib/dkms/amzn-drivers/1.1.2/build/kernel/linux/ena/ena_netdev.c:53:
/var/lib/dkms/amzn-drivers/1.1.2/build/kernel/linux/ena/kcompat.h:299:0: warning: "GENMASK_ULL" redefined [enabled by default]
 #define GENMASK_ULL(h, l) (((U64_C(1) << ((h) - (l) + 1)) - 1) << (l))
 ^
In file included from include/linux/kernel.h:10:0,
                 from include/linux/cpumask.h:9,
                 from include/linux/cpu_rmap.h:13,
                 from /var/lib/dkms/amzn-drivers/1.1.2/build/kernel/linux/ena/ena_netdev.c:36:
include/linux/bitops.h:24:0: note: this is the location of the previous definition
 #define GENMASK_ULL(h, l) \
 ^
/var/lib/dkms/amzn-drivers/1.1.2/build/kernel/linux/ena/ena_netdev.c: In function ‘ena_select_queue’:
/var/lib/dkms/amzn-drivers/1.1.2/build/kernel/linux/ena/ena_netdev.c:2203:3: error: implicit declaration of function ‘__netdev_pick_tx’ [-Werror=implicit-function-declaration]
   qid = __netdev_pick_tx(dev, skb);
   ^
compilation terminated due to -Wfatal-errors.
cc1: some warnings being treated as errors
make[2]: *** [/var/lib/dkms/amzn-drivers/1.1.2/build/kernel/linux/ena/ena_netdev.o] Error 1
make[1]: *** [_module_/var/lib/dkms/amzn-drivers/1.1.2/build/kernel/linux/ena] Error 2
make[1]: Leaving directory `/usr/src/kernels/3.10.0-514.2.2.el7.x86_64'
make: *** [all] Error 2
make: Leaving directory `/var/lib/dkms/amzn-drivers/1.1.2/build/kernel/linux/ena'

kabi compliance ?

AMZN recommends dkms as a method for keeping up with kernel updates.
This approach does not always work for enterprise customers for policy reasons.
A quick check against RHEL7.3 whitelist shows a lot of non compliant symbols.
Any plans to make it kabi whitelist compliant - at least for RHEL ?

cannot ena device speed with ethtool (eu-west)

i`m running m4.16xlarge with ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-20180126 (ami-1b791862) image.
when running ethtool ens3 yields the following output:
Settings for ens3:
Current message level: 0x000004e3 (1251)
drv probe ifup rx_err tx_err tx_done
Link detected: yes

i.e. speed is not printed

I had traced it to the following print (enabled via dyndebug)
echo "file ena_com.c line 796 +p" > /sys/kernel/debug/dynamic_debug/control
yields :
[ 863.844540] ena: Feature 27 isn't supported
when running ethtool command.

It seems that ENA_ADMIN_LINK_CONFIG is not supported by the device.

dmesg during initialization:
[ 10.828330] ena: Elastic Network Adapter (ENA) v1.3.0K
[ 10.831006] ena 0000:00:03.0: Elastic Network Adapter (ENA) v1.3.0K
[ 10.843325] ena: ena device version: 0.10
[ 10.845477] ena: ena controller version: 0.0.1 implementation version 1
[ 10.854098] AVX2 version of gcm_enc/dec engaged.
[ 10.856801] AES CTR mode by8 optimization enabled
[ 10.938797] ena 0000:00:03.0: creating 8 io queues. queue size: 1024
[ 10.946848] ena 0000:00:03.0: Elastic Network Adapter (ENA) found at mem f3000000, mac addr 02:db:fc:5d:93:e8 Queues 8
[ 10.953476] ena 0000:00:03.0 ens3: renamed from eth0


ethtool -i ens3
driver: ena
version: 1.3.0K
firmware-version:
expansion-rom-version:
bus-info: 0000:00:03.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no

----------------------------------------------- lspci relevant lines---------------------------------------------------------
00:03.0 Ethernet controller: Device 1d0f:ec20
Physical Slot: 3
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Latency: 0
Region 0: Memory at f3000000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [70] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <1us, L1 <8us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed unknown, Width x0, ASPM not supported, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed unknown, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [b0] MSI-X: Enable+ Count=9 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Kernel driver in use: ena
Kernel modules: ena

Is there any way to verify device is configured on 25G?

fail to compile on centos 6.6

# uname -r
2.6.32-504.30.3.el6.x86_64

# yum list kernel-devel
Loaded plugins: fastestmirror, presto
Loading mirror speeds from cached hostfile
 * base: centos.aol.com
 * epel: s3-mirror-us-east-1.fedoraproject.org
 * extras: centos.mirror.constant.com
 * updates: bay.uchicago.edu
Installed Packages
kernel-devel.x86_64                                                                  2.6.32-504.30.3.el6                                                                   installed
Available Packages
kernel-devel.x86_64                                                                  2.6.32-696.18.7.el6                                                                   updates

# cat /etc/redhat-release
CentOS release 6.6 (Final)

I've tried using make, and rpmbuilding the src rpm, same issue.

make -C /usr/src/kernels/2.6.32-504.30.3.el6.x86_64 M=/root/rpmbuild/BUILD/ena-1.5.0/obj/default/kernel/linux/ena 'NOSTDINC_FLAGS=-I /root/rpmbuild/BUILD/ena-1.5.0/obj/default/include'
make: Entering directory `/usr/src/kernels/2.6.32-504.30.3.el6.x86_64'
  LD      /root/rpmbuild/BUILD/ena-1.5.0/obj/default/kernel/linux/ena/built-in.o
  CC [M]  /root/rpmbuild/BUILD/ena-1.5.0/obj/default/kernel/linux/ena/ena_netdev.o
/root/rpmbuild/BUILD/ena-1.5.0/obj/default/kernel/linux/ena/ena_netdev.c: In function 'ena_calc_io_queue_num':
/root/rpmbuild/BUILD/ena-1.5.0/obj/default/kernel/linux/ena/ena_netdev.c:3286: error: implicit declaration of function 'pci_msix_vec_count'
compilation terminated due to -Wfatal-errors.
make[1]: *** [/root/rpmbuild/BUILD/ena-1.5.0/obj/default/kernel/linux/ena/ena_netdev.o] Error 1
make: *** [_module_/root/rpmbuild/BUILD/ena-1.5.0/obj/default/kernel/linux/ena] Error 2
make: Leaving directory `/usr/src/kernels/2.6.32-504.30.3.el6.x86_64'
error: Bad exit status from /var/tmp/rpm-tmp.yJ2QIl (%build)

High retransmits with single connection on newer kernels on 20gbps instances

Background:
When we first started using ENA, we were on kernel 3.18 with the first publicly published ENA driver (before a release was tagged). At that time, iperf3 tests showed very few retransmits when using a single connection. Now we have hosts running kernel 4.4 mainline and kernel 4.9 mainline with ena driver version 1.1.3 and we see absurdly high retransmits when using a single connection, but zero retransmits when using multiple parallel connections. I verified that this is fully reproducible on the amazon linux 2017 AMI ami-a4c7edb2 with default settings.

We are running m4.16xlarge instances in placement groups. I suspect that the issue is that a single connection on an m4.16xlarge is capped at 10gbps while the NIC is capped at 20gbps so when using a single connection, linux is flooding the NIC with double the packets it can handle.

iperf3 output with single connection (kernel 4.9.35, ena 1.1.3):

[  4] local 172.18.8.230 port 17684 connected to 172.18.0.223 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  2.33 GBytes  20.0 Gbits/sec  222    227 KBytes       
[  4]   1.00-2.00   sec  1.18 GBytes  10.1 Gbits/sec  6239    210 KBytes       
[  4]   2.00-3.00   sec  1.18 GBytes  10.1 Gbits/sec  6933    122 KBytes       
[  4]   3.00-4.00   sec  1.18 GBytes  10.1 Gbits/sec  6970    297 KBytes       
[  4]   4.00-5.00   sec  1.18 GBytes  10.1 Gbits/sec  6114    157 KBytes       
[  4]   5.00-6.00   sec  1.17 GBytes  10.1 Gbits/sec  6660    218 KBytes       
[  4]   6.00-7.00   sec  1.18 GBytes  10.1 Gbits/sec  6381    122 KBytes       
[  4]   7.00-8.00   sec  1.17 GBytes  10.1 Gbits/sec  6341    122 KBytes       
[  4]   8.00-9.00   sec  1.18 GBytes  10.1 Gbits/sec  6468    175 KBytes       
[  4]   9.00-10.00  sec  1.17 GBytes  10.1 Gbits/sec  6565    218 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  12.9 GBytes  11.1 Gbits/sec  58893             sender
[  4]   0.00-10.00  sec  12.9 GBytes  11.1 Gbits/sec                  receiver

iperf Done.

iperf3 output with -P 2 (kernel 4.9.35, ena 1.1.3):

[  4] local 172.18.8.230 port 17714 connected to 172.18.0.223 port 5201
[  6] local 172.18.8.230 port 17716 connected to 172.18.0.223 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  1.20 GBytes  10.3 Gbits/sec    0    900 KBytes       
[  6]   0.00-1.00   sec  1.20 GBytes  10.3 Gbits/sec    0    856 KBytes       
[SUM]   0.00-1.00   sec  2.41 GBytes  20.7 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4]   1.00-2.00   sec  1.20 GBytes  10.3 Gbits/sec    0    900 KBytes       
[  6]   1.00-2.00   sec  1.20 GBytes  10.3 Gbits/sec    0    856 KBytes       
[SUM]   1.00-2.00   sec  2.40 GBytes  20.6 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4]   2.00-3.00   sec  1.20 GBytes  10.3 Gbits/sec    0    900 KBytes       
[  6]   2.00-3.00   sec  1.20 GBytes  10.3 Gbits/sec    0    856 KBytes       
[SUM]   2.00-3.00   sec  2.40 GBytes  20.6 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4]   3.00-4.00   sec  1.20 GBytes  10.3 Gbits/sec    0    900 KBytes       
[  6]   3.00-4.00   sec  1.20 GBytes  10.3 Gbits/sec    0    856 KBytes       
[SUM]   3.00-4.00   sec  2.40 GBytes  20.6 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4]   4.00-5.00   sec  1.20 GBytes  10.3 Gbits/sec    0    900 KBytes       
[  6]   4.00-5.00   sec  1.20 GBytes  10.3 Gbits/sec    0    900 KBytes       
[SUM]   4.00-5.00   sec  2.40 GBytes  20.6 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4]   5.00-6.00   sec  1.20 GBytes  10.3 Gbits/sec    0    900 KBytes       
[  6]   5.00-6.00   sec  1.20 GBytes  10.3 Gbits/sec    0    900 KBytes       
[SUM]   5.00-6.00   sec  2.40 GBytes  20.6 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4]   6.00-7.00   sec  1.20 GBytes  10.3 Gbits/sec    0    900 KBytes       
[  6]   6.00-7.00   sec  1.20 GBytes  10.3 Gbits/sec    0    900 KBytes       
[SUM]   6.00-7.00   sec  2.40 GBytes  20.6 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4]   7.00-8.00   sec  1.20 GBytes  10.3 Gbits/sec    0    900 KBytes       
[  6]   7.00-8.00   sec  1.20 GBytes  10.3 Gbits/sec    0    900 KBytes       
[SUM]   7.00-8.00   sec  2.40 GBytes  20.6 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4]   8.00-9.00   sec  1.20 GBytes  10.3 Gbits/sec    0    900 KBytes       
[  6]   8.00-9.00   sec  1.20 GBytes  10.3 Gbits/sec    0    900 KBytes       
[SUM]   8.00-9.00   sec  2.40 GBytes  20.6 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[  4]   9.00-10.00  sec  1.20 GBytes  10.3 Gbits/sec    0    900 KBytes       
[  6]   9.00-10.00  sec  1.20 GBytes  10.3 Gbits/sec    0    900 KBytes       
[SUM]   9.00-10.00  sec  2.40 GBytes  20.6 Gbits/sec    0             
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  12.0 GBytes  10.3 Gbits/sec    0             sender
[  4]   0.00-10.00  sec  12.0 GBytes  10.3 Gbits/sec                  receiver
[  6]   0.00-10.00  sec  12.0 GBytes  10.3 Gbits/sec    0             sender
[  6]   0.00-10.00  sec  12.0 GBytes  10.3 Gbits/sec                  receiver
[SUM]   0.00-10.00  sec  24.0 GBytes  20.6 Gbits/sec    0             sender
[SUM]   0.00-10.00  sec  24.0 GBytes  20.6 Gbits/sec                  receiver

iperf Done.

iperf3 output with server on kernel: 3.18.46, ena: 1.0.0 and client on kernel: 4.9.35, ena: 1.1.3 shows no improvement in retransmits:

Connecting to host black-sheep, port 5201
[  4] local 172.18.8.230 port 17746 connected to 172.18.0.223 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  2.17 GBytes  18.6 Gbits/sec    0   1.12 MBytes       
[  4]   1.00-2.00   sec  1.34 GBytes  11.5 Gbits/sec  6530    218 KBytes       
[  4]   2.00-3.00   sec  1.18 GBytes  10.1 Gbits/sec  4897    830 KBytes       
[  4]   3.00-4.00   sec  1.18 GBytes  10.1 Gbits/sec  5302    236 KBytes       
[  4]   4.00-5.00   sec  1.17 GBytes  10.1 Gbits/sec  6478    201 KBytes       
[  4]   5.00-6.00   sec  1.18 GBytes  10.1 Gbits/sec  7039    507 KBytes       
[  4]   6.00-7.00   sec  1.18 GBytes  10.1 Gbits/sec  7764    184 KBytes       
[  4]   7.00-8.00   sec  1.17 GBytes  10.1 Gbits/sec  6485    280 KBytes       
[  4]   8.00-9.00   sec  1.18 GBytes  10.1 Gbits/sec  6277    236 KBytes       
[  4]   9.00-10.00  sec  1.17 GBytes  10.1 Gbits/sec  6111    218 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  12.9 GBytes  11.1 Gbits/sec  56883             sender
[  4]   0.00-10.00  sec  12.9 GBytes  11.1 Gbits/sec                  receiver

iperf Done.

iperf3 output when client is on 3.18.46 and server is on 4.9.35 improves:

[  4] local 172.18.0.223 port 9283 connected to 172.18.8.230 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  2.34 GBytes  20.1 Gbits/sec  2063    201 KBytes       
[  4]   1.00-2.00   sec  1.18 GBytes  10.1 Gbits/sec  402    332 KBytes       
[  4]   2.00-3.00   sec  1.17 GBytes  10.1 Gbits/sec  400    184 KBytes       
[  4]   3.00-4.00   sec  1.17 GBytes  10.1 Gbits/sec  317    192 KBytes       
[  4]   4.00-5.00   sec  1.18 GBytes  10.1 Gbits/sec  375    192 KBytes       
[  4]   5.00-6.00   sec  1.18 GBytes  10.1 Gbits/sec  329    236 KBytes       
[  4]   6.00-7.00   sec  1.17 GBytes  10.1 Gbits/sec  362    280 KBytes       
[  4]   7.00-8.00   sec  1.18 GBytes  10.1 Gbits/sec  448    201 KBytes       
[  4]   8.00-9.00   sec  1.18 GBytes  10.1 Gbits/sec  345    323 KBytes       
[  4]   9.00-10.00  sec  1.17 GBytes  10.1 Gbits/sec  369    184 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  12.9 GBytes  11.1 Gbits/sec  5410             sender
[  4]   0.00-10.00  sec  12.9 GBytes  11.1 Gbits/sec                  receiver

iperf Done.

iperf3 output when client and server are both on 3.18.36 is roughly the same, minus the initial retransmit spike:

[  4] local 172.18.8.230 port 31533 connected to 172.18.0.223 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  2.03 GBytes  17.5 Gbits/sec    0   1.01 MBytes       
[  4]   1.00-2.00   sec  1.46 GBytes  12.5 Gbits/sec  353    245 KBytes       
[  4]   2.00-3.00   sec  1.18 GBytes  10.1 Gbits/sec  314    253 KBytes       
[  4]   3.00-4.00   sec  1.19 GBytes  10.2 Gbits/sec  263    201 KBytes       
[  4]   4.00-5.00   sec  1.17 GBytes  10.1 Gbits/sec  238    288 KBytes       
[  4]   5.00-6.00   sec  1.17 GBytes  10.1 Gbits/sec  307    288 KBytes       
[  4]   6.00-7.00   sec  1.18 GBytes  10.1 Gbits/sec  300    210 KBytes       
[  4]   7.00-8.00   sec  1.17 GBytes  10.1 Gbits/sec  290    114 KBytes       
[  4]   8.00-9.00   sec  1.17 GBytes  10.1 Gbits/sec  226    341 KBytes       
[  4]   9.00-10.00  sec  1.18 GBytes  10.1 Gbits/sec  282    210 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  12.9 GBytes  11.1 Gbits/sec  2573             sender
[  4]   0.00-10.00  sec  12.9 GBytes  11.1 Gbits/sec                  receiver

iperf Done.

If I cap iperf3's throughput at 10gbps or use tc to cap the system's throughput at 10gbps, there are zero retransmits on both kernels. I'm wondering what changed that caused the retransmits to get so out of hand. Does something need to be tuned to make the system pace itself at a max of 10gbps per-connection?

Instances not starting as c5.* with wheezy

3.2.0-4-amd64 #1 SMP Debian 3.2.96-2 x86_64 GNU/Linux

filename: /lib/modules/3.2.0-4-amd64/updates/dkms/ena.ko
version: 1.5.1g
license: GPL
description: Elastic Network Adapter (ENA)
author: Amazon.com, Inc. or its affiliates
srcversion: 121BE61723081CBF841045A
alias: pci:v00001D0Fd0000EC21svsdbcsci*
alias: pci:v00001D0Fd0000EC20svsdbcsci*
alias: pci:v00001D0Fd00001EC2svsdbcsci*
alias: pci:v00001D0Fd00000EC2svsdbcsci*
depends:
vermagic: 3.2.0-4-amd64 SMP mod_unload modversions
parm: debug:Debug level (0=none,...,16=all) (int)

unfortunately I can't provide additional logs, since the instances just plain hang and i have to force stop them and --no-ena-support them to work again

please let me know if there's a way to provide additional debug output

Network interface reset due to lost Tx packets

We faced a network disconnection from our instance that lasted for a few seconds less than 1 minute

Driver info

Jul 12 07:08:09 docker-linux kernel: [20045236.927756] ena: Elastic Network Adapter (ENA) v1.1.2
Jul 12 07:08:09 docker-linux kernel: [20045236.930919] ena 0000:00:03.0: Elastic Network Adapter (ENA) v1.1.2
Jul 12 07:08:09 docker-linux kernel: [20045236.965275] AES CTR mode by8 optimization enabled
Jul 12 07:08:09 docker-linux kernel: [20045237.036805] ena: ena device version: 0.10
Jul 12 07:08:09 docker-linux kernel: [20045237.039665] ena: ena controller version: 0.0.1 implementation version 1
Jul 12 07:08:09 docker-linux kernel: [20045238.496839] ena 0000:00:03.0: creating 8 io queues. queue size: 1024
Jul 12 07:08:09 docker-linux kernel: [20045238.505780] ena 0000:00:03.0: Elastic Network Adapter (ENA) found at mem 83000000, mac addr th:is:is:not:my:mac Queues 8

Issue

Instance had these soft lockup issues due to some processes were killed by OOM-killer

Jul 12 15:12:04 docker-linux-build-1-dh kernel: [20074223.938094] NMI watchdog: BUG: soft lockup - CPU#48 stuck for 22s! [java:40646]
Jul 12 15:12:04 docker-linux-build-1-dh kernel: [20074251.939111] NMI watchdog: BUG: soft lockup - CPU#48 stuck for 22s! [java:40646]
Jul 12 15:12:04 docker-linux-build-1-dh kernel: [20074291.940437] NMI watchdog: BUG: soft lockup - CPU#48 stuck for 23s! [java:40646]

Checking /var/log/syslog we found these kind of messages complaining about Tx that wasn't completed on time (times are in UTC)

Jul 12 15:12:04 docker-linux kernel: [20074195.605296] ena 0000:00:03.0 eth0: Found a Tx that wasn't completed on time, qid 5, index 0.
Jul 12 15:12:04 docker-linux kernel: [20074195.605298] ena 0000:00:03.0 eth0: Found a Tx that wasn't completed on time, qid 5, index 1.
...
Jul 12 15:12:04 docker-linux kernel: [20074195.605867] ena 0000:00:03.0 eth0: Found a Tx that wasn't completed on time, qid 5, index 1022.
Jul 12 15:12:04 docker-linux kernel: [20074195.605871] ena 0000:00:03.0 eth0: Found a Tx that wasn't completed on time, qid 5, index 1023.

also it was complaining about The number of lost tx completion is above the threshold and this was the action that triggered the reset

Jul 12 15:12:04 docker-linux kernel: [20074195.605531] ena 0000:00:03.0 eth0: The number of lost tx completion is above the threshold (129 > 128). Reset the device                                                                                                  
Jul 12 15:12:04 docker-linux kernel: [20074195.605535] ena 0000:00:03.0 eth0: The number of lost tx completion is above the threshold (130 > 128). Reset the device
...
Jul 12 15:12:04 docker-linux kernel: [20074195.605869] ena 0000:00:03.0 eth0: The number of lost tx completion is above the threshold (230 > 128). Reset the device                                                                                                  
Jul 12 15:12:04 docker-linux kernel: [20074195.605872] ena 0000:00:03.0 eth0: The number of lost tx completion is above the threshold (231 > 128). Reset the device

The bit of reset was set to the interface according to this line
https://github.com/amzn/amzn-drivers/blob/master/kernel/linux/ena/ena_netdev.c#L2849

Finally interface was reset
Jul 12 15:12:04 docker-linux-build-1-dh kernel: [20074195.606002] ena 0000:00:03.0 eth0: Trigger reset is on

We want to know why this happens and how to avoid this to happen again

Let me know if I am missing any information here

ena interface keep restarting on freebsd

I have two freebsd instances, both are same OS version and kernel versions. but unfortunately one instance ena adapter is keep restarting every one minute, help is much appreciated.

Oct 2 19:59:25 nfs2 kernel: ena0: ENA admin queue is not in running state!
Oct 2 19:59:25 nfs2 kernel: ena0: Trigger reset is on
Oct 2 19:59:25 nfs2 kernel: ena0: device is going DOWN
Oct 2 19:59:26 nfs2 kernel: ena0: Allocated msix_entries, vectors (cnt: 9)
Oct 2 19:59:26 nfs2 kernel: ena0: ena0: device is going UP
Oct 2 19:59:26 nfs2 kernel: ena0: link is UP
Oct 2 19:59:26 nfs2 kernel: queue 0 - cpu 8
Oct 2 19:59:26 nfs2 kernel: ena0: queue 1 - cpu 9
Oct 2 19:59:26 nfs2 kernel: ena0: queue 2 - cpu 10
Oct 2 19:59:26 nfs2 kernel: ena0: queue 3 - cpu 11
Oct 2 19:59:26 nfs2 kernel: ena0: queue 4 - cpu 12
Oct 2 19:59:26 nfs2 kernel: ena0: queue 5 - cpu 13
Oct 2 19:59:26 nfs2 kernel: ena0: queue 6 - cpu 14
Oct 2 19:59:26 nfs2 kernel: ena0: queue 7 - cpu 15
Oct 2 19:59:33 nfs2 kernel: ena0: ENA admin queue is not in running state!
Oct 2 19:59:33 nfs2 kernel: ena0: Trigger reset is on
Oct 2 19:59:33 nfs2 kernel: ena0: device is going DOWN
Oct 2 19:59:34 nfs2 kernel: ena0: Allocated msix_entries, vectors (cnt: 9)
Oct 2 19:59:34 nfs2 kernel: ena0: ena0: device is going UP
Oct 2 19:59:34 nfs2 kernel: ena0: link is UP
Oct 2 19:59:34 nfs2 kernel: queue 0 - cpu 0
Oct 2 19:59:34 nfs2 kernel: ena0: queue 1 - cpu 1
Oct 2 19:59:34 nfs2 kernel: ena0: queue 2 - cpu 2
Oct 2 19:59:34 nfs2 kernel: ena0: queue 3 - cpu 3
Oct 2 19:59:34 nfs2 kernel: ena0: queue 4 - cpu 4
Oct 2 19:59:34 nfs2 kernel: ena0: queue 5 - cpu 5
Oct 2 19:59:34 nfs2 kernel: ena0: queue 6 - cpu 6
Oct 2 19:59:34 nfs2 kernel: ena0: queue 7 - cpu 7

FreeBSD nfs2 11.1-RELEASE-p1 FreeBSD 11.1-RELEASE-p1 #0: Wed Aug 9 11:55:48 UTC 2017 [email protected]:/usr/obj/usr/src/sys/GENERIC amd64

Module Verification Failure on Ubuntu 16.04 AMI

When following instructions for Ubuntu 16.04 (and using systemd dynamic network port disabling like required in CentOS/RHEL 7), following error from syslog:

ena: module verification failed: signature and/or required key missing - tainting kernel

My knowledge of module kernel signing is limited. Happy to dive deeper into the code or run further de-bugging if you can point me in the right direction.

Please tag a release

Rather than having to read the README for the version number, it would be nice to know it from the URL of the tarball from the download of a release. Its also a stable point release rather than a moving target. Please tag a release.

Error Installing ENA networking kernel driver under Debian Jessie

Hi.

While building a Debian Jessie image using bootstrap-vz i hit an issue installing ENA network drivers.

Bootstrap-vz pulls amzn-drivers master branch so i hit this while using (currently this rev).

Here is the output from bootstrap-vz:

2017-09-26 09:29:00,185 - bootstrapvz.base.tasklist.run - INFO - Installing ENA networking kernel driver using DKMS
2017-09-26 09:29:00,233 - bootstrapvz.common.tools.root.git.handle_stderr - ERROR - Cloning into '/target/5521ed29/root/usr/src/amzn-drivers-1.0.0'...
2017-09-26 09:29:07,728 - bootstrapvz.common.tools.root.chroot.handle_stderr - ERROR - Error! Bad return status for module build on kernel: 3.16.0-4-amd64 (x86_64)
2017-09-26 09:29:07,728 - bootstrapvz.common.tools.root.chroot.handle_stderr - ERROR - Consult /var/lib/dkms/amzn-drivers/1.0.0/build/make.log for more information.

Contents of make.log

root@box:~# chroot /target/cae4cc9e/root
root@box:/# cat /var/lib/dkms/amzn-drivers/1.0.0/build/make.log
DKMS make.log for amzn-drivers-1.0.0 for kernel 3.16.0-4-amd64 (x86_64)
Tue 26 Sep 10:01:05 UTC 2017
make: Entering directory '/var/lib/dkms/amzn-drivers/1.0.0/build/kernel/linux/ena'
make -C /lib/modules/3.16.0-4-amd64/build M=/var/lib/dkms/amzn-drivers/1.0.0/build/kernel/linux/ena modules
make[1]: Entering directory '/usr/src/linux-headers-3.16.0-4-amd64'
make[1]: Entering directory `/usr/src/linux-headers-3.16.0-4-amd64'
CC [M] /var/lib/dkms/amzn-drivers/1.0.0/build/kernel/linux/ena/ena_netdev.o
/var/lib/dkms/amzn-drivers/1.0.0/build/kernel/linux/ena/ena_netdev.c: In function ‘ena_probe’:
/var/lib/dkms/amzn-drivers/1.0.0/build/kernel/linux/ena/ena_netdev.c:3480:3: error: implicit declaration of function ‘devm_ioremap_wc’ [-Werror=implicit-function-declaration]
ena_dev->mem_bar = devm_ioremap_wc(&pdev->dev,
^
compilation terminated due to -Wfatal-errors.
cc1: some warnings being treated as errors
/usr/src/linux-headers-3.16.0-4-common/scripts/Makefile.build:262: recipe for target '/var/lib/dkms/amzn-drivers/1.0.0/build/kernel/linux/ena/ena_netdev.o' failed
make[4]: *** [/var/lib/dkms/amzn-drivers/1.0.0/build/kernel/linux/ena/ena_netdev.o] Error 1
/usr/src/linux-headers-3.16.0-4-common/Makefile:1354: recipe for target 'module/var/lib/dkms/amzn-drivers/1.0.0/build/kernel/linux/ena' failed
make[3]: *** [module/var/lib/dkms/amzn-drivers/1.0.0/build/kernel/linux/ena] Error 2
Makefile:181: recipe for target 'sub-make' failed
make[2]: *** [sub-make] Error 2
Makefile:8: recipe for target 'all' failed
make[1]: *** [all] Error 2
make[1]: Leaving directory '/usr/src/linux-headers-3.16.0-4-amd64'
Makefile:19: recipe for target 'all' failed
make: *** [all] Error 2
make: Leaving directory '/var/lib/dkms/amzn-drivers/1.0.0/build/kernel/linux/ena'
root@box:/#

I have resolved this issue (purely to unblock me) by hardcoding a different version of the amzn-drivers. By default it seems to pull master, but by pinning it to ena_linux_1.2.0.

This issue has been raised under bootstrap-vz here.

Thanks

DPDK ena driver performance issue

I measured 10 Gb/s network bandwidth between two i3.large instances using iperf (using the Linux kernel ena driver for the NIC). However, when I run a DPDK application such as dpdk-iperf (now using the DPDK ena driver for the NIC) on the same two instances, I only see up to 2.5 Gb/s bandwidth.

What is the expected per-core throughput achievable with the DPDK ena driver? Is there a particular benchmark you recommend running to verify proper DPDK setup on EC2?

I am using DPDK v17.05 and I have applied the patches provided in this repo. I have ensured that my EC2 instances are in the same placement group. I am using the Amazon Linux AMI for my instances. However, I am only getting 2.5 Gb/s instead of 10 Gb/s. Any suggestions?

cannot build drivers

I am trying to build the drivers for oracle 3.10.0-862.el7.x86_64 kernel.
Here is the error i get for 1.5.0 , 1.5.1 and 1.5.2 :

+ echo 'Building for kernel: 3.10.0-862.el7.x86_64 flavors: '\''default'\'''
Building for kernel: 3.10.0-862.el7.x86_64 flavors: 'default'
+ echo 'Build var: kmodtool = /root/amzn-drivers/kernel/linux/rpm/kmodtool'
Build var: kmodtool = /root/amzn-drivers/kernel/linux/rpm/kmodtool
+ echo 'Build var: kverrel = 3.10.0-862.el7.x86_64'
Build var: kverrel = 3.10.0-862.el7.x86_64
+ for flavor in default
+ rm -rf obj/default
+ cp -r source obj/default
+ symvers=source/Module.symvers-x86_64
+ '[' -e source/Module.symvers-x86_64 ']'
++ '[' default = default ']'
+ make -C /usr/src/kernels/3.10.0-862.el7.x86_64 M=/root/amzn-drivers/kernel/linux/rpm/build/ena-1.5.1/obj/default/kernel/linux/ena 'NOSTDINC_FLAGS=-I /root/amzn-drivers/kernel/linux/rpm/build/ena-1.5.1/obj/default/include'
make[1]: Entering directory `/usr/src/kernels/3.10.0-862.el7.x86_64'
  LD      /root/amzn-drivers/kernel/linux/rpm/build/ena-1.5.1/obj/default/kernel/linux/ena/built-in.o
  CC [M]  /root/amzn-drivers/kernel/linux/rpm/build/ena-1.5.1/obj/default/kernel/linux/ena/ena_netdev.o
In file included from /root/amzn-drivers/kernel/linux/rpm/build/ena-1.5.1/obj/default/kernel/linux/ena/ena_netdev.h:44:0,
                 from /root/amzn-drivers/kernel/linux/rpm/build/ena-1.5.1/obj/default/kernel/linux/ena/ena_netdev.c:53:
/root/amzn-drivers/kernel/linux/rpm/build/ena-1.5.1/obj/default/kernel/linux/ena/kcompat.h:414:6: error: nested redefinition of 'enum pkt_hash_types'
 enum pkt_hash_types {
      ^
compilation terminated due to -Wfatal-errors.
make[2]: *** [/root/amzn-drivers/kernel/linux/rpm/build/ena-1.5.1/obj/default/kernel/linux/ena/ena_netdev.o] Error 1
make[1]: *** [_module_/root/amzn-drivers/kernel/linux/rpm/build/ena-1.5.1/obj/default/kernel/linux/ena] Error 2
make[1]: Leaving directory `/usr/src/kernels/3.10.0-862.el7.x86_64'
error: Bad exit status from /var/tmp/rpm-tmp.i4sc4a (%build)

Performance in terms of PPS

Hi,

I'm trying to use Amazon ENA to generate RFC2544 in our solution. However on c5.9xlarge Ubuntu I can only get up to 500k PPS (64B of UDP packet) with CPU ~2% . Anything higher the CPU skyrockets up to 100% and I'm getting packet losses.

The flow is build from one UDP packet and it is send between two interfaces (eni-d4765480 and eni-d6765482). Both of them are connected to the same EC2 instance. The Instance ID is i-04485508cf26bfcf6 (eu-central-1).

To generate the traffic I was using Trex or pktgen. Both of them are using DPDK.

Are those numbers expected? If so is there any way to get ~5MPPS performance on AWS?

Regards,
Adam

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.