Preliminary Actions <

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Oops, sorry, but I was using wrong command. I should use <code class="notranslate"

I enabled WC properly this time. <div class="snippet-clipboard-content notranslate

[Support]: Performance issue of ENA PMD driver for DPDK,about amzn/amzn-drivers

Comments (18)

shaibran commented on September 12, 2024

Thank you for reaching out to us.
We will conduct our benchmark test on c6in.metal internally and provide an update accordingly.

from amzn-drivers.

shaibran commented on September 12, 2024

I apologize for overlooking that you are using the c6in.metal instance which is built with Nitro v4 system which requires to map memory BAR of the ENA as write combined, otherwise, performance may be degraded.
Please refer to our guide which includes instructions to enable WC. If this does not resolve your issue, please provide your instance ID and the test timeframe (UTC) so we can inspect our logs.

from amzn-drivers.

Binary-Vanguard-12138 commented on September 12, 2024

I apologize for overlooking that you are using the c6in.metal instance which is built with Nitro v4 system which requires to map memory BAR of the ENA as write combined, otherwise, performance may be degraded. Please refer to our guide which includes instructions to enable WC. If this does not resolve your issue, please provide your instance ID and the test timeframe (UTC) so we can inspect our logs.

I am pretty sure that I enabled WC and was using igb_uio.
instance ID is i-046d3db433ead3d2a.
I am not clear on what timeframe I should provide.
Can you please tell me a bit more details?
Do you want me to send burst packets with different configurations and write the timeframe?

from amzn-drivers.

shaibran commented on September 12, 2024

Do you want me to send burst packets with different configurations and write the timeframe?

Exactly, please provide the test start time in UTC so we can inspect ec2 internal logs during your test.
In addition:

please share which UIO driver you use and the command sequence you used to bring it up and bind to it.
metal instances supports IOMMU, but you may need to disable it in case you are working with igb_uio or generic_uio_driver which do not support it.
please see guide for verifying that the memory was mapped as WC.

from amzn-drivers.

Binary-Vanguard-12138 commented on September 12, 2024

Do you want me to send burst packets with different configurations and write the timeframe?

Exactly, please provide the test start time in UTC so we can inspect ec2 internal logs during your test. In addition:

please share which UIO driver you use and the command sequence you used to bring it up and bind to it.
metal instances supports IOMMU, but you may need to disable it in case you are working with igb_uio or generic_uio_driver which do not support it.

please see guide for verifying that the memory was mapped as WC.

Basically I followed the instructions in this link.
Here are the commands that I used.

cd ~/dpdk_24_07/
git clone git://dpdk.org/dpdk-kmods
cd ~/dpdk_24_07/dpdk-kmods/linux/igb_uio/
make

sudo modprobe uio
sudo rmmod igb_uio
sudo insmod ./igb_uio.ko wc_enabled=1


cd ~/dpdk_24_07/dpdk/usertools/
sudo python3 dpdk-devbind.py --status
sudo python3 dpdk-devbind.py --unbind 0000:09:00.0
sudo python3 dpdk-devbind.py --bind=igb_uio 0000:09:00.0

I know when using vfio_pcio, we need to enable IOMMU following the instructions in this link.
When testing with igb_uio, I basically reverted the changes in /etc/default/grub, i.e. removed iommu=1 intel_iommu=on from the file, and then ran grub2-mkconfig, finally rebooted, so it is supposed to disable IOMMU.
2. I also verified that I enabled WC using the guide you provided. I will post the screenshot eventually.

from amzn-drivers.

Binary-Vanguard-12138 commented on September 12, 2024

@shaibran
Here is the WC check result on c6in.metal instance, it looks like WC is not enabled properly.

[root@ip-172-31-41-20 ec2-user]# lspci -v -s 0000:09:00.0
09:00.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
        Subsystem: Amazon.com, Inc. Elastic Network Adapter (ENA)
        Physical Slot: 2
        Flags: bus master, fast devsel, latency 0, NUMA node 0
        Memory at 9d202000 (32-bit, non-prefetchable) [size=8K]
        Memory at 9d200000 (32-bit, non-prefetchable) [size=8K]
        Memory at 21fffc000000 (64-bit, prefetchable) [size=4M]
        Capabilities: [40] Power Management version 3
        Capabilities: [70] Express Endpoint, MSI 00
        Capabilities: [b0] MSI-X: Enable+ Count=132 Masked-
        Capabilities: [100] #19
        Capabilities: [150] Transaction Processing Hints
        Kernel driver in use: igb_uio
        Kernel modules: ena

[root@ip-172-31-41-20 ec2-user]# cat /sys/kernel/debug/x86/pat_memtype_list | grep 21fffc000000
uncached-minus @ 0x21fffc000000-0x21fffc400000
uncached-minus @ 0x21fffc000000-0x21fffc400000

I sent some packets with igb_uio enabled 16 TX lcores from about Wed Aug 7 14:36:33 UTC 2024 to Wed Aug 7 14:37:33 UTC 2024, sent 273365929 packets in total.

from amzn-drivers.

Binary-Vanguard-12138 commented on September 12, 2024

@shaibran What could be the reason why WC is not enabled correctly?

from amzn-drivers.

Binary-Vanguard-12138 commented on September 12, 2024

Tried with DPDK 23.11.1, sent 532783050 packets using 16 TX lcores from Wed Aug 7 14:51:40 UTC 2024 to Wed Aug 7 14:52:40 UTC 2024.
WC is same with DPDK 24.07, not enabled properly.

[root@ip-172-31-41-20 ec2-user]# lspci -v -s 0000:09:00.0
09:00.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
        Subsystem: Amazon.com, Inc. Elastic Network Adapter (ENA)
        Physical Slot: 2
        Flags: bus master, fast devsel, latency 0, NUMA node 0
        Memory at 9d202000 (32-bit, non-prefetchable) [size=8K]
        Memory at 9d200000 (32-bit, non-prefetchable) [size=8K]
        Memory at 21fffc000000 (64-bit, prefetchable) [size=4M]
        Capabilities: [40] Power Management version 3
        Capabilities: [70] Express Endpoint, MSI 00
        Capabilities: [b0] MSI-X: Enable+ Count=132 Masked-
        Capabilities: [100] #19
        Capabilities: [150] Transaction Processing Hints
        Kernel driver in use: igb_uio
        Kernel modules: ena

[root@ip-172-31-41-20 ec2-user]# cat /sys/kernel/debug/x86/pat_memtype_list | grep 21fffc000000
uncached-minus @ 0x21fffc000000-0x21fffc400000
uncached-minus @ 0x21fffc000000-0x21fffc400000

from amzn-drivers.

Binary-Vanguard-12138 commented on September 12, 2024

Oops, sorry, but I was using wrong command.
I should use sudo insmod ./igb_uio.ko wc_activate=1 instead of wc_enabled=1.
Will try again with this option.

from amzn-drivers.

Binary-Vanguard-12138 commented on September 12, 2024

I enabled WC properly this time.

[root@ip-172-31-41-20 ec2-user]# lspci -v -s 0000:09:00.0
09:00.0 Ethernet controller: Amazon.com, Inc. Elastic Network Adapter (ENA)
        Subsystem: Amazon.com, Inc. Elastic Network Adapter (ENA)
        Physical Slot: 2
        Flags: fast devsel, NUMA node 0
        Memory at 9d202000 (32-bit, non-prefetchable) [size=8K]
        Memory at 9d200000 (32-bit, non-prefetchable) [size=8K]
        Memory at 21fffc000000 (64-bit, prefetchable) [size=4M]
        Capabilities: [40] Power Management version 3
        Capabilities: [70] Express Endpoint, MSI 00
        Capabilities: [b0] MSI-X: Enable- Count=132 Masked-
        Capabilities: [100] #19
        Capabilities: [150] Transaction Processing Hints
        Kernel driver in use: igb_uio
        Kernel modules: ena
[root@ip-172-31-41-20 ec2-user]# cat /sys/kernel/debug/x86/pat_memtype_list | grep 21fffc000000
write-combining @ 0x21fffc000000-0x21fffc400000

The TX packets per second improved much, was able to send 17.971 Mpps with 16 TX lcores.
The timestamp is from Wed Aug 7 15:16:08 UTC 2024 to Wed Aug 7 15:17:08 UTC 2024.

However, I could not send 20Mpps (which I believe is a limit of the AWS instance) in any case which made me unhappy.
I was able to send basically same amount with only 8 TX lcores as well, meaning that the increase of TX lcores does not increase the total performance for some reason.

from amzn-drivers.

Binary-Vanguard-12138 commented on September 12, 2024

Also, there is a big difference when I enabled/disabled UDP port cycling.
When I sent from the fixed UDP source port to the fixed UDP destination port, the performance decreased a lot.
When I sent from continuously changing UDP source port to changing UDP destination port, the performance was much better than fixed UDP port.
I was wondering why. @shaibran

from amzn-drivers.

Binary-Vanguard-12138 commented on September 12, 2024

Another timeframe: from Wed Aug 7 15:30:17 UTC 2024 to Wed Aug 7 15:31:17 UTC 2024.
Sent with 10 TX lcores, sent 1095894515 packets in total, which is equivalent with 18.264Mpps.
This result is almost same with when I used vfio-pci module.

from amzn-drivers.

Binary-Vanguard-12138 commented on September 12, 2024

Can I ask what is the optimized configuration to get the maximum pps for TX? @shaibran
In the c6in.metal instance, I have 128 virtual cores on 2 NUMA nodes. Each NUMA node has one physical NIC attached.
I also know there are 32 TX and RX rings for each NIC.
During some experiments, I found that we could achieve the limit i.e. 20Mpps when I was using 32 TX lcores, and each TX lcore was sending packets from a fixed UDP port to a fixed UDP port. (But the destination port for each TX ring was different.)
But with 32 TX cores, when I added UDP port cycling for each packet, i.e. increasing the destination UDP port for each packet in each TX thread, the performance decreased a lot (lower than 15Mpps).

from amzn-drivers.

shaibran commented on September 12, 2024

As documented, each EC2 instance has a maximum PPS performance based on its type and size. The metrics that AWS provides can be queried using aws ec2 describe-instance-types, which include only bandwidth.

We tested the c6in metal instance and found that it met the Key Performance Indicators (KPI). A review of your instance's internal logs did not show any issues.

Matching applications and algorithms to the underlying architecture is beyond the PMD scope, but we can share some best practices:

Identify the NUMA socket to which the adapter is attached and use only the lcores on that NUMA. Note that this varies from platform to platform and cannot be assumed to be sequential. Avoid using lcore 0, as it is the primary core used by Linux.
Ensure the application handles pushbacks from the device. If the application floods the device, handling the dropped packets will consume CPU resources and lead to performance degradation.
Track the instance allowance via xstats (e.g., pps_allowance_exceeded).

All the best,
Shai

from amzn-drivers.

Binary-Vanguard-12138 commented on September 12, 2024

As documented, each EC2 instance has a maximum PPS performance based on its type and size. The metrics that AWS provides can be queried using aws ec2 describe-instance-types, which include only bandwidth.

We tested the c6in metal instance and found that it met the Key Performance Indicators (KPI). A review of your instance's internal logs did not show any issues.

Matching applications and algorithms to the underlying architecture is beyond the PMD scope, but we can share some best practices:

Identify the NUMA socket to which the adapter is attached and use only the lcores on that NUMA. Note that this varies from platform to platform and cannot be assumed to be sequential. Avoid using lcore 0, as it is the primary core used by Linux.

Ensure the application handles pushbacks from the device. If the application floods the device, handling the dropped packets will consume CPU resources and lead to performance degradation.

Track the instance allowance via xstats (e.g., pps_allowance_exceeded).

All the best, Shai

Thank you.

I am sure I used only the lcores on the corresponding NUMA nodes. I admit that I used lcore 0, but not sure it would impact the performance too much. I can try to avoid using lcore 0 though.
Can you please explain it in more detail? How can I know if there are pushback from the device and how can I avoid them or handle them properly?
Can we get xstats when I am using DPDK? If I understand correctly, I can not get stats information using ethtools command on the DPDK-bind NICs.

You said each EC2 instance has a maximum PPS performance based on its type and size, where can I find this value? If I understand correctly, I can only find bandwidth, not PPS limit.

You also mentioned, We tested the c6in metal instance and found that it met the Key Performance Indicators (KPI).
Can you tell me what KPI you met and what kind of test you conducted?
I am particularly interested about how you assigned lcores when using DPDK, and what values you used for TX descriptor numbers, etc and how many PPS you were able to reach.

from amzn-drivers.

shaibran commented on September 12, 2024

The metrics that AWS provides externally are those available via the aws ec2 describe-instance-types command. Our KPIs are not available for sharing.
All statistics should be retrieved from the application bound to the network interface; please refer to the provided instructions. Developing a DPDK application falls outside the scope of PMD expertise, so we recommend consulting the open-source DPDK documentation and example applications, such as testpmd.
As you already observed during your testing, you should utilize the instance CPU resources in order to improve PPS (multiple flows).

from amzn-drivers.

Binary-Vanguard-12138 commented on September 12, 2024

@shaibran I have checked the performance benchmark test results here.
In test results of Test#10 Zero packet loss test on Intel server of c6in.8xlarge AWS cloud instance, when the frame size is 500 bytes, the frame rate is 5.4Mpps and the line rate is 44.93%.
There is a note claiming that Throughput is limited by the AWS instance configuration on ENA NIC driver.
I wonder what exactly is this limit.
Is it a limit of PPS or BPS or something else?
Also in Test #10 of other similar test, why can't we achieve full line rate even when we send 1518 bytes of frame?
Additionally, which parameter did they (I am not sure if this test was done by an Intel or AWS engineer) possibly use for the TREX packet generator? Were they TCP or UDP? What were the distributions of IP addresses or ports of sent packets?
Thanks in advance.

from amzn-drivers.

shaibran commented on September 12, 2024

This report was not crafted by AWS, please reach out to the publisher of that report for the technical details.
EC2 virtualized instance types have performance limiters across all metrics. However, these limiters are enforced at the underlying infrastructure level, not within the driver as the report suggests
As I wrote before, the performance metrics that AWS provides externally are those available via the aws ec2 describe-instance-types command. Our KPIs are not available for sharing.

from amzn-drivers.

[Support]: Performance issue of ENA PMD driver for DPDK about amzn-drivers HOT 18 OPEN

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent