Giter Club home page Giter Club logo

fpgasystems / coyote Goto Github PK

View Code? Open in Web Editor NEW
207.0 10.0 62.0 532.17 MB

Framework providing operating system abstractions and a range of shared networking (RDMA, TCP/IP) and memory services to common modern heterogeneous platforms.

License: MIT License

CMake 0.91% Makefile 0.02% C 7.78% SystemVerilog 52.95% Verilog 1.88% VHDL 12.09% Tcl 16.70% C++ 7.55% Shell 0.11% Python 0.02%
fpga hbm rdma service tcp virtualization amd xilinx-fpga gpu

coyote's Introduction

Build benchmarks Documentation Status License: MIT

OS for FPGAs

Coyote is a framework that offers operating system abstractions and a variety of shared networking (RDMA, TCP/IP), memory (DRAM, HBM) and accelerator (GPU) services for modern heterogeneous platforms with FPGAs, targeting data centers and cloud environments.

Some of Coyote's features:

  • Multiple isolated virtualized vFPGA regions (with individual VMs)
  • Nested dynamic reconfiguration (independently reconfigurable layers: Static, Service and Application)
  • RTL and HLS user logic support
  • Unified host and FPGA memory with striping across virtualized DRAM/HBM channels
  • TCP/IP service
  • RDMA RoCEv2 service (compliant with Mellanox NICs)
  • GPU service
  • Runtime scheduler for different host user processes
  • Multithreading support

Prerequisites

Full Vivado/Vitis suite is needed to build the hardware side of things. Hardware server will be enough for deployment only scenarios. Coyote runs with Vivado 2022.1. Previous versions can be used at one's own peril.

We are currently only actively supporting the AMD Alveo u55c accelerator card. Our codebase offers some legacy-support for the following platforms: vcu118, Alveo u50, Alveo u200, Alveo u250 and Alveo u280, but we are not actively working with these cards anymore. Coyote is currently being developed on the HACC cluster at ETH Zurich. For more information and possible external access check out the following link: https://systems.ethz.ch/research/data-processing-on-modern-hardware/hacc.html

CMake is used for project creation. Additionally Jinja2 template engine for Python is used for some of the code generation. The API is writen in C++, 17 should suffice (for now).

If networking services are used, to generate the design you will need a valid UltraScale+ Integrated 100G Ethernet Subsystem license set up in Vivado/Vitis.

To run the virtual machines on top of individual vFPGAs the following packages are needed: qemu-kvm, build-essential and kmod.

Quick Start

Initialize the repo and all submodules:

$ git clone --recurse-submodules https://github.com/fpgasystems/Coyote

Build HW

To build an example hardware project (generate a shell image):

$ mkdir build_hw && cd build_hw
$ cmake <path_to_cmake_config> -DFDEV_NAME=<target_device>  -DEXAMPLE=<target_example>

It's a good practice to generate the hardware-build in a subfolder of the examples_hw, since this already contains the cmake that needs to be referenced. In this case, the procedure would look like this:

$ mkdir examples_hw/build_hw && cd examples_hw/build_hw 
$ cmake ../ -DFDEV_NAME=<target_device>  -DEXAMPLE=<target_example>

Already implemented target-examples are specified in examples_hw/CMakeLists.txt and allow to build a variety of interesting design constellations, i.e. rdma_perf will create a RDMA-capable Coyote-NIC.

Generate all projects and compile all bitstreams:

$ make project 
$ make bitgen

The bitstreams will be generated under bitstreams directory. This initial bitstream can be loaded via JTAG. Further custom shell bitstreams can all be loaded dynamically.

Netlist with the official static layer image is already provided under hw/checkpoints. We suggest you build your shells on top of this image. This default image is built with -DEXAMPLE=static.

Build SW

Provided software applications (as well as any other) can be built with the following commands:

$ mkdir build_sw && cd build_sw
$ cmake <path_to_cmake_config>
$ make

Similar to building the HW, it makes sense to build within the examples_sw directory for direct access to the provided CMakeLists.txt:

$ mkdir examples_sw/build_sw && cd examples_sw/build_sw 
$ cmake ../ -DEXAMPLE=<target_example> -DVERBOSITY=<ON or OFF>
$ make

The software-stack can be built in verbosity-mode, which will generate extensive printouts during execution. This is controlled via the VERBOSITY toggle in the cmake-call. Per default, verbosity is turned off.

Build Driver

After the bitstream is loaded, the driver can be inserted once for the initial static image.

$ cd driver && make
$ insmod coyote_drv.ko <any_additional_args>

Provided examples

Coyote already comes with a number of pre-configured example applications that can be used to test the shell-capabilities and systems performance or start own developments around networking or memory-offloading. The following list (to be continued in the future) should give you an overview on the existent example apps, how to set them up in hard- and software and how to use them:

kmeans

multithreading

perf_fpga

perf_local

rdma_service

reconfigure_shell

streaming_service

tcp_iperf

Deploying on the ETHZ HACC-cluster

The ETHZ HACC is a premiere cluster for research in systems, architecture and applications (https://github.com/fpgasystems/hacc/tree/main). Its hardware equipment provides the ideal environment to run Coyote-based experiments, since users can book up to 10 servers with U55C-accelerator cards connected via a fully switched 100G-network. User accounts for this platform can be obtained following the explanation on the previously cited homepage.

The interaction with the HACC-cluster can be simplified by using the sgutil-run time commands. They also allow to easily program the accelerator with a Coyote-bitstreamd and insert the driver. For this purpose, the script program_coyote.sh has been generated. Under the assumption that the hardware-project has been created in examples_hw/build and the driver is already compiled in driver, the workflow should look like this:

$ bash program_coyote.sh examples_hw/build/bitstreams/cyt_top.bit driver/coyote_drv.ko

Obviously, the paths to cyt_top.bit and coyote_drv.ko need to be adapted if a different build-structure has been chosen before. A successful completion of this process can be checked via a call to

$ dmesg

If the driver insertion went through, the last printed message should be probe returning 0. Furthermore, the dmesg-printout should contain a line set network ip XXXXXXXX, mac YYYYYYYYYYYY, which displays IP and MAC of the Coyote-NIC if networking has been enabled in the system configuration.

Publication

If you use Coyote, cite us :

@inproceedings{coyote,
    author = {Dario Korolija and Timothy Roscoe and Gustavo Alonso},
    title = {Do {OS} abstractions make sense on FPGAs?},
    booktitle = {14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20)},
    year = {2020},
    pages = {991--1010},
    url = {https://www.usenix.org/conference/osdi20/presentation/roscoe},
    publisher = {{USENIX} Association}
}

License

Copyright (c) 2023 FPGA @ Systems Group, ETH Zurich

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

coyote's People

Contributors

d-kor avatar dgiantsidi avatar jedichen121 avatar jonasdann avatar linvogel avatar maximilianheer avatar rbshi avatar twk119 avatar zhenhaohe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

coyote's Issues

Fail to compile for example pr_scheduling

I'm trying to test the partial reconfiguration feature in Coyote. I've successfully compiled and run the examples bmark_host and bmark_fpga. However, for pr_scheduling it couldn't compile and showed the following error:

**** Synthesis passed
****
**** CERR: ERROR: [Common 17-53] User Exception: No open design. Please open an elaborated, synthesized or implemented design before executing this command.

****
INFO: [Common 17-206] Exiting Vivado at Thu Aug 25 12:05:21 2022...
make[3]: *** [CMakeFiles/compile.dir/build.make:57: CMakeFiles/compile] Error 1
make[2]: *** [CMakeFiles/Makefile2:68: CMakeFiles/compile.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:75: CMakeFiles/compile.dir/rule] Error 2
make: *** [Makefile:118: compile] Error 2

This happened both for Alveo U50 (on Ubuntu 20.04) and Alveo U280 (on Ubuntu 22.04). I'm using Vivado 2021.2. Can you give me some suggestions on how to fix this problem? Thanks a lot!

unable to specific path for vivado hls

Hi
unable to specific path for vivado hls, and vivado hls is deprecated, now should be vitis, please help

/home/peter/workspace/Coyote/hw/build>cmake .. -DFDEV_NAME=u250 -DVIVADO_HLS_ROOT_DIR=/home/peter/tools/Xilinx/Vitis_HLS
CMake Warning at /home/peter/workspace/Coyote/cmake/FindVivadoHLS.cmake:21 (message):
Vivado HLS not found.
Call Stack (most recent call first):
ext/network/hls/arp_server_subnet/CMakeLists.txt:30 (find_package)

CMake Error at ext/network/hls/arp_server_subnet/CMakeLists.txt:32 (message):
Vivado HLS not found.

-- Configuring incomplete, errors occurred!
See also "/home/peter/workspace/Coyote/hw/build/CMakeFiles/CMakeOutput.log".

OS hanging when fpga_tlb_miss_isr():(irq=107) page fault ISR

@d-kor ,

hello d-kor, When I was testing coyote, I encountered the following problem,

When a missing page prompt appears and the missing page exception is handled, the operating system hangs. 

dmesg information is as follows,
fpga_tlb_miss_isr():(irq=107) page fault ISR
fpga_tlb_miss_isr():page fault, vaddr 7ffbc5402000, length 40, cpid 0
tlb_get_user_pages():pid found = 2823
tlb_get_user_pages():allocated 8 bytes for page pointer array for 1 pages @0x000000001bd3b688, passed size 64.
tlb_get_user_pages():pages=0x000000001bd3b688
tlb_get_user_pages():first = 7ffbc5402, last = 7ffbc5402
tlb_get_user_pages():get_user_pages_remote(7ffbc5402000, n_pages = 1, page start = 7a19000000, hugepages = 0)
tlb_get_user_pages():could not get all user pages, -14
fpga_tlb_miss_isr():pages could not be obtained

When I use a large memory page (HUGE_2M), the dmesg information is also similar to the above dmesg information when there is a page missing prompt, and the dmesg information shows hugepages = 0 (and the large memory page is not started)

QDMA Support

Hello,

This is Hongshi from NUS HACC. May I know if there is any schedule for supporting QDMA?

If not, I would like to contribute to QDMA-related features for Coyote. Since the current project is quite large, please let me know if there are any details, like the milestones, etc., I need to follow.

[Need help]: IO port is missing a buffer

When i run implementation, I encountered the following errors. I really want to find out your project but I'm very new to this, and I'd really appreciate your help to solve them.

Starting DRC Task
INFO: [DRC 23-27] Running DRC with 8 threads
ERROR: [DRC INBB-3] Black Box Instances: Cell 'inst_dynamic/inst_user_wrapper_0' of type 'design_user_wrapper_0' has undefined contents and is considered a black box.  The contents of this cell must be defined for opt_design to complete successfully.
ERROR: [DRC RPBF-1] IO port is missing a buffer: Device port dyn_bscan_bscanid_en should be connected to an IO cell such as an [IO]BUF*.
ERROR: [DRC RPBF-1] IO port is missing a buffer: Device port dyn_bscan_capture should be connected to an IO cell such as an [IO]BUF*.
ERROR: [DRC RPBF-1] IO port is missing a buffer: Device port dyn_bscan_drck should be connected to an IO cell such as an [IO]BUF*.
ERROR: [DRC RPBF-1] IO port is missing a buffer: Device port dyn_bscan_runtest should be connected to an IO cell such as an [IO]BUF*.
ERROR: [DRC RPBF-1] IO port is missing a buffer: Device port dyn_bscan_sel should be connected to an IO cell such as an [IO]BUF*.
ERROR: [DRC RPBF-1] IO port is missing a buffer: Device port dyn_bscan_shift should be connected to an IO cell such as an [IO]BUF*.
ERROR: [DRC RPBF-1] IO port is missing a buffer: Device port dyn_bscan_tdo should be connected to an IO cell such as an [IO]BUF*.
ERROR: [DRC RPBF-1] IO port is missing a buffer: Device port m_axis_dyn_out_0_tdata[0] should be connected to an IO cell such as an [IO]BUF*.

ERROR: [DRC RPBF-1] IO port is missing a buffer: Device port m_axis_dyn_out_0_tdata[183] should be connected to an IO cell such as an [IO]BUF*.
INFO: [Common 17-14] Message 'DRC RPBF-1' appears 100 times and further instances of the messages will be disabled. Use the Tcl command set_msg_config to change the current settings.
INFO: [Project 1-461] DRC finished with 4148 Errors
INFO: [Project 1-462] Please refer to the DRC report (report_drc) for more information.
ERROR: [Vivado_Tcl 4-78] Error(s) found during DRC. Opt_design not run.

Time (s): cpu = 00:00:03 ; elapsed = 00:00:02 . Memory (MB): peak = 7068.531 ; gain = 0.000 ; free physical = 48516 ; free virtual = 55234
INFO: [Common 17-83] Releasing license: Implementation
128 Infos, 199 Warnings, 9 Critical Warnings and 102 Errors encountered.
opt_design failed
ERROR: [Common 17-39] 'opt_design' failed due to earlier errors.

typo in main.cpp of bmark_fpga

In line 143: << std::setprecision(2) << std::setw(5) << vctr_avg(time_bench_wd) << " [ns]" << std::endl;.
It should be time_bench_wr instead of time_bench_wd.

Build system misconfiguration for refdesign with Coyote + RDMA + U250

Reprodution

source <VITIS_INSTALL>/settings64.sh
cd "test/refdesigns"
make MODE=coyote_rdma PLATFORM=xilinx_u250_gen3x16_xdma_4_1_202210_1

Error message

"Number of DDR channels misconfigured"

Details

N_DDR_CHAN is not correctly set for U250 + RDMA in 'test/refdesigns/Coyote/hw/config.cmake'

Error Building Static Application using CMake (U250)

I'm getting an error message when building static application for u250. I run the following command

cmake ../CMakeLists.txt -DFDEV_NAME=u250 -DEXAMPLE=static

Error Message

-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Vivado at /home/celeris/tools/Xilinx_2022.1/Vivado/2022.1.
-- Found Vivado/Vitis HLS at /home/celeris/tools/Xilinx_2022.1/Vitis_HLS/2022.1.
** Vitis toolchain
** Static design flow
** Target platform u250
CMake Error at /home/celeris/Documents/aqdas/coyote-v2/Coyote/cmake/FindCoyoteHW.cmake:398 (message):
  Number of DDR channels misconfigured.
Call Stack (most recent call first):
  CMakeLists.txt:31 (validation_checks_hw)


-- Configuring incomplete, errors occurred!
See also "/home/celeris/Documents/aqdas/coyote-v2/Coyote/examples_hw/CMakeFiles/CMakeOutput.log".

Can you please look into the issue and update on this?

bitgen error

user_logic_error

bitgen_error

the vFPGA_top.sv
intsanted some modules can't be find can anyone help to find

rdma_base_slv inst_rdma_base_slv (
.aclk(aclk),
.aresetn(aresetn),

.axi_ctrl(axi_ctrl),

.mux_ctid(mux_ctid)

);

mux_host_card_rd_rdma inst_mux_send (
.aclk(aclk),
.aresetn(aresetn),

.mux_ctid(mux_ctid),
.s_rq(rq_rd),
.m_sq(sq_rd),

.s_axis_host(axis_host_recv[0]),
.s_axis_card(axis_card_recv[0]),
.m_axis(/*axis_rdma_send[0]*/)//

);

mux_host_card_wr_rdma inst_mux_recv (
.aclk(aclk),
.aresetn(aresetn),

.mux_ctid(mux_ctid),
.s_rq(rq_wr),
.m_sq(sq_wr),

.s_axis(/*axis_rdma_recv[0]*/),//
.m_axis_host(axis_host_send[0]),
.m_axis_card(axis_card_send[0])

);

A bug occurred when canceling AVX

After I create the project by executing the following instructions
cmake -DFDEV_NAME=u50 -DEXAMPLE=gbm_dtrees -DEN_AVX=0 ..
make shell
make compile

The following error occurred at compile time:
ERROR: [Synth 8-524] part-select [95:64] out of range of prefix 'axi_rdata_bram' [/home/me/Coyote/hw/hdl/slave/cnfg_slave.sv:685]
ERROR: [Synth 8-6156] failed synthesizing module 'cnfg_slave' [/home/me/Coyote/hw/hdl/slave/cnfg_slave.sv:33]
ERROR: [Synth 8-6156] failed synthesizing module 'tlb_region_top' [/home/me/Coyote/hw/hdl/mmu/tlb_region_top.sv:42]
ERROR: [Synth 8-196] conditional expression could not be resolved to a constant [/home/me/Coyote/hw/hdl/mmu/tlb_top.sv:158]
ERROR: [Synth 8-6156] failed synthesizing module 'tlb_top' [/home/me/Coyote/hw/hdl/mmu/tlb_top.sv:41]
ERROR: [Synth 8-6156] failed synthesizing module 'design_dynamic_wrapper' [/home/me/Coyote/hw/build/lynx/hdl/wrappers/common/dynamic_wrapper.sv:8]
ERROR: [Synth 8-6156] failed synthesizing module 'top' [/home/me/Coyote/hw/build/lynx/hdl/wrappers/common/top_u50.sv:11]

rdma test failed

@d-kor hi,
I tested RDMA_PERF, but it failed, indicating that it cannot connect.

I successfully built the build_perf_rdma_host_hw and build_perf_rdma_card_hw projects
I successfully insmod driver and compiled build_perf_rdma_sw. 

My testing method is as follows,
host0:  fpga0 with build_perf_rdma_host_hw bit    IP:192.168.0.4
host1:  fpga1 with build_perf_rdma_card_hw bit    IP:192.168.0.5
I ping 192.168.0.4 and 192.168.0.5 is ok   

I execute the following  build_perf_rdma_sw,
on host0:   sudo ./build_perf_rdma_sw/main  --reps 100  --mins 128  --maxs 2048
on host1:   sudo ./build_perf_rdma_sw/main  --tcpaddr 192.168.0.4  --reps 100  --mins 128  --maxs 2048

The information displayed after executing the application build_perf_rdma_sw is as follows
on host0:
                -- PARAMS
                -----------------------------------------------
                IBV IP address: 192.168.0.4
                Number of allocated pages: 1
                Read operation
                Min size: 128
                Max size: 2048
                Number of reps: 100
                Queue pair created, qpid: 0
                Master side exchange started ...

on host1 :
               
                -- PARAMS
                -----------------------------------------------
                TCP master IP address: 192.168.0.4
                IBV IP address: 192.168.0.5
                Number of allocated pages: 1
                Read operation
                Min size: 128
                Max size: 2048
                Number of reps: 100
                Queue pair created, qpid: 0
                Slave side exchange started ...
                terminate called after throwing an instance of 'std::runtime_error'
                         what():  Could not connect to master: 192.168.0.4:18488
                Aborted

The information displayed after terminating the application build_perf_rdma_sw on host0 as follows

                -- PARAMS
                -----------------------------------------------
                IBV IP address: 192.168.0.4
                Number of allocated pages: 1
                Read operation
                Min size: 128
                Max size: 2048
                Number of reps: 100
                Queue pair created, qpid: 0
                Master side exchange started ...
                ^Cterminate called after throwing an instance of 'std::runtime_error'
                what():  Accept failed

Coyote v2 RDMA fails under certain benchmark

I was testing the coyote v2 with rdma perf hw design and rdma services sw application.

  1. The RDMA read benchmark is unstable and fails under default amount of repetitions specified in the sw.The experiment below does not return.

./bin/test -d 0 -i 0 -t 10.1.212.177 -x 2048
Queue pair:
Local : QPN 0x000000, PSN 0x22b267, VADDR 00007fe912200000, SIZE 00010000, IP 0x0afd4a60
Remote: QPN 0x000000, PSN 0x30c5c7, VADDR 00007feefbc00000, SIZE 00010000, IP 0x0afd4a5c
Client registered
Sent payload

RDMA BENCHMARK
1024 [bytes], thoughput: 19.94 [MB/s], latency: 33100.42 [ns]
2048 [bytes], thoughput: 2124.81 [MB/s], latency: 8167.80 [ns]

  1. The RDMA write benchmark does not scale beyond 4K message size:

./bin/test -d 0 -i 0 -t 10.1.212.175 -x 1024 -r 10 -l 10 -w 1
Queue pair:
Local : QPN 0x000000, PSN 0x9bd652, VADDR 00007fbc23e00000, SIZE 00010000, IP 0x0afd4a58
Remote: QPN 0x000000, PSN 0xa03ec3, VADDR 00007fe9b5400000, SIZE 00010000, IP 0x0afd4a54
Client registered
Sent payload

RDMA BENCHMARK
1024 [bytes], thoughput: 870.19 [MB/s], latency: 5824.05 [ns]
2048 [bytes], thoughput: 1976.83 [MB/s], latency: 6007.90 [ns]
4096 [bytes], thoughput: 3813.60 [MB/s], latency: 6559.50 [ns]
^Cterminate called after throwing an instance of 'std::runtime_error'
what(): Stalled, SIGINT caught
Aborted

Setting both EN_HLS and EN_PR does not work

Setting both EN_HLS=1 and EN_PR=1 leads to a synthesis issue.

Steps to reproduce for Alveo U200 (but I guess it does not work on any platform):

cd hw
mkdir build
cd build
cmake .. -DFDEV_NAME=u200 -DEXAMPLE=hyperloglog -DEN_PR=1 ..
make shell compile

The first error in the file lynx/lynx.runs/design_user_wrapper_c0_0_synth_1/runme.log is:

ERROR: [Synth 8-439] module 'design_user_hls_c0_0' not found [REDACTED/lynx/hdl/wrappers/config_0/user_wrapper_c0_0.sv:121]

TCP stack?

HI,

The TCP stack does not seem to be supported, network_top.sv only has wrapper for ROCE. Any plan for adding TCP in?

Best,
Yang

Clean up req_t/ack_t

The request/completion structures contain several stale fields which should be cleaned up. Also mode should be merged into opcode as it doesn't currently provide any additional information.

Specifically, the following fields seem like can be removed:

  • remote
  • host
  • rdma
  • mode (can be replaced by inspecting verb)

set data_width=8,network hls error

after set(DATA_WIDTH 8 CACHE STRING "Data width"),When I make the shell, the HLS integration of the network stack will report an error. The error message is as follows:

(1)ERROR: [HLS 200-70] Compilation errors found: In file included from hw/services/network/hls/ip_handler/ip_handler.cpp:1:
hw/services/network/hls/ip_handler/ip_handler.cpp:539:2: error: no matching function for call to 'ip_handler_compute_ipv4_checksum'
ip_handler_compute_ipv4_checksum(ipDataMetaFifo, ipDataCheckFifo, iph_subSumsFifoOut);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(2)hw/services/network/hls/ip_handler/ip_handler.cpp:656:2: note: in instantiation of function template specialization 'ip_handler<64>' requested here
ip_handler<DATA_WIDTH>(s_axis_raw,
(3)hw/services/network/hls/ip_handler/../ipv4/ipv4.hpp:361:6: note: candidate function not viable: no known conversion from 'hls::stream<net_axis<64> >' to 'hls::stream<net_axis<512> > &' for 1st argument;
void ip_handler_compute_ipv4_checksum( hls::stream<net_axis<512> >& dataIn,

N_OUTSTANDING

N_OUTSTANDING is a parameter when using cmake to generate project, but in hw/hdl/network/rdma/rdma_flow.sv, the RDMA_N_OST is hardcoded as 16. Is this a bug or feature?

Runtime error in perf_host

I built the perf_host design for U280. I could generate the bitstream, program the alveo, and load the driver. However, I got the following error when I tried to run the sw executable.

Number of regions: 3
Hugepages: 0
Mapped pages: 1
Number of allocated pages: 8
Number of repetitions: 100
Starting transfer size: 128
Ending transfer size: 32768
terminate called after throwing an instance of 'std::runtime_error'
what(): cProcess could not be obtained, vfid: 0
Aborted (core dumped)

Any thoughts on how to debug?
Thanks!

Kernel BUG on probe

I am experiencing a kernel BUG at Coyote/driver/fpga_dev.c:108 (accl_integration branch)
I am currently using an Alveo U280. Linux Kernel: 6.5

My steps:

  • I made sure that coyote driver was not loaded
  • I used Vivado Hardware Manager 2022.2 to load the bitstream
  • I ran sw/util/hot_reset
  • lspci | grep Xilinx shows that the board is correctly identified
  • I launch sudo insmod coyote_drv.ko

This last command hangs, because of the kernel bug.

I also experienced this error previously, with the same setup apart from the Linux kernel, which was 5.15

RDMA Server Exits Abruptly

I've synthesized the design and ran the hardware on AU-250. It works properly and shows me that I have RDMA Enabled 1 in the dmesg logs and provides me with a MAC and IP.

When I try and run the rdma_server application, the application gives the following output and exits abruptly.

I'm assuming that the device and vfid is the number thats listed in ls -la /dev/fpga_{d}_v{i}. I use the following command to run the rdma_server

./bin/test -d 0 -i 0

which gives the output

Forking...


Has anyone face anything similar, or is there a bug that needs to be addressed?

rocev2 test cases

Hi,
Could you please provide test cases for RoCE module? In make.tcl it says the following:

write_read_read_large_receiver.in
write_read_read_large_receiver.out
rdma_txwriteread.in
rdma_txwriteread.out

Or anyone that has any of the above could you share it?
Thanks!

Out-of tree SW builds fail

The instructions to build software in the Coyote documentation don't work for out-of-tree builds because FindCoyoteSW.cmake uses CMAKE_SOURCE_DIR to identify sources and this envvar is set to the location of the caller CMakeLists.txt. In the case of Coyote software examples in examples_sw it just happens that the relative paths overlap.

You can confirm that this is the case as follows:

  1. move examples_sw to a new directory e.g. Coyote/foo/examples_sw and cd into it
  2. edit line 4 of Coyote/foo/examples_sw/CMakeLists.txt to point to the Coyote top-level folder
  3. run cmake -DEXAMPLES=reconfigure_shell .

The output of this will be

CMake Error at Coyote/cmake/FindCoyoteSW.cmake:93 (add_library):
No SOURCES given to target: Coyote
Call Stack (most recent call first):
CMakeLists.txt:7 (find_package)

This can be fixed by using CMAKE_CURRENT_LIST_DIR instead of CMAKE_SOURCE_DIR in FindCoyoteSW.cmake

OSDI paper

Hi, Fascinating work and thanks for your opensource. Is it possible to public your OSDI paper? Can't wait to read it!

Segmentation fault

After I finished compiling the hw section and burning the bit flow through the jtag interface into vcu118, I executed
insmod coyote_drv.ko
A segment error has occurred, as shown below
image
And the printed kernel information is as follows:
Screenshot from 2022-11-18 10-50-01
The read configuration information is all 1. I do not know the cause

cProcess::invoke(...) potential bug (avx disabled)

Coyote/sw/src/cProcess.cpp

Lines 442 to 460 in 78026e5

cnfg_reg[static_cast<uint32_t>(CnfgLegRegs::VADDR_RD_REG)] = reinterpret_cast<uint64_t>(cs_invoke.src_addr);
cnfg_reg[static_cast<uint32_t>(CnfgLegRegs::LEN_RD_REG)] = cs_invoke.src_len;
cnfg_reg[static_cast<uint32_t>(CnfgLegRegs::CTRL_REG)] =
(isRead(cs_invoke.oper) ? CTRL_START_RD : 0x0) |
(cs_invoke.clr_stat ? CTRL_CLR_STAT_RD : 0x0) |
(cs_invoke.stream ? CTRL_STREAM_RD : 0x0) |
((cs_invoke.dest & CTRL_DEST_MASK) << CTRL_DEST_RD) |
((cpid & CTRL_PID_MASK) << CTRL_PID_RD) |
(cs_invoke.oper == CoyoteOper::SYNC ? CTRL_SYNC_WR : 0x0);
cnfg_reg[static_cast<uint32_t>(CnfgLegRegs::VADDR_WR_REG)] = reinterpret_cast<uint64_t>(cs_invoke.dst_addr);
cnfg_reg[static_cast<uint32_t>(CnfgLegRegs::LEN_WR_REG)] = cs_invoke.dst_len;
cnfg_reg[static_cast<uint32_t>(CnfgLegRegs::CTRL_REG)] =
(isWrite(cs_invoke.oper) ? CTRL_START_WR : 0x0) |
(cs_invoke.clr_stat ? CTRL_CLR_STAT_WR : 0x0) |
(cs_invoke.stream ? CTRL_STREAM_WR : 0x0) |
((cpid & CTRL_PID_MASK) << CTRL_PID_WR) |
(cs_invoke.oper == CoyoteOper::OFFLOAD ? CTRL_SYNC_RD : 0x0);

Coyote/sw/include/cDefs.hpp

Lines 299 to 309 in 78026e5

constexpr auto isRead(CoyoteOper oper) {
return oper == CoyoteOper::READ || oper == CoyoteOper::OFFLOAD || oper == CoyoteOper::TRANSFER;
}
constexpr auto isWrite(CoyoteOper oper) {
return oper == CoyoteOper::WRITE || oper == CoyoteOper::SYNC || oper == CoyoteOper::TRANSFER;
}
constexpr auto isSync(CoyoteOper oper) {
return oper == CoyoteOper::OFFLOAD || oper == CoyoteOper::SYNC;
}

Do we need to swap lines 450 and 460 ?
Is reading and writing defined from the perspective of the fpga card ?
If so, the following applies for data flow direction:

Reading:
Host RAM |--- CTRL_STREAM_RD set ---> vfpga x (x=1,..)
Host RAM |--- ::OFFLOAD set ---> local fpga memory

Writing :
Host RAM <--- CTRL_STREAM_WR ---| vfpga x
Host RAM <--- ::SYNC ---| local fpga memory

Is this correct ?

make compile error

when i run compile.tcl,error as follows,

set i 1

while {[file isdirectory "$proj_dir/hdl/config_$i"]} {

incr i

}

set_property STEPS.WRITE_BITSTREAM.TCL.POST "$build_dir/post.tcl" [get_runs "impl_$i"]

WARNING: [Runs 36-537] File /home/crizy/pro/Coyote/hw/build/post.tcl is not part of fileset utils_1, but has specified as a Tcl hook script for run(s) impl_3. This file will not be handled as part of the project for archive and other project based functionality.

set cmd "reset_run impl_1 -prev_step "

eval $cmd

if {$cfg(en_pr) eq 1} {

set cmd "reset_run "

for {set j 1} {$j <= $i} {incr j} {

append cmd "impl_$j "

}

} else {

set cmd "reset_run "

append cmd "impl_1 "

}

eval $cmd

ERROR: [Common 17-165] Too many positional options when parsing 'impl_3', please type 'reset_runs -help' for usage info.

Segmentation fault

When I used the gbm_dtrees example for vcu118, I compiled the hw section and installed the driver

insmod coyote_drv.ko

A segment error has occurred
image

OFFLOAD and SYNC behave strangely

OFFLOAD and SYNC behave unexpectedly. I understand OFFLOAD sends data from the host mem to the FPGA mem, and SYNC sends data from the FPGA mem to the host mem. Based on my understanding, I wrote the code below. The setting is the same as perf_mem example (EN_MEM is enabled).

cProcess cproc(0, getpid());

// Memory allocation 
int* fpga_mem1 = (int*)cproc.getMem({CoyoteAlloc::HUGE_2M, 1});
int* host_mem1 = (int*)cproc.getMem({CoyoteAlloc::HOST_2M, 1});
for (int i=0; i<32; i++) host_mem1[i] = i;
int* host_mem2 = (int*)cproc.getMem({CoyoteAlloc::HOST_2M, 1});
for (int i=0; i<32; i++) host_mem2[i] = i+1;

// Print host_mem1 and host_mem2
printf("host_mem1: ");
for (int i=0; i<32; i++) printf("%d ", host_mem1[i]);
printf("\n");
printf("host_mem2: ");
for (int i=0; i<32; i++) printf("%d ", host_mem2[i]);
printf("\n");

// Data transfer
cproc.invoke({CoyoteOper::OFFLOAD, host_mem1, fpga_mem1, 128, 128});
cproc.invoke({CoyoteOper::SYNC, fpga_mem1, host_mem2, 128, 128});
printf("----- Data Transfer -----\n");

// Print host_mem1 and host_mem2 after data transfer
printf("host_mem1: ");
for (int i=0; i<32; i++) printf("%d ", host_mem1[i]);
printf("\n");
printf("host_mem2: ");
for (int i=0; i<32; i++) printf("%d ", host_mem2[i]);
printf("\n");

// Memory free
cproc.freeMem(fpga_mem1);
cproc.freeMem(host_mem1);
cproc.freeMem(host_mem2);

I expect the data of host_mem1 be sent to host_mem2 via fpga_mem1, and then the output of host_mem1 and host_mem2 should be the same. However, the output is as below.

host_mem1: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
host_mem2: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 
----- Data Transfer -----
host_mem1: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
host_mem2: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

The data of host_mem2 is changed, but I don't know why it is filled with zero.
How can I send the data correctly so that the data of host_mem1 and host_mem2 is the same?

perf_tcp board test is stucked

I make perf_tcp board test, when I execute sw main, the test is stuck, and the information is as follows

sudo ./build_perf_tcp_sw/main
usecon:1, useIP:1, pkgWordCount:8,port:5001, local ip:c0a800e2, target ip:c0a800dc, time:250000000, is server:0, transferBytes:1024
Start

I build perf_tcp_hw and successfully generated the bit file , download bit to fpga, and insmod driver ok
lspci -vvd 10ee is ok,i set local IP is 192.168.0.226 , Destination IP is 192.168.0.220

I successfully pinged the local machine(192.168.0.226) on the remote machine (192.168.0.220)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.