Fail to compile for example pr_scheduling

OS for FPGAs

Coyote is a framework that offers operating system abstractions and a variety of shared networking (RDMA, TCP/IP), memory (DRAM, HBM) and accelerator (GPU) services for modern heterogeneous platforms with FPGAs, targeting data centers and cloud environments.

Some of Coyote's features:

Multiple isolated virtualized vFPGA regions (with individual VMs)
Nested dynamic reconfiguration (independently reconfigurable layers: Static, Service and Application)
RTL and HLS user logic support
Unified host and FPGA memory with striping across virtualized DRAM/HBM channels
TCP/IP service
RDMA RoCEv2 service (compliant with Mellanox NICs)
GPU service
Runtime scheduler for different host user processes
Multithreading support

For more detailed information, check out the documentation

Prerequisites

Full Vivado/Vitis suite is needed to build the hardware side of things. Hardware server will be enough for deployment only scenarios. Coyote runs with Vivado 2022.1. Previous versions can be used at one's own peril.

We are currently only actively supporting the AMD Alveo u55c accelerator card. Our codebase offers some legacy-support for the following platforms: vcu118, Alveo u50, Alveo u200, Alveo u250 and Alveo u280, but we are not actively working with these cards anymore. Coyote is currently being developed on the HACC cluster at ETH Zurich. For more information and possible external access check out the following link: https://systems.ethz.ch/research/data-processing-on-modern-hardware/hacc.html

CMake is used for project creation. Additionally Jinja2 template engine for Python is used for some of the code generation. The API is writen in C++, 17 should suffice (for now).

If networking services are used, to generate the design you will need a valid UltraScale+ Integrated 100G Ethernet Subsystem license set up in Vivado/Vitis.

To run the virtual machines on top of individual vFPGAs the following packages are needed: qemu-kvm, build-essential and kmod.

Quick Start

Initialize the repo and all submodules:

$ git clone --recurse-submodules https://github.com/fpgasystems/Coyote

Build `HW`

To build an example hardware project (generate a shell image):

$ mkdir build_hw && cd build_hw
$ cmake <path_to_cmake_config> -DFDEV_NAME=<target_device>  -DEXAMPLE=<target_example>

It's a good practice to generate the hardware-build in a subfolder of the examples_hw, since this already contains the cmake that needs to be referenced. In this case, the procedure would look like this:

$ mkdir examples_hw/build_hw && cd examples_hw/build_hw 
$ cmake ../ -DFDEV_NAME=<target_device>  -DEXAMPLE=<target_example>

Already implemented target-examples are specified in examples_hw/CMakeLists.txt and allow to build a variety of interesting design constellations, i.e. rdma_perf will create a RDMA-capable Coyote-NIC.

Generate all projects and compile all bitstreams:

$ make project 
$ make bitgen

The bitstreams will be generated under bitstreams directory. This initial bitstream can be loaded via JTAG. Further custom shell bitstreams can all be loaded dynamically.

Netlist with the official static layer image is already provided under hw/checkpoints. We suggest you build your shells on top of this image. This default image is built with -DEXAMPLE=static.

Build `SW`

Provided software applications (as well as any other) can be built with the following commands:

$ mkdir build_sw && cd build_sw
$ cmake <path_to_cmake_config>
$ make

Similar to building the HW, it makes sense to build within the examples_sw directory for direct access to the provided CMakeLists.txt:

$ mkdir examples_sw/build_sw && cd examples_sw/build_sw 
$ cmake ../ -DEXAMPLE=<target_example> -DVERBOSITY=<ON or OFF>
$ make

The software-stack can be built in verbosity-mode, which will generate extensive printouts during execution. This is controlled via the VERBOSITY toggle in the cmake-call. Per default, verbosity is turned off.

Build `Driver`

After the bitstream is loaded, the driver can be inserted once for the initial static image.

$ cd driver && make
$ insmod coyote_drv.ko <any_additional_args>

Provided examples

Coyote already comes with a number of pre-configured example applications that can be used to test the shell-capabilities and systems performance or start own developments around networking or memory-offloading. The following list (to be continued in the future) should give you an overview on the existent example apps, how to set them up in hard- and software and how to use them:

kmeans

multithreading

perf_fpga

perf_local

rdma_service

reconfigure_shell

streaming_service

tcp_iperf

Deploying on the ETHZ HACC-cluster

The ETHZ HACC is a premiere cluster for research in systems, architecture and applications (https://github.com/fpgasystems/hacc/tree/main). Its hardware equipment provides the ideal environment to run Coyote-based experiments, since users can book up to 10 servers with U55C-accelerator cards connected via a fully switched 100G-network. User accounts for this platform can be obtained following the explanation on the previously cited homepage.

The interaction with the HACC-cluster can be simplified by using the sgutil-run time commands. They also allow to easily program the accelerator with a Coyote-bitstreamd and insert the driver. For this purpose, the script program_coyote.sh has been generated. Under the assumption that the hardware-project has been created in examples_hw/build and the driver is already compiled in driver, the workflow should look like this:

$ bash program_coyote.sh examples_hw/build/bitstreams/cyt_top.bit driver/coyote_drv.ko

Obviously, the paths to cyt_top.bit and coyote_drv.ko need to be adapted if a different build-structure has been chosen before. A successful completion of this process can be checked via a call to

$ dmesg

If the driver insertion went through, the last printed message should be probe returning 0. Furthermore, the dmesg-printout should contain a line set network ip XXXXXXXX, mac YYYYYYYYYYYY, which displays IP and MAC of the Coyote-NIC if networking has been enabled in the system configuration.

Publication

If you use Coyote, cite us :

@inproceedings{coyote,
    author = {Dario Korolija and Timothy Roscoe and Gustavo Alonso},
    title = {Do {OS} abstractions make sense on FPGAs?},
    booktitle = {14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20)},
    year = {2020},
    pages = {991--1010},
    url = {https://www.usenix.org/conference/osdi20/presentation/roscoe},
    publisher = {{USENIX} Association}
}

License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

	cnfg_reg[static_cast<uint32_t>(CnfgLegRegs::VADDR_RD_REG)] = reinterpret_cast<uint64_t>(cs_invoke.src_addr);
	cnfg_reg[static_cast<uint32_t>(CnfgLegRegs::LEN_RD_REG)] = cs_invoke.src_len;
	cnfg_reg[static_cast<uint32_t>(CnfgLegRegs::CTRL_REG)] =
	(isRead(cs_invoke.oper) ? CTRL_START_RD : 0x0) \|
	(cs_invoke.clr_stat ? CTRL_CLR_STAT_RD : 0x0) \|
	(cs_invoke.stream ? CTRL_STREAM_RD : 0x0) \|
	((cs_invoke.dest & CTRL_DEST_MASK) << CTRL_DEST_RD) \|
	((cpid & CTRL_PID_MASK) << CTRL_PID_RD) \|
	(cs_invoke.oper == CoyoteOper::SYNC ? CTRL_SYNC_WR : 0x0);


	cnfg_reg[static_cast<uint32_t>(CnfgLegRegs::VADDR_WR_REG)] = reinterpret_cast<uint64_t>(cs_invoke.dst_addr);
	cnfg_reg[static_cast<uint32_t>(CnfgLegRegs::LEN_WR_REG)] = cs_invoke.dst_len;
	cnfg_reg[static_cast<uint32_t>(CnfgLegRegs::CTRL_REG)] =
	(isWrite(cs_invoke.oper) ? CTRL_START_WR : 0x0) \|
	(cs_invoke.clr_stat ? CTRL_CLR_STAT_WR : 0x0) \|
	(cs_invoke.stream ? CTRL_STREAM_WR : 0x0) \|
	((cpid & CTRL_PID_MASK) << CTRL_PID_WR) \|
	(cs_invoke.oper == CoyoteOper::OFFLOAD ? CTRL_SYNC_RD : 0x0);

	constexpr auto isRead(CoyoteOper oper) {
	return oper == CoyoteOper::READ \|\| oper == CoyoteOper::OFFLOAD \|\| oper == CoyoteOper::TRANSFER;
	}

	constexpr auto isWrite(CoyoteOper oper) {
	return oper == CoyoteOper::WRITE \|\| oper == CoyoteOper::SYNC \|\| oper == CoyoteOper::TRANSFER;
	}

	constexpr auto isSync(CoyoteOper oper) {
	return oper == CoyoteOper::OFFLOAD \|\| oper == CoyoteOper::SYNC;
	}

	proc call_write_hdl {r_path i} {
	set output [exec python3 "$r_path/write_hdl.py" 1 $cn]
	puts $output
	}

fpgasystems / coyote Goto Github PK

coyote's Introduction

OS for FPGAs

Prerequisites

Quick Start

Build HW

Build SW

Build Driver

Provided examples

kmeans

multithreading

perf_fpga

perf_local

rdma_service

reconfigure_shell

streaming_service

tcp_iperf

Deploying on the ETHZ HACC-cluster

Publication

If you use Coyote, cite us :

License

coyote's People

Contributors

Stargazers

Watchers

Forkers

coyote's Issues

Reprodution

Error message

Details

Error Message

set i 1

while {[file isdirectory "$proj_dir/hdl/config_$i"]} {

incr i

}

set_property STEPS.WRITE_BITSTREAM.TCL.POST "$build_dir/post.tcl" [get_runs "impl_$i"]

set cmd "reset_run impl_1 -prev_step "

eval $cmd

if {$cfg(en_pr) eq 1} {

set cmd "reset_run "

for {set j 1} {$j <= $i} {incr j} {

append cmd "impl_$j "

}

} else {

set cmd "reset_run "

append cmd "impl_1 "

}

eval $cmd

Recommend Projects

Recommend Topics

Recommend Org

Build `HW`

Build `SW`

Build `Driver`