Giter Club home page Giter Club logo

fpga-runtime's Introduction

FPGA Runtime

This project provides a convenient runtime for PCIe-based FPGAs programmed under the OpenCL host-kernel model. Both Intel and Xilinx platforms are supported.

Prerequisites

  • Ubuntu 20.04+

Install from Binary

./install.sh

Usage

Invoking

template <typename... Args>
fpga::Instance Invoke(const std::string& bitstream, Args&&... args);

This invokes the kernel contained in file bitstream. bitstream should be a file that can be read via ifstream and can be a pipe with proper EOF. args are the arguments to the kernel. If an argument is not a scalar, it needs to be wrapped in one of the following wrappers:

ReadOnly(T* ptr, size_t n);
WriteOnly(T* ptr, size_t n);
ReadWrite(T* ptr, size_t n);

This will tell the runtime the data exchange direction and how many elements are allocated. The directions are with respect to the host, not the device (because this is host code). Passing a host pointer directly will not work (doesn't even compile).

Device Selection

By default, FRT selects devices using metadata from the bitstream. This may not always work as expected, often due to the following reasons:

  1. Xilinx 2RP shell platforms must be flashed by admin (root) before running any user logic.
  2. FRT may not know how to match the device name in the bitstream and the runtime device name. If you encounter this issue, please feel free to file a bug.

Selecting Xilinx Device by PCIe BDF

For Xilinx devices, it is possible to select the device by its PCIe BDF.

To do this, make sure you parsed gflags in your main function:

#include <gflags/gflags.h>
...
int main(int argc, char* argv[]) {
  gflags::ParseCommandLineFlags(&argc, &argv, /*remove_flags=*/true);
  ...
}

When running the host program, add --xocl_bdf=<bdf>, e.g.,

./host --xocl_bdf=0000:d8:00.1 ...

Profiling

Invoke returns an fpga::Instance object that contains profiling information.

double Instance::LoadTimeSeconds();
double Instance::ComputeTimeSeconds();
double Instance::StoreTimeSeconds();
double Instance::LoadThroughputGbps();
double Instance::StoreThroughputGbps();

Streaming

Streaming is supported (on legacy Xilinx platforms).

class fpga::ReadStream;
class fpga::WriteStream;

The streams need to be created and passed to fpga::Invoke as a parameter. If the arguments to fpga::Invoke contains a stream, it will not wait for the kernel to finish; instead, it will return an fpga::Instance object immediately. The host program can read from fpga::ReadStream and/or write to fpga::WriteStream. When all stream I/O are done, instance.Finish() should be invoked to wait until the kernel finishes.

fpga-runtime's People

Contributors

blaok avatar jskimko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

fpga-runtime's Issues

Option to build without XRT

It might be possible to build FRT without having XRT installed. For example, for producing cores to be used in the Vivado flow.

A crude approximation is vkomenda@c97beaf.

The resulting FRT library is missing required symbols however:

$ g++ -o vadd -O2 vadd.cpp vadd-host.cpp -ltapa -lfrt -lglog -lgflags -lOpenCL
/usr/bin/ld: warning: libboost_context.so.1.65.1, needed by /usr/lib/gcc/x86_64-linux-gnu/11/../../../../lib/libtapa.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/11/../../../../lib/libtapa.so: undefined reference to `ontop_fcontext'
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/11/../../../../lib/libtapa.so: undefined reference to `jump_fcontext'
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/11/../../../../lib/libfrt.so: undefined reference to `fpga::internal::XilinxOpenclDevice::New(std::vector<std::vector<unsigned char, std::allocator<unsigned char> >, std::allocator<std::vector<unsigned char, std::allocator<unsigned char> > > > const&)'
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/11/../../../../lib/libtapa.so: undefined reference to `make_fcontext'
/usr/bin/ld: /usr/lib/gcc/x86_64-linux-gnu/11/../../../../lib/libfrt.so: undefined reference to `fpga::internal::XilinxOpenclDevice::GetEnviron[abi:cxx11]()'
collect2: error: ld returned 1 exit status

Querying constrained devices crashes

If using constrained devices via cgroups, which can prevent users from accessing certain physical devices, the call to query available devices on the system via clGetDeviceInfo will crash prematurely in xilinx_opencl_device.cpp:100 due to a strict CL_CHECK guard.

Suppose there are 4 FPGA devices on a machine and the user has access to only the 4th. Expected behavior would be that frt attempts to query each device in sequence (failing) until it reaches the 4th device, which it will return successfully. Given the call site logic in src/frt/devices/opencl_device.cpp:165, the following patch was tested and should be sufficient to address this issue:

diff --git a/src/frt/devices/xilinx_opencl_device.cpp b/src/frt/devices/xilinx_opencl_device.cpp
index 4edc1ec..6a0b7d6 100644
--- a/src/frt/devices/xilinx_opencl_device.cpp
+++ b/src/frt/devices/xilinx_opencl_device.cpp
@@ -97,8 +97,8 @@ class DeviceMatcher : public OpenclDeviceMatcher {
     const std::string device_name = device.getInfo<CL_DEVICE_NAME>();
     char bdf[32];
     size_t bdf_size = 0;
-    CL_CHECK(clGetDeviceInfo(device.get(), CL_DEVICE_PCIE_BDF, sizeof(bdf), bdf,
-                             &bdf_size));
+    cl_int rc = clGetDeviceInfo(device.get(), CL_DEVICE_PCIE_BDF, sizeof(bdf), bdf, &bdf_size);
+    if (rc != CL_SUCCESS) { return ""; }
     const std::string device_name_and_bdf =
         Concat({device_name, " (bdf=", bdf, ")"});
     LOG(INFO) << "Found device: " << device_name_and_bdf;

Is there any chance this patch could be applied? Our use case only involves Xilinx devices, but this should be applicable to other devices as well.

Adding Versal Device Support

Here is my current script to generate the xclbins for the vadd example. I had to use the absolute path of the platform folder because it is not located under /opt/xilinx/platforms in our server. Also note that boot_mode qspi I used for the package step might be wrong, because the xilinx doc said vck190 does not support qspi, but it is the only one that works for me so far.

# TARGET=hw
TARGET=hw_emu
DEBUG=-g

TOP=VecAdd
XO='/home/jakeke/tapa/apps/vadd/run/VecAdd.xilinx_vck190_base_202310_1.xo'
CONFIG_FILE='/home/jakeke/tapa/apps/vadd/link_config.ini'
>&2 echo "Using the default clock target of the platform."
PLATFORM=/mnt/software/xilinx/Vitis/2023.1/base_platforms/xilinx_vck190_base_202310_1
OUTPUT_DIR="$(pwd)/vitis_run_${TARGET}"

MAX_SYNTH_JOBS=8
STRATEGY="Explore"
PLACEMENT_STRATEGY="EarlyBlockPlacement"

v++ ${DEBUG} \
  --link \
  --output "${OUTPUT_DIR}/${TOP}_xilinx_vck190_base_202310_1.xsa" \
  --kernel ${TOP} \
  --platform ${PLATFORM}/xilinx_vck190_base_202310_1.xpfm \
  --target ${TARGET} \
  --report_level 2 \
  --temp_dir "${OUTPUT_DIR}/${TOP}_xilinx_vck190_base_202310_1.temp" \
  --report_dir "${OUTPUT_DIR}/${TOP}_xilinx_vck190_base_202310_1.temp/reports" \
  --log_dir "${OUTPUT_DIR}/${TOP}_xilinx_vck190_base_202310_1.temp/logs" \
  --optimize 3 \
  --connectivity.nk ${TOP}:1:${TOP} \
  --save-temps \
  "${XO}" \
  --vivado.synth.jobs ${MAX_SYNTH_JOBS} \
  --vivado.prop=run.impl_1.STEPS.PHYS_OPT_DESIGN.IS_ENABLED=1 \
  --vivado.prop=run.impl_1.STEPS.OPT_DESIGN.ARGS.DIRECTIVE=$STRATEGY \
  --vivado.prop=run.impl_1.STEPS.PLACE_DESIGN.ARGS.DIRECTIVE=$PLACEMENT_STRATEGY \
  --vivado.prop=run.impl_1.STEPS.PHYS_OPT_DESIGN.ARGS.DIRECTIVE=$STRATEGY \
  --vivado.prop=run.impl_1.STEPS.ROUTE_DESIGN.ARGS.DIRECTIVE=$STRATEGY \
  --config "${CONFIG_FILE}" \

emconfigutil --platform ${PLATFORM}/xilinx_vck190_base_202310_1.xpfm --od "${OUTPUT_DIR}/"

# data center
v++ ${DEBUG} \
  --package \
  "${OUTPUT_DIR}/${TOP}_xilinx_vck190_base_202310_1.xsa" \
  --target ${TARGET} \
  --platform ${PLATFORM}/xilinx_vck190_base_202310_1.xpfm \
  --save-temps \
  --temp_dir "${OUTPUT_DIR}/${TOP}_xilinx_vck190_base_202310_1.temp/package.build" \
  --package.out_dir "${OUTPUT_DIR}/package" \
  --package.boot_mode qspi \
  -o "${OUTPUT_DIR}/${TOP}_xilinx_vck190_base_202310_1.xclbin"

cannot install FRT with Xilinx OpenCL ext 2022

Hi,

We recently upgraded to XRT/Vitis 2022.1. It seems that Xilinx has made a lot of changes to their OpenCL extension APIs, and as a result, FRT cannot be compiled. Here is an excerpt of the error when I was compiling FRT from source

[ 66%] Building CXX object CMakeFiles/frt_static.dir/src/frt/devices/xilinx_opencl_device.cpp.o
/scratch/users/sx233/fpga-runtime/src/frt/devices/xilinx_opencl_stream.cpp: In destructor ‘virtual fpga::internal::XilinxOpenclStream::~XilinxOpenclStream()’:
/scratch/users/sx233/fpga-runtime/src/frt/devices/xilinx_opencl_stream.cpp:22:39: error: too many arguments to function ‘void clReleaseStream()’
     auto err = clReleaseStream(stream_);
                                       ^
In file included from /opt/xilinx/xrt/include/CL/cl_ext.h:39:0,
                 from /usr/include/CL/opencl.h:50,
                 from /usr/include/CL/cl2.hpp:504,
                 from /scratch/users/sx233/fpga-runtime/src/frt/devices/xilinx_opencl_stream.h:6,
                 from /scratch/users/sx233/fpga-runtime/src/frt/devices/xilinx_opencl_stream.cpp:1:
/opt/xilinx/xrt/include/CL/cl_ext_xilinx.h:342:13: note: declared here
 extern void clReleaseStream();
             ^~~~~~~~~~~~~~~
/scratch/users/sx233/fpga-runtime/src/frt/devices/xilinx_opencl_stream.cpp:22:10: error: ‘void err’ has incomplete type
     auto err = clReleaseStream(stream_);
          ^~~
/scratch/users/sx233/fpga-runtime/src/frt/devices/xilinx_opencl_stream.cpp: In constructor ‘fpga::internal::XilinxOpenclStream::XilinxOpenclStream(const string&, cl::Device, cl::Kernel, int, fpga::internal::Tag)’
:
/scratch/users/sx233/fpga-runtime/src/frt/devices/xilinx_opencl_stream.cpp:33:3: error: ‘cl_stream_flags’ was not declared in this scope
   cl_stream_flags flags;

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.