abs-tudelft / fletcher Goto Github PK
View Code? Open in Web Editor NEWFletcher: A framework to integrate FPGA accelerators with Apache Arrow
Home Page: https://abs-tudelft.github.io/fletcher/
License: Apache License 2.0
Fletcher: A framework to integrate FPGA accelerators with Apache Arrow
Home Page: https://abs-tudelft.github.io/fletcher/
License: Apache License 2.0
To improve the user experience I'd like to propose to include the echo runtime in our python runtime wheels. This requires two changes:
auditwheel
).auditwheel
.Describe the bug
If the -o
option is given, the AxiTop and SimTop_tc files are not created. If The path is hardcoded in fletchgen.cc@L104
from std::string axi_file_path = "vhdl/AxiTop.vhd";
to std::string axi_file_path = "/output/vhdl/AxiTop.vhd";
it does work. It also works without the -o
option and saving to the current directory.
To Reproduce
fletchgen -o /output -i /source/ff_in.fbs /source/ff_out.fbs -l vhdl --axi --sim -r /source/Float_data.rb -s /source/Float_data.srec --force -n FletcherFloat
vs
cd /output && fletchgen -i /source/ff_in.fbs /source/ff_out.fbs -l vhdl --axi --sim -r /source/Float_data.rb -s /source/Float_data.srec --force -n FletcherFloat
Expected behavior
Both should result in the same output.
Currently files related to Buffer(reader/writer), Column(reader/writer) and the internal bus infrastrucure (Bus(Read/Write)...) are all placed together in vhdl/arrow. It might be nicer to split them up over subdirectories:
hardware/vhdl/bus
|- read - all files for reading from host memory bus
|- write - all files for writing ...
hardware/vhdl/buffer
|- read - etc..
|- write
hardware/vhdl/column
|- read
|- write
|- common
For writers but potentially for readers as well.
While working on #181 I ran into the following bug:
Describe the bug
[INFO ]: Loading RecordBatch(es) from recordbatch.rb
[INFO ]: Creating SchemaSet.
[DEBUG]: Schema ExampleBatch, Direction: in
[INFO ]: Generating Mantle...
terminate called after throwing an instance of 'std::runtime_error'
what(): Not implemented.
To Reproduce
Make a release build of fletchgen and work through the tutorial.
Expected behavior
It should work like the debug build.
Additional context
Confirmed it's not a Cython side-effect by running a release build locally (this is macOS):
./fletchgen -n Sum -r recordbatch.rb 0.05s 11:44:20
[INFO ]: Loading RecordBatch(es) from recordbatch.rb
[INFO ]: Creating SchemaSet.
[DEBUG]: Schema ExampleBatch, Direction: in
[INFO ]: Generating Mantle...
libc++abi.dylib: terminating with uncaught exception of type std::runtime_error: Not implemented.
This is a tracking issue for Alveo card support in Fletcher.
It seems that there are currently three methods to develop for Alveo cards:
SDx development depends on XRT. In order to support SDx kernel development (development method 1 and 2) for Fletcher we need to provide a build target which generates the wrapping kernel for the accelerator and a platform runtime which wraps around XRT for execution. This method makes sense if Arrow has native or OpenCL kernel support.
Custom flow with Vivado (development method 3) makes more sense now but can't be used with XRT. Based on this we can use QDMA with this (Github).
I'll investigate.
There should be a reference document for the standard / expected MMIO register layout to clarify which implementations are "correct".
Current standard registers (needs to be verified):
This header is a C-only header intended for platforms and other tools that might not be written in C++.
The goal is to build a Cerata frontend and backend for IP-XACT.
The frontend should parse an IP-XACT package to Cerata graphs, and the backend should allow bundling Cerata graph's into an IP-XACT package.
More info on IP-XACT
Info
Schema's
User guide
kactus2
kactus2-code
MMIO register writes by the user hardware are not propagated back to the register input in simulation top level.
regs_in
: to user hardware
regs_out
: from user hardware
regs_out_en
: write enable signal from user hardware
regs
: register state in simulation
regs_out
is properly propagated to regs
, but not to regs_in
. This is not trivial, since the value of regs_in
and regs
are determined in separate processes.
Example with an additional write enable, and connecting regs
instead of regs_in
to the user hardware:
regs_out_proc: process(acc_clk) is
begin
if rising_edge(acc_clk) then
for I in 0 to NUM_REGS-1 loop
if regs_out_en(I) = '1' then
regs((I+1)*REG_WIDTH-1 downto I * REG_WIDTH) <= regs_out((I+1)*REG_WIDTH-1 downto I * REG_WIDTH);
elsif regs_sim_we(I) = '1' then
regs((I+1)*REG_WIDTH-1 downto I * REG_WIDTH) <= regs_in((I+1)*REG_WIDTH-1 downto I * REG_WIDTH);
end if;
end loop;
end if;
end process;
This might be an interesting starting point
In some cases, StreamPipelineControl seems to size its output FIFO smaller than the pipeline depth, resulting in (oscillating) backpressure. NUM_PIPE_REGS=2 and MIN_CYCLES_PER_TRANSFER=1 replicates the issue.
Using the AWS C library that interfaces with XSIM through Vivado DPI,
it's possible to create an FPGAPlatform that uses this library.
We can then write applications on top of this library once, for both simulation and run time.
Fletchgen does not exit if the input file fails to open. Is this intended behavior?
[INFO ]: Loading RecordBatch(es) from asdf
[ERROR]: Could not open file for reading. asdf ARROW:[IOError: Failed to open local file 'asdf', error: No such file or directory]
[INFO ]: Creating SchemaSet.
[INFO ]: Generating Mantle...
[DEBUG]: MMIO Type already exists in default pool.
[INFO ]: Generating DOT output.
[INFO ]: DOT: Generating output for Graph: Mantle
[INFO ]: DOT: Generating output for Graph: asdf
[INFO ]: Generating VHDL output.
[INFO ]: VHDL: Transforming Component Mantle to VHDL-compatible version.
[INFO ]: VHDL: Generating sources for component Mantle
[DEBUG]: VHDL: Transforming Cerata graph to VHDL-compatible.
[DEBUG]: VHDL: Resolve port-to-port connections...
[DEBUG]: VHDL: Materialize stream abstraction...
[DEBUG]: VHDL: Expanding type MMIO_A32_D32:Rec
[INFO ]: VHDL: Saving design to: ./vhdl/Mantle.vhd
[INFO ]: VHDL: Transforming Component asdf to VHDL-compatible version.
[INFO ]: VHDL: Generating sources for component asdf
[DEBUG]: VHDL: Transforming Cerata graph to VHDL-compatible.
[DEBUG]: VHDL: Resolve port-to-port connections...
[DEBUG]: VHDL: Materialize stream abstraction...
[INFO ]: VHDL: Saving design to: ./vhdl/asdf.vhd
[INFO ]: VHDL: Generated output for 2 graphs.
[INFO ]: Saving simulation top-level design to: ./vhdl/SimTop_tc.vhd
[DEBUG]: SIM: Generating MMIO writes for 0 RecordBatches.
[INFO ]: fletchgen completed.
When a ColumnReader has delivered all elements to the stream, as far as the sink is concerned, the ColumnReader is done doing its job. Thus it might reset the ColumnReader before all bus transfers on the host-side of the bus have been handled by the ColumnReader. This can cause a deadlock on the host side bus as the resetted ColumnReader will not acknowledge any incoming transfers that might still be open from a large burst request.
Solution is probably to provide a "done" signal on the output of the ColumnReader, or to "un-fix" the burst length so we never burst over our last byte of interest.
Writers currently do not properly support separate clock domains for the bus and accellerator, despite availability of separate clock/reset ports. The read side should support it, though enabling the CDC logic is inconveniently done for each reader separately through the CFG string. Multi-clock-domain test code should also be created or improved.
Currently the AXI top-level output of Fletchgen using --axi doesn't provide a simulation like the simulation top-level that it can generate using the --sim flag.
It would be nice to just wrapception the AXI top into a simulation top level, so the same simulation capabilities of the normal simulation top-level become available to the AXI top-level as well, testing the AXI top-level itself in the meanwhile for all different codegen tests.
This would also enable the possibility of adding a protocol checker on the axi-simulation top level.
Alternatively we could just make the simulation top-level always instantiate the AXI top wrapper.
When supplying an unsupported / unknown output language, Fletchgen doesn't react to this at all.
It would be nice if it would at least log a warning or error about it being an unkown target language.
Currently the type of paths in Cerata and Fletchgen is std::string
. This makes handling them not very ergonomic (see #139), which is why I'd like to suggest we use the filesystem library for this. Since we're already using C++17 features, this should not be an issue. It may however be required for some people to update their compilers.
Describe the bug
Calling the function queue_record_batch
with an input batch that results from a RecordBatchFileReader.get_batch
call, results in a segfault
To Reproduce
parser = argparse.ArgumentParser()
parser.add_argument("schema_path")
parser.add_argument("input_path")
parser.add_argument("output_path")
args = parser.parse_args()
schema = pa.read_schema(args.schema_path)
# Set up a RecordBatch reader and read the RecordBatch.
reader = pa.RecordBatchFileReader(args.input_path)
input_batch = reader.get_batch(0)
output_batch = pa.RecordBatch.from_arrays([pa.array([0] * input_batch.num_rows, pa.uint32())],schema)
print("Got the batches.")
platform = pf.Platform("snap", False) # Create an interface to an auto-detected FPGA Platform.
platform.init() # Initialize the Platform.
print("Initialized platform.")
context = pf.Context(platform) # Create a Context for our data on the Platform.
print("Created context.")
context.queue_record_batch(input_batch) # Queue the RecordBatch to the Context.
print("Queued record output batch.")
context.queue_record_batch(output_batch) # Queue the RecordBatch to the Context.
print("Queued record output batch.")
context.enable() # Enable the Context, (potentially transferring the data to FPGA).
Run this and the code will fail right after printing: Created context.
schema_path
is the output schema used for the output_path
Expected behavior
To continue, just like the same function call in the C++ runtime.
Calling the FPGAPlatform::prepare_column_chunks
function twice leads to buffer addresses of the former column being overwritten.
In order to prepare multiple columns, this function could be modified to take a vector of arrow columns as an argument instead.
A dirty example for preparing two Arrow columns:
uint64_t FPGAPlatform::prepare_column_chunks(const std::shared_ptr<arrow::Column>& column0, const std::shared_ptr<arrow::Column>& column1)
{
uint64_t bytes = 0;
std::vector<BufConfig> host_bufs;
std::vector<BufConfig> dest_bufs;
auto chunks0 = column0->data()->chunks();
auto chunk0 = chunks0[0];
std::vector<BufConfig> chunk_config0;
append_chunk_buffer_config(chunk0->data(), column0->field(), chunk_config0);
host_bufs.insert(host_bufs.end(), chunk_config0.begin(), chunk_config0.end());
auto chunks1 = column1->data()->chunks();
auto chunk1 = chunks1[0];
std::vector<BufConfig> chunk_config1;
append_chunk_buffer_config(chunk1->data(), column1->field(), chunk_config1);
host_bufs.insert(host_bufs.end(), chunk_config1.begin(), chunk_config1.end());
LOGD("Host side buffers:" << std::endl << ToString(host_bufs));
bytes += this->organize_buffers(host_bufs, dest_bufs);
LOGD("Destination buffers: " << std::endl << ToString(dest_bufs));
size_t nbufs = host_bufs.size();
this->_argument_offset += nbufs;
LOGD("Configured " << nbufs << " buffers. " "Argument offset starting at " << this->argument_offset());
return bytes;
}
Note that this implementation still only supports a single chunk per column.
Currently the length stream to the Fletcher list ArrayWriters can only support one length per cycle. For short strings especially it would be beneficial to the throughput if it were possible to stream multiple lengths per cycle.
To prevent ambiguity between stream handshake valid, empty list (dvalid=0) and Arrows validity bit the proposal is to change:
not(dvalid) -> empty
null -> validity
The -o
parameter is ignored and always takes the default.
In general working with AWS EC2 F1 reminds one of the myth of Pandora's box.
I suggest at some point we come up with something that makes our life less painful when working with this platform. I'd much rather just have the following folders in an aws-fpga platform specific project:
root
|- src/
|- sim/
|- ip/
And then run some script to where we can simply run:
make sim
make sim-gui
make bitstream
That then just un-cluster-messes the aws-fpga toolchain, regardless of their version, and runs simulation, simulation with waveform gui, or just builds the F1 bitstream.
Issues #5 #33 #90 are related and I will close them for now so we can fix all of them in a pull request on this issue.
I have fixed the sum exaple for now on #106, but all other examples should be checked before this issue can be closed.
hardware/vhdl/arrow/BusArbiter.vhd
contains port names which are the inverse of the conventions that AXI commonly uses
It might be worthwhile to refactor this
When user core implementation is done through Vivado HLS it's possible to integrate Fletcher streams with Vivado's stream format (for hls::stream arguments to HLS functions).
This could be done automatically if the user would supply some argument saying they want the stream to be Vivado HLS compatible.
Vivado uses the following signaling for hls::stream. For example on a HLS output top-level we find that an input stream has:
values_V_V_dout : IN STD_LOGIC_VECTOR (31 downto 0);
values_V_V_empty_n : IN STD_LOGIC;
values_V_V_read : OUT STD_LOGIC;
The handshake protocol should be checked but it seems it is assumed the HLS component will pull from a FIFO. Empty_n and read look like valid and ready, but this needs to be checked to be sure.
Best case outcome would be that a user can just instantiate the HLS core in the generated template without having to type a single line of HDL, similar to OpenPOWER SNAP.
See ghdl/ghdl#606
Desired implementation to solve this could be to have a seperate MMIO register to signal there is an implicit null bitmap (e.g. all data is valid).
This bit should end up in the command stream of a user core to a columnreader/writer.
Simulation of the stringread example (hardware/test/fletchgen/stringread
) is broken for the latest fletchgen. The hardware/test/fletchgen/stringread/test.vhd
needs to be updated to work with the generated test_wrapper
file by the latest version of fletchgen.
/src/test/fletchgen/stringread/test_wrapper.vhd:228:3: for default port binding of component instance "test_inst":
/src/test/fletchgen/stringread/test_wrapper.vhd:228:3: signal interface "idx_first" has no association in entity "test"
/src/test/fletchgen/stringread/test_wrapper.vhd:228:3: signal interface "idx_last" has no association in entity "test"
We are impacted by this: https://discuss.python.org/t/libcrypt-so-1-removal-in-fedora-30-impacting-manylinux-builds/1961
Build of Python 3.7 manylinux2010 wheels is broken.
2019-07-31T09:04:33.9373085Z Traceback (most recent call last):
2019-07-31T09:04:34.7234380Z File "/opt/python/cp35-cp35m/bin/auditwheel", line 10, in <module>
2019-07-31T09:04:34.7235875Z sys.exit(main())
2019-07-31T09:04:34.7236483Z File "/opt/_internal/cpython-3.5.7/lib/python3.5/site-packages/auditwheel/main.py", line 50, in main
2019-07-31T09:04:34.7236769Z rval = args.func(args, p)
2019-07-31T09:04:34.7237789Z File "/opt/_internal/cpython-3.5.7/lib/python3.5/site-packages/auditwheel/main_repair.py", line 83, in execute
2019-07-31T09:04:34.7238280Z update_tags=args.UPDATE_TAGS)
2019-07-31T09:04:34.7238930Z File "/opt/_internal/cpython-3.5.7/lib/python3.5/site-packages/auditwheel/repair.py", line 101, in repair_wheel
2019-07-31T09:04:34.7239162Z needed = elf_read_dt_needed(path)
2019-07-31T09:04:34.7239661Z File "/opt/_internal/cpython-3.5.7/lib/python3.5/site-packages/auditwheel/elfutils.py", line 16, in elf_read_dt_needed
2019-07-31T09:04:34.7240179Z raise ValueError('Could not find soname in %s' % fn)
2019-07-31T09:04:34.7240694Z ValueError: Could not find soname in pyfletcher/.libs/libcrypt-2-cd9d3846.12.so
Going over the 'Hello world' tutorial, I was just thinking it would be nice if we provide a Python wrapper for Fletchgen. This would result in people not having to build Fletchgen from source, but simply allow them to run pip install pyfletchgen
.
In order to then run the 'Hello world' tutorial:
pip install pyfletcher pyfletchgen vhdeps
The complete tutorial can then be run from a notebook:
import pyarrow
import pyfletcher
import pyfletchgen
import vhdeps
# step 1 - schema
schema = pyarrow.schema(...)
# step 2 - recordbatch
recordbatch = pyarrow.recordbatch(schema, ...)
# step 3 - fletchgen
template = pyfletchgen.generate(schema, recordbatch, ...)
# step 4 - kernel
# here we can put the vhdl for the kernel implementation without the wrapping
sum = r"""
-- Registers used:
constant REG_CONTROL : natural := 0;
constant REG_STATUS : natural := 1;
...
"""
# this takes the kernel and puts it in the wrapper
kernel = pyfletchgen.kernel(sum, ...)
# step 5 - simulate
test = pyfletchgen.tests(...)
assert vhdeps.ghdl(test)
# step 6 - host-software
platform = pyfletcher.platform(...)
...
For step 4, I'm not sure if it makes sense to put the kernel in the notebook, but in theory it could be done. It's just nice to have everything that's required in one place. With jupyterlab we can also put a text editor next to the tutorial snippets for the kernel implementation.
Then in the future we can build a waveform plot lib for Python to make simulation from a notebook even better, and we can build a Fletcher build tool for step 7 with a Python interface to build and deploy to FPGAs.
The configurable output slice in BufferReader.vhd (through RESP_OUT_SLICE / MST_DAT_SLICE on dev) is not controllable through the config string of a ColumnReader.
In a ColumnWriter with the following parameters:
BUS_BURST_STEP_LEN=1 beat
BUS_BURST_MAX_LEN= 64 beats
BUS_DATA_WIDTH=512 bits
BUS_ADDR_WIDTH=64 bits
Even though by design it should not be possible to cross BUS_BURST_BOUNDARY=4096 bytes (currently set to AXI spec), AXI protocol checkers indicate that the bus burst boundary is sometimes still exceeded when BUS_BURST_MAX_LEN=64 beats. It doesn't seem to happen when BUS_BURST_MAX_LEN=32 beats.
Important to note is that BUS_BURST_STEP_LEN * BUS_DATA_WIDTH should currently not exceed 64 bytes, else buffers that start at an address > n * BUS_BURST_BOUNDARY-BUS_BURST_STEP_LEN * BUS_DATA_WIDTH will always burst over the boundary. This means that whenever BUS_DATA_WIDTH would become larger than 64 bytes the current design of the BusReq units will not suffice. The nicest solution would be to change the Arrow spec to work with cache lines larger than just the ones Intel uses.
When multiple physical memories are available, application throughput can be maximized if it was limited by memory bandwidth on a single memory interface.
Some things that we can automate in fletchgen to make it possible to utilize full system bandwidth:
Eventually (but not to close this issue) it might be nice to provide a fletcher crossbar unit to share e.g. pci masters and cr/cw masters with all the physical memories.
Describe the bug
-f
and --force
seem to be ignored, resulting in vhdt
files.
To Reproduce
docker run --rm -v /<pathto>/fletcherfiltering_test_workspace/Float:/source -v /<pathto>/FloatSnapAction/hw:/output -t fletchgen:develop -o /output -i /source/ff_in.fbs /source/ff_out.fbs -l vhdl --axi --sim -r /source/Float_data.rb -s /source/Float_data.srec --force -n FletcherFloat
or
docker run --rm -v /<pathto>/fletcherfiltering_test_workspace/Float:/source -v /<pathto>/FloatSnapAction/hw:/output --entrypoint /bin/bash -it fletchgen:develop
fletchgen -o /output -i /source/ff_in.fbs /source/ff_out.fbs -l vhdl --axi --sim -r /source/Float_data.rb -s /source/Float_data.srec --force -n FletcherFloat
Expected behavior
FletcherFloat.vhd
should be overwritten.
Additional context
using a build of version 142f6de
This seems to just hard code the overwrite option to false
, the options->overwrite is only ever used to overwrite the AxiTop and the like.
https://github.com/abs-tudelft/fletcher/blob/142f6de/codegen/fletchgen/src/fletchgen/design.cc#L97
It also does not like Different issue #176--axi
or --sim
for some reason.
Source docker file:
FROM mbrobbel/libarrow:0.14.1
RUN curl -L curl -L https://github.com/Kitware/CMake/releases/download/v3.14.5/cmake-3.14.5-Linux-x86_64.tar.gz | tar xz -C $HOME
WORKDIR /fletcher
ADD . .
WORKDIR /fletcher/codegen/fletchgen
RUN mkdir -p build && cd build && $HOME/cmake-3.14.5-Linux-x86_64/bin/cmake .. && make -j && make install
ENV FLETCHER_DIR="/fletcher"
ENV FLETCHER_RUNTIME_DIR=$FLETCHER_DIR/runtime
ENV FLETCHER_CODEGEN_DIR=$FLETCHER_DIR/codegen
ENV FLETCHER_HARDWARE_DIR=$FLETCHER_DIR/hardware
ENV FLETCHER_PLATFORM_DIR=$FLETCHER_DIR/platforms
ENV FLETCHER_EXAMPLES_DIR=$FLETCHER_DIR/examples
ENTRYPOINT ["fletchgen"]
Tie off the axi write channel in the generated axi_top entity when no writers are present.
m_axi_awvalid <= '0';
m_axi_wvalid <= '0';
Only for the AWS platform, when e.g. the StringWriter has written a dataset to on-board memory and that data is read by the host afterwards, it cannot be restarted.
This behavior is not seen in simulation or on the SNAP platform, so it is suspected to have something to do with either the AWS platform itself or the Fletcher AWS runtime lib.
Simulation is functional now but for building this still needs to be applied:
https://github.com/aws/aws-fpga/blob/master/hdk/docs/AWS_Shell_V1.4_Migration_Guidelines.md
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.