jcxue / rdma-tutorial Goto Github PK
View Code? Open in Web Editor NEWA tutorial on RDMA based programming using code examples
License: Apache License 2.0
A tutorial on RDMA based programming using code examples
License: Apache License 2.0
两块ConnectX-3的板子直连,用ib_write_lat测试时延,结果出来的数据没看懂什么意思。
比如1024bytes的t_avg竟然有8299usec,就是8ms,是不是太慢了,而如果去掉-C选项,又变成了2usec,又太快了,这是怎么一回事?从源码里来看,为什么不加-C要除CPU的主频,加了又不除了。
那么从哪里能得到正常的数据呢?还是我看的方法不对?
[root@localhost perftest-master]# ib_write_lat -a -n 5 -F -C
RDMA_Write Latency Test
这是加了-C选项,出来的时延是不是太大了。而下面不加的又太小了。。
[root@localhost perftest-master]# ib_write_lat -a -n 5 -F
RDMA_Write Latency Test
Currently I'm able to get the RDMA Write-Only operation to work, but not the RDMA Fetch&Add operation. I have used ibv_modify_qp()
on the server to modify the access properties of the queue pairs to support atomic operations. And have used ibv_post_send
on the client to construct and send Fetch&Add operations, but only receive NAK messages from the server.Here is some code I wrote, hope that you can help me find some problems.
This is the C++ code of executing Fetch&Add operation in client:
static int perform_rdma_fetch_and_add(struct ibv_qp *qp, struct ibv_mr *mr, uint64_t compare_add_value, uint64_t remote_addr, uint32_t rkey) {
struct ibv_send_wr send_wr = {};
struct ibv_send_wr *bad_wr;
struct ibv_sge atomic_sge = {};
// (Not use in fetch&add operation) The address of the buffer to read from or write to
atomic_sge.addr = (uint64_t)mr->addr;
// (Atomic operations operate on 8-byte values) The length of the buffer in bytes
atomic_sge.length = sizeof(uint64_t);
// (Not use in fetch&add operation) The Local key of the Memory Region that this memory buffer was registered with
atomic_sge.lkey = mr->lkey;
memset(&send_wr, 0, sizeof(send_wr));
// send_wr.wr_id = 0; // Optional, can be used for tracking purposes
send_wr.num_sge = 1;
send_wr.sg_list = &atomic_sge;
send_wr.opcode = IBV_WR_ATOMIC_FETCH_AND_ADD;
send_wr.send_flags = IBV_SEND_SIGNALED; // Optionally, set the signaled flag to receive a completion notification
send_wr.wr.atomic.remote_addr = remote_addr;
send_wr.wr.atomic.rkey = rkey;
send_wr.wr.atomic.compare_add = compare_add_value;
int ret = ibv_post_send(qp, &send_wr, &bad_wr);
if (ret) {
perror("ibv_post_send for RDMA Fetch&Add");
return -1;
}
return 0;
}
I'm trying example 1 : git checkout 65893ec
as the wiki stated.
If I tune up the concurrency config_info.num_concurr_msgs = 2;
(main.c:38) , the program crashed with local protection error
.
The error is caused by the wrong implementation of the ring buffer.
i.e. (client.c:53)
buf_offset = (buf_offset + msg_size) % buf_size;
buf_ptr += buf_offset;
should be changed to
buf_offset = (buf_offset + msg_size) % buf_size;
buf_ptr = ib_res.ib_buf + buf_offset;
Otherwise, the buffer is never wrap around and causes an overflow.
Hi, sir, after I enter the command ./rdma-tutorial lsy 1234
, I'll get a segmentation fault
error output, any help?
In example2, i want to use multi-thread transporting, so i modify the value of the num_threads,but when i run the project in the command line, nothing output in the command line, i forced the process to exit, i found in the log,there is some information about IB config, so the qp connect is success,but nothing tranported.
Error
d7dd63e#r33301093
Warning -Werror
f73736e#r33301112
Hi,
Thanks for the great tutorial.
I have an issue when running the first example on both server and client.
================ IB Echo Server ================
************ Configuraion ************
is_server = true
msg_size = 64
num_concurr_msgs = 1
sock_port = 8080
************ End of Configuraion ************
[ERROR] (setup_ib.c:184:setup_ib: errno: Cannot allocate memory) Failed to create qp
[ERROR] (main.c:39:main: errno: None) Failed to setup IB
================ Run Finished ================
the output is as the following:
gcc -Wall -Werror -O2 -c -o config.o config.c
config.c: In function ‘get_rank’:
config.c:105:5: error: ‘strncpy’ output may be truncated copying 64 bytes from a string of length 64 [-Werror=stringop-truncation]
strncpy (hostname, utsname_buf.nodename, sizeof(hostname));
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
gcc does this for security reason, and it seems that many project have got this warning.[1]
I think we can use strlcpy[2] instead:)
[1]openwall/john#3127 (comment)
[2]https://linux.die.net/man/3/strlcpy
https://stackoverflow.com/questions/2114896/why-are-strlcpy-and-strlcat-considered-insecure
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.