Giter Club home page Giter Club logo

Comments (30)

githubfoam avatar githubfoam commented on July 17, 2024 1

@Hunter21007 I also tried GlusterFS RDMA transport with kernel 4.9.0. what do you mean same machine? I have 2 VMs 2x NIC .1x NAT 1x host-only.did you get glusterfs rdma transport running with soft-roce?

from rxe-dev.

yonatanco avatar yonatanco commented on July 17, 2024

Hello.
i know this issue. its a race condition that we recently fixed.
I sent a fix for 4.8-rc5.
you can work with upstream instead of github to be up to date.

BTW : are you working with a Mellanox's HCA ? or some ethernet NIC like Intel or Broadcom ?

from rxe-dev.

mcfatealan avatar mcfatealan commented on July 17, 2024

Hi @yonatanco , thanks so much for your responding! I'm trying 4.8-rc5 now, later I'll send you my feedbacks.

BTW, here's my hardware info:

mcfatealan@mcfatealan-desktop:~$ lspci | grep 'Ethernet'
05:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)

from rxe-dev.

mcfatealan avatar mcfatealan commented on July 17, 2024

Oops. still got the same problem:

mcfatealan@mcfatealan-desktop:~$ uname -a
Linux mcfatealan-desktop 4.8.0-rc5 #1 SMP Mon Sep 12 14:12:15 CST 2016 x86_64 x86_64 x86_64 GNU/Linux

mcfatealan@mcfatealan-desktop:~/librxe-dev$ ibv_rc_pingpong -g 0 -d rxe0 -i 1
  local address:  LID 0x0000, QPN 0x000011, PSN 0xeefb20, GID fe80::be5f:f4ff:fe3a:cd36
  remote address: LID 0x0000, QPN 0x000012, PSN 0x365328, GID fe80::be5f:f4ff:fe3a:cd36
//Hanging..

mcfatealan@mcfatealan-desktop:~/librxe-dev$ ibv_rc_pingpong -g 0 -d rxe0 -i 1
  local address:  LID 0x0000, QPN 0x000011, PSN 0xeefb20, GID fe80::be5f:f4ff:fe3a:cd36
  remote address: LID 0x0000, QPN 0x000012, PSN 0x365328, GID fe80::be5f:f4ff:fe3a:cd36
//Hanging..

mcfatealan@mcfatealan-desktop:~$ ibv_rc_pingpong -g 0 -d rxe0 -i 1 192.168.10.19
  local address:  LID 0x0000, QPN 0x000012, PSN 0x365328, GID fe80::be5f:f4ff:fe3a:cd36
  remote address: LID 0x0000, QPN 0x000011, PSN 0xeefb20, GID fe80::be5f:f4ff:fe3a:cd36
//Hanging..


from rxe-dev.

yonatanco avatar yonatanco commented on July 17, 2024

you are using gid 0. try with gid 1.

ibv_rc_pingpong -g 1 -d rxe0 -i 1
ibv_rc_pingpong -g 1 -d rxe0 -i 1 192.168.10.19

from rxe-dev.

mcfatealan avatar mcfatealan commented on July 17, 2024

Thanks for reminding, @yonatanco . The result stays same..

mcfatealan@mcfatealan-desktop:~/librxe-dev$ ibv_rc_pingpong -g 1 -d rxe0 -i 1
  local address:  LID 0x0000, QPN 0x000012, PSN 0x5e5383, GID ::ffff:192.168.10.19
  remote address: LID 0x0000, QPN 0x000013, PSN 0x4c0dd8, GID ::ffff:192.168.10.19

mcfatealan@mcfatealan-desktop:~$ ibv_rc_pingpong -g 1 -d rxe0 -i 1 192.168.10.19 
  local address:  LID 0x0000, QPN 0x000013, PSN 0x4c0dd8, GID ::ffff:192.168.10.19
  remote address: LID 0x0000, QPN 0x000012, PSN 0x5e5383, GID ::ffff:192.168.10.19

``

from rxe-dev.

yonatanco avatar yonatanco commented on July 17, 2024

On 9/12/2016 2:12 PM, Chang Lou wrote:

Thanks for reminding. The result stays same..

|mcfatealan@mcfatealan-desktop:/librxe-dev$ ibv_rc_pingpong -g 1 -d rxe0
-i 1 local address: LID 0x0000, QPN 0x000012, PSN 0x5e5383, GID
::ffff:192.168.10.19 remote address: LID 0x0000, QPN 0x000013, PSN
0x4c0dd8, GID ::ffff:192.168.10.19 mcfatealan@mcfatealan-desktop:
$
ibv_rc_pingpong -g 1 -d rxe0 -i 1 192.168.10.19 local address: LID
0x0000, QPN 0x000013, PSN 0x4c0dd8, GID ::ffff:192.168.10.19 remote
address: LID 0x0000, QPN 0x000012, PSN 0x5e5383, GID
::ffff:192.168.10.19 `` |


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#51 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AS6tfYiAn42AMO1T0tbi32-kgLZSi7Jzks5qpTOogaJpZM4J2kVU.

are trying to ping using the same host ? loopback ?

from rxe-dev.

mcfatealan avatar mcfatealan commented on July 17, 2024

@yonatanco , sorry to reply late.. I didn't receive notification on the main page.
I'm wondering if there's any issue if I rping myself? I could do that on some of my other machines enabled with RNIC. I'm new to RDMA, maybe I'm being silly here...

from rxe-dev.

anthonyliubin avatar anthonyliubin commented on July 17, 2024

Hello ,mcfatealan
@yonatanco @mcfatealan
I am testing the SoftRoce right now. And meet the same issue as you decribed.
Firstly, i used the rxe-dev-master, the kernel is 4.0.0. But if i run rping , there is RDMA_CM_EVENT_ADDR_ERROR .
Then i changed to use rxe-dev-rxe_submission_v18, the kernel is 4.7.0-rc3. The rping could run, but it is hang up. The server could not receive the RDMA_CM_EVENT_CONNECT_REQUEST event, so rping server side blocked in the function sem_wait(). I did it on the same VM, Loopback testing.

As you mentioned the Loopback issue, I aslo tested this case between two PC. One is Linux, the other is VM(NAT connection). We run rping server on PC, but when run client on the VM, the server is crashed, no response for any action.

You said that you have try the 4.8-rc5. I want to know how you achieve it, use rxe-dev branch or just upgrade the kernel? I want to continue the testing , thanks!

Best Regards
Anthony

from rxe-dev.

mcfatealan avatar mcfatealan commented on July 17, 2024

Hi @anthonyliubin , I'm sorry to hear that you've had the same issue. The thing is that unluckily I still didn't pass the test in the end. My purpose was to find a temporary solution to test my RDMA codes before our server was fixed. The time spent on this project exceeded my expected limits, so I had to give up. But still I'd like to thank @yonatanco for all of his help!

About 4.8-rc5, I just upgraded my kernel.

It's kinda embarrassing that my answer might not provide any help. Anyway, that's all I know. Hope for the best!

from rxe-dev.

anthonyliubin avatar anthonyliubin commented on July 17, 2024

hi, @mcfatealan
Thanks for your response.
I have a question, if we do not use rxe-dev branch, just upgrade the kernel, how to keep the rxe dev package in the new kernel? In my mind, if we compile the new kernel, it does not include the rxe. Do we need to port rxe? It is a big work.
If you could give a simple explaination on how to upgrade, it will help us more! Thanks.

Best Regards
Anthony

from rxe-dev.

mcfatealan avatar mcfatealan commented on July 17, 2024

I'm not 100% sure since it's been a while, but according to the description of @yonatanco , seems that rxe-dev already included in 4.8.0? I suggest you give a try :)

from rxe-dev.

anthonyliubin avatar anthonyliubin commented on July 17, 2024

hi, @mcfatealan
Thanks for your help.The rxe-dev already included in 4.8-rc5.
We have tested this case in 4.7 and 4.8-rc5, both results are OK now(ibv_rc_pingpong and rping).
In our testing, we need 2 PCs, Bridge connecting(No NAT, if use VM),clear iptables rules at first.
ibv_rc_pingpong need use gid 1.
And it does not support loop testing.

Best Regards
Anthony

from rxe-dev.

mcfatealan avatar mcfatealan commented on July 17, 2024

@anthonyliubin congrats! so glad to hear that :) The points you mentioned are very helpful. Maybe I will test again next time according to your experience.

from rxe-dev.

oTTer-Chief avatar oTTer-Chief commented on July 17, 2024

Hi,
I try to get rxe running on Debian 8.7 with Kernel 4.8.15 (rdma_rxe version 0.2) and face exactly the same issues. Neither rping nor ib_rc_pingpong are sending data if both run on the same machine.

In case of rping I get:

hutter@cbm01:~$ tail -n1 /var/log/messages
Jan 25 13:28:05 cbm01 kernel: [54644.488129] detected loopback device

Should this work loopback/on the same machine or is this unsupported?

from rxe-dev.

anthonyliubin avatar anthonyliubin commented on July 17, 2024

hi, @oTTer-Chief

In my testing, it could not work loopback/on the same machine.
You could try it via 2 PC.

Best Regards
Anthony

from rxe-dev.

oTTer-Chief avatar oTTer-Chief commented on July 17, 2024

Hi @anthonyliubin ,

I tried testing between 2 VMs and this worked.
Nevertheless I wonder if the loopback is intended to work and there is an error in my setup or if loopback is explicitly unsupported.
If I have real RDMA hardware like Infiniband I am able to send to the same machine so I would assume the software representation is also able to do this.

from rxe-dev.

Hunter21007 avatar Hunter21007 commented on July 17, 2024

Hi all,

Communication with the same machine is also required with GlusterFS RDMA transport...(which I was not able to do with Linux 4.9)

from rxe-dev.

Peng-git-hub avatar Peng-git-hub commented on July 17, 2024

you may try this:
first:make sure that message can pass through the firewall
iptables -F; iptables -t mangle -F
then:add the IP address of both server and client to “trusted list”
firewall-cmd --zone=trusted --add-source=1.1.1.1 --permanent
firewall-cmd --zone=trusted --add-source=1.1.1.2 --permanent

from rxe-dev.

Hunter21007 avatar Hunter21007 commented on July 17, 2024

Is this nesessary also if firewall is disabled?

from rxe-dev.

Peng-git-hub avatar Peng-git-hub commented on July 17, 2024

The default firewall rule is rejecting the unknown connection, and the direct test will be rejected by the remote firewall

from rxe-dev.

byronyi avatar byronyi commented on July 17, 2024

Any updates? Seems the loopback interface is not functioning for RDMACM, which is crucial for testing and local development.

from rxe-dev.

monis410 avatar monis410 commented on July 17, 2024

The RXE project maintenance in Github was stopped. You should move to upstream linux for kernel module and rdma-core (https://github.com/linux-rdma/rdma-core) for userspace library to get the latest features and bug fixes.
Note that it is possible that some of the bugs you meet have fixes in drivers/infinibad/core (which means that they are common to all infiniband provideres)

from rxe-dev.

byronyi avatar byronyi commented on July 17, 2024

Thanks for your comment!

from rxe-dev.

byronyi avatar byronyi commented on July 17, 2024

@githubfoam According to my inquiry in the linux-rdma mailing list, several RXE bugs were fixed in 4.9/10/11, and you are suggested to upgrade to 4.14/15 (e.g. Ubuntu 18.04 or Debian unstable). If the problem persists, let us know.

from rxe-dev.

Hunter21007 avatar Hunter21007 commented on July 17, 2024

@githubfoam Host only means glusterfs server and client on same machine via 127.0.0.1. No I was not successful to make it work. And now it is even out of scope. Because glusterfs rdma support was dropped. So this one is not relevant anymore.

from rxe-dev.

githubfoam avatar githubfoam commented on July 17, 2024

@Hunter21007 could you provide a link that shows glusterfs-rdma support is dropped ? Over here site suggests two links.However both links end nowhere.
https://docs.gluster.org/en/v3/Administrator%20Guide/RDMA%20Transport/
Normally I build two servers NAT-ed in the same network.Glusterfs server/client works.TCP works but RDMA network does not.

from rxe-dev.

githubfoam avatar githubfoam commented on July 17, 2024

@byronyi I tried what's suggested on this website.My nodes are configured with "Ubuntu 16.04.4 LTS-Linux 4.7.0-rc3+" after installing kernel/user spaces. I can't play ping pong.rxe testing fails. I dont get how you upgrade to 4.14-15 kernels.With this kernel spaces it is upgraded from "4.4.0-116-generic" to " 4.7.0-rc3+"
https://github.com/SoftRoCE/rxe-dev/wiki/rxe-dev:-Home
One of contributers say this github is not maintained anymore and suggests to follow "upstream kernel+rdma-core" method which is the one below link.
https://community.mellanox.com/docs/DOC-2184
So I started trying this. My nodes are "Ubuntu 16.04.4 LTS-Kernel: Linux 4.17.0-rc6" "after kernel/rdma-core" installations.Problem is that there are missing steps .Like "sudo make install". At the bottom of the page someone tried and steps are different.

from rxe-dev.

lalith-b avatar lalith-b commented on July 17, 2024

I am able to do ping pong with rxe but rdma_cm is failing when it comes to gluster-rdma support. the port 24008 is never started due to rdma_cm fails with [No Device Found]

from rxe-dev.

githubfoam avatar githubfoam commented on July 17, 2024

@ lalith-b if you read whole thread glusterfs rdma was dropped at that time.If you have information that says otherwise could you please share? The point I left was TCP worked but RDMA did not

from rxe-dev.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.