Giter Club home page Giter Club logo

ns3-rdma's Introduction

NS-3 simulator for RDMA

This is an NS-3 simulator for RDMA over Converged Ethernet v2 (RoCEv2). It includes the implementation of DCQCN, TIMELY, PFC, ECN and Broadcom shared buffer switch.

It is based on NS-3 version 3.17, and ported to Visual Studio environment, as explained here.

Note

TIMELY implementation is in "timely" branch and hasn't been merged into the master branch. So, you may not be able to simulate DCQCN and TIMELY simultaneously at this moment.

Quick Start

Build

To compile it out-of-the-box, you need Visual Studio 2015 (not 2013 or 2017). People have successfully built it with free version, which can be downloaded here. Open windows/ns-3-dev/ns-3-dev.sln, just build the whole solution.

If you cannot get a Windows machine or Visual Studio for any reason, you may try building it with the original Makefile. We have done it a while back, but now you probably need to edit a few things in waf to make it work.

Run

The binary will be generated at windows/ns-3-dev/x64/Release/main.exe. We include a sample configuration file at windows/ns-3-dev/x64/Release/mix/config.txt Execute main.exe in windows/ns-3-dev/x64/Release/:

cd windows\ns-3-dev\x64\Release\
main.exe mix\config.txt

It runs a 2:1 incast at 40Gbps for 1 second. Please allow a few minutes for it to finish. The trace will be generated at mix/mix.tr, as defined by mix/config.txt

There are quite a few options in mix/config.txt. We will gradually add documentation. For your own convenience you can just check the code, project "main" -- source files -- "third.cc", and see how these options are parsed. You can also raise issues if you have any questions.

What did we add exactly?

point-to-point/model/qbb-net-device.cc and all other qbb-* files:

DCQCN and PFC implementation. It also includes go-back-to-N and go-back-to-0 that handle packet drop due to corruption.

In 2013, we got a very basic NS-3 PFC implementation somewhere, and developed based on it. We cannot find the original repository anymore.

network/model/broadcom-node.cc and .h:

This implements a Broadcom ASIC switch model, which is mostly doing all kinds of buffer threshold-related operations. These include deciding whether PFC should be triggered, ECN should be marked, buffer is too full so packets should be dropped, etc. It supports both static and dynamic thresholds for PFC.

Disclaim: this module is purely based on authors' personal understanding of Broadcom ASIC. It does not reflect any official confirmation from either Microsoft or Broadcom.

network/utils/broadcom-egress-queue.cc and .h:

This is the actual MMU buffering packets. It also includes switch scheduler, i.e., when upper layer ask for a packet to send, it will decide which queue to be dequeued. Strategies like strict priority and round robin are supported.

applications/model/udp-echo-client.cc:

We implement the RDMA client here, which aligns with the fact that RoCEv2 includes UDP header. In particular, original UDP client has troubles when PFC pause the link. Original UDP client keeps sending packets at line rate, soon it builds up huge queue and memory runs out. Here we throttle the sending rate if it gets pushed back by PFC.

internet/model/seq-ts-header.cc and .h:

We didn't implement the full InfiniBand header. Instead, what we really need is just the sequence number (for detecting corruption drops, and also help us understand the throughput) and timestamp (required by TIMELY.) This is where we encode this information into packets.

main/third.cc:

The main() function.

There may be other edits here and there, especially the trace generation is scattered among various network stacks. But above are the major ones.

Q&A

Q: Why do you port it to Windows?

A: This is a Microsoft project. Visual Studio, including the free version, works well.

Q: Fine. What if I want to run it on Linux, and do not want to spend time changing the build process?

A: You can build it using Visual Studio and run the .exe using WINE. We have tested WINE 1.6.2 and it works well.

Q: I don't understand ... (some part of the code or configuration)

A: Raise issues on GitHub, so that your questions can also help others. If you really do not want others know you are working on this, you can email [email protected]

Q: What papers should I cite, if I also publish?

A: Below are the ones you should definitely check. They are ranked from most relevant to less. That said, all of them are quite relevant:

ECN or Delay: Lessons Learnt from Analysis of DCQCN and TIMELY, CoNEXT'16 (this project is released with this paper, we ask you to at least cite this paper if you use this code.)

Congestion Control for Large-scale RDMA Deployments, SIGCOMM'15 (DCQCN)

TIMELY: RTT-based Congestion Control for the Datacenter, SIGCOMM'15 (TIMELY)

RDMA over Commodity Ethernet at Scale, SIGCOMM'16 (discussed go-back-to-N)

Deadlocks in Datacenter Networks: Why Do They Form, and How to Avoid Them, HotNets'16 (PFC deadlock analysis, directly used this simulator.)

ns3-rdma's People

Contributors

bobzhuyb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ns3-rdma's Issues

some questions about the implementation of tcp

hello,Yibo.
I have tested the tcp-flow in the project.But the sequeueNumber in mix.tr is always 42. Does this mean that the receiver cannot receive the packets and send back NACK?
And the codes about generate NACK and ACK in qbb-net-device is used for UDP only?
Waiting for your reply.

some questions about how to realize rdma on server

Is the DCQCN based on RoCEv2 the only one to realize RDMA? I compare the code with the basic ns3 about udp. But I cannot find the code about kernel bypass.Have you realizede it ?
Thanks for your reply.

Some question about simulation(about fluid model)

hi, yi bo
I try to use ns3 to verfiy the fluid model you come up in 'Congestion Control for Large-Scale RDMA
Deployments
'.
Then i get some strange performance.
For example, i change parameter BYTE_COUNTER to 10MB which comes from your paper, but the rate of host can't converge and queue length at bottleneck varies a lot. Then i found some parameters that i don't understand:
CLAMP_TARGET_RATE
CLAMP_TARGET_RATE_AFTER_TIMER
If i set them both to 0, the rate of host can converge but it and queue length at bottleneck still oscillate a lot.
I stick the figure of performance(2 flow) as below.

i cyni_haz 1_bg 242ab

Thank you!

How to adjust the sending rate?

Could you tell me how to set the sending rate of a sender to a fixed value?Where should I change the code?
Thank you in advance.

Question about some WARNING and ERROR

Hi, Yibo,

I have ported your windows version to linux version based on the ns-3.18. But, I am not quite sure whether I have done things right.

Anyway, the build process terminates successfully.

But there are some ERROR and WARNING messages when I run the default application (third.cc w/ mix/config.txt).

ERROR: Sendingbuffer miss!
WARNING: shouldn't reach here -- socket.h

So, is the simulation still right? or what happens according to these messages?

some questions about the implementation of Timely

Hi ,
About timely branch, i study the code but i find i am confused.
Where is the algorithm of "timely" implemented?
In the main function, if i don't use qbb-device, can i simulate the protocol of Timely?
Thank you for your reply.

I want to know how many times the queues drop packets and mark ecn ?

I want to know how many times the queues drop packets and mark ecn ?
I find the network/utils/boadcom-egress-queue.cc::BEgressQueue has the limit of queue and network/model/boadcom-node.cc also has the limit of buffer.
1.So what the difference ?
2.If I change the queue's MaxBytes in network/utils/boadcom-egress-queue.cc , where should I also change ?
3.I need to count the drop times and ecn times, where should I add the counter?

how to visualize simulation?

Hi,
On ubuntu OS, pyviz and netanim can be used to visualize simualtion. Is there any tool supported by project ns3-rdma to visualize simulation? I have tried to generate .xml file in simulation, and then open this file in Ubuntu, but netanim told me "This XML format is not supported. Minimum Version:3.106" (the verison of netanim I used is 3.107, the version of ns-3 is 3.26). Do you have any suggestion about visualization?
thank you in advance!

Can you tell me what the output is?

Hello, friend. I ran your code and got the output file mix.tr. But I don't know what the output means. Can you tell me what is the output of each item? Thank you very much.
image

Build Errors with visual studio 2015 community.

Hi,

I am using visual studio 2015 community and try to build the solution but got error like this:
image

There are 3 kinds of errors, C1083, MSB3073 and LNK2001.
I search for solutions but they seem to be useless.
Can you help? Or is there anything I did wrong?
Also the C1083 error is confusing, because I didn't do any change on the codes.

Thx. Nice day :)

hi,i try to run but it seems has something wrong

Hi friend.
I use vs 2015 to build the project and it builds successfully.
But when i try to run the simulation with the command in readme, it takes very long time, it has run 10 hours and is still runing, does this work normally? And i can't see the trace file in mix.

image

image

some questions about the DCQCN and Timely

hello, I have read your paper about analysis of DCQCN and TIMELY. TIMELY has a control engine that inserts delays between segments to achieve the target rate. And how dose DCQCN achieve the target rate in real NIC or ns3 ? Thank you.

WARNING: Drop because egress Q buffer full

I use Bcube as my network topology. It has 8 switches and 16 nodes. This error occurred when I ran the simulation. Could you tell me why? How can I solve this problem? Thank you very much.

Run time error "cannot add the same kind of tag twice"

Hi Yibo,

have you ever met this error "cannot add the same kind of tag twice"?
I got this error whenever a CNP is passing two switch nodes.

This error is thrown here:
"src/point-to-point/model/qbb-net-device.cc:459-461 "

if (ipv4h.GetProtocol() != 0xFE) //not PFC
{
        packet->AddPacketTag(FlowIdTag(m_ifIndex));
......

Here, It seems that when a CNP(ipv4h.GetProtocol=0xFF) arrives at a switch node,
the packet tag will be added.

But the tag is not removed when leaving switch, specified in this scope:
"src/point-to-point/model/qbb-net-device.cc:349 "

if (m_queue->GetLastQueue() == qCnt - 1)//this is a pause or cnp, send it immediately!

I traced back and found when the CNP arrives at the next switch node, the error occurs.

Then I added a code snippet to remove the tag within the 'if' scope in
"src/point-to-point/model/qbb-net-device.cc:349"

if (m_queue->GetLastQueue() == qCnt - 1)//this is a pause or cnp, send it immediately!
  {
+      if (h.GetProtocol() != 0xFE) //not PFC , here h refers to ipv4header
+        {
+            p->RemovePacketTag(t);
+        }
	TransmitStart(p);
  }

The error does not occur now.

I wonder if this is a bug or due to I have missed something in configuration.
I met this in an Incast test scenario.

Thanks,
Ge

How to compile and run with WAF under Ubuntu?

Hello, I learned a lot from your project. Now I want to ask a question about WAF compilation. I have run successfully under Windows, now I want to merge it into the ns3.29 project under ubuntu. What should I do? Look forward to your reply!

M1 MacBook installation

What is the best way to build this for m1 macs? Is it necessary to have a windows vm or vscode 2015? Would it be easier to try to make changes to the waf and adapt the build for m1 macs?

PAUSE never triggered

Hi,
I was doing a simulation using a N:1 (N is big) incast topology, and the buffer is quickly full leading to egress drop, without triggering PAUSE.
So I went back to check the code, and found out that PAUSE generation is checked (checkqueuefull()) inside the send() function, not the receive() function; also it's checked after the ingress and egress admission check, and the ingress and egress admission are checked at the same time.
I thought first in the receive(), we should check the pause generation and ingressadmission, then in the send(), check egressadmission.
So I am confused here, since the simulation keeps telling me buffer is full without triggering any pause generation.
Thank you.

How to implement RSVP protocol?

Recently, I have been studying various commonly used protocol emulation. May I ask if NS3 can realize RSVP protocol emulation? I have not seen anyone realize RSVP on the Internet, so I have a skeptical attitude about the feasibility.

As for the DCQCN on RDMA READ flows

According to my limited understanding, DCQCN depends on ECN to detect network congestion and utilizes marked ACKs to notify the sender to restrict its sending rate. However, for RDMA READ operations, payload as well as ACKs are carried in the response messages. Further, according to IB transport, there is no further ACKs for "read response". So, how does DCQCN control the rate of RDMA READ flows? Does the NP of DCQCN implement an additional ACK mechanism other than the original IB transport? Thanks.

How to read and analyse the output trace file?

How to read and analyse the output trace file?
I have run the example configure file and got the following output in the mix.tr file:
2.000002 /1 1.2>1.1 u 29348 0 3
...
what does each number mean? where did u define them?

Thank you.

No output in the cmd

Hello,
I have successfully run the simulator on my windows machine and the default program generated the trace file successfully.
However, when running the hello-simulator or first project, I do not get any output in the cmd.
I tried adding an std::clog to the project and it worked fine.
is the NS_LOG_UNCOND deactivated in this version or there is something else wrong?
Thanks.

questions about Timely

    Sorry to bother you. But i think i really need some help. I build the solution in Timely , but i didn't get the main.exe in release, instead i get it in a new directory "Debug“.  I run it suing the config.txt ,however it did not generate trace file . I don't know if it is normal or something wrong happened.
   the output information shows that it just generate some files in Debug. 

timelyoutput

flow-level ECMP may not work properly

Hi,
I tried to used Flow-level ECMP in fat-tree topology, but I found the total throughput of the receiver was not as much as the maximum. I think you use the original source codes about flow-level ECMP of ns-3, and I ran the same topology using TCP in ns-3 and visualized it. Surprisingly, only some core switches were used, so I thought that flow-level ECMP might not work properly and it is the main cause for low throughput. Have you ever noticed this? or I used it wrongly?

thanks.

Other version of ns3

Hello, is there a way that dcqcn can be simulated on other version of ns3, like ns-3.30/ns-3.35/ns-3.36?

some questions about pfc pause frame

hi,I'm interested in your project.
I want to know how pause frame come to upstream device(NIC). Pause frame is transmittid based L2.Did you use the Global routing?

time run error

Hello, sir!
I have questions to consult with you. I opened the config.txt, but also run failed. Please give me some modification suggestions.
Hope for your reply. Thanks!
run error

Missing of build file

Hello , I have a question about the link of build file. The link is not work right now, can you update the new link. It will be really helps me

Some questions about how modules work in simlator.

Hi.
I have run the main.exe correctly and now i want to implement some algorithm on simulator.
Can i know how the simulator work , like the relationship of qbb-device and broadcom-node, and the role of qbb-device in simulation.
I also want to know where and who decides when to send PFC packet.
Thank you for your reply

Some questions about the output

Can you tell me where the content in the mix.tr file comes from?What is the meaning of qFb in the figure below?Thank you.
image

Can I turn off buffer sharing among ports on a switch?

Hi Yibo,
From your replies in the issue Some questions about how modules work in simlator, I learnt that all the ports on a switch share the same queue buffer by "A node can have multiple qbb-net-device (especially on a switch), which share the same m_broadcom and m_queue."
I would like to know, can I turn off the buffer sharing and let each port have its own fixed-size buffer? In this way I can control the buffer resource allocated to each switch port.

Thanks~

What does the hops in qbb-net-device mean?

Hi Yibo,

I am reading the code of qbb-net-device. I see many per-flow variables also have hop indexes (e.g., m_alpha[fCnt][maxHop], m_targetRate[fCnt][maxHop]). What does the hop mean here? I do not see this concept in the DCQCN paper.

Thanks,
Yuliang

How to run with visual studio 2015?

Hello, I'm very impressed with the work you've done.However, I still have some questions about how this project will be used in visual studio 2015.I wonder if you could write a tutorial to teach us how to use it.

Output interpretation

Hello,

I was wondering what single entry in mix.tr signifies.

2.000002 /1 1.2>1.1 u 25671 0 6

2.000002 is the timestamp
1.2 is the source
1.1 is the destination
u for udp
25671 , I am not sure what it stands for.
0 packet number
6 is the priority

Please correct me if I am wrong. Also If possible could anyone please let me know what 25671 signifies.

hi , i meet some problems when i build solution on VS2012

hi
i open the file ns-3-dev.sh by VS2012. But when i build solution i meet this problem and it happened many times:

c:\users\ns3-rdma-master\windows\ns-3-dev\headers\ns3\nstime.h(145): error C3861: “lround”: 找不到标识
image
May be u can give me some solution , it will be nice !
Thanks

a small bug in code in ReceiverCheckSeq function

Hi, yibo
I think I found a small bug in your codes in the ReceiverCheckSeq function in qbb-net-device.cc, it does nothing when seq<expected, which means that the NIC receives a duplicate data packets. Let's think of a condition, when the ack(n=4000) lost and the sender didn't receives the ack, so it waits for a period of time, then it began to retransmit, unfortunately, the receiver will do nothing when it receives the duplicate data packets so the sender will never receives the ack(n=4000). This is what I met when I set the loss rate to 0.01 determinately(drop 1 per 100 packets passes the switch), and the ack(n=16000) get lost, thus cause the network a livelock. I think the algorithm should check the seq even though seq<expected. And when (seq+1)%m_chunk==0, the receiver will send back a "duplicate" ack to the sender.

a specific derivation process for R_AI

Hello, I would like to inquire about the rate increment R_AI during the active increase process of the DCQCN algorithm.Is there a specific derivation process for R_AI , and what factors are related to the determination of this rate increment? Is there a specific expression? It doesn't seem to be mentioned in the article.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.