Giter Club home page Giter Club logo

remotedialer's Introduction

Reverse Tunneling Dialer

remotedialer creates a two-way connection between a server and a client, so that a net.Dial can be performed on the server and actually connects to the client running services accessible on localhost or behind a NAT or firewall.

Architecture

Abstractions

remotedialer consists of structs that organize and abstract a TCP connection between a server and a client. Both client and server use Sessions, which provides a means to make a network connection to a remote endpoint and keep track of incoming connections. A server uses the Server object which contains a sessionManager which governs one or more Sessions, while a client creates a Session directly. A connection implements the io.Reader and io.WriteCloser interfaces, so it can be read from and written to directly. The connection's internal readBuffer monitors the size of the data it is carrying and uses backPressure to pause incoming data transfer until the amount of data is below a threshold.

Data flow

A client establishes a session with a server using the server's URL. The server upgrades the connection to a websocket connection which it uses to create a Session. The client then also creates a Session with the websocket connection.

The client sits in front of some kind of HTTP server, most often a kubernetes API server, and acts as a reverse proxy for that HTTP resource. When a user requests a resource from this remote resource, request first goes to the remotedialer server. The application containing the server is responsible for routing the request to the right client connection.

The request is sent through the websocket connection to the client and read into the client's connection buffer. A pipe is created between the client and the HTTP service which continually copies data between each socket. The request is forwarded through the pipe to the remote service, draining the buffer. The service's response is copied back to the client's buffer, and then forwarded back to the server and copied into the server connection's own buffer. As the user reads the response, the server's buffer is drained.

The pause/resume mechanism checks the size of the buffer for both the client and server. If it is greater than the threshold, a PAUSE message is sent back to the remote connection, as a suggestion not to send any more data. As the buffer is drained, either by the pipe to the remote HTTP service or the user's connection, the size is checked again. When it is lower than the threshold, a RESUME message is sent, and the data transfer may continue.

remotedialer in the Rancher ecosystem

remotedialer is used to connect Rancher to the downstream clusters it manages, enabling a user agent to access the cluster through an endpoint on the Rancher server. remotedialer is used in three main ways:

Agent config and tunnel server

When the agent starts, it initially makes a client connection to the endpoint /v3/connect/register, which runs an authorizer that sets some initial data about the node. The agent continues to connect to the endpoint /v3/connect on a loop. On each connection, it runs an OnConnect handler which pulls down node configuration data from /v3/connect/config.

Steve Aggregation

The steve aggregation server on the agent establishes a remotedialer Session with Rancher, making the steve API on the downstream cluster accessible from the Rancher server and facilitating resource watches.

Health Check

The clusterconnected controller in Rancher uses the established tunnel to check that clusters are still responsive and sets alert conditions on the cluster object if they are not.

HA operation (peering)

remotedialer supports a mode where multiple servers can be configured as peers. In that mode all servers maintain a mapping of all remotedialer client connections to all other servers, and can route incoming requests appropriately.

Therefore, http requests referring any of the remotedialer clients can be resolved by any of the peer servers. This is useful for high availability, and Rancher leverages that functionality to distribute downstream clusters (running agents acting as remotedialer clients) among replica pods (acting as remotedialer server peers). In case one Rancher replica pod breaks down, Rancher will reassign its downstream clusters to others.

Peers authenticate to one another via a shared token.

Running Locally

remotedialer provides an example client and server which can be run in standalone mode FOR TESTING ONLY. These are found in the server/ and client/ directories.`

Compile

Compile the server and client:

make server
make client

Run

Start the server first.

./server/server

The server has debug mode off by default. Enable it with --debug.

The client proxies requests from the remotedialer server to a web server, so it needs to be run somewhere where it can access the web server you want to target. The remotedialer server also needs to be reachable by the client.

For testing purposes, a basic HTTP file server is provided. Build the server with:

make dummy

Create a directory with files to serve, then run the web server from that directory:

mkdir www
cd www
echo 'hello' > bigfile
/path/to/dummy

Run the client with

./client/client

Both server and client can be run with even more verbose logging:

CATTLE_TUNNEL_DATA_DEBUG=true ./server/server --debug
CATTLE_TUNNEL_DATA_DEBUG=true ./client/client

Usage

If the remotedialer server is running on 192.168.0.42, and the web service that the client can access is running at address 127.0.0.1:8125, make proxied requests like this:

curl http://192.168.0.42:8123/client/foo/http/127.0.0.1:8125/bigfile

where foo is the hardcoded client ID for this test server.

This test server only supports GET requests.

HA Usage

To test remotedialer in HA mode, first start the dummy server from an appropriate directory, eg.:

cd /tmp
mkdir www
cd www
echo 'hello' > bigfile
/path/to/dummy -listen :8125

Then start two peer remotedialer servers with the -peers id:token:url flag:

./server/server -debug -id first -token aaa -listen :8123 -peers second:aaa:ws://localhost:8124/connect &
./server/server -debug -id second -token aaa -listen :8124 -peers first:aaa:ws://localhost:8123/connect

Then connect a client to the first server, eg:

./client/client -id foo -connect ws://localhost:8123/connect

Finally, use the second server to make a request to the client via the first server:

curl http://localhost:8124/client/foo/http/127.0.0.1:8125/

remotedialer's People

Contributors

aledbf avatar alexellis avatar aruiz14 avatar brandond avatar cbron avatar chrisbulgaria avatar cmurphy avatar daxmc99 avatar dramich avatar erikwilson avatar ibuildthecloud avatar joncrowther avatar liut avatar luthermonson avatar moio avatar nflynt avatar paraglade avatar renovate-rancher[bot] avatar rmweir avatar snasovich avatar tomleb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

remotedialer's Issues

Potential goroutine leak in readbuffer

If a deadline is set, r.cond.Broadcast() is called without obtaining a lock and that opens up the posibility of a race where the broadcast could be called first, and r.cond.Wait() would miss it, potentially leading to a blocked goroutine.

t = time.AfterFunc(r.deadline.Sub(now), func() { r.cond.Broadcast() })

This is a follow up from the following PR review discussion: #68 (comment). So far, it was not observed in Rancher user installations

Potential lock-up?

Hi @cmurphy @cbron thanks for your work on remotedialer updating the README and documenting the new flow control mechanism.

We've been using it to help people connect private services to a central Kubernetes cluster and most TCP/HTTP traffic works exactly as expected.

Recently we were told that tcpdump was showing TCP congestion followed by a lock up (non technical term). Vendoring in Darren's pause/resume flow control work apparently made this better. But when running a script to pull in ~ 100MB of data - it only works the first time, then it locks up for the second time we run the script. When running the same test over a remote forwarded SSH tunnel, the script that pulls this data runs multiple times without locking up. The underlying protocol is a form of LDAP.

If we wait 5 minutes, and see the proxied connection close, then the script to pull data can be run again.

Oddly, running iperf3 with 1-2GB of data completes sequentially, over and over without issue with the flow control patch. I've also seen the Rancher and Harvester teams mention uploading 2-5GB ISO images through remotedialer. So this seems to perhaps be related to a slow reader / fast writer?

I've tried enabling metrics and also debug info - but suspect that maybe there's a deadlock where a pause was sent and everything is on hold? The metrics do not show a gauge for pause/resume - if that was added, do you think it would help show if remotedialer had got into a deadlock?

We see dozens if not hundreds of messages like this:

time="2023/04/27 08:16:55" level=debug msg="READ    [2] 20440112/20724079 32768 <nil>"
time="2023/04/27 08:16:55" level=debug msg="WRITE 7206386832554118841 RESUME       [2]"
time="2023/04/27 08:16:55" level=debug msg="READ    [2] 20472880/20724079 32768 <nil>"
time="2023/04/27 08:16:55" level=debug msg="READ    [2] 20505648/20724079 32768 <nil>"
time="2023/04/27 08:16:55" level=debug msg="READ    [2] 20538416/20724079 32768 <nil>"
time="2023/04/27 08:16:55" level=debug msg="READ    [2] 20571184/20724079 32768 <nil>"
time="2023/04/27 08:16:55" level=debug msg="READ    [2] 20603952/20724079 32768 <nil>"
time="2023/04/27 08:16:55" level=debug msg="READ    [2] 20636720/20724079 32768 <nil>"
time="2023/04/27 08:16:55" level=debug msg="READ    [2] 20669488/20724079 32768 <nil>"
time="2023/04/27 08:16:55" level=debug msg="READ    [2] 20702256/20724079 32768 <nil>"
time="2023/04/27 08:16:55" level=debug msg="READ    [2] 20724079/20724079 21823 <nil>"
time="2023/04/27 08:16:55" level=debug msg="resetting remotedialer buffer id=2 to zero, old cap 12582912"
time="2023/04/27 08:16:55" level=debug msg="REQUEST 3665852071688446406 DATA         [2]: buffered"
time="2023/04/27 08:16:55" level=debug msg="ONDATA  [2] 20724079/20725256"
time="2023/04/27 08:16:55" level=debug msg="REQUEST 3665852071688446407 DATA         [2]: buffered"
time="2023/04/27 08:16:55" level=debug msg="ONDATA  [2] 20725256/20758024"
time="2023/04/27 08:16:55" level=debug msg="REQUEST 3665852071688446408 DATA         [2]: buffered"
time="2023/04/27 08:16:55" level=debug msg="ONDATA  [2] 20725256/20790792"

There's also a large amount of messages saying that the buffer was exceeded when viewing logs on the server.

time="2023/04/27 08:16:53" level=debug msg="remotedialer buffer exceeded id=2, length: 4203768"
time="2023/04/27 08:16:53" level=debug msg="ONDATA  [2] 12772400/16976168"
time="2023/04/27 08:16:53" level=debug msg="REQUEST 3665852071688443237 DATA         [2]: buffered"
time="2023/04/27 08:16:53" level=debug msg="remotedialer buffer exceeded id=2, length: 4204963"
time="2023/04/27 08:16:53" level=debug msg="ONDATA  [2] 12772400/16977363"
time="2023/04/27 08:16:53" level=debug msg="REQUEST 3665852071688443238 DATA         [2]: buffered"
time="2023/04/27 08:16:53" level=debug msg="remotedialer buffer exceeded id=2, length: 4206126"
time="2023/04/27 08:16:53" level=debug msg="ONDATA  [2] 12772400/16978526"
time="2023/04/27 08:16:53" level=debug msg="REQUEST 3665852071688443239 DATA         [2]: buffered"
time="2023/04/27 08:16:53" level=debug msg="remotedialer buffer exceeded id=2, length: 4207281"
time="2023/04/27 08:16:53" level=debug msg="ONDATA  [2] 12772400/16979681"
time="2023/04/27 08:16:53" level=debug msg="REQUEST 3665852071688443240 DATA         [2]: buffered"

This is what we can see in the metrics:

# TYPE session_server_total_add_websocket_session counter
session_server_total_add_websocket_session{clientkey="73372f0260954018b17a522d2701d440",peer="false"} 1
# HELP session_server_total_receive_bytes Total bytes received
# TYPE session_server_total_receive_bytes counter
session_server_total_receive_bytes{clientkey="73372f0260954018b17a522d2701d440"} 1.17589617e+08
# HELP session_server_total_transmit_bytes Total bytes transmitted
# TYPE session_server_total_transmit_bytes counter
session_server_total_transmit_bytes{clientkey="73372f0260954018b17a522d2701d440"} 316234

This looks like ~ 117.589617MB received and 300KB sent.

  1. Server opens two local ports
  2. Client connects to server and remotely forwards two ports to upstreams in its private network
  3. Script queries an LDAP daemon running on the server which pulls in ~ 100MB from the two ports
  4. 2nd running of script fails - just hangs

The above works multiple times with remote SSH tunnelling.

Any suggestions would be appreciated

it's better to support load-balancing in the case of HA

First of all, thx for the package for helping me understanding the tunnel of rancher. As the introduction says

This framework can hurt your head trying to conceptualize.

It's kind of confusing at first... Also I have to say, it's not easy to read a project without any necessary comments.

After reading through the code, I'm thinking load-balancing to support multiple server with multiple clients. In real world, maybe HA rancher with multiple cluster-agents. we can improve the throughput of the tunnel.

motivation

I deployed a rancher for testing the throughput. I found that k8s throughput is fine, but that of rancher is worse. In my env, TPS of request from rancher to backend cluster is about 4000, while k8s can support higher TPS.

Control plane throughput is not that important, due to we usually will have rate-limiter, and the heaviest part is data plane.

But is it better to support load-balancing in remotedialer?

proposal

session manager stores sessions of a client using an array. When generating a dialer, it uses sessions[0].

if multiple clients representing a same cluster has registered into server, we can use round robin or other lb algorithms.

If I don't misunderstands the code, I'm willing to contribute.

Potential refactoring: proper use of context in readbuffer

reabuffer.go code seems prone to leaks, race conditions and similar bugs because of its use of Cond (which is generally discouraged) and the lack of "typical" use of contexts.

Although I am not aware of any new specific bug, I believe a refactoring of this part could be beneficial for future readers given the crucial importance of this piece of architecture to the whole Rancher product family.

cc @aruiz14

Feature request: local tunnels

Hi Darren, thanks to your earlier PR, we're now using remotedialer in inlets and it seems to be performing well for most use-cases.

I have a potential client who has seen value in the remote port-forwarding that inlets gives - a local port in a private cluster becomes available on a public cluster or node. One of their requirements is for the opposite to be true, for a private endpoint in the remote public cluster to be made available in the private cluster / VM.

Today inlets is a little like ssh -R, where the SSH server is like "inlets server" and the ssh client issuing ssh -R is "inlets client".

So inlets client --upstream http://127.0.0.1:3000 --remote ws://remote-site makes port 3000 on my local machine available On "remote-site" on port 3000.

They are wanting this and the opposite where a private service on 127.0.0.1:3000 on the remote-site ("inlets server"), could be brought back to the ("inlets client") the private side.

I've played around with the code and have been able to inject some code int he client.go file to make it open a connection and send a string, but it's very hacky and a lot is missing.

Would you consider adding support for the client to maintain a session and for it to send data / Dial in the way that the server can today?

Alex

[rancher/rancher-agent] Unable to go through proxy

Hello,

I'm running Rancher behind a proxy, and it seems that I cannot tell the Rancher agent WebSocket connection to go through my proxy (nothing in my access log, proxy-side).

I suspect that this library is the cause.

  1. Can this library be proxied through an HTTP/S proxy ?
  2. Is there environment variable to provide to make it work ?

Thanks for your answers !

remote dialer does not use proxy env

What kind of request is this (question/bug/enhancement/feature request):
Bug

Steps to reproduce (least amount of steps as possible):
Import existing cluster into Rancher - with a proxy between that cluster and Rancher. Add to the agent yaml the required HTTP_PROXY, HTTPS_PROXY, NO_PROXY envs

Expected : Rancher cluster/node can contact Rancher and import is successful

Result:
Rancher agent wss calls fail, log
time="2019-12-19T09:39:08Z" level=info msg="Connecting to wss://rancher.foobar.com with token $token"
time="2019-12-19T09:39:08Z" level=info msg="Connecting to proxy" url="wss://rancher.foobar.com/v3/connect/register"
time="2019-12-19T09:39:18Z" level=error msg="Failed to connect to proxy. Empty dialer response" error="dial tcp :443: i/o timeout"
time="2019-12-19T09:39:18Z" level=error msg="Remotedialer proxy error" error="dial tcp $IP_OF_RANCHER:443: i/o timeout"

Other details that may be helpful:
This is the same issue as rancher repository #24681 and #22750 , its as well the same as #9 in the remotedialer repository sorry for opening a new one, but those are probably not clear enough.
The problem is in remotedialer/client.go line 26
dialer = &websocket.Dialer{HandshakeTimeout:HandshakeTimeOut}
should read instead
dialer = &websocket.Dialer{HandshakeTimeout:HandshakeTimeOut,Proxy: http.ProxyFromEnvironment}

Environment information

Rancher version (rancher/rancher/rancher/server image tag or shown bottom left in the UI):
2.3.4 single install

Please consider the condition of error printing buffer exceeded

Refer: harvester/harvester#1956
harvester/harvester#1956 (comment)

remotedialer buffer exceeded, length: error message is seen, comparing, the length is in a reasonable range, it does not exceeds 7 MB, not in sky high.

From our test, could you consider error printing when the size is over e.g. 8MB or 16Mb ?
When networking is extremely fast, the paus/resume mechanism may still let the data cached a bit more than expected.

time="2022-03-11T10:29:13Z" level=error msg="remotedialer buffer exceeded, length: 4227072"
time="2022-03-11T10:29:13Z" level=error msg="remotedialer buffer exceeded, length: 4259840"
time="2022-03-11T10:29:13Z" level=error msg="remotedialer buffer exceeded, length: 4292608"
time="2022-03-11T10:29:13Z" level=error msg="remotedialer buffer exceeded, length: 4325376"
..
time="2022-03-11T10:29:20Z" level=error msg="remotedialer buffer exceeded, length: 4259840"
time="2022-03-11T10:29:20Z" level=error msg="remotedialer buffer exceeded, length: 4292608"
time="2022-03-11T10:29:20Z" level=error msg="remotedialer buffer exceeded, length: 4325376"
time="2022-03-11T10:29:20Z" level=error msg="remotedialer buffer exceeded, length: 4358144"
time="2022-03-11T10:29:20Z" level=error msg="remotedialer buffer exceeded, length: 4390912"
time="2022-03-11T10:29:20Z" level=error msg="remotedialer buffer exceeded, length: 4423680"
..
time="2022-03-11T10:29:20Z" level=error msg="remotedialer buffer exceeded, length: 4685824"

Another point:
"remotedialer buffer exceeded" could be an warning or info, error seems too strict, it is self healing.

Thank you

Hello,

Found out about this amazing repo while digging through the code of inlets where I was trying to find a way to let unexperienced users for this project expose their own instance on the internet without any knowledge of port forwarding, creation of SSL certificates, ...

That is all amazing, thank you!

Release for 0.2.6

Hi Darren, it looks like HEAD (commits in Apr) is different from the latest release of 0.2.5 (made in Jan).

Could you cut a release for 0.2.6, so that it can be referenced in upstream projects by tag?

Alex

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.