Giter Club home page Giter Club logo

Comments (24)

urosgruber avatar urosgruber commented on July 29, 2024

If it helps anything. Here is tcpdump output

01:55:56.845104 IP (tos 0x0, ttl 64, id 54246, offset 0, flags [none], proto TCP (6), length 60, bad cksum 0 (->a8d3)!)
    localhost.24595 > localhost.6379: Flags [S], cksum 0x894f (correct), seq 1246980252, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 1139938987 ecr 0], length 0
01:55:56.845216 IP (tos 0x0, ttl 64, id 42741, offset 0, flags [none], proto TCP (6), length 60, bad cksum 0 (->d5c4)!)
    localhost.6379 > localhost.24595: Flags [S.], cksum 0xaee1 (correct), seq 24767610, ack 1246980253, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 2011919485 ecr 1139938987], length 0
01:55:56.845227 IP (tos 0x0, ttl 64, id 24903, offset 0, flags [none], proto TCP (6), length 40, bad cksum 0 (->1b87)!)
    localhost.24595 > localhost.6379: Flags [R], cksum 0x85ef (correct), seq 1246980253, win 0, length 0
01:57:01.350634 IP (tos 0x0, ttl 64, id 39193, offset 0, flags [none], proto TCP (6), length 60, bad cksum 0 (->e3a0)!)
    localhost.34341 > localhost.6379: Flags [S], cksum 0xfe1c (correct), seq 1362524936, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 1384207341 ecr 0], length 0
01:57:01.350764 IP (tos 0x0, ttl 64, id 12705, offset 0, flags [none], proto TCP (6), length 60, bad cksum 0 (->4b19)!)
    localhost.6379 > localhost.34341: Flags [S.], cksum 0x4a5a (correct), seq 2189045235, ack 1362524937, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 1923387551 ecr 1384207341], length 0
01:57:01.350781 IP (tos 0x0, ttl 64, id 22800, offset 0, flags [none], proto TCP (6), length 52, bad cksum 0 (->23b2)!)
    localhost.34341 > localhost.6379: Flags [.], cksum 0xae4e (correct), seq 1, ack 1, win 1275, options [nop,nop,TS val 1384207341 ecr 1923387551], length 0
01:57:01.350868 IP (tos 0x0, ttl 64, id 64981, offset 0, flags [none], proto TCP (6), length 67, bad cksum 0 (->7edd)!)
    localhost.34341 > localhost.6379: Flags [P.], cksum 0xf8c5 (correct), seq 1:16, ack 1, win 1275, options [nop,nop,TS val 1384207341 ecr 1923387551], length 15
01:57:01.351023 IP (tos 0x0, ttl 64, id 42171, offset 0, flags [none], proto TCP (6), length 57, bad cksum 0 (->d801)!)
    localhost.6379 > localhost.34341: Flags [P.], cksum 0x2dd6 (correct), seq 1:6, ack 16, win 1275, options [nop,nop,TS val 1923387551 ecr 1384207341], length 5

The first one is timeout, second is ok. As you see there is flag R (reset). Don't know why this is actually sent.

from lua-resty-redis.

agentzh avatar agentzh commented on July 29, 2024

@urosgruber Several notes:

  1. It's very possible that the connect() requests from your Nginx exceed the hard-coded backlog limit in the Redis server (which is a very small value, 511). Because it is your kernel dropping (SYN) packets, you won't see the dropped connections in your Redis's debug logs (which are on the user-land). You can try increasing the backlog limit in your Redis server or just scaling your Redis server.
  2. When the SYN packet is dropped, re-sending the packet can easily exceed your timeout setting in your Lua code (that is, 1 second).
  3. When you get the "attempt to send data on a closed socket" error message that means that the socket is closed due to an earlier error. You should see the actual error message before these error messages. Basically you should stop using the current socket if you see an error.
  4. If you already handle the socket errors yourself in your Lua code, it's recommended to turn off the lua_socket_log_errors directive: https://github.com/chaoslawful/lua-nginx-module#lua_socket_log_errors
  5. Because you are on the FreeBSD system, you cannot directly use the diagnosing tools in my Nginx Systemtap Toolkit and stap++. You may consider porting some of the relevant tools in these two projects over to FreeBSD's dtrace port:

https://github.com/agentzh/nginx-systemtap-toolkit#readme

https://github.com/agentzh/stapxx#samples

from lua-resty-redis.

agentzh avatar agentzh commented on July 29, 2024

And I should add that, when connect() is timed out, nginx will close the current connection right away and subsequent (late) response packets from the Redis server side will yield RST.

from lua-resty-redis.

urosgruber avatar urosgruber commented on July 29, 2024
  1. Redis server is actualy doing nothing at all. This is test server when only client is me. Would this still be the problem with backlog limit?
  2. I'll try with 10s
  3. I add this setting to Off

Btw is it better to have set_keepalive for redis client or to disable keepalive?

from lua-resty-redis.

agentzh avatar agentzh commented on July 29, 2024

Hello!

On Sun, Nov 10, 2013 at 5:36 PM, Uroš Gruber wrote:

Redis server is actualy doing nothing at all. This is test server
when only client is me. Would this still be the problem with backlog
limit?

Yes, of course. This has actually been a commen problem :) The Redis
server is a single-threaded process while Nginx can have multiple
worker processes, which means Nginx can usually have way more CPU
resources to exhaust the Redis server.

Btw is it better to have set_keepalive for redis client or to disable keepalive?

Enabling the Redis connection pool on the client side (i.e., in Nginx)
can save the overhead of TCP handshakes during connect() and close(),
which can definitely help. But it's still possible (and easy) to
exhaust the Redis server with Redis queries BTW.

Regards,
-agentzh

from lua-resty-redis.

urosgruber avatar urosgruber commented on July 29, 2024

With timeout set to 10s things are looking better. I also check that backlog limit I have on this server is probably a bit low kern.ipc.somaxconn: 128

and checking netstat -Lan

gives me this

tcp4  0/0/128        192.168.168.150.80
tcp4  0/0/128        172.20.0.1.5000
tcp4  0/0/128        172.20.0.3.5000
tcp4  0/0/128        127.0.0.1.6379

80 is frontend proxy nginx and at 5000 are backend servers.

from lua-resty-redis.

urosgruber avatar urosgruber commented on July 29, 2024

Hm, and the issue is back. Server was in idle. There was no open connections, no pf states. When I refreshed the webpage I received 500 errror, checked the log and again timeout problem.

from lua-resty-redis.

urosgruber avatar urosgruber commented on July 29, 2024

I also try socket configuration and connection throug /tmp/redis.sock and it works without any issues.

Btw increased kern.ipc.somaxconn=1024 does make changes to how much backlog limit is set for redis and it's now 512, but still does not help. Random timeouts on idle server still happened.

from lua-resty-redis.

agentzh avatar agentzh commented on July 29, 2024

Another possibility for timeout issues is that your Nginx worker process's event loop is heavily blocked by something. You can try porting the epoll-loop-blocking-distr tool over to kqueue and FreeBSD:

https://github.com/agentzh/stapxx#epoll-loop-blocking-distr

and the off-CPU flamegraph tool over FreeBSD's dtrace too:

https://github.com/agentzh/nginx-systemtap-toolkit#sample-bt-off-cpu

And it's wise to monitor the latency in the accept() and recv() queues in your kernel, for example, see the tools for Linux:

https://github.com/agentzh/nginx-systemtap-toolkit#tcp-accept-queue

https://github.com/agentzh/nginx-systemtap-toolkit#tcp-recv-queue

Also you can write your own custom dtrace scripts to trace various user-land and kernel-space events and socket states associated with those timeout connections on-the-fly on your side.

Also please distinguish timeout errors in different operations like connect(), send(), and receive(). They mean different things :)

from lua-resty-redis.

urosgruber avatar urosgruber commented on July 29, 2024

For now I'll try to stick with /tmp/redis.sock becaue I don't have enough knowlage for DTrace. Btw a lot of nice tools you have there :)

I also need to try the same setup with some other servers.

from lua-resty-redis.

xidianwlc avatar xidianwlc commented on July 29, 2024

你好,春哥:
我最近测试redis连接池的部分,直接使用local ok, err = red:set_keepalive(10000, 100) 后进行ab压力测试,发现times, err = red:get_reused_times() 这个方法返回一直是times=0,日志一直报错:lua tcp socket connect timed out, client: 192.168.6.7 ,查看测试服务器到redis端口的连接数发现特别多,初步怀疑lua 的redis 连接池是不是有问题。我测试服务器为centos6.5 , nginx使用的是openresty最新版

from lua-resty-redis.

agentzh avatar agentzh commented on July 29, 2024

@xidianwlc Ah, please no Chinese here. This place is English only. If you really want to use Chinese, please join the openresty (Chinese) mailing list instead. Please see https://openresty.org/#Community

Regarding your question, it's already an FAQ. See my (Chinese) replies (to similar questions from others) on the aforementioned mailing list:

https://groups.google.com/d/msg/openresty/e-r69KtAWek/wJ3cdzxluhUJ

https://groups.google.com/d/msg/openresty/h3l6jAo3aD0/UvQGlF77cUwJ

from lua-resty-redis.

xidianwlc avatar xidianwlc commented on July 29, 2024

hi, agentzh:
I try to use "local ok, err = red:set_keepalive(10000, 100)" in lua script and use "times, err = red:get_reused_times()" to check whether if the connection pool was used . But unexpected , times were always 0 and the error info example are "lua tcp socket connect timed out, client: 192.168.6.7 " .my nginx version is ngx_openresty-1.7.10.1 .
mysql test tool is ab and the command is ab -r -k -n 20000 -c 10000 "http://192.168.6.7:8088/redis/mget?key=bk_1006725"

from lua-resty-redis.

agentzh avatar agentzh commented on July 29, 2024

@xidianwlc When you see "connect timed out", it usually means

  1. your redis server is too busy to accept new connections and the redis server's TCP stack is dropping packets when the accept() queue is full, or
  2. your nginx server is too busy to respond to new I/O events (like doing CPU intensive computations or blocking on system calls)

You can consider

  1. scale your redis backend to multiple nodes so as to utilize more CPU resources,
  2. increase the backlog setting in your redis.conf file (the default is very small, which determines the length limit of your accept() queue on your redis server).
  3. check whether your nginx server is too busy doing CPU intensive work or blocking system calls (like disk I/O syscalls) by using the flame graph tools: https://openresty.org/#Profiling
  4. increase the timeout limit in your Lua code. Maybe you specify a too short timeout value on the client side?
  5. automatically retry connect() in your Lua code for one more time when your connect() call fails.

from lua-resty-redis.

agentzh avatar agentzh commented on July 29, 2024

@xidianwlc Regarding the get_reused_times always returning 0 problem, have you checked the return values of set_keepalive? You should always do proper error handling in Lua there (and elsewhere).

My hunch is that you always fail to put your connections back to the connection pool in the first place so that your pool is always empty.

from lua-resty-redis.

xidianwlc avatar xidianwlc commented on July 29, 2024

@agentzh
local ok, err = red:set_keepalive(10000, 800)
if not ok then
FORMAT_RETURN.msg="failed to set keepalive: "..err
local value=cjson.encode(FORMAT_RETURN)
ngx.say(value)
ngx.log(ngx.ERR, value)
return
end

there were no nginx error log for this

from lua-resty-redis.

xidianwlc avatar xidianwlc commented on July 29, 2024

@agentzh I find some articles from https://groups.google.com/forum/#!topic/openresty/h3l6jAo3aD0
in the article you said if connections are bigger than connection pool ,it will use short connection

from lua-resty-redis.

agentzh avatar agentzh commented on July 29, 2024

@xidianwlc Isn't it already stated in the mailing list posts I mentioned earlier?

https://groups.google.com/d/msg/openresty/e-r69KtAWek/wJ3cdzxluhUJ

https://groups.google.com/d/msg/openresty/h3l6jAo3aD0/UvQGlF77cUwJ

Alas. you didn't read my comments carefully.

from lua-resty-redis.

agentzh avatar agentzh commented on July 29, 2024

@xidianwlc It's also officially documented:

https://github.com/openresty/lua-nginx-module#tcpsocksetkeepalive

"When the connection pool exceeds the available size limit, the least recently used (idle) connection already in the pool will be closed to make room for the current connection."

from lua-resty-redis.

RobinTsai avatar RobinTsai commented on July 29, 2024

I also met the error info "lua tcp socket connect timed out, when connecting to xxx:xxx"(xxx:xxx is the master redis server). We group had been working on it for several days. I checked a lot issues but there were no resolution. At last we used connection pool to decrease this error - but not fixed it.

It's on production env, and I'm sure the configure is no mistaken. On 3 different prod envs, one gived big chances of this error msg, the others are rarely. This one only have 2 cores (another one 8 cores and used 4 cores for thie nginx) but other sys config are totally same. And also same as redis. There are some info:

  1. We set 100ms connect timeout when connect to redis master. When we adjust it up to 5s, the error just be less but not missing. (the other envs also set 100ms connection timeout, but there are not so much errors)
  2. I capture the TCP (only when the connect timeout setting is 100ms), there are all normal TCP connection record: 3-handshakes, select db, get data, 4-waves. (Maybe the timeout is much little there are no retransmission. So I even thought the cpu just not process this event for the timeout-setting time.)

from lua-resty-redis.

zhuizhuhaomeng avatar zhuizhuhaomeng commented on July 29, 2024

is the timeout issue occurred when you capture the TCP?

from lua-resty-redis.

RobinTsai avatar RobinTsai commented on July 29, 2024

Nop, there were not any clue for connection timeout issue.

There were no SYN retransmition. And no packet just abnormal at SYN.

from lua-resty-redis.

zhuizhuhaomeng avatar zhuizhuhaomeng commented on July 29, 2024

What did you mean by "abnormal at SYN"?

from lua-resty-redis.

RobinTsai avatar RobinTsai commented on July 29, 2024

If there were connect timeout, I thought, there might:

  1. Have SYN Retransmit packets. >>> But there were not.
  2. Have RST at the TCP stream. >>> But there were not.
  3. Or, there might have incomplete 3-handshakes at TCP stream. >>> But there were not.

The TCP streams what I saw all were normal as expected.

from lua-resty-redis.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.