Comments (24)
If it helps anything. Here is tcpdump output
01:55:56.845104 IP (tos 0x0, ttl 64, id 54246, offset 0, flags [none], proto TCP (6), length 60, bad cksum 0 (->a8d3)!)
localhost.24595 > localhost.6379: Flags [S], cksum 0x894f (correct), seq 1246980252, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 1139938987 ecr 0], length 0
01:55:56.845216 IP (tos 0x0, ttl 64, id 42741, offset 0, flags [none], proto TCP (6), length 60, bad cksum 0 (->d5c4)!)
localhost.6379 > localhost.24595: Flags [S.], cksum 0xaee1 (correct), seq 24767610, ack 1246980253, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 2011919485 ecr 1139938987], length 0
01:55:56.845227 IP (tos 0x0, ttl 64, id 24903, offset 0, flags [none], proto TCP (6), length 40, bad cksum 0 (->1b87)!)
localhost.24595 > localhost.6379: Flags [R], cksum 0x85ef (correct), seq 1246980253, win 0, length 0
01:57:01.350634 IP (tos 0x0, ttl 64, id 39193, offset 0, flags [none], proto TCP (6), length 60, bad cksum 0 (->e3a0)!)
localhost.34341 > localhost.6379: Flags [S], cksum 0xfe1c (correct), seq 1362524936, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 1384207341 ecr 0], length 0
01:57:01.350764 IP (tos 0x0, ttl 64, id 12705, offset 0, flags [none], proto TCP (6), length 60, bad cksum 0 (->4b19)!)
localhost.6379 > localhost.34341: Flags [S.], cksum 0x4a5a (correct), seq 2189045235, ack 1362524937, win 65535, options [mss 16344,nop,wscale 6,sackOK,TS val 1923387551 ecr 1384207341], length 0
01:57:01.350781 IP (tos 0x0, ttl 64, id 22800, offset 0, flags [none], proto TCP (6), length 52, bad cksum 0 (->23b2)!)
localhost.34341 > localhost.6379: Flags [.], cksum 0xae4e (correct), seq 1, ack 1, win 1275, options [nop,nop,TS val 1384207341 ecr 1923387551], length 0
01:57:01.350868 IP (tos 0x0, ttl 64, id 64981, offset 0, flags [none], proto TCP (6), length 67, bad cksum 0 (->7edd)!)
localhost.34341 > localhost.6379: Flags [P.], cksum 0xf8c5 (correct), seq 1:16, ack 1, win 1275, options [nop,nop,TS val 1384207341 ecr 1923387551], length 15
01:57:01.351023 IP (tos 0x0, ttl 64, id 42171, offset 0, flags [none], proto TCP (6), length 57, bad cksum 0 (->d801)!)
localhost.6379 > localhost.34341: Flags [P.], cksum 0x2dd6 (correct), seq 1:6, ack 16, win 1275, options [nop,nop,TS val 1923387551 ecr 1384207341], length 5
The first one is timeout, second is ok. As you see there is flag R (reset). Don't know why this is actually sent.
from lua-resty-redis.
@urosgruber Several notes:
- It's very possible that the
connect()
requests from your Nginx exceed the hard-coded backlog limit in the Redis server (which is a very small value, 511). Because it is your kernel dropping (SYN) packets, you won't see the dropped connections in your Redis's debug logs (which are on the user-land). You can try increasing the backlog limit in your Redis server or just scaling your Redis server. - When the SYN packet is dropped, re-sending the packet can easily exceed your timeout setting in your Lua code (that is, 1 second).
- When you get the "attempt to send data on a closed socket" error message that means that the socket is closed due to an earlier error. You should see the actual error message before these error messages. Basically you should stop using the current socket if you see an error.
- If you already handle the socket errors yourself in your Lua code, it's recommended to turn off the
lua_socket_log_errors
directive: https://github.com/chaoslawful/lua-nginx-module#lua_socket_log_errors - Because you are on the FreeBSD system, you cannot directly use the diagnosing tools in my Nginx Systemtap Toolkit and stap++. You may consider porting some of the relevant tools in these two projects over to FreeBSD's dtrace port:
https://github.com/agentzh/nginx-systemtap-toolkit#readme
https://github.com/agentzh/stapxx#samples
from lua-resty-redis.
And I should add that, when connect() is timed out, nginx will close the current connection right away and subsequent (late) response packets from the Redis server side will yield RST.
from lua-resty-redis.
- Redis server is actualy doing nothing at all. This is test server when only client is me. Would this still be the problem with backlog limit?
- I'll try with 10s
- I add this setting to Off
Btw is it better to have set_keepalive for redis client or to disable keepalive?
from lua-resty-redis.
Hello!
On Sun, Nov 10, 2013 at 5:36 PM, Uroš Gruber wrote:
Redis server is actualy doing nothing at all. This is test server
when only client is me. Would this still be the problem with backlog
limit?
Yes, of course. This has actually been a commen problem :) The Redis
server is a single-threaded process while Nginx can have multiple
worker processes, which means Nginx can usually have way more CPU
resources to exhaust the Redis server.
Btw is it better to have set_keepalive for redis client or to disable keepalive?
Enabling the Redis connection pool on the client side (i.e., in Nginx)
can save the overhead of TCP handshakes during connect() and close(),
which can definitely help. But it's still possible (and easy) to
exhaust the Redis server with Redis queries BTW.
Regards,
-agentzh
from lua-resty-redis.
With timeout set to 10s things are looking better. I also check that backlog limit I have on this server is probably a bit low kern.ipc.somaxconn: 128
and checking netstat -Lan
gives me this
tcp4 0/0/128 192.168.168.150.80
tcp4 0/0/128 172.20.0.1.5000
tcp4 0/0/128 172.20.0.3.5000
tcp4 0/0/128 127.0.0.1.6379
80 is frontend proxy nginx and at 5000 are backend servers.
from lua-resty-redis.
Hm, and the issue is back. Server was in idle. There was no open connections, no pf states. When I refreshed the webpage I received 500 errror, checked the log and again timeout problem.
from lua-resty-redis.
I also try socket configuration and connection throug /tmp/redis.sock and it works without any issues.
Btw increased kern.ipc.somaxconn=1024 does make changes to how much backlog limit is set for redis and it's now 512, but still does not help. Random timeouts on idle server still happened.
from lua-resty-redis.
Another possibility for timeout issues is that your Nginx worker process's event loop is heavily blocked by something. You can try porting the epoll-loop-blocking-distr tool over to kqueue and FreeBSD:
https://github.com/agentzh/stapxx#epoll-loop-blocking-distr
and the off-CPU flamegraph tool over FreeBSD's dtrace too:
https://github.com/agentzh/nginx-systemtap-toolkit#sample-bt-off-cpu
And it's wise to monitor the latency in the accept() and recv() queues in your kernel, for example, see the tools for Linux:
https://github.com/agentzh/nginx-systemtap-toolkit#tcp-accept-queue
https://github.com/agentzh/nginx-systemtap-toolkit#tcp-recv-queue
Also you can write your own custom dtrace scripts to trace various user-land and kernel-space events and socket states associated with those timeout connections on-the-fly on your side.
Also please distinguish timeout errors in different operations like connect(), send(), and receive(). They mean different things :)
from lua-resty-redis.
For now I'll try to stick with /tmp/redis.sock becaue I don't have enough knowlage for DTrace. Btw a lot of nice tools you have there :)
I also need to try the same setup with some other servers.
from lua-resty-redis.
你好,春哥:
我最近测试redis连接池的部分,直接使用local ok, err = red:set_keepalive(10000, 100) 后进行ab压力测试,发现times, err = red:get_reused_times() 这个方法返回一直是times=0,日志一直报错:lua tcp socket connect timed out, client: 192.168.6.7 ,查看测试服务器到redis端口的连接数发现特别多,初步怀疑lua 的redis 连接池是不是有问题。我测试服务器为centos6.5 , nginx使用的是openresty最新版
from lua-resty-redis.
@xidianwlc Ah, please no Chinese here. This place is English only. If you really want to use Chinese, please join the openresty (Chinese) mailing list instead. Please see https://openresty.org/#Community
Regarding your question, it's already an FAQ. See my (Chinese) replies (to similar questions from others) on the aforementioned mailing list:
https://groups.google.com/d/msg/openresty/e-r69KtAWek/wJ3cdzxluhUJ
https://groups.google.com/d/msg/openresty/h3l6jAo3aD0/UvQGlF77cUwJ
from lua-resty-redis.
hi, agentzh:
I try to use "local ok, err = red:set_keepalive(10000, 100)" in lua script and use "times, err = red:get_reused_times()" to check whether if the connection pool was used . But unexpected , times were always 0 and the error info example are "lua tcp socket connect timed out, client: 192.168.6.7 " .my nginx version is ngx_openresty-1.7.10.1 .
mysql test tool is ab and the command is ab -r -k -n 20000 -c 10000 "http://192.168.6.7:8088/redis/mget?key=bk_1006725"
from lua-resty-redis.
@xidianwlc When you see "connect timed out", it usually means
- your redis server is too busy to accept new connections and the redis server's TCP stack is dropping packets when the accept() queue is full, or
- your nginx server is too busy to respond to new I/O events (like doing CPU intensive computations or blocking on system calls)
You can consider
- scale your redis backend to multiple nodes so as to utilize more CPU resources,
- increase the backlog setting in your
redis.conf
file (the default is very small, which determines the length limit of your accept() queue on your redis server). - check whether your nginx server is too busy doing CPU intensive work or blocking system calls (like disk I/O syscalls) by using the flame graph tools: https://openresty.org/#Profiling
- increase the timeout limit in your Lua code. Maybe you specify a too short timeout value on the client side?
- automatically retry
connect()
in your Lua code for one more time when yourconnect()
call fails.
from lua-resty-redis.
@xidianwlc Regarding the get_reused_times
always returning 0 problem, have you checked the return values of set_keepalive
? You should always do proper error handling in Lua there (and elsewhere).
My hunch is that you always fail to put your connections back to the connection pool in the first place so that your pool is always empty.
from lua-resty-redis.
@agentzh
local ok, err = red:set_keepalive(10000, 800)
if not ok then
FORMAT_RETURN.msg="failed to set keepalive: "..err
local value=cjson.encode(FORMAT_RETURN)
ngx.say(value)
ngx.log(ngx.ERR, value)
return
end
there were no nginx error log for this
from lua-resty-redis.
@agentzh I find some articles from https://groups.google.com/forum/#!topic/openresty/h3l6jAo3aD0
in the article you said if connections are bigger than connection pool ,it will use short connection
from lua-resty-redis.
@xidianwlc Isn't it already stated in the mailing list posts I mentioned earlier?
https://groups.google.com/d/msg/openresty/e-r69KtAWek/wJ3cdzxluhUJ
https://groups.google.com/d/msg/openresty/h3l6jAo3aD0/UvQGlF77cUwJ
Alas. you didn't read my comments carefully.
from lua-resty-redis.
@xidianwlc It's also officially documented:
https://github.com/openresty/lua-nginx-module#tcpsocksetkeepalive
"When the connection pool exceeds the available size limit, the least recently used (idle) connection already in the pool will be closed to make room for the current connection."
from lua-resty-redis.
I also met the error info "lua tcp socket connect timed out, when connecting to xxx:xxx"(xxx:xxx is the master redis server). We group had been working on it for several days. I checked a lot issues but there were no resolution. At last we used connection pool to decrease this error - but not fixed it.
It's on production env, and I'm sure the configure is no mistaken. On 3 different prod envs, one gived big chances of this error msg, the others are rarely. This one only have 2 cores (another one 8 cores and used 4 cores for thie nginx) but other sys config are totally same. And also same as redis. There are some info:
- We set 100ms connect timeout when connect to redis master. When we adjust it up to 5s, the error just be less but not missing. (the other envs also set 100ms connection timeout, but there are not so much errors)
- I capture the TCP (only when the connect timeout setting is 100ms), there are all normal TCP connection record: 3-handshakes, select db, get data, 4-waves. (Maybe the timeout is much little there are no retransmission. So I even thought the cpu just not process this event for the timeout-setting time.)
from lua-resty-redis.
is the timeout issue occurred when you capture the TCP?
from lua-resty-redis.
Nop, there were not any clue for connection timeout issue.
There were no SYN retransmition. And no packet just abnormal at SYN.
from lua-resty-redis.
What did you mean by "abnormal at SYN"?
from lua-resty-redis.
If there were connect timeout, I thought, there might:
- Have SYN Retransmit packets. >>> But there were not.
- Have RST at the TCP stream. >>> But there were not.
- Or, there might have incomplete 3-handshakes at TCP stream. >>> But there were not.
The TCP streams what I saw all were normal as expected.
from lua-resty-redis.
Related Issues (20)
- How to connect redisDB with user id and password? HOT 1
- Confusing instructions for installation on README.md
- redis:connect is taking exactly 60 seconds for connection
- SSL not working
- The performance of redis pipeline mode is unstable HOT 3
- CLIENT SETNAME?
- redis:get(key) question HOT 2
- opm package outdated HOT 1
- Connection pool not creating HOT 2
- How to create a single redis connection HOT 5
- this support two-way authentication?
- Add keepalive for redis client to make the connections reliable
- Nginx 1.24 HOT 2
- Nginx creates more connection than poolsize, backlog to redis HOT 5
- lua tcp socket read timed out, context: ssl_certificate_by_lua*, client: 172.69.58.204, server: 0.0.0.0:443 HOT 2
- redis set_keepalive doesn't work HOT 1
- lua entry thread aborted: runtime error: /usr/local/openresty/lualib/resty/redis.lua:357: bad request HOT 1
- no request found HOT 1
- Add the host:port to the error message
- Reading from client: error:0A000126:SSL routines::unexpected eof while reading
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lua-resty-redis.