projecteru / redis-cerberus Goto Github PK
View Code? Open in Web Editor NEWRedis Cluster Proxy
License: MIT License
Redis Cluster Proxy
License: MIT License
是否可以动态设置read_slave?
原因是,在初始化启动的时候设置了read_slave yes,但在高峰期,想动态设置部分代理read_slave no,目的让读请求分摊到master上,减轻slave的压力。
$ g++ --version
g++ (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ make COMPILER=g++ STATIC_LINK=1
clang++ -c -std=c++0x -D_XOPEN_SOURCE -DELPP_THREAD_SAFE -Wall -Wextra -Wold-style-cast -Werror -O3 -DELPP_DISABLE_DEBUG_LOGS -DELPP_NO_DEFAULT_LOG_FILE -I. core/command.cpp -o ./build/command.o
core/command.cpp:261:25: error: no member named 'accumulate' in namespace 'std'
return std::accumulate(
~~~~~^
1 error generated.
# 然后我在core/command.cpp里插入 #include <numeric> 解决了,但还报如下错:
In file included from core/slot_map.cpp:5:0:
core/slot_map.hpp:83:31: error: 'std::vector' has not been declared
void replace_map(std::vector<RedisNode> const& nodes, Proxy* proxy);
^~~~~~
core/slot_map.hpp:83:37: error: expected ',' or '...' before '<' token
void replace_map(std::vector<RedisNode> const& nodes, Proxy* proxy);
^
core/slot_map.hpp:90:10: error: 'vector' in namespace 'std' does not name a template type
std::vector<RedisNode> parse_slot_map(std::string const& nodes_info,
^~~~~~
core/slot_map.cpp:56:6: error: prototype for 'void cerb::SlotMap::replace_map(const std::vector<cerb::RedisNode>&, cerb::Proxy*)' does not match any in class 'cerb::SlotMap'
void SlotMap::replace_map(std::vector<RedisNode> const& nodes, Proxy* proxy)
^~~~~~~
In file included from core/slot_map.cpp:5:0:
core/slot_map.hpp:83:14: error: candidate is: void cerb::SlotMap::replace_map(int)
void replace_map(std::vector<RedisNode> const& nodes, Proxy* proxy);
^~~~~~~~~~~
core/slot_map.cpp:111:76: error: 'std::vector<cerb::RedisNode> cerb::parse_slot_map(const string&, const string&)' should have been declared inside 'cerb'
std::string const& default_host)
^
/tmp/tmp.s7Hv765rf0:8: recipe for target 'build/slot_map.o' failed
redis-benchmark -h 127.0.0.1 -p 8889 -P 10 -t get,set -c 1000 -d 10000 -l -q
可以很容易令单线程 cerberus 崩溃.
崩溃通常发生在一组 set 请求测试结束, get 请求刚刚开始的时候.
这并不是因为新开始的 get 请求会令 cerberus 崩溃, 而是在 set 请求结束时, 一部分客户端会直接关闭 socket; cerberus 会判定这些 socket 对应的客户端及这些客户端的缓冲区失效 (内存被回收), 但是, 如果这些缓冲区在上一次 writev
调用中没有全部写出, 下一次 writev
继续写出就会引起崩溃.
通常的 web 应用在压力不高的情况下不会引起崩溃, 因为
writev
足够写出Buffer::writev
强制连续写出有的业务需要 auth,能否在 proxy 层实现一下 auth 命令呢?passwd 写在 cerberus 的配置文件里就可以,选择是否开启
我可以提个 pr
gdb 日志:
Core was generated by `./cerberus example.conf'.
Program terminated with signal SIGABRT, Aborted.
#0 0x0000003a47432625 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x0000003a47432625 in raise () from /lib64/libc.so.6
#1 0x0000003a47433e05 in abort () from /lib64/libc.so.6
#2 0x0000003a4a46007d in __gnu_cxx::__verbose_terminate_handler () at ../../.././libstdc++-v3/libsupc++/vterminate.cc:95
#3 0x0000003a4a45e0e6 in __cxxabiv1::__terminate (handler=) at ../../.././libstdc++-v3/libsupc++/eh_terminate.cc:47
#4 0x0000003a4a45e131 in std::terminate () at ../../.././libstdc++-v3/libsupc++/eh_terminate.cc:57
#5 0x00000000005236ff in std::thread::~thread() ()
#6 0x0000000000523be2 in std::default_deletestd::thread::operator()(std::thread*) const ()
#7 0x00000000005253a9 in std::unique_ptr<std::thread, std::default_deletestd::thread >::~unique_ptr() ()
#8 0x0000000000525250 in util::sptrstd::thread::~sptr() ()
#9 0x0000000000526976 in cerb::ListenThread::~ListenThread() ()
#10 0x000000000052699c in void std::_Destroycerb::ListenThread(cerb::ListenThread*) ()
#11 0x000000000052693a in void std::_Destroy_aux::__destroycerb::ListenThread*(cerb::ListenThread*, cerb::ListenThread*) ()
#12 0x0000000000526912 in void std::_Destroycerb::ListenThread*(cerb::ListenThread*, cerb::ListenThread*) ()
#13 0x00000000005268ed in void std::_Destroy<cerb::ListenThread*, cerb::ListenThread>(cerb::ListenThread*, cerb::ListenThread*, std::allocatorcerb::ListenThread&) ()
#14 0x00000000005268a9 in std::vector<cerb::ListenThread, std::allocatorcerb::ListenThread >::~vector() ()
#15 0x0000003a47435b22 in exit () from /lib64/libc.so.6
#16 0x000000000052772d in (anonymous namespace)::exit_on_int(int) ()
#17
#18 0x0000003a478082fb in pthread_join () from /lib64/libpthread.so.0
#19 0x0000003a4a4bb627 in __gthread_join (__value_ptr=0x0, __threadid=)
at /root/tmp/gcc-4.9.3/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits/gthr-default.h:668
#20 std::thread::join (this=) at ../../../.././libstdc++-v3/src/c++11/thread.cc:107
#21 0x00000000005224b0 in cerb::ListenThread::join() ()
#22 0x000000000052828e in (anonymous namespace)::run((anonymous namespace)::Configuration const&) ()
#23 0x0000000000528839 in main ()
个人认为由cerb_global::all_threads是全局变量所致,ctrl+c导致exit被调用, vector调用析构,而此时线程并未结束。建议将all_threads存储ListenThread指针而非ListenThread对象,当然也可去掉捕获SIGINT信号以消除此dump. 作者以为如何?
master宕机,cerberus就会重新获取分布,直到所有的SlotsUpdater都判断到fail,如果full_coverage选项为yes的话,proxy会把retry缓存都返回cluster down。是不是可理解为,如果新的slave升为master很慢的话,这段时间内的写指令并不会被缓存重试
在core/client.cpp里面的Client::_push_awaitings_to_ready()
void Client::_push_awaitings_to_ready()
{
if (this->_awaiting_count != 0 || (
!this->_ready_groups.empty() &&
this->_awaiting_groups.size() + this->_ready_groups.empty() > MAX_RESPONSES
))
{
return;
}
for (util::sptr<CommandGroup>& g: this->_awaiting_groups) {
g->append_buffer_to(this->_output_buffer_set);
this->_ready_groups.push_back(std::move(g));
}
this->_awaiting_groups.clear();
if (!this->_output_buffer_set.empty()) {
this->_proxy->set_conn_poll_rw(this);
}
}
这个函数把_awaiting_groups里面的命令放到_ready_groups里面,第一个if,
当_awaiting_count的数量不为0的时候不做操作,
或者
_ready_groups不为空,并且
this->_awaiting_groups.size() + this->_ready_groups.empty() > MAX_RESPONSES
_ready_groups.empty()->false->0这个是不是错了?应该是
this->_awaiting_groups.size() + this->_ready_groups.size() > MAX_RESPONSES
这里的意思应该是一次写到output_buffer的数量最大为MAX_RESPONSES, 当_awaiting_groups的数量+_ready_groups的数量大于MAX_RESPONSES的时候就先等reay的处理完再处理awaiting_groups的。
代码里的cppformat就是fmtlib/fmt库吧,想问一下为何单独放进项目呢?作为submodule引入岂不更好?另外,代码里用fmt的地方真不多呢
在线上使用redis-cerberus的时候发现cerberus有的时候会有小概率挂掉的情况。我们排查了一下,挂掉的原因是一个segment fault
segmentfault的日志:
2017-02-19 18:41:31,669 I 140625852184320 Slot map updated
Segmentation fault:
/opt/tiger/twemproxy_bin/bin/cerberus 45f76c trac::stacktrace() 29
/opt/tiger/twemproxy_bin/bin/cerberus 45f997 trac::print_trace(std::ostream&) 19
/opt/tiger/twemproxy_bin/bin/cerberus 462dc9 0
/lib/x86_64-linux-gnu/libpthread.so.0 aa4368d0 f8d0
/opt/tiger/twemproxy_bin/bin/cerberus 454355 cerb::Server::pop_client(cerb::Client*) 165
/opt/tiger/twemproxy_bin/bin/cerberus 43a41c cerb::Client::~Client() 2c
/opt/tiger/twemproxy_bin/bin/cerberus 43a58d cerb::Client::after_events(std::set<cerb::Connection*, std::less<cerb::Connection*>, std::allocator<cerb::Connection*> >&) 2d
/opt/tiger/twemproxy_bin/bin/cerberus 44e3ff cerb::Proxy::handle_events(epoll_event*, int) 30f
/opt/tiger/twemproxy_bin/bin/cerberus 446a62 0
/opt/tiger/twemproxy_bin/bin/cerberus 446f5d 0
/usr/lib/x86_64-linux-gnu/libstdc++.so.6 aa1d2970 b6970
/lib/x86_64-linux-gnu/libpthread.so.0 aa42f0a4 80a4
/lib/x86_64-linux-gnu/libc.so.6 a994287d clone 6d
terminate called without an active exception
Cerberus version 0.7.9-2016-08-18 Copyright (c) HunanTV Platform developers
在日志中能看到是在pop_client的时候挂掉的。
因为线上挂掉的概率小,又不能加-g参数,所以加了日志:
void Server::pop_client(Client* cli)
{
LOG(INFO) << "Server::pop_client, before erase_if";
util::erase_if(
this->_commands,
[&](util::sref<DataCommand> cmd)
{
return cmd->group->client.is(cli);
});
LOG(INFO) << "Server::pop_client, after erase_if";
for (util::sref<DataCommand>& cmd: this->_sent_commands) {
LOG(INFO) << "Server::pop_client, in for";
if(cmd.not_nul()){
LOG(INFO) << "Server::pop_client, cmd->group:" << cmd->group.nul();
LOG(INFO) << "Server::pop_client, cmd->group->client:" << cmd->group->client.nul();
}
if (cmd.not_nul() && cmd->group->client.is(cli)) {
LOG(INFO) << "Server::pop_client, in if";
cmd.reset();
}
}
}
如下:
2017-02-21 05:39:22,306 I 140140096595712 Server::pop_client, before erase_if
2017-02-21 05:39:22,306 I 140140096595712 Server::pop_client, after erase_if
2017-02-21 05:39:22,306 I 140140096595712 Server::pop_client, in for
2017-02-21 05:39:22,306 I 140140096595712 Server::pop_client, cmd->group:0
Segmentation fault:
/opt/tiger/twemproxy_bin/bin/cerberus 4600dc trac::stacktrace() 29
/opt/tiger/twemproxy_bin/bin/cerberus 460307 trac::print_trace(std::ostream&) 19
/opt/tiger/twemproxy_bin/bin/cerberus 463739 0
/lib/x86_64-linux-gnu/libpthread.so.0 eaeb28d0 f8d0
/opt/tiger/twemproxy_bin/bin/cerberus 456069 cerb::Server::pop_client(cerb::Client*) 4c9
/opt/tiger/twemproxy_bin/bin/cerberus 43a47c cerb::Client::~Client() 2c
/opt/tiger/twemproxy_bin/bin/cerberus 43a5ed cerb::Client::after_events(std::set<cerb::Connection*, std::less<cerb::Connecti
on*>, std::allocator<cerb::Connection*> >&) 2d
/opt/tiger/twemproxy_bin/bin/cerberus 44e45f cerb::Proxy::handle_events(epoll_event*, int) 30f
/opt/tiger/twemproxy_bin/bin/cerberus 446ac2 0
/opt/tiger/twemproxy_bin/bin/cerberus 446fbd 0
/usr/lib/x86_64-linux-gnu/libstdc++.so.6 eac4e970 b6970
/lib/x86_64-linux-gnu/libpthread.so.0 eaeab0a4 80a4
/lib/x86_64-linux-gnu/libc.so.6 ea3be87d clone 6d
terminate called without an active exception
Cerberus version 0.7.9-2016-08-18 Copyright (c) HunanTV Platform developers
打印cmd->group->client的地址:
if(cmd.not_nul()){
LOG(INFO) << "Server::pop_client, cmd->group:" << cmd->group.nul();
try {
LOG(INFO) << "Server::pop_client, cmd->group->client.address:" << &(cmd->group->client);
} catch (std::exception& e) {
LOG(INFO) << "Server::pop_client, exception:" << e.what();
}
}
2017-02-24 05:48:22,789 I 140241907771136 Server::pop_client, in for
2017-02-24 05:48:22,789 I 140241907771136 Server::pop_client, cmd->group:1
2017-02-24 05:48:22,789 I 140241907771136 Server::pop_client, cmd->group->client.address:0x8
Segmentation fault:
/opt/tiger/twemproxy_bin/bin/cerberus 46012c trac::stacktrace() 29
/opt/tiger/twemproxy_bin/bin/cerberus 460357 trac::print_trace(std::ostream&) 19
所以就是
cmd->group有的时候是null,有的时候不是null
当时有可能cmd->group.nul()是false,但是到了cmd->group->client却是nul就崩了
说明,有的时候sref这个结构体是NULL,有的时候sref里面的ptr是NULL,导致segment fault
我们团队觉得 redis-cerberus 代码质量不错,准备在这个代码基础上开发几个特性。
请问有关于代码里面几个重要的类的描述文档吗?最好有典型的数据流向路径的描述文档。
谢谢
线上使用redis_cerberus时,偶尔会有挂掉的现象,segment fault日志如下:
2020-06-10 10:13:53,614 I 139989652322048 Poll elapse=0.14279 events=1 clients=13 long_clients=0 slots_map_updated=true
Segmentation fault:
/usr/bin/cerberus 45a03c trac::stacktrace() 29
/usr/bin/cerberus 45a26d trac::print_trace(std::ostream&) 19
/usr/bin/cerberus 45d5b9 0
/lib64/libpthread.so.0 1e80f7e0 0
/lib64/libc.so.6 1e48a0c5 0
/lib64/libc.so.6 1e4839d9 memmove 129
/usr/bin/cerberus 436840 cerb::Buffer::truncate_from_begin(__gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, cerb::BufferStatAllocator> >) 40
/usr/bin/cerberus 44ebeb cerb::split_server_response(cerb::Buffer&) 55b
/usr/bin/cerberus 4502a9 cerb::Server::_recv_from() 59
/usr/bin/cerberus 450891 cerb::Server::on_events(int) 41
/usr/bin/cerberus 44b584 cerb::Proxy::handle_events(epoll_event*, int) 574
/usr/bin/cerberus 444ea2 0
/usr/bin/cerberus 44539d 0
/usr/lib64/libstdc++.so.6 210d940c /usr/lib64/libstdc+ 15731
/lib64/libpthread.so.0 1e807aa1 0
/lib64/libc.so.6 1e4e8c4d clone 6d
cerberus版本信息:Cerberus version 0.8.0-2018-05-02 Copyright (c) HunanTV Platform developers
你好,请问redis-cerberus支持HA?或者redis-cerberus+keepliaved的方案是否可行,还请赐教
主从同时挂掉, 要求 redis-cluster 依然部分可用.
在 redis cluster 中打开: cluster-require-full-coverage yes
默认配置下,且该 master 有数据,如果都挂了 cluster 会 down 掉。 如果上面的配置置为 no 就不会 down 了, 打到挂掉的 master 的请求都会失败。 尽管 redis 不会 down 掉了,但是 redis-cerberus 还是会返回 cluster down。 redis-cerberus 在代码里面直接 hardcode 了如果没有 16384 个 slot 直接返回 cluster down。
别 hardcode 了, 然后也支持下挂掉节点没有数据的时候依然能够 work.
请教关于rw模式下的规则问题:
一年之前check的代码,在rw模式下均是只走master,我们自己增加的写M读所有MS
现在看到又有不少更新,正在研究追代码,请问现在官方版还是读写只走M的模式吗?考虑是什么??
多谢回答
我们的redis cluster运行在pod里,配置pod的服务DNS地址到配置文件后,启动报错:
2019-09-02 09:56:24,367 E 140477375293184 Disconnect ext-redis-cache-cluster-0.ext-redis-cache-cluster-svc:6379 for Unknown host: ext-redis-cache-cluster-0.ext-redis-cache-cluster-svc
2019-09-02 09:56:24,367 E 140477375293184 Disconnect ext-redis-cache-cluster-1.ext-redis-cache-cluster-svc:6379 for Unknown host: ext-redis-cache-cluster-1.ext-redis-cache-cluster-svc
2019-09-02 09:56:24,367 E 140477375293184 Disconnect ext-redis-cache-cluster-2.ext-redis-cache-cluster-svc:6379 for Unknown host: ext-redis-cache-cluster-2.ext-redis-cache-cluster-svc
2019-09-02 09:56:24,367 E 140477375293184 Disconnect ext-redis-cache-cluster-3.ext-redis-cache-cluster-svc:6379 for Unknown host: ext-redis-cache-cluster-3.ext-redis-cache-cluster-svc
2019-09-02 09:56:24,367 E 140477375293184 Disconnect ext-redis-cache-cluster-4.ext-redis-cache-cluster-svc:6379 for Unknown host: ext-redis-cache-cluster-4.ext-redis-cache-cluster-svc
2019-09-02 09:56:24,367 E 140477375293184 Disconnect ext-redis-cache-cluster-5.ext-redis-cache-cluster-svc:6379 for Unknown host: ext-redis-cache-cluster-5.ext-redis-cache-cluster-svc
redis-cerberus 目前不支持域名方式么?
如果想要支持,可以修正哪里的源码?
程序直连报超时错误:PHP Fatal error: Uncaught RedisException: Connection timed out
cerberus错误日志如下:
2020-05-07 16:50:10,758 I 139821100205824 Start accepting - Acceptor(16@0x7f2aa6f830f8)
2020-05-07 16:50:10,758 W 139821100205824 Too many open files. Stop accepting from Acceptor(16@0x7f2aa6f830f8)
2020-05-07 16:50:10,758 W 139821100205824 version:0.8.0-2018-05-02
threads:8
cluster_ok:1
read_slave:0
clients_count:149,115,114,125,108,124,107,107
accepting:0,1,0,0,0,0,1,0
long_connections_count:0,0,0,0,0,0,0,0
used_cpu_sys:5016.74
used_cpu_user:5436.59
mem_buffer_alloc:5727980,5757925,6057566,5053367,5545437,6056406,6060881,5886104
completed_commands:31235142
total_process_elapse:16135
total_remote_cost:13903.9
last_command_elapse:0.000243825,0.0186795,0.0141551,0.000359905,0.0203328,0.00054563,0.000973783,0.000646515
last_remote_cost:0.000197395,0.0186091,0.0139204,0.000297576,0.0202605,0.000488308,0.000921514,0.000564282
此时accepting指标大多为0,直连cerbers端口有卡顿,请问是什么原因呢?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.