Giter Club home page Giter Club logo

redis-cerberus's People

Contributors

baiwfg2 avatar cmgs avatar dongzerun avatar maralla avatar zheplusplus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

redis-cerberus's Issues

是否可以动态设置read_slave?

是否可以动态设置read_slave?
原因是,在初始化启动的时候设置了read_slave yes,但在高峰期,想动态设置部分代理read_slave no,目的让读请求分摊到master上,减轻slave的压力。

debian-stretch-slim下编译报错

$ g++ --version
g++ (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ make COMPILER=g++ STATIC_LINK=1

clang++ -c -std=c++0x -D_XOPEN_SOURCE -DELPP_THREAD_SAFE    -Wall -Wextra -Wold-style-cast -Werror -O3 -DELPP_DISABLE_DEBUG_LOGS -DELPP_NO_DEFAULT_LOG_FILE -I. core/command.cpp -o ./build/command.o
core/command.cpp:261:25: error: no member named 'accumulate' in namespace 'std'
            return std::accumulate(
                   ~~~~~^
1 error generated.

# 然后我在core/command.cpp里插入 #include <numeric> 解决了,但还报如下错:

In file included from core/slot_map.cpp:5:0:
core/slot_map.hpp:83:31: error: 'std::vector' has not been declared
         void replace_map(std::vector<RedisNode> const& nodes, Proxy* proxy);
                               ^~~~~~
core/slot_map.hpp:83:37: error: expected ',' or '...' before '<' token
         void replace_map(std::vector<RedisNode> const& nodes, Proxy* proxy);
                                     ^
core/slot_map.hpp:90:10: error: 'vector' in namespace 'std' does not name a template type
     std::vector<RedisNode> parse_slot_map(std::string const& nodes_info,
          ^~~~~~
core/slot_map.cpp:56:6: error: prototype for 'void cerb::SlotMap::replace_map(const std::vector<cerb::RedisNode>&, cerb::Proxy*)' does not match any in class 'cerb::SlotMap'
 void SlotMap::replace_map(std::vector<RedisNode> const& nodes, Proxy* proxy)
      ^~~~~~~
In file included from core/slot_map.cpp:5:0:
core/slot_map.hpp:83:14: error: candidate is: void cerb::SlotMap::replace_map(int)
         void replace_map(std::vector<RedisNode> const& nodes, Proxy* proxy);
              ^~~~~~~~~~~
core/slot_map.cpp:111:76: error: 'std::vector<cerb::RedisNode> cerb::parse_slot_map(const string&, const string&)' should have been declared inside 'cerb'
                                             std::string const& default_host)
                                                                            ^
/tmp/tmp.s7Hv765rf0:8: recipe for target 'build/slot_map.o' failed

使用如 redis-benchmark 在高压力的情况下会使 cerberus 崩溃

测试方式

redis-benchmark -h 127.0.0.1 -p 8889 -P 10 -t get,set -c 1000 -d 10000 -l -q

可以很容易令单线程 cerberus 崩溃.

表征

崩溃通常发生在一组 set 请求测试结束, get 请求刚刚开始的时候.

原因分析

这并不是因为新开始的 get 请求会令 cerberus 崩溃, 而是在 set 请求结束时, 一部分客户端会直接关闭 socket; cerberus 会判定这些 socket 对应的客户端及这些客户端的缓冲区失效 (内存被回收), 但是, 如果这些缓冲区在上一次 writev 调用中没有全部写出, 下一次 writev 继续写出就会引起崩溃.

对日常使用的影响

通常的 web 应用在压力不高的情况下不会引起崩溃, 因为

  • 请求缓冲区不大, 一次 writev 足够写出
  • 连接 cerberus 的 socket 不会随意中断, 通常即使客户端连接关闭, web 服务器也会等待 redis 指令返回

解决方法

  • 退回非增量重试方案: Buffer::writev 强制连续写出
  • 将缓冲区内存管理设为共享方式, 共享内存页缓冲区实现已经有过此代码, 不过效率还需评估

能否支持认证?

有的业务需要 auth,能否在 proxy 层实现一下 auth 命令呢?passwd 写在 cerberus 的配置文件里就可以,选择是否开启

我可以提个 pr

是否支持slave宕机转向Master呢?

gdb 日志:

Core was generated by `./cerberus example.conf'.
Program terminated with signal SIGABRT, Aborted.
#0 0x0000003a47432625 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x0000003a47432625 in raise () from /lib64/libc.so.6
#1 0x0000003a47433e05 in abort () from /lib64/libc.so.6
#2 0x0000003a4a46007d in __gnu_cxx::__verbose_terminate_handler () at ../../.././libstdc++-v3/libsupc++/vterminate.cc:95
#3 0x0000003a4a45e0e6 in __cxxabiv1::__terminate (handler=) at ../../.././libstdc++-v3/libsupc++/eh_terminate.cc:47
#4 0x0000003a4a45e131 in std::terminate () at ../../.././libstdc++-v3/libsupc++/eh_terminate.cc:57
#5 0x00000000005236ff in std::thread::~thread() ()
#6 0x0000000000523be2 in std::default_deletestd::thread::operator()(std::thread*) const ()
#7 0x00000000005253a9 in std::unique_ptr<std::thread, std::default_deletestd::thread >::~unique_ptr() ()
#8 0x0000000000525250 in util::sptrstd::thread::~sptr() ()
#9 0x0000000000526976 in cerb::ListenThread::~ListenThread() ()
#10 0x000000000052699c in void std::_Destroycerb::ListenThread(cerb::ListenThread*) ()
#11 0x000000000052693a in void std::_Destroy_aux::__destroycerb::ListenThread*(cerb::ListenThread*, cerb::ListenThread*) ()
#12 0x0000000000526912 in void std::_Destroycerb::ListenThread*(cerb::ListenThread*, cerb::ListenThread*) ()
#13 0x00000000005268ed in void std::_Destroy<cerb::ListenThread*, cerb::ListenThread>(cerb::ListenThread*, cerb::ListenThread*, std::allocatorcerb::ListenThread&) ()
#14 0x00000000005268a9 in std::vector<cerb::ListenThread, std::allocatorcerb::ListenThread >::~vector() ()
#15 0x0000003a47435b22 in exit () from /lib64/libc.so.6
#16 0x000000000052772d in (anonymous namespace)::exit_on_int(int) ()
#17
#18 0x0000003a478082fb in pthread_join () from /lib64/libpthread.so.0
#19 0x0000003a4a4bb627 in __gthread_join (__value_ptr=0x0, __threadid=)
at /root/tmp/gcc-4.9.3/x86_64-unknown-linux-gnu/libstdc++-v3/include/x86_64-unknown-linux-gnu/bits/gthr-default.h:668
#20 std::thread::join (this=) at ../../../.././libstdc++-v3/src/c++11/thread.cc:107
#21 0x00000000005224b0 in cerb::ListenThread::join() ()
#22 0x000000000052828e in (anonymous namespace)::run((anonymous namespace)::Configuration const&) ()
#23 0x0000000000528839 in main ()

个人认为由cerb_global::all_threads是全局变量所致,ctrl+c导致exit被调用, vector调用析构,而此时线程并未结束。建议将all_threads存储ListenThread指针而非ListenThread对象,当然也可去掉捕获SIGINT信号以消除此dump. 作者以为如何?

Client::_push_awaitings_to_ready()代码是不是错了?

在core/client.cpp里面的Client::_push_awaitings_to_ready()

void Client::_push_awaitings_to_ready()
{
    if (this->_awaiting_count != 0 || (
            !this->_ready_groups.empty() &&
            this->_awaiting_groups.size() + this->_ready_groups.empty() > MAX_RESPONSES
        ))
    {
        return;
    }
    for (util::sptr<CommandGroup>& g: this->_awaiting_groups) {
        g->append_buffer_to(this->_output_buffer_set);
        this->_ready_groups.push_back(std::move(g));
    }
    this->_awaiting_groups.clear();
    if (!this->_output_buffer_set.empty()) {
        this->_proxy->set_conn_poll_rw(this);
    }
}

这个函数把_awaiting_groups里面的命令放到_ready_groups里面,第一个if,
当_awaiting_count的数量不为0的时候不做操作,
或者
_ready_groups不为空,并且
this->_awaiting_groups.size() + this->_ready_groups.empty() > MAX_RESPONSES
_ready_groups.empty()->false->0这个是不是错了?应该是
this->_awaiting_groups.size() + this->_ready_groups.size() > MAX_RESPONSES
这里的意思应该是一次写到output_buffer的数量最大为MAX_RESPONSES, 当_awaiting_groups的数量+_ready_groups的数量大于MAX_RESPONSES的时候就先等reay的处理完再处理awaiting_groups的。

关于fmt库

代码里的cppformat就是fmtlib/fmt库吧,想问一下为何单独放进项目呢?作为submodule引入岂不更好?另外,代码里用fmt的地方真不多呢

segment fault when pop_client

在线上使用redis-cerberus的时候发现cerberus有的时候会有小概率挂掉的情况。我们排查了一下,挂掉的原因是一个segment fault

segmentfault的日志:

2017-02-19 18:41:31,669 I 140625852184320 Slot map updated
Segmentation fault:
/opt/tiger/twemproxy_bin/bin/cerberus 45f76c trac::stacktrace() 29
/opt/tiger/twemproxy_bin/bin/cerberus 45f997 trac::print_trace(std::ostream&) 19
/opt/tiger/twemproxy_bin/bin/cerberus 462dc9  0
/lib/x86_64-linux-gnu/libpthread.so.0 aa4368d0  f8d0
/opt/tiger/twemproxy_bin/bin/cerberus 454355 cerb::Server::pop_client(cerb::Client*) 165
/opt/tiger/twemproxy_bin/bin/cerberus 43a41c cerb::Client::~Client() 2c
/opt/tiger/twemproxy_bin/bin/cerberus 43a58d cerb::Client::after_events(std::set<cerb::Connection*, std::less<cerb::Connection*>, std::allocator<cerb::Connection*> >&) 2d
/opt/tiger/twemproxy_bin/bin/cerberus 44e3ff cerb::Proxy::handle_events(epoll_event*, int) 30f
/opt/tiger/twemproxy_bin/bin/cerberus 446a62  0
/opt/tiger/twemproxy_bin/bin/cerberus 446f5d  0
/usr/lib/x86_64-linux-gnu/libstdc++.so.6 aa1d2970  b6970
/lib/x86_64-linux-gnu/libpthread.so.0 aa42f0a4  80a4
/lib/x86_64-linux-gnu/libc.so.6 a994287d clone 6d
terminate called without an active exception
Cerberus version 0.7.9-2016-08-18 Copyright (c) HunanTV Platform developers

在日志中能看到是在pop_client的时候挂掉的。
因为线上挂掉的概率小,又不能加-g参数,所以加了日志:

void Server::pop_client(Client* cli)
{
    LOG(INFO) << "Server::pop_client, before erase_if";
    util::erase_if(
        this->_commands,
        [&](util::sref<DataCommand> cmd)
        {
            return cmd->group->client.is(cli);
        });
    LOG(INFO) << "Server::pop_client, after erase_if";
    for (util::sref<DataCommand>& cmd: this->_sent_commands) {
        LOG(INFO) << "Server::pop_client, in for";
        if(cmd.not_nul()){
            LOG(INFO) << "Server::pop_client, cmd->group:" << cmd->group.nul();
            LOG(INFO) << "Server::pop_client, cmd->group->client:" << cmd->group->client.nul();
        }
        if (cmd.not_nul() && cmd->group->client.is(cli)) {
            LOG(INFO) << "Server::pop_client, in if";
            cmd.reset();
        }
    }
}

如下:

2017-02-21 05:39:22,306 I 140140096595712 Server::pop_client, before erase_if
2017-02-21 05:39:22,306 I 140140096595712 Server::pop_client, after erase_if
2017-02-21 05:39:22,306 I 140140096595712 Server::pop_client, in for
2017-02-21 05:39:22,306 I 140140096595712 Server::pop_client, cmd->group:0
Segmentation fault:
/opt/tiger/twemproxy_bin/bin/cerberus 4600dc trac::stacktrace() 29
/opt/tiger/twemproxy_bin/bin/cerberus 460307 trac::print_trace(std::ostream&) 19
/opt/tiger/twemproxy_bin/bin/cerberus 463739  0
/lib/x86_64-linux-gnu/libpthread.so.0 eaeb28d0  f8d0
/opt/tiger/twemproxy_bin/bin/cerberus 456069 cerb::Server::pop_client(cerb::Client*) 4c9
/opt/tiger/twemproxy_bin/bin/cerberus 43a47c cerb::Client::~Client() 2c
/opt/tiger/twemproxy_bin/bin/cerberus 43a5ed cerb::Client::after_events(std::set<cerb::Connection*, std::less<cerb::Connecti
on*>, std::allocator<cerb::Connection*> >&) 2d
/opt/tiger/twemproxy_bin/bin/cerberus 44e45f cerb::Proxy::handle_events(epoll_event*, int) 30f
/opt/tiger/twemproxy_bin/bin/cerberus 446ac2  0
/opt/tiger/twemproxy_bin/bin/cerberus 446fbd  0
/usr/lib/x86_64-linux-gnu/libstdc++.so.6 eac4e970  b6970
/lib/x86_64-linux-gnu/libpthread.so.0 eaeab0a4  80a4
/lib/x86_64-linux-gnu/libc.so.6 ea3be87d clone 6d
terminate called without an active exception
Cerberus version 0.7.9-2016-08-18 Copyright (c) HunanTV Platform developers

打印cmd->group->client的地址:

if(cmd.not_nul()){
    LOG(INFO) << "Server::pop_client, cmd->group:" << cmd->group.nul();
    try {
        LOG(INFO) << "Server::pop_client, cmd->group->client.address:" << &(cmd->group->client);
    } catch (std::exception& e) {
        LOG(INFO) << "Server::pop_client, exception:" << e.what();
    }
}
2017-02-24 05:48:22,789 I 140241907771136 Server::pop_client, in for
2017-02-24 05:48:22,789 I 140241907771136 Server::pop_client, cmd->group:1
2017-02-24 05:48:22,789 I 140241907771136 Server::pop_client, cmd->group->client.address:0x8
Segmentation fault:
/opt/tiger/twemproxy_bin/bin/cerberus 46012c trac::stacktrace() 29
/opt/tiger/twemproxy_bin/bin/cerberus 460357 trac::print_trace(std::ostream&) 19

所以就是
cmd->group有的时候是null,有的时候不是null
当时有可能cmd->group.nul()是false,但是到了cmd->group->client却是nul就崩了

说明,有的时候sref这个结构体是NULL,有的时候sref里面的ptr是NULL,导致segment fault

设计文档

我们团队觉得 redis-cerberus 代码质量不错,准备在这个代码基础上开发几个特性。

请问有关于代码里面几个重要的类的描述文档吗?最好有典型的数据流向路径的描述文档。

谢谢

segment fault Buffer::truncate_from_begin

线上使用redis_cerberus时,偶尔会有挂掉的现象,segment fault日志如下:

2020-06-10 10:13:53,614 I 139989652322048 Poll elapse=0.14279 events=1 clients=13 long_clients=0 slots_map_updated=true
Segmentation fault:
/usr/bin/cerberus 45a03c trac::stacktrace() 29
/usr/bin/cerberus 45a26d trac::print_trace(std::ostream&) 19
/usr/bin/cerberus 45d5b9 0
/lib64/libpthread.so.0 1e80f7e0 0
/lib64/libc.so.6 1e48a0c5 0
/lib64/libc.so.6 1e4839d9 memmove 129
/usr/bin/cerberus 436840 cerb::Buffer::truncate_from_begin(__gnu_cxx::__normal_iterator<unsigned char*, std::vector<unsigned char, cerb::BufferStatAllocator> >) 40
/usr/bin/cerberus 44ebeb cerb::split_server_response(cerb::Buffer&) 55b
/usr/bin/cerberus 4502a9 cerb::Server::_recv_from() 59
/usr/bin/cerberus 450891 cerb::Server::on_events(int) 41
/usr/bin/cerberus 44b584 cerb::Proxy::handle_events(epoll_event*, int) 574
/usr/bin/cerberus 444ea2 0
/usr/bin/cerberus 44539d 0
/usr/lib64/libstdc++.so.6 210d940c /usr/lib64/libstdc+ 15731
/lib64/libpthread.so.0 1e807aa1 0
/lib64/libc.so.6 1e4e8c4d clone 6d

cerberus版本信息:Cerberus version 0.8.0-2018-05-02 Copyright (c) HunanTV Platform developers

集群健康度 && 可用性检查...

场景

主从同时挂掉, 要求 redis-cluster 依然部分可用.

对策

在 redis cluster 中打开: cluster-require-full-coverage yes

问题

默认配置下,且该 master 有数据,如果都挂了 cluster 会 down 掉。 如果上面的配置置为 no 就不会 down 了, 打到挂掉的 master 的请求都会失败。 尽管 redis 不会 down 掉了,但是 redis-cerberus 还是会返回 cluster down。 redis-cerberus 在代码里面直接 hardcode 了如果没有 16384 个 slot 直接返回 cluster down。

需求....

别 hardcode 了, 然后也支持下挂掉节点没有数据的时候依然能够 work.

关于rw模式

请教关于rw模式下的规则问题:

一年之前check的代码,在rw模式下均是只走master,我们自己增加的写M读所有MS
现在看到又有不少更新,正在研究追代码,请问现在官方版还是读写只走M的模式吗?考虑是什么??

多谢回答

配置文件中node行不支持k8s的域名

我们的redis cluster运行在pod里,配置pod的服务DNS地址到配置文件后,启动报错:

2019-09-02 09:56:24,367 E 140477375293184 Disconnect ext-redis-cache-cluster-0.ext-redis-cache-cluster-svc:6379 for Unknown host: ext-redis-cache-cluster-0.ext-redis-cache-cluster-svc
2019-09-02 09:56:24,367 E 140477375293184 Disconnect ext-redis-cache-cluster-1.ext-redis-cache-cluster-svc:6379 for Unknown host: ext-redis-cache-cluster-1.ext-redis-cache-cluster-svc
2019-09-02 09:56:24,367 E 140477375293184 Disconnect ext-redis-cache-cluster-2.ext-redis-cache-cluster-svc:6379 for Unknown host: ext-redis-cache-cluster-2.ext-redis-cache-cluster-svc
2019-09-02 09:56:24,367 E 140477375293184 Disconnect ext-redis-cache-cluster-3.ext-redis-cache-cluster-svc:6379 for Unknown host: ext-redis-cache-cluster-3.ext-redis-cache-cluster-svc
2019-09-02 09:56:24,367 E 140477375293184 Disconnect ext-redis-cache-cluster-4.ext-redis-cache-cluster-svc:6379 for Unknown host: ext-redis-cache-cluster-4.ext-redis-cache-cluster-svc
2019-09-02 09:56:24,367 E 140477375293184 Disconnect ext-redis-cache-cluster-5.ext-redis-cache-cluster-svc:6379 for Unknown host: ext-redis-cache-cluster-5.ext-redis-cache-cluster-svc

redis-cerberus 目前不支持域名方式么?
如果想要支持,可以修正哪里的源码?

连接超时

程序直连报超时错误:PHP Fatal error: Uncaught RedisException: Connection timed out
cerberus错误日志如下:
2020-05-07 16:50:10,758 I 139821100205824 Start accepting - Acceptor(16@0x7f2aa6f830f8)
2020-05-07 16:50:10,758 W 139821100205824 Too many open files. Stop accepting from Acceptor(16@0x7f2aa6f830f8)
2020-05-07 16:50:10,758 W 139821100205824 version:0.8.0-2018-05-02
threads:8
cluster_ok:1
read_slave:0
clients_count:149,115,114,125,108,124,107,107
accepting:0,1,0,0,0,0,1,0
long_connections_count:0,0,0,0,0,0,0,0
used_cpu_sys:5016.74
used_cpu_user:5436.59
mem_buffer_alloc:5727980,5757925,6057566,5053367,5545437,6056406,6060881,5886104
completed_commands:31235142
total_process_elapse:16135
total_remote_cost:13903.9
last_command_elapse:0.000243825,0.0186795,0.0141551,0.000359905,0.0203328,0.00054563,0.000973783,0.000646515
last_remote_cost:0.000197395,0.0186091,0.0139204,0.000297576,0.0202605,0.000488308,0.000921514,0.000564282

此时accepting指标大多为0,直连cerbers端口有卡顿,请问是什么原因呢?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.