Giter Club home page Giter Club logo

cloud-config's People

Contributors

acechef avatar jayanring avatar jlerxky avatar k4nzdroid avatar naughtydogofschrodinger avatar pencil-yao avatar rink1969 avatar whfuyn avatar

Forkers

acechef

cloud-config's Issues

容器启动报错

在阿里云k8s集群上部署链的时候,容器启动报错:

container init: error mounting "/etc/timezone" to rootfs at "/etc/ti │
│ mezone": mount /etc/timezone:/etc/timezone (via /proc/self/fd/6), flags: 0x5000: not a directory: unknown

typo

In README

该实例设置了5个节点:

“实例” -> "示例"

禁止pod滚动更新

重启链的节点的时候,通过rancher操作,默认是滚动更新的,会先创建新的pod,然后再删除老的pod。
新老pod并存的时候,就很容易出现数据库LOCK文件被锁的问题。


要禁止对一个Pod进行滚动更新,可以通过设置Pod的spec.strategy.type字段为Recreate来实现。具体来说,可以在部署Deployment或者StatefulSet时指定spec.strategy.type: Recreate,这样在更新Pod时,Kubernetes会先删除旧的Pod,然后再启动新的Pod,这样就可以避免新旧Pod之间争夺资源的问题。

spec.strategy.type字段在Kubernetes的v1.6版本中引入,因此只要您的Kubernetes集群的版本在v1.6或更高版本,就可以使用该字段来设置Pod的更新策略。

删除节点后新增节点不在共用相同的index

例如

  1. 删除节点 test-chain-4 addr: 0x1234
  2. 新增节点 addr 0xabcd 其节点目录名: test-chain-4
    出现了删除节点后新增节点不在共用相同的index

需要修改为:

  1. 删除节点 test-chain-4 addr: 0x1234
  2. 新增节点 addr 0xabcd 其节点目录名: test-chain-5
  3. 新增节点 addr 0x1234 其节点目录名: test-chain-4

这种做法有利于 tls common name 推算

consensus raft lost log_level

consensus_raft的配置文件里缺少log_level配置项

应该把init-node时的log_level参数配置到consensus_raft的配置文件里

import-account没有创建node_address

$ docker run -it --rm -v $(pwd):/data -w /data $DOCKER_REGISTRY/$DOCKER_REPO/cloud-config:$RELEASE_VERSION cloud-config import-account --chain-name $CHAIN_NAME --privkey 0x2241d9b09a86f759dff2badad0791437cd8cc23363c85e5736a35c557ce445a9
node address: eb9b2ef5551babdb66d11e9882e2d2f9359c547c validator address: eb9b2ef5551babdb66d11e9882e2d2f9359c547c

$ ls sla-bft/accounts/a6ca9b322580e65c064af352ab194f7e98c2ee55/
node_address  private_key  validator_address
$ ls sla-bft/accounts/eb9b2ef5551babdb66d11e9882e2d2f9359c547c/
private_key  validator_address

通过new-account创建的账户,目录下会创建3个文件。
但是通过import-account创建的账户,缺少了node_address文件。

Nodes failed with Permission denied

# kubectl get pod
NAME                 READY   STATUS             RESTARTS       AGE
test-chain-node0-0   2/6     CrashLoopBackOff   23 (68s ago)   5m45s
test-chain-node1-0   2/6     CrashLoopBackOff   23 (55s ago)   5m38s
test-chain-node2-0   2/6     CrashLoopBackOff   23 (57s ago)   5m35s
test-chain-node3-0   2/6     CrashLoopBackOff   23 (44s ago)   5m30s
# kubectl logs test-chain-node0-0 -c nework
...
Thread main panicked at called `Result::unwrap()` on an `Err` value: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }, src/util.rs:30`
...
# kubectl logs test-chain-node0-0 -c consensue
...
Thread main panicked at called `Result::unwrap()` on an `Err` value: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }, src/authority_manage.rs:34
...
# kubectl logs test-chain-node0-0 -c executor
...
Thread main panicked at called `Result::unwrap()` on an `Err` value: Internal("Failed to create RocksDB directory: `Os { code: 13, kind: PermissionDenied, message: \"Permission denied\" }`."), src/core_executor/libexecutor/executor.rs:51
...
# kubectl logs test-chain-node0-0 -c  storage
...
Thread main panicked at called `Result::unwrap()` on an `Err` value: Error { message: "Failed to create RocksDB directory: `Os { code: 13, kind: PermissionDenied, message: \"Permission denied\" }`." }, src/db.rs:58
...

增加节点导致kms微服务异常

一个4个节点的链
通过如下命令增加一个节点

helm install increase-single ./cita-cloud-config --set config.action.type=increaseSingle --set config.action.increaseSingle.kmsPassword=123456

然后原有节点的kms报错

2022-03-23T02:38:24.797259385+00:00 WARN kms::kms - sign_message get account(id: 1) failed: SqliteFailure(Error { code: NotADatabase, extended_code: 26 }, Some("file is not a database"))

add subcmd for localcluster(statefulset)

conventions:

  • node network host use headless service
  • node network port is 40000
  • node domain is i
  • nodes have same kms password
  • grpc ports start from 50000
  • is stdout is true
  • append and delete as LILO

debug容器启动失败

打开enable-debug开关之后,增加的debug容器启动报错:

The directory /usr/share/nginx/html is not mounted.
Therefore, over-writing the default index.html file with some useful information:
tee: /usr/share/nginx/html/index.html: Permission denied
Praqma Network MultiTool (with NGINX) - d9a0ef5e09cb - 172.17.0.2 - HTTP: 80 , HTTPS: 443
/docker/entrypoint.sh: line 26: can't create /usr/share/nginx/html/index.html: Permission denied

========================= IMPORTANT ==============================

cat: can't open '/root/press-release.md': Permission denied

==================================================================

/docker/entrypoint.sh: exec: line 61: illegal option --

原因是之前我们把容器运行的用户从root改成了普通用户,导致debug容器的entrypoint没有权限执行。

解决方法:设置一个空循环的entrypoin,覆盖镜像自身的entrypoin。

执行init-node子命令时卡住

在某个特殊一点的环境下执行init-node子命令时卡住了。

经过分析,是在递归复制文件夹的时候卡住的。
猜测是递归复制的时候,碰到了了 "." ".."或者其他的特殊的文件夹。

update node panic

when user call update-node, panic at line 79 with error No such file.
Because when user call init-node set account with 0x prefix.
So we should check user args like this.

delete node报找不到节点

cloud-config delete-node --chain-name chain-810889132908875776 --domain 10
thread 'main' panicked at 'Can't found node that want to delete! 10 not in [NodeNetworkAddress { host: "192.168.130.2", port: 30510, domain: "0", cluster: "764824418512932864", svc_port: 40000 }, NodeNetworkAddress { host: "192.168.130.2", port: 30520, domain: "1", cluster: "7648244185
12932864", svc_port: 40000 }, NodeNetworkAddress { host: "192.168.130.2", port: 30530, domain: "2", cluster: "764824418512932864", svc_port: 40000 }, NodeNetworkAddress { host: "192.168.130.2", port: 30540, domain: "3", cluster: "764824418512932864", svc_port: 40000 }, NodeNetworkAddre
ss { host: "192.168.130.2", port: 30550, domain: "10", cluster: "764824418512932864", svc_port: 40000 }, NodeNetworkAddress { host: "192.168.130.2", port: 30560, domain: "11", cluster: "764824418512932864", svc_port: 40000 }]', src/delete_node.rs:57:19

10是存在的,但是就是报找不到。

这里查找节点的时候用了binary_search,这个函数需要数组是排好序的。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.