cockroachdb / cockroach-operator Goto Github PK
View Code? Open in Web Editor NEWk8s operator for CRDB
License: Apache License 2.0
k8s operator for CRDB
License: Apache License 2.0
Provide a mechanism to override crdb defaults when starting the database. e.g. override cache size
This issue is for me to track the first phase of work. This is brainstorming so that we have an initial list of stuff.
Here is a breakdown of tasks:
Administrative Items
@chrisseto can you sometimes walk me through 3ef3a93 and if we still need it?
Provide a mechanism for updating the configuration of an existing crdb cluster (e.g. update cache settings, etc)
As an operator, I can perform major version upgrades via rolling restarts,
so that my application can take advantage of new features and use on a
more stable version of CockroachDB without losing availability.
Enable pod replacement with new patch (e.g. 19.2.x) for maintenance purposes
manage upgrade workflow for feature releases (e.g. 19.1.x to 19.2.x) with manual finalize step
I think this is a duplicate issue, but here is the comment that we need to remove.
We are leaving behind pv's and pvc's when we run unit tests. We need to wait the default grace period and then delete the disks.
Enable configurable settings for the following:
Our helm chart currently has features not found in this operator. One of our goals for the operator is to be able to deprecate and replace the helm chart.
There's some upcoming maintenance work that we'd like to be able to avoid due to infrastructure changes in the helm project. The deadline to either do that work or deprecate the helm chart is Aug 13.
This project will need a release process. We probably want a stable, beta, and dev versions.
As an operator, I can configure CockroachDB license on a new
CockroachDB cluster in Kubernetes, so that I initialize a cluster that
meets my expected deployment needs.
Customers need to be able to apply an enterprise license that exist in cluster settings (enterprise.license).
example.yaml uses a generated secret and the operator is not creating the secret
When applying the CRD (kubectl apply -f config/crd/bases/crdb.cockroachlabs.com_crdbclusters.yaml), it requires using the --validate=false flag.
At the moment, I receive this error:
error: error validating "config/crd/bases/crdb.cockroachlabs.com_crdbclusters.yaml": error validating data: ValidationError(CustomResourceDefinition.spec.validation.openAPIV3Schema.properties.spec.properties.dataStore.properties.emptyDir.properties.sizeLimit): unknown field "x-kubernetes-int-or-string" in io.k8s.apiextensions-apiserver.pkg.apis.apiextensions.v1beta1.JSONSchemaProps; if you choose to ignore these errors, turn validation off with --validate=false
Environment:
Running on a k8s cluster on GKE, following these instructions: https://www.cockroachlabs.com/docs/stable/orchestrate-cockroachdb-with-kubernetes.html
kubectl version:
Client Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.5", GitCommit:"20c265fef0741dd71a66480e35bd69f18351daea", GitTreeState:"clean", BuildDate:"2019-10-15T19:16:51Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.10-gke.36", GitCommit:"34a615f32e9a0c9e97cdb9f749adb392758349a6", GitTreeState:"clean", BuildDate:"2020-04-06T16:33:17Z", GoVersion:"go1.12.12b4", Compiler:"gc", Platform:"linux/amd64"}
I ran into this error, which might be a flake or a framework error:
TestCreatesSecureClusterWithGeneratedCert/creates_1-node_secure_cluster: e2e_test.go:109:
Error Trace: e2e_test.go:109
Error: Received unexpected error:
an error on the server ("Internal Server Error: \"/apis/crd.projectcalico.org/v1/namespaces/crdb-test-8vk9nj/ne
failed to list objects in namespace crdb-test-8vk9nj
github.com/cockroachdb/cockroach-operator/pkg/testutil/env.listAllObjs
/ws/pkg/testutil/env/sandbox.go:193
github.com/cockroachdb/cockroach-operator/pkg/testutil/env.(*DiffingSandbox).Diff
/ws/pkg/testutil/env/sandbox.go:174
github.com/cockroachdb/cockroach-operator/e2e.TestCreatesSecureClusterWithGeneratedCert.func1
/ws/e2e/e2e_test.go:108
testing.tRunner
/usr/local/go/src/testing/testing.go:991
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1373
Test: TestCreatesSecureClusterWithGeneratedCert/creates_1-node_secure_cluster
It does not see to be an error with the operator.
.PHONY
statementsWe are getting some test failures because of calico annotations
TestCreatesInsecureCluster/creates_1-node_insecure_cluster: assert.go:10: unexpected result (-want +got):
strings.Join({
... // 14 identical lines
"kind: Pod",
"metadata:",
+ " annotations:",
+ " cni.projectcalico.org/podIP: 10.56.1.5/32",
" labels:",
" app.kubernetes.io/component: database",
... // 233 identical lines
}, "\n")
--- FAIL: TestCreatesInsecureCluster (24.40s)
--- FAIL: TestCreatesInsecureCluster/creates_1-node_insecure_cluster (21.38s)
TestCreatesSecureClusterWithGeneratedCert/creates_1-node_secure_cluster: assert.go:10: unexpected result (-want +got):
strings.Join({
... // 15 identical lines
"kind: Pod",
"metadata:",
+ " annotations:",
+ " cni.projectcalico.org/podIP: 10.56.1.6/32",
" labels:",
" app.kubernetes.io/component: database",
... // 299 identical lines
}, "\n")
--- FAIL: TestCreatesSecureClusterWithGeneratedCert (45.40s)
--- FAIL: TestCreatesSecureClusterWithGeneratedCert/creates_1-node_secure_cluster (42.37s)
As an operator, I can get the latest OpenShift certified CockroachDB
Kubernetes Operator on the Red Hat Marketplace, so that I can more
easily purchase and deploy CockroachDB in my OpenShift environment.
When we have a PR we need to run the following make targets
make docker/build/operator-ubi
make docker/build/test-runner
make test
make lint
https://github.com/chrislovecnm/gcp-dev-server-terraform should do the trick
As an operator, I can get the latest Google Anthos certified
CockroachDB Kubernetes Operator on the Google Cloud Marketplace,
so that I can purchase and deploy CockroachDB on my Google Cloud
Anthos-based environment.
We are using kustomize for some of the acceptance testing scripts.
This script is a bit messy since it modifies one of the files, and git then wants to check it in. We need to copy the file to a temp folder and then use kustomize to deploy those files.
Do not have the current container image https://github.com/cockroachdb/cockroach-operator/blob/master/deploy/operator.yaml#L181
This does not allow for external use.
We have the following value in the API
cockroach-operator/api/v1alpha1/cluster_types.go
Lines 14 to 15 in bbfc23a
The term Node and Nodes are also used by Kubernetes
https://kubernetes.io/docs/concepts/architecture/nodes/
@johnrk I recommend renaming this to something like PodReplicaCount, Replicas, or something else that does not use the term Node.
I am not the best at figuring out names that make sense, but I am good at figuring out overloaded terms, and this one is overloaded.
I think we should probably rename this to cockroachdb-operator
(or maybe cockroach-operator
) for consistency with everything else - we don't really use crdb
in formal contexts much. It'll get a lot harder to change after this ships, so if we're going to change it we should do it soon.
This is a partial log dump. The pod restarted and came up, but I am wondering if this is a timing issue.
This is from running the container cockroachdb/cockroach:v19.2.6
Factory (0x7ff46d8153e8)
cache_index_and_filter_blocks: 0
cache_index_and_filter_blocks_with_high_priority: 0
pin_l0_filter_and_index_blocks_in_cache: 0
pin_top_level_index_and_filter: 1
index_type: 0
data_block_index_type: 0
index_shortening: 1
data_block_hash_table_util_ratio: 0.750000
hash_index_allow_collision: 1
checksum: 1
no_block_cache: 0
block_cache: 0x7ff46d976210
block_cache_name: LRUCache
block_cache_options:
capacity : 650437632
num_shard_bits : 4
strict_capacity_limit : 0
memory_allocator : None
high_pri_pool_ratio: 0.000
block_cache_compressed: (nil)
persistent_cache: (nil)
block_size: 32768
block_size_deviation: 10
block_restart_interval: 16
index_block_restart_interval: 1
metadata_block_size: 4096
partition_filters: 0
use_delta_encoding: 1
filter_policy: rocksdb.BuiltinBloomFilter
whole_key_filtering: 0
verify_compression: 0
read_amp_bytes_per_bit: 0
format_version: 2
enable_index_compression: 1
block_align: 0
I200708 20:44:20.993521 22 storage/engine/rocksdb.go:120 Options.write_buffer_size: 67108864
I200708 20:44:20.993609 22 storage/engine/rocksdb.go:120 Options.max_write_buffer_number: 4
I200708 20:44:20.993704 22 storage/engine/rocksdb.go:120 Options.compression: Snappy
I200708 20:44:20.993773 22 storage/engine/rocksdb.go:120 Options.bottommost_compression: Disabled
I200708 20:44:20.993837 22 storage/engine/rocksdb.go:120 Options.prefix_extractor: cockroach_prefix_extractor
I200708 20:44:20.993909 22 storage/engine/rocksdb.go:120 Options.memtable_insert_with_hint_prefix_extractor: nullptr
I200708 20:44:20.993975 22 storage/engine/rocksdb.go:120 Options.num_levels: 7
I200708 20:44:20.994039 22 storage/engine/rocksdb.go:120 Options.min_write_buffer_number_to_merge: 1
I200708 20:44:20.994047 22 storage/engine/rocksdb.go:120 Options.max_write_buffer_number_to_maintain: 0
I200708 20:44:20.994053 22 storage/engine/rocksdb.go:120 Options.bottommost_compression_opts.window_bits: -14
I200708 20:44:20.994073 22 storage/engine/rocksdb.go:120 Options.bottommost_compression_opts.level: 32767
I200708 20:44:20.994170 22 storage/engine/rocksdb.go:120 Options.bottommost_compression_opts.strategy: 0
I200708 20:44:20.994193 22 storage/engine/rocksdb.go:120 Options.bottommost_compression_opts.max_dict_bytes: 0
I200708 20:44:20.994200 22 storage/engine/rocksdb.go:120 Options.bottommost_compression_opts.zstd_max_train_bytes: 0
I200708 20:44:20.994206 22 storage/engine/rocksdb.go:120 Options.bottommost_compression_opts.enabled: false
I200708 20:44:20.994212 22 storage/engine/rocksdb.go:120 Options.compression_opts.window_bits: -14
I200708 20:44:20.994219 22 storage/engine/rocksdb.go:120 Options.compression_opts.level: 32767
I200708 20:44:20.994228 22 storage/engine/rocksdb.go:120 Options.compression_opts.strategy: 0
I200708 20:44:20.994234 22 storage/engine/rocksdb.go:120 Options.compression_opts.max_dict_bytes: 0
I200708 20:44:20.994239 22 storage/engine/rocksdb.go:120 Options.compression_opts.zstd_max_train_bytes: 0
I200708 20:44:20.994245 22 storage/engine/rocksdb.go:120 Options.compression_opts.enabled: false
I200708 20:44:20.994251 22 storage/engine/rocksdb.go:120 Options.level0_file_num_compaction_trigger: 2
I200708 20:44:20.994259 22 storage/engine/rocksdb.go:120 Options.level0_slowdown_writes_trigger: 950
I200708 20:44:20.994265 22 storage/engine/rocksdb.go:120 Options.level0_stop_writes_trigger: 1000
I200708 20:44:20.994271 22 storage/engine/rocksdb.go:120 Options.target_file_size_base: 4194304
I200708 20:44:20.994276 22 storage/engine/rocksdb.go:120 Options.target_file_size_multiplier: 2
I200708 20:44:20.994282 22 storage/engine/rocksdb.go:120 Options.max_bytes_for_level_base: 67108864
I200708 20:44:20.994291 22 storage/engine/rocksdb.go:120 Options.level_compaction_dynamic_level_bytes: 1
I200708 20:44:20.994299 22 storage/engine/rocksdb.go:120 Options.max_bytes_for_level_multiplier: 10.000000
I200708 20:44:20.994304 22 storage/engine/rocksdb.go:120 Options.max_bytes_for_level_multiplier_addtl[0]: 1
I200708 20:44:20.994310 22 storage/engine/rocksdb.go:120 Options.max_bytes_for_level_multiplier_addtl[1]: 1
I200708 20:44:20.994315 22 storage/engine/rocksdb.go:120 Options.max_bytes_for_level_multiplier_addtl[2]: 1
I200708 20:44:20.994324 22 storage/engine/rocksdb.go:120 Options.max_bytes_for_level_multiplier_addtl[3]: 1
I200708 20:44:20.994329 22 storage/engine/rocksdb.go:120 Options.max_bytes_for_level_multiplier_addtl[4]: 1
I200708 20:44:20.994335 22 storage/engine/rocksdb.go:120 Options.max_bytes_for_level_multiplier_addtl[5]: 1
I200708 20:44:20.994340 22 storage/engine/rocksdb.go:120 Options.max_bytes_for_level_multiplier_addtl[6]: 1
I200708 20:44:20.994346 22 storage/engine/rocksdb.go:120 Options.max_sequential_skip_in_iterations: 8
I200708 20:44:20.994354 22 storage/engine/rocksdb.go:120 Options.max_compaction_bytes: 104857600
I200708 20:44:20.994360 22 storage/engine/rocksdb.go:120 Options.arena_block_size: 8388608
I200708 20:44:20.994365 22 storage/engine/rocksdb.go:120 Options.soft_pending_compaction_bytes_limit: 2199023255552
I200708 20:44:20.994371 22 storage/engine/rocksdb.go:120 Options.hard_pending_compaction_bytes_limit: 4400193994752
I200708 20:44:20.994380 22 storage/engine/rocksdb.go:120 Options.rate_limit_delay_max_milliseconds: 100
I200708 20:44:20.994385 22 storage/engine/rocksdb.go:120 Options.disable_auto_compactions: 0
I200708 20:44:20.994393 22 storage/engine/rocksdb.go:120 Options.compaction_style: kCompactionStyleLevel
I200708 20:44:20.994404 22 storage/engine/rocksdb.go:120 Options.compaction_pri: kMinOverlappingRatio
I200708 20:44:20.994409 22 storage/engine/rocksdb.go:120 Options.compaction_options_universal.size_ratio: 1
I200708 20:44:20.994418 22 storage/engine/rocksdb.go:120 Options.compaction_options_universal.min_merge_width: 2
I200708 20:44:20.994425 22 storage/engine/rocksdb.go:120 Options.compaction_options_universal.max_merge_width: 4294967295
I200708 20:44:20.994431 22 storage/engine/rocksdb.go:120 Options.compaction_options_universal.max_size_amplification_percent: 200
I200708 20:44:20.994437 22 storage/engine/rocksdb.go:120 Options.compaction_options_universal.compression_size_percent: -1
I200708 20:44:20.994443 22 storage/engine/rocksdb.go:120 Options.compaction_options_universal.stop_style: kCompactionStopStyleTotalSize
I200708 20:44:20.994452 22 storage/engine/rocksdb.go:120 Options.compaction_options_fifo.max_table_files_size: 1073741824
I200708 20:44:20.994458 22 storage/engine/rocksdb.go:120 Options.compaction_options_fifo.allow_compaction: 0
I200708 20:44:20.994467 22 storage/engine/rocksdb.go:120 Options.table_properties_collectors: TimeBoundTblPropCollectorFactory; DeleteRangeTblPropCollectorFactory;
I200708 20:44:20.994474 22 storage/engine/rocksdb.go:120 Options.inplace_update_support: 0
I200708 20:44:20.994480 22 storage/engine/rocksdb.go:120 Options.inplace_update_num_locks: 10000
I200708 20:44:20.994497 22 storage/engine/rocksdb.go:120 Options.memtable_prefix_bloom_size_ratio: 0.000000
I200708 20:44:20.994503 22 storage/engine/rocksdb.go:120 Options.memtable_whole_key_filtering: 0
I200708 20:44:20.994509 22 storage/engine/rocksdb.go:120 Options.memtable_huge_page_size: 0
I200708 20:44:20.994515 22 storage/engine/rocksdb.go:120 Options.bloom_locality: 0
I200708 20:44:20.994520 22 storage/engine/rocksdb.go:120 Options.max_successive_merges: 0
I200708 20:44:20.994529 22 storage/engine/rocksdb.go:120 Options.optimize_filters_for_hits: 1
I200708 20:44:20.994535 22 storage/engine/rocksdb.go:120 Options.paranoid_file_checks: 0
I200708 20:44:20.994540 22 storage/engine/rocksdb.go:120 Options.force_consistency_checks: 0
I200708 20:44:20.994880 22 storage/engine/rocksdb.go:120 Options.report_bg_io_stats: 0
I200708 20:44:20.994961 22 storage/engine/rocksdb.go:120 Options.ttl: 0
I200708 20:44:20.995010 22 storage/engine/rocksdb.go:120 Options.periodic_compaction_seconds: 0
I200708 20:44:20.995873 22 storage/engine/rocksdb.go:120 [db/version_set.cc:4286] Recovered from manifest file:/cockroach/cockroach-data/MANIFEST-000001 succeeded,manifest_file_number is 1, next_file_number is 3, last_sequence is 0, log_number is 0,prev_log_number is 0,max_column_family is 0,min_log_number_to_keep is 0
I200708 20:44:20.996005 22 storage/engine/rocksdb.go:120 [db/version_set.cc:4295] Column family [default] (ID 0), log number is 0
I200708 20:44:21.002097 22 storage/engine/rocksdb.go:120 DB pointer 0x7ff46d963000
I200708 20:44:21.002442 22 server/config.go:502 [n?] 1 storage engine initialized
I200708 20:44:21.002585 22 server/config.go:505 [n?] RocksDB cache size: 684 MiB
I200708 20:44:21.002669 22 server/config.go:505 [n?] store 0: RocksDB, max size 0 B, max open file limit 1043576
W200708 20:44:21.003037 22 gossip/gossip.go:1517 [n?] no incoming or outgoing connections
I200708 20:44:21.003225 22 server/server.go:1391 [n?] no stores bootstrapped and --join flag specified, awaiting init command or join with an already initialized node.
I200708 20:44:21.017215 75 gossip/client.go:124 [n?] started gossip client to crdb-0.crdb.default:26257
I200708 20:44:21.019346 22 server/node.go:645 [n?] connecting to gossip network to verify cluster ID...
I200708 20:44:21.019582 22 server/node.go:665 [n?] node connected via gossip and verified as part of cluster "cf11fb7a-2dc4-4976-9af7-87b88567520e"
I200708 20:44:21.036451 22 server/node.go:381 [n?] new node allocated ID 2
I200708 20:44:21.036825 22 gossip/gossip.go:394 [n2] NodeDescriptor set to node_id:2 address:<network_field:"tcp" address_field:"crdb-1.crdb.default.svc.cluster.local:26257" > attrs:<> locality:<> ServerVersion:<major_val:19 minor_val:2 patch:0 unstable:0 > build_tag:"v19.2.6" started_at:1594241061036645861 cluster_name:"" sql_address:<network_field:"tcp" address_field:"crdb-1.crdb.default.svc.cluster.local:26257" >
I200708 20:44:21.037014 22 storage/stores.go:240 [n2] read 0 node addresses from persistent storage
I200708 20:44:21.037204 22 storage/stores.go:259 [n2] wrote 1 node addresses to persistent storage
I200708 20:44:21.088507 22 server/node.go:620 [n2] bootstrapped store [n2,s2]
I200708 20:44:21.090884 22 server/node.go:512 [n2] node=2: started with [<no-attributes>=/cockroach/cockroach-data] engine(s) and attributes []
I200708 20:44:21.091299 22 server/server.go:1519 [n2] starting http server at [::]:8080 (use: crdb-1.crdb.default.svc.cluster.local:8080)
I200708 20:44:21.091457 22 server/server.go:1526 [n2] starting grpc/postgres server at [::]:26257
I200708 20:44:21.091671 22 server/server.go:1527 [n2] advertising CockroachDB node at crdb-1.crdb.default.svc.cluster.local:26257
W200708 20:44:21.124763 22 jobs/registry.go:340 [n2] unable to get node liveness: node not in the liveness table
I200708 20:44:21.196382 172 sql/event_log.go:130 [n2,intExec=add-constraints-ttl] Event: "set_zone_config", target: 25, info: {Target:TABLE system.public.replication_constraint_stats Config: Options:"gc.ttlseconds" = 600 User:root}
I200708 20:44:21.243416 180 sql/event_log.go:130 [n2,intExec=add-replication-status-ttl] Event: "set_zone_config", target: 27, info: {Target:TABLE system.public.replication_stats Config: Options:"gc.ttlseconds" = 600 User:root}
I200708 20:44:21.247871 185 sql/sqlbase/structured.go:1529 [n2,intExec=update-reports-meta-generated] publish: descID=28 (reports_meta) version=2 mtime=1970-01-01 00:00:00 +0000 UTC
I200708 20:44:22.618476 60 gossip/gossip.go:1531 [n2] node has connected to cluster via gossip
I200708 20:44:22.618881 60 storage/stores.go:259 [n2] wrote 1 node addresses to persistent storage
I200708 20:44:31.110661 143 server/status/runtime.go:498 [n2] runtime stats: 130 MiB RSS, 138 goroutines, 78 MiB/46 MiB/125 MiB GO alloc/idle/total, 2.4 MiB/3.4 MiB CGO alloc/total, 0.0 CGO/sec, 0.0/0.0 %(u/s)time, 0.0 %gc (9x), 69 KiB/59 KiB (r/w)net
E200708 20:44:31.258105 204 sql/flowinfra/flow_registry.go:234 [n2,intExec=count-leases] flow id:f166c019-b942-4adf-bec6-493b3f5f4875 : 1 inbound streams timed out after 10s; propagated error throughout flow
E200708 20:44:31.508664 22 util/log/crash_reporting.go:537 [n2] Reported as error a5c43e29b97e4a328e5ce96e5e88b509
F200708 20:44:31.508900 22 server/server.go:1592 [n2] error with attached stack trace:
github.com/cockroachdb/cockroach/pkg/sql.(*internalExecutorImpl).execInternal.func1
/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:472
github.com/cockroachdb/cockroach/pkg/sql.(*internalExecutorImpl).execInternal
/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:569
github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).ExecWithUser
/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:320
github.com/cockroachdb/cockroach/pkg/sqlmigrations.glob..func1
/go/src/github.com/cockroachdb/cockroach/pkg/sqlmigrations/migrations.go:242
github.com/cockroachdb/cockroach/pkg/sqlmigrations.(*Manager).EnsureMigrations
/go/src/github.com/cockroachdb/cockroach/pkg/sqlmigrations/migrations.go:552
github.com/cockroachdb/cockroach/pkg/server.(*Server).Start
/go/src/github.com/cockroachdb/cockroach/pkg/server/server.go:1586
github.com/cockroachdb/cockroach/pkg/cli.runStart.func3.2
/go/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:699
github.com/cockroachdb/cockroach/pkg/cli.runStart.func3
/go/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:814
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1337
- error with embedded safe details: update-reports-meta-generated
- update-reports-meta-generated:
- error with attached stack trace:
github.com/cockroachdb/cockroach/pkg/sql.(*internalExecutorImpl).execInternal.func1
/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:472
github.com/cockroachdb/cockroach/pkg/sql.(*internalExecutorImpl).execInternal
/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:569
github.com/cockroachdb/cockroach/pkg/sql.(*internalExecutorImpl).queryInternal
/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:252
github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).Query
/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:223
github.com/cockroachdb/cockroach/pkg/sql.(*InternalExecutor).QueryRow
/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:278
github.com/cockroachdb/cockroach/pkg/sql.CountLeases
/go/src/github.com/cockroachdb/cockroach/pkg/sql/lease.go:535
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).checkTableTwoVersionInvariant
/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:505
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).commitSQLTransaction
/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:577
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).handleAutoCommit
/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:1255
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmtInOpenState.func5
/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:211
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmtInOpenState
/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:446
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execStmt
/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor_exec.go:98
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).execCmd
/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1243
github.com/cockroachdb/cockroach/pkg/sql.(*connExecutor).run
/go/src/github.com/cockroachdb/cockroach/pkg/sql/conn_executor.go:1172
github.com/cockroachdb/cockroach/pkg/sql.(*internalExecutorImpl).initConnEx.func1
/go/src/github.com/cockroachdb/cockroach/pkg/sql/internal.go:202
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1337
- error with embedded safe details: count-leases
- count-leases:
- no inbound stream connection
github.com/cockroachdb/cockroach/pkg/sql/flowinfra.init.ializers
/go/src/github.com/cockroachdb/cockroach/pkg/sql/flowinfra/flow_registry.go:30
runtime.main
/usr/local/go/src/runtime/proc.go:188
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1337
failed to run migration "change reports fields from timestamp to timestamptz"
github.com/cockroachdb/cockroach/pkg/sqlmigrations.(*Manager).EnsureMigrations
/go/src/github.com/cockroachdb/cockroach/pkg/sqlmigrations/migrations.go:553
github.com/cockroachdb/cockroach/pkg/server.(*Server).Start
/go/src/github.com/cockroachdb/cockroach/pkg/server/server.go:1586
github.com/cockroachdb/cockroach/pkg/cli.runStart.func3.2
/go/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:699
github.com/cockroachdb/cockroach/pkg/cli.runStart.func3
/go/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:814
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1337
goroutine 22 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xc00041c300, 0xc00041c300, 0x0, 0xc00063c6b8)
/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:1024 +0xb1
github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x753dbe0, 0xc000000004, 0x6ceef15, 0x10, 0x638, 0xc000ecd900, 0x12c2)
/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:871 +0x95b
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x4a895a0, 0xc00068c540, 0x4000000000000004, 0x2, 0x40efecb, 0x3, 0xc000a5ce40, 0x1, 0x1)
/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:66 +0x2cc
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x4a895a0, 0xc00068c540, 0x1, 0xc000000004, 0x40efecb, 0x3, 0xc000a5ce40, 0x1, 0x1)
/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:69 +0x8c
github.com/cockroachdb/cockroach/pkg/util/log.Fatalf(...)
/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:180
github.com/cockroachdb/cockroach/pkg/server.(*Server).Start(0xc0009a0000, 0x4a895a0, 0xc000d40390, 0x0, 0x0)
/go/src/github.com/cockroachdb/cockroach/pkg/server/server.go:1592 +0x2b9b
github.com/cockroachdb/cockroach/pkg/cli.runStart.func3.2(0xc0006767e0, 0xc000010538, 0xc0002b4220, 0x4a895a0, 0xc000d40390, 0xc00008b700, 0x30cb81ed, 0xed6982724, 0x0, 0x7c716e, ...)
/go/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:699 +0x10d
github.com/cockroachdb/cockroach/pkg/cli.runStart.func3(0xc000010538, 0x4a895a0, 0xc000d40390, 0x4af3940, 0xc000b889a0, 0xc0006767e0, 0xc0002b4220, 0x0, 0x30cb81ed, 0xed6982724, ...)
/go/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:814 +0x12e
created by github.com/cockroachdb/cockroach/pkg/cli.runStart
/go/src/github.com/cockroachdb/cockroach/pkg/cli/start.go:655 +0x8f1
****************************************************************************
This node experienced a fatal error (printed above), and as a result the
process is terminating.
Fatal errors can occur due to faulty hardware (disks, memory, clocks) or a
problem in CockroachDB. With your help, the support team at Cockroach Labs
will try to determine the root cause, recommend next steps, and we can
improve CockroachDB based on your report.
Please submit a crash report by following the instructions here:
https://github.com/cockroachdb/cockroach/issues/new/choose
If you would rather not post publicly, please contact us directly at:
[email protected]
The Cockroach Labs team appreciates your feedback.
I200708 20:44:31.509939 1 util/stop/stopper.go:542 quiescing; tasks left:
1 [async] intent_resolver_ir_batcher
1 [async] intent_resolver_gc_batcher
1 [async] closedts-subscription
1 [async] closedts-rangefeed-subscriber
We need
make generate
does not produce new code@johnrk what platforms does this operator needs to be tested against? We are building this to support on-prem as well as cloud-deployed k8s. What type of on-prem installations? I think this is a shortlist of testing that we can start with:
k delete -f ../config/examples/example.yaml
and we get
2020-07-08T18:52:28.256Z ERROR controller.CrdbCluster failed to retrieve CrdbCluster resource {"CrdbCluster": "default/crdb", "error": "CrdbCluster.crdb.cockroachlabs.com \"crdb\" not found"}
github.com/go-logr/zapr.(*zapLogger).Error
/go/pkg/mod/github.com/go-logr/[email protected]/zapr.go:128
github.com/cockroachdb/cockroach-operator/pkg/controller.(*ClusterReconciler).Reconcile
/workspace/pkg/controller/cluster_controller.go:50
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:256
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:232
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:211
k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:152
k8s.io/apimachinery/pkg/util/wait.JitterUntil
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:153
k8s.io/apimachinery/pkg/util/wait.Until
/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:88
We are using kustomize in this project now and we need documentation around it
This is a larger issue tracking sub-tasks. We need to get this project building within TeamCity.
Note: we need to ensure that #36 will run on the build servers or use a container.
Testing against strings that represent objects is a bit too fragile for my preference. I am wondering if
var expected Pod *v1.Pod
cmp.Diff(expectedPod, actualPod)
Would work better. When we update API or the k8s API is updated I do not want to modify a ton of text documents. Also as I an into with Calico, the objects are even modified by the Kubernetes server. I had to remove the Pod annotations added by calico in order to get the e2e testing to pass.
I like it when unit won't compile when APIs are updated. I am uncertain about how to address this well.
I would love feedback on this design.
I am testing against gke and I am getting a kubectl auth timeout. The container that is running expects to be able to access gcloud binary that is set in my kubeconfig file. This command is the helper command to refresh the auth token for the cluster.
For instance:
cmd-path: /Users/clove/Downloads/google-cloud-sdk/bin/gcloud
The container needs to run gcloud auth inside of the container, and not rely on my kubeconfig file.
gcloud container clusters get-credentials "$CLUSTER_NAME" --zone "$ZONE"
There are a bunch of ways to fix this, but the auth for kubectl is different between cloud providers and k8s cluster types on cloud providers. I would like this to work against multiple different cloud providers, so I need to work on this a bit.
Provide a mechanism for updating the pod config for an existing CRDB cluster (add more pods, change persistent volume size, change CPU count, etc)
We can set a storageClassName, but we are not validating its existence
cc @johnrk
List of the first possible task that I need to complete. We can figure out what you want me to knock out first
From the requirements document:
As an operator, I can configure CockroachDB custom resources on a
new cluster in Kubernetes, so that I can initialize a cluster that will meet
my deployment needs.
What is missing currently? Do we have a diff on what still needs implementation in the statefulset?
As an operator, I can perform minor version upgrades via rolling restarts,
so that my application can start using a more stable version of
CockroachDB without losing availability.
Minor version upgrades do not require a finalization step.
Unable to make changes to an existing k8s cluster
As one of the expected features in the K8s Operator MVP, it appears the Operator is not watching for any changes to a Cockroach cluster. When starting up a new cluster, it appears to properly apply the spec.
I think Vlad intended to watch for changes here: https://github.com/cockroachdb/crdb-operator/blob/master/pkg/controller/cluster_controller.go#L103
Test Details
Expected Result: K8s is watching the status of the pods, notices a discrepancy between the status of the pods and their spec, in reconcile function, updates the pods
Actual Result: no restarts happen, when looking at the node in gcp and when opening a crdb client, nothing changed
I need to look at the Pod manifest that we are deploying with the operator. I think it still has the auth token mounted. I should also double-check the container security as well. I might be wrong on the auth token, but we may need to remove it.
When we delete a database we do not seem to delete the PVCs correctly
# you need a bad storage class name or request over quota in the example
# this step has to fail
./hack/apply-apply-crdb-example.sh -c test
# then delete
./hack/apply-delete-crdb-example.sh -c test
You can then see the hanging PVC
$ k get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
datadir-crdb-0 Pending crdb-io1 3m43s
When we do a PR, using k3s is a great idea.
When we release a tag I recommend doing full platform testing, including GKE.
Most of what is in the README.md is developer documentation. We probably should move that into a developer document, and start working on user documentation in the README.md.
Since upgrades are a big deal. I need to do some work on unit testing the code for upgrades.
cockroach-operator/pkg/actor/upgrade.go
Line 38 in bbfc23a
@johnrk I do not have access to configure a project for this repo, but can we do that during a standup?
Downgraded a cluster from cockroachdb/cockroach:v20.1.3 to cockroachdb/cockroach:v20.1.2 and got
I200713 14:46:01.758684 6703 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:02.466477 6703 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:02.466658 6839 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:03.468843 6839 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:03.469082 6703 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:03.758602 6839 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:03.758686 6703 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:04.469214 6839 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:04.469488 6703 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
W200713 14:46:04.593992 234 kv/kvserver/node_liveness.go:563 [n1,liveness-hb] slow heartbeat took 4.5s
W200713 14:46:04.594057 234 kv/kvserver/node_liveness.go:488 [n1,liveness-hb] failed node liveness heartbeat: operation "node liveness heartbeat" timed out after 4.5s
(1) operation "node liveness heartbeat" timed out after 4.5s
Wraps: (2) context deadline exceeded
Error types: (1) *contextutil.TimeoutError (2) context.deadlineExceededError
I200713 14:46:05.471587 6839 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:05.471726 6703 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:06.471659 6839 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:06.472125 6703 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:07.474348 6839 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:07.474422 6703 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:07.577865 227 server/status/runtime.go:498 [n1] runtime stats: 182 MiB RSS, 227 goroutines, 76 MiB/43 MiB/115 MiB GO alloc/idle/total, 19 MiB/25 MiB CGO alloc/total, 14.9 CGO/sec, 1.4/0.7 %(u/s)time, 0.0 %gc (0x), 46 KiB/60 KiB (r/w)net
I200713 14:46:07.759240 6839 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:07.759331 6703 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:08.104190 6839 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:08.474522 6839 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:08.474745 6703 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
W200713 14:46:09.094304 234 kv/kvserver/node_liveness.go:563 [n1,liveness-hb] slow heartbeat took 4.5s
W200713 14:46:09.094403 234 kv/kvserver/node_liveness.go:488 [n1,liveness-hb] failed node liveness heartbeat: operation "node liveness heartbeat" timed out after 4.5s
(1) operation "node liveness heartbeat" timed out after 4.5s
Wraps: (2) context deadline exceeded
Error types: (1) *contextutil.TimeoutError (2) context.deadlineExceededError
I200713 14:46:09.477148 6703 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:09.759314 6703 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:09.759448 6839 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:10.477071 6703 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:10.477388 6839 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:11.479423 6703 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:11.479698 6839 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:11.656282 15212 rpc/nodedialer/nodedialer.go:160 [n1] unable to connect to n3: failed to connect to n3 at crdb-tls-enabled-2.crdb-tls-enabled.default.svc.cluster.local:26257: initial connection heartbeat failed: rpc error: code = Unknown desc = client requested node ID 3 doesn't match server node ID 4
I200713 14:46:11.759730 6839 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:11.760068 6703 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:11.975049 6839 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:11.975415 6703 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:12.479738 6839 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:12.480122 6703 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:13.483348 6839 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:26257
I200713 14:46:13.483692 6703 gossip/server.go:227 [n1] received initial cluster-verification connection from crdb-tls-enabled-1.crdb-tls-enabled.default.svc.cluster.local:262
Pods are not starting
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.