k8up-io / k8up Goto Github PK
View Code? Open in Web Editor NEWKubernetes and OpenShift Backup Operator
Home Page: https://k8up.io/
License: Apache License 2.0
Kubernetes and OpenShift Backup Operator
Home Page: https://k8up.io/
License: Apache License 2.0
The operator should have the ability to limit how many jobs should get run concurrently.
This limit should be configurable by job type (prune, backup, check, etc.)
Add unit tests to all packages in the operator.
Some operator-sdk guidelines on adding tests: https://sdk.operatorframework.io/docs/building-operators/golang/testing/
With #114 merged the new implementation lives on the development branch.
Description of the issue:
The k8up application does not provide options to set the region.
I think this makes it only compatible with
us-east-1
onlygithub.com/minio/minio-go/pull/1188
Error message:
Connection to S3 endpoint not possible: The authorization header is malformed; the region 'us-east-1' is wrong; expecting 'xx-yyy'
k8up version: docker.io/vshn/k8up:v0.1.6
Proposed Solution:
Allow region as another env variable
As K8up User
I want more information in the status fields of objects
So that I know in what state my backups and restores are and where to investigate failures.
Error handling is currently done via logs. But we cannot assume that every K8up user has access to view the K8up operator Logs. Thus we should incorporate error handling directly via Status field of the affected resource.
Given an existing K8up CRD object
When K8up reconciles an object
Then the object's .status
field should be updated
Given an existing K8up CRD object
When K8up encounters an error during reconcile
Then the object's .status
field should be updated with a clear error message
Given an existing K8up CRD object with error in the status field
When K8up reconciles successfully after a failed attempt
Then the object's .status
field should be cleared from any errors.
As a user
I want to have composable configuration
So that so I can re-use defaults and repository configs
Currently, defaults for backups can be defined by setting environment variables. While this serves its purpose, it's not very flexible and rather cumbersome to configure.
So I propose a new way to configure and schedule backups. We should split the configuration:
Given a `k8up.io/v2/Schedule` spec
When I refer a `k8up.io/v2/JobPlan` spec
Then K8up can spawn backups using the configuration provided in the `JobPlan` spec.
Given a `k8up.io/v2/JobPlan` spec
When I specify what PVCs I want backed up
Then K8up will only backup those PVCs
Given a `k8up.io/v2/JobPlan` spec
When I specify what pods I want for prebackup commands
Then K8up will only backup those pods via prebackup commands
Add the check job logic to the new implementation: https://github.com/vshn/k8up/tree/master/service/archive
The controller is already implemented for that. Now we need to implement an Executor that actually triggers the job.
With #114 merged the new implementation lives on the development branch.
Dear all
I found another bug. Here is what I did:
Deleting the pods with STATUS=ContainerCreating gives them a second attempt and usually works.
Unless at least one of the PVCs have been deleted in the mean time.
Then all jobs created will be STATUS=pending forever default-scheduler persistentvolumeclaim "gitlab-prometheus" not found
.
This could also happen if between the automated creation of a job and the execution of the pod any of the found PVCs get deleted.
Workaround: delete old entries in jobs.batch
manually.
Are there plans to automate this? Or mitigate it in another way`
From the looks of it, k8up then stops creating new backups.
Cheers,
Stefan
Implement a mechanism to backup Kubernetes objects. Probably based on #12.
refs APPU-1626
As K8up user
I want to interact with K8up snapshots via the K8s API
So that I don't need any other tools to trigger a restore\
Right now if a user wants to trigger a restore we need to use restic to find the right snapshot. After the right snapshot is found we then have to create a K8up restore with the given ID.
Given a K8up snapshot,
When using `kubectl -n $namespace get snapshots`,
Then present an accurate list of snapshots to the user for this namespace
Given a K8up snapshot,
When using `kubectl -n $namespace get snapshots`,
Then list all snapshots for this namespace,
And include paths, tags, date and ID for each snapshot
Given a K8up snapshot,
When listing it via the K8s API for namespace `default`,
Then only list snapshots from the namespace `default`
restic snapshots
needs the repository configurationshostname
field for PVC backupsHi all
I am currently guessing from the code that it should work without a functioning backup.
However, is a prebackup-pod needed? (Suggested by #10)
Best,
Stefan
Given the announcement of Project Syn
Is project syn going to take precedence of this repository ?
Should I skip straight to project syn or will support for k8up continue ?
Thanks !
Add the check job logic to the new implementation: https://github.com/vshn/k8up/tree/master/service/prune
The controller is already implemented for that. Now we need to implement an Executor that actually triggers the job.
That one should be quite trivial to implement.
With #114 merged the new implementation lives on the development branch.
The restic command seems to run with uid 1001 and is therfor not able to backup files owned by other users (e.G. root).
Is there a possibility to run the restic command as root (by setting the securityContext of the job pod or so..)?
As K8up user
I want to specify smart schedules
So that I can let K8up figure out optimal schedules for optimal resource usage
The scheduler should be able accept "auto" or "smart" schedules.
These can be like for crons:
They should behave in such a way that the jobs will be run at least once during that defined time. The old cron syntax should still be supported for use cases where specific time is necessary.
The idea for the auto schedules would be to be triggered at any time in the given frame. For example daily should mean that the job should run at least once every day sometime between 00:00 and 23:59. When exactly should be determined by the operator.
This feature is intended mostly for jobs that need exclusive access to the backup repository like prune and check. They don't have any impact on the applications and can thus run whenever no backups are running.
For prune and check jobs the operator has to figure out the best time between backups to a repository when the jobs can run. One idea could be that the prune could be triggered right after all backups have finished. That would eliminate the need for a separate prune schedule completely.
Additional from #118:
Also the cron library used by k8up has some predefined schedules like @daily
or @weekly
so for the auto schedules we'd have to define them without an @ as not to break those pre-defined schedules. https://pkg.go.dev/github.com/robfig/cron#hdr-Predefined_schedules
And finally, the cron library supports intervals which could come in very handy for this feature.
Given a schedulable K8up object
When a standardized cron syntax is specified
Then schedule the resource at the specified times
(this keeps existing functionality)
Given a schedulable K8up object
When a non-standardized predefined cron syntax is specified
Then schedule the resource at randomized times within the given timeframe, with a stable randomization seed
(e.g. when @hourly-randomized
this could result in <random number between 0 and 59> * * * *
schedule, contrast to predefined 0 * * * *
, which would defeat the purpose of smart scheduling)
Given a schedulable K8up object
When a non-standardized predefined cron syntax with @every
is specified
Then schedule the resource at randomized times with the given interval
(e.g. when @every 2h
this could result in a schedule that runs every 2h with a random start time)
Given a schedulable K8up object
When any sort of non-standardized predefined cron syntax is specified
Then store the resulting schedule in the status field of the object
(to make the actual schedule transparent for the user)
@hourly
-> same as https://pkg.go.dev/github.com/robfig/cron#hdr-Predefined_schedules@hourly-randomized
-> this will trigger randomization of the schedule and then generate the actual, standard cron syntax@daily-randomized
) and fill that until we get a stable, random schedule like 23 5 * * *
Dear vshn team
It would be very useful to have a control per namespace to set the default behaviour for the PVCs,
I am guessing this should be possible by modifying this section.
https://github.com/vshn/k8up/blob/bf8c6386bb50d31a14aa73addffb6a30a5dd1c4e/service/backup/backupRunner.go#L116-L123
Cheers
Add the check job logic to the new implementation: https://github.com/vshn/k8up/tree/master/service/check
The controller is already implemented for that. Now we need to implement an Executor that actually triggers the job.
That one should be quite trivial to implement.
With #114 merged the new implementation lives on the development
branch.
The operator should have the ability to limit how many jobs should get run concurrently.
This limit should be configurable by job type (prune, backup, check, etc.)
Automate the docs handling and get the docs up-to-date into a state where every Kubernetes admin is able to install and handle K8up and every Kubernetes user is able to get the best out of K8up custom resources.
refs APPU-1545
When I try to create a PVC backup, the following error occurs:
No repository available, initialising...
created restic repository 7ebf084d04 at s3:https://minio-backup.local.example/volumes
Please note that knowledge of your password is required to access
the repository. Losing your password means that your data is
irrecoverably lost.
Removing locks...
created new cache in /.cache/restic
successfully removed locks
Listing all pods with annotation k8up.syn.tools/backupcommand in namespace test
Listing snapshots
snapshots command:
0 Snapshots
backing up...
Starting backup for folder test-claim0
could not parse restic output: invalid character 'S' looking for beginning of value
could not parse restic output: invalid character 'S' looking for beginning of value
could not parse restic output: invalid character 'S' looking for beginning of value
could not parse restic output: invalid character 'S' looking for beginning of value
could not parse restic output: invalid character 'S' looking for beginning of value
Did I do something wrong? I've retried the backup job multiple times.
Log the version number and build information when the operator starts.
I created the following Schedule:
apiVersion: backup.appuio.ch/v1alpha1
kind: Schedule
metadata:
name: backup-netbox
spec:
archive:
schedule: '0 0 1 * *'
backup:
schedule: '*/5 * * * *'
keepJobs: 6
check:
schedule: '0 1 * * 1'
prune:
schedule: '0 1 * * 0'
retention:
keepLast: 5
keepDaily: 14
and then a Deployment with a Pod and the following Annotation:
appuio.ch/backupcommand: PGPASSWORD=$(cat /etc/secrets/stolon-stolonsupg_su_password) pg_dump -h stolon-proxy -U $(cat /etc/secrets/stolon-stolonsupg_su_username) -d netbox
All PVC's in the Namespace have appuio.ch/backup: "false"
and then the Operator crashes:
2019/05/22 15:14:14 [INFO] Registering prune schedule backup-netbox in namespace netbox
2019/05/22 15:14:14 [INFO] Registering check schedule backup-netbox in namespace netbox
2019/05/22 15:14:14 [INFO] Registering backup schedule backup-netbox in namespace netbox
E0522 15:14:14.010293 1 runtime.go:66] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
/go/src/github.com/vshn/k8up/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:72
/go/src/github.com/vshn/k8up/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/go/src/github.com/vshn/k8up/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:522
/usr/local/go/src/runtime/panic.go:513
/usr/local/go/src/runtime/panic.go:82
/usr/local/go/src/runtime/signal_unix.go:390
/go/src/github.com/vshn/k8up/service/schedule/scheduleRunner.go:225
/go/src/github.com/vshn/k8up/service/schedule/scheduler.go:59
/go/src/github.com/vshn/k8up/operator/handler.go:29
/go/src/github.com/vshn/k8up/vendor/github.com/spotahome/kooper/operator/controller/generic.go:305
/go/src/github.com/vshn/k8up/vendor/github.com/spotahome/kooper/operator/controller/generic.go:279
/go/src/github.com/vshn/k8up/vendor/github.com/spotahome/kooper/operator/controller/generic.go:248
/go/src/github.com/vshn/k8up/vendor/github.com/spotahome/kooper/operator/controller/generic.go:224
/go/src/github.com/vshn/k8up/vendor/github.com/spotahome/kooper/operator/controller/generic.go:208
/go/src/github.com/vshn/k8up/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/go/src/github.com/vshn/k8up/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/go/src/github.com/vshn/k8up/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/go/src/github.com/vshn/k8up/vendor/github.com/spotahome/kooper/operator/controller/generic.go:208
/usr/local/go/src/runtime/asm_amd64.s:1333
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xfb3047]
goroutine 192 [running]:
github.com/vshn/k8up/vendor/k8s.io/apimachinery/pkg/util/runtime.HandleCrash(0x0, 0x0, 0x0)
/go/src/github.com/vshn/k8up/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:58 +0x108
panic(0x1123fa0, 0x1e02a20)
/usr/local/go/src/runtime/panic.go:513 +0x1b9
github.com/vshn/k8up/service/schedule.(*scheduleRunner).Start(0xc00074e1e0, 0x10c2900, 0xc0002a0ae0)
/go/src/github.com/vshn/k8up/service/schedule/scheduleRunner.go:225 +0x877
github.com/vshn/k8up/service/schedule.(*Schedule).Ensure(0xc00036ff80, 0x13cb0a0, 0xc0000b65a0, 0x4, 0x4)
/go/src/github.com/vshn/k8up/service/schedule/scheduler.go:59 +0x381
github.com/vshn/k8up/operator.(*handler).Add(0xc0003ae800, 0x13e0600, 0xc0002f6180, 0x13cb0a0, 0xc0000b65a0, 0x13f2b40, 0x1e37598)
/go/src/github.com/vshn/k8up/operator/handler.go:29 +0x47
github.com/vshn/k8up/vendor/github.com/spotahome/kooper/operator/controller.(*generic).handleAdd(0xc0003de210, 0x13e0600, 0xc0002f6180, 0xc0003c0620, 0x14, 0x13cb0a0, 0xc0000b65a0, 0x0, 0x0)
/go/src/github.com/vshn/k8up/vendor/github.com/spotahome/kooper/operator/controller/generic.go:305 +0x4b1
github.com/vshn/k8up/vendor/github.com/spotahome/kooper/operator/controller.(*generic).processJob(0xc0003de210, 0x13e0600, 0xc0002f6150, 0xc0003c0620, 0x14, 0xc0002f6150, 0x13f2b40)
/go/src/github.com/vshn/k8up/vendor/github.com/spotahome/kooper/operator/controller/generic.go:279 +0xf7
github.com/vshn/k8up/vendor/github.com/spotahome/kooper/operator/controller.(*generic).getAndProcessNextJob(0xc0003de210, 0xc0001ac500)
/go/src/github.com/vshn/k8up/vendor/github.com/spotahome/kooper/operator/controller/generic.go:248 +0x21a
github.com/vshn/k8up/vendor/github.com/spotahome/kooper/operator/controller.(*generic).runWorker(0xc0003de210)
/go/src/github.com/vshn/k8up/vendor/github.com/spotahome/kooper/operator/controller/generic.go:224 +0x2b
github.com/vshn/k8up/vendor/github.com/spotahome/kooper/operator/controller.(*generic).runWorker-fm()
/go/src/github.com/vshn/k8up/vendor/github.com/spotahome/kooper/operator/controller/generic.go:208 +0x2a
github.com/vshn/k8up/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil.func1(0xc00005d7b0)
/go/src/github.com/vshn/k8up/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133 +0x54
github.com/vshn/k8up/vendor/k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc000a11fb0, 0x3b9aca00, 0x0, 0x1, 0xc00003e900)
/go/src/github.com/vshn/k8up/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134 +0xbe
github.com/vshn/k8up/vendor/k8s.io/apimachinery/pkg/util/wait.Until(0xc00005d7b0, 0x3b9aca00, 0xc00003e900)
/go/src/github.com/vshn/k8up/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88 +0x4d
github.com/vshn/k8up/vendor/github.com/spotahome/kooper/operator/controller.(*generic).run.func1(0xc0003de210, 0xc00003e900)
/go/src/github.com/vshn/k8up/vendor/github.com/spotahome/kooper/operator/controller/generic.go:208 +0x5c
created by github.com/vshn/k8up/vendor/github.com/spotahome/kooper/operator/controller.(*generic).run
/go/src/github.com/vshn/k8up/vendor/github.com/spotahome/kooper/operator/controller/generic.go:207 +0x1e1
not yet sure, if I'm just missing anything in the Schedule CRD, but I guess the Operator should be more robust.
the Operator runs with the following ENV's
- name: BACKUP_IMAGE
value: docker.io/vshn/wrestic:v0.0.10
- name: BACKUP_GLOBALS3ENDPOINT
value: https://objects.cloudscale.ch
- name: BACKUP_GLOBALS3BUCKET
value: k8up_backup
- name: BACKUP_GLOBALACCESSKEYID
value: ....
- name: BACKUP_GLOBALSECRETACCESSKEY
value: ....
- name: BACKUP_GLOBALREPOPASSWORD
value: ....
- name: BACKUP_GLOBALRESTORES3ENDPOINT
value: https://objects.cloudscale.ch
- name: BACKUP_GLOBALRESTORES3BUCKET
value: k8up_restore
- name: BACKUP_GLOBALRESTORES3ACCESKEYID
value: ...
- name: BACKUP_GLOBALRESTORES3SECRETACCESSKEY
value: ...
As K8up admin
I want to override the default resource request and limits of Pods generated by K8up
So that I can optimize resource usage or comply with cluster or namespace resource policies.
On clusters with a default pod cpu/memory limit, backups jobs are currently limited to these default values, because there is no possibility to override or remove them from k8up.
Resource Limits
Type Resource Min Max Default Request Default Limit Max Limit/Request Ratio
---- -------- --- --- --------------- ------------- -----------------------
Container cpu 1m - 100m 250m -
Container memory 1Mi - 128Mi 256Mi -
This limits the use cases on such clusters heavily.
Given a K8up Schedule
object with per-schedule-specified resources
When K8up schedules Jobs
Then the containers in Pods are scheduled with configured resource request and limits
Given a K8up Schedule
object outside of cluster admin's responsibility
When K8up schedules Jobs
Then the containers in Pods are scheduled with configured global default resource request and limits
(in a multi-tenant cluster, customers can create schedules, while cluster-admins can provide global defaults in case customer doesn't define those)
global defaults < schedule defaults < job type specifics
(right overrides left)Good evening
I think I found a bug with the cleanup of the finished jobs.
What I did:
What I expected:
Workaround:
kubectl get jobs --all-namespaces | rg backupjob | rg 1/1
Then delete all completed jobs manually.
Cheers,
Stefan
The rewrite currently doesn't contain any k8up metrics.
The operator-sdk provides a default metrics endpoint though. So use that one to add the custom metrics, if possible.
Most metrics are defined in the observer or the scheduling in the old version:
https://github.com/vshn/k8up/blob/master/service/schedule/scheduleRunner.go#L30
https://github.com/vshn/k8up/blob/master/service/observe/subscription.go#L23
With #114 merged the new implementation lives on the development branch.
For #16 we need to disable automatic CRD management by the operator. At least make it configurable on operator startup.
OLM wants to manage the CRD itself and isn't happy if an operator does CRD management.
the BACKUP_FILEEXTENSIONANNOTATION
and BACKUP_BACKUPCOMMANDANNOTATION
env variables are used to set the annotations on the PreBackupPods Pods created by wrestic, this is great.
The problem is that these env variables are not passed into the backup pod itself, so wrestic is still searching for it's own defaults.
With #91 the defaults in k8up have changed to be k8up.syn.tools/*
which is great, but they have not been updated in wrestic, this will be fixed with k8up-io/wrestic#21. But in my understanding just changing them in k8up should be enough as they should be passed to the backup pod by k8up where they are then picked up by wrestic?
currently we're on 0.19. It's probably easier to upgrade to the latest version of operator-sdk before adding lots of code.
Migration guide:
https://sdk.operatorframework.io/docs/building-operators/golang/migration/
If a prune job is already running for a given repository and another one is triggered (either manually or some schedule), it should get skipped.
This ensures that multiple prune jobs don't clog up the operator and starve out the time for actual backups.
The same applies to check jobs, too.
When I try to create a PVC backup, the following error occurs:
Starting backup for folder nextcloud-claim0
done: 0.00%
done: 63.95%
error cannot open on file /data/nextcloud-claim0/config/config.php
backup finished! new files: 0 changed files: 0 bytes added: 356
Listing snapshots
snapshots command:
35 Snapshots
config.php stat:
-rw-r----- 1 www-data www-data 1.6K Aug 1 21:17 config.php
This permission error occurs in other containers too.
If a prune job is already running for a given repository and another one is triggered (either manually or some schedule), it should get skipped.
This ensures that multiple prune jobs don't clog up the operator and starve out the time for actual backups.
The same applies to check jobs, too.
I was instructed in this HN thread:
https://news.ycombinator.com/item?id=20772971
... to formally request a working SFTP remote. We (rsync.net) already support restic and would like to give our customers a recipe (and tech support) for pointing k8up at our cloud storage platform. Since we offer a stock, standard OpenSSH interface, the SFTP remote is what we would look at ...
I'm happy to give a free account to vshn/k8up for testing, etc., but that's probably superfluous since it's not any different than any other SFTP login you already have ...
With #154 we have set the groundwork for e2e testing with KIND. What's left are the actual e2e tests themselves.
In https://github.com/vshn/espejo we set up e2e tests with bash, but it's considered experimental. We have chosen to go with bash for the following reasons:
kubectl apply -f ..
etc)In https://github.com/vshn/wrestic/blob/master/TESTCASES.md, there are a bunch of Testcases, ideal candidates for automation.
There is also the question of concern: Should K8up include e2e testcases where the whole stack is tested? Or should part of the tests be automated in wrestic, while the e2e tests for K8up really test the Operator features and not transparently wrestic also.
The current CRD generated is of version apiextensions.k8s.io/v1beta1
, but some generated properties aren't valid for that version in K8s 1.18+. The generator currently used could do apiextensions.k8s.io/v1
if we upgrade it, but v1
is not available in older Kubernetes server versions (e.g. OpenShift 3.11), that means we would not be able to install K8up.
In order to stay compatible with both Openshift 3.11 and also Kubernetes 1.18+ (Rancher, K3s etc) I propose:
config/crd/apiextensions.k8s.io/v1beta1/
(see #152 )config/crd/apiextensions.k8s.io/v1/
v1
to stay up-to-date with Operator SDK (which will eventually also migrate to controller-gen 0.4+) and K8s API.
v1
by defaultThe CustomResourceDefinition "prebackuppods.backup.appuio.ch" is invalid:
* spec.validation.openAPIV3Schema.properties[spec].properties[pod].properties[spec].properties[initContainers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property
* spec.validation.openAPIV3Schema.properties[spec].properties[pod].properties[spec].properties[containers].items.properties[ports].items.properties[protocol].default: Required value: this property is in x-kubernetes-list-map-keys, so it must have a default or be a required property
make: *** [Makefile:163: setup_e2e_test] Error 1
Warning: apiextensions.k8s.io/v1beta1 CustomResourceDefinition is deprecated in v1.16+, unavailable in v1.22+; use apiextensions.k8s.io/v1 CustomResourceDefinition
Add the check job logic to the new implementation: https://github.com/vshn/k8up/tree/master/service/restore
The controller is already implemented for that. Now we need to implement an Executor that actually triggers the job.
With #114 merged the new implementation lives on the development branch.
As a k8up developer
I don't want to handle the prebackup pod readiness in the operator
So that we can get rid of the complicated asynchronous prebackup pod handling
The backup execution is currently much more complex than the other ones. The main reason for this is, that the backups are handled asynchronously via a go-routine. The reason for this is, that the actual backup job needs to wait to be applied until the prebackup pods are ready. And it can take quite a while until the pods are running. Blocking during that time could result in a reconcile congestion when multiple backups start at the same time.
Given a backup with prebackpods
When the operator triggers it
Then it won't do it asynchronously
As K8up developer
I want automated whitebox tests of K8up internals
So that I can contribute tested features and changes to ensure quality and avoid accidental breaking changes
Currently it's unclear if there's a distinction between integration tests and unit tests.
A potential distinction could be:
But in all honesty, it doesn't really make a difference. However, some IDEs may have difficulties to debug tests when envtest is running if those only run in Makefile
.
Given a K8up PR
When I push code into a feature branch
Then GitHub should run automated tests and indicate to code authors and reviewers of passing or failing tests
I have two PVCs ("data1" and "data2"), I was trying to make a copy for them with k8up, each of them has only one text file named "data.txt" in it.
I'd restore the snapshot into S3, and I got only one file in it
backup-default-data2-2020-01-04T16_27_03Z.tar.gz
:
data
└── data2
└── data.txt
Make K8up compatible with the Operator Lifecycle Manager (OLM) and list the operator on OperatorHub.io.
refs APPU-1705
Add a lot more of monitoring information to the operator to close the gap.
Operator:
Wrestic:
refs APPU-1058
Implement high-availability for the operator so that more than once operator can run per cluster.
refs APPU-1623 and APPU-1625
There's an s
missing in the Acces*s*
part. Unsure what implications this has, but might be unpleasant 😄
hi
I use k8up v0.1.10 and wrestic v.0.2.0.
The backup job throws an error when the backup-job pushes to the prometheus pushgateway:
I0813 23:02:09.739192 1 handler.go:44] wrestic/statsHandler/promStats "level"=0 "msg"="sending prometheus stats" "url"="https://pushgateway.example.com"
E0813 23:02:09.741696 1 backup.go:145] wrestic "msg"="prometheus send failed" "error"="unexpected status code 200 while pushing to https://pushgateway.example.com/metrics/job/restic_backup/instance/backup: "
In the pushgateway I see that the data was send successfully...
I have defined the pushgateway url in the k8up BACKUP_PROMURL
env variable.
Any ideas why this happens?
Restic has capabilities to add tags to snapshots: https://restic.readthedocs.io/en/stable/040_backup.html#tags-for-backup
it would be create to define these tags within a Schedule
object, so we can use these tags to have a bucket a bit organized.
At the same time it would be awesome to define tags in an Archive
object so that the archive job only takes backups with a given tag into consideration.
In the case of Lagoon I would use the environment type (production, staging or development) for this, so we only archive snapshots of production environments.
In a recent workshop we had with APPUiO customers, the developers seemed irritated about having to use restic to list historic snapshots of their backups and/or do a manual restore. Later, as we mentioned the Archive
object, we explained that by design we don't use restic for long-term storage of backups.
Also, the customers - who had specifically asked for a demo of backing up and restoring (Postgres) database data - seemed irritated about the fact that k8up simply backs up data from the file system volume instead of specifically "doing database backups".
Apparently, from the perspective of an application developer, this seems like an inconsistent tool chain and/or user experience. That makes it harder to "sell" the promise that k8up makes backups and restore real simple.
In practical terms, this could be by
k8up
CLI (with an idiomatic, self-explanatory interface), and/orCurrently there are a few technical depts in K8up:
This issue will track the rewrite of following aspects:
Migrate K8up to Project Syn GitHub Organization (https://github.com/projectsyn/), adapt the code, docs and description accordingly and therefore make it a integral part of Project Syn.
K8up needs a home with a good community infrastructure. Project Syn has that and is building up what's needed to be as open to contributors as possible.
[docker.io,quay.io]/vshn/[k8up,wrestic]
to [docker.io,quay.io]/projectsyn/[k8up,wrestic]
[docker.io,quay.io]/vshn/[k8up,wrestic]
later (Date tbd)Dear vshn team
In some part of the docs it says it points at a Prometheus endpoint.
However, in the examples it is the same URL as the minio server.
So my question, what does it do now?
Cheers
The docs currently mention how to setup a backup schedule with restic.
We use kustomize to manage our k8s deployments & overlays
Is there any intend to publish an open source kustomization CRD for apiVersion: backup.appuio.ch/v1alpha1
? It would be especially useful when having multiple S3 buckets so that I could define a variable per OCP project with the correct bucket name in it.
Currently I need to overwrite the whole object in every project:
apiVersion: backup.appuio.ch/v1alpha1
kind: Schedule
metadata:
name: backup-pods
spec:
backend:
s3:
bucket: XXX # This is now for every OCP project different
This could be simplified (with the help of a CRD) to a single base config:
apiVersion: backup.appuio.ch/v1alpha1
kind: Schedule
metadata:
name: backup-pods
spec:
backend:
s3:
bucket: $(BUCKET_NAME)
Implement a generic Pre-Backup Pod mechanism. See PR #10.
refs APPU-1530
Hi
When trying to backup RWO PVCs K8up doesn't check on which node the pod with said PVC mounted is running.
Since RWO volumes can only be mounted multiple times on the same node, K8up should check on which node the pod is running and create the backupjob on the same node.
As of now, the backupjob will never start since it's not running on the same node.
Is it possible to implement that check?
As a workaround we'll try to add an application aware backup command to the pod; basically "tar"-ing all of the files to stdout. But it would be much nicer if we could just use the PVC annotation.
Many thanks,
gi8lino
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.