There are a couple of issues with current backup/restore workflow:

Consider backing up application artifacts and configuration instead of database and blobstore about bosh-backup-and-restore HOT 3 CLOSED

cloudfoundry commented on August 27, 2024

Consider backing up application artifacts and configuration instead of database and blobstore

from bosh-backup-and-restore.

Comments (3)

cf-gitbot commented on August 27, 2024

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/163732535

The labels on this github issue will be updated when the story is started.

from bosh-backup-and-restore.

glestaris commented on August 27, 2024

Hey Sergey,

I know we’ve talked about this GH issue on a number of occasions. Nevertheless, this issue warrants a public response for anyone else reading.

Pain point: “It is very complex and includes a lot of manual steps”

Backup is as simple as bbr backup -d CF-DEPLOYMENT-NAME. Restore is done with bbr restore -d CF-DEPLOYMENT-NAME. Presumably, the complexity you mentioned lives in re-creating the infrastructure rather than the restore process itself. Would you be able to elaborate on that?

Pain point: “Backup has API downtime”

This is a design choice. We lock the API in order to maintain referential integrity between CCDB entries and blobstore objects. If the API is not locked, and changes are being made during backup (e.g.: an app is being updated), the backup may not be restorable. Your suggested solution would have this problem if the app is being updated during backup.

Pain point: “We can't restore in a different environment.”

This is a broad statement. It depends on what different means and what the recovery use case is. Restoring in a different environment is possible and supported.

The complication with doing so is service bindings to on-platform services. The IP addresses of service instances may change depending on the IP ranges of your service broker and how the broker provisions instances. The solution you propose would have the same issue unless you add a smart way to reprovision service instances and rebind them to applications.

Pain point: “We can't restore in a live foundation.”

You can, you probably don’t want to. I assume this refers to the all-or-nothing nature of restore. Yes, if you restore an old backup to a live foundation some changes will be reverted. That’s inherent to the nature of backup and restore. The older the backup the harder to restore.
However, I do hear the problem with our all-or-nothing approach. We are considering app-level backups as a mitigation.

Pain point: “Because of the issues 1 and 2 we can't test our backup/restore procedure in a production environment. And that is a real problem for us - we did backup/restore test in infra environment, but we never tested it in prod. When production issue occurred, that required restore, we figured out that we can't use our production backups. (it was our own fault, not bbr issue, but if we were able to test the whole procedure in advance we could avoid that)”

I think we need to reassess this statement given my previous answers.

Pain point: “Backup/restore is very slow and backup files are huge (almost terabyte in our case ) Because of that we have to setup a dedicated concourse workers for backup and allocate a huge S3 bucket. Also we can't backup often - at most once a day, so our backups may already become outdated by the time we need to restore.”

Backup size and duration are, unfortunately, relative to the foundation size and the blobstore technology. We recommend external blobstores (S3, Azure, GCS) for faster and smaller backups.
We recently introduced incremental backups for blobstores that are backed by S3. This is a feature you will get for free if using latest CF-D. We are also working on selective backups which will soon be available for internal blobstores (next CAPI release) and are currently available for external blobstores (Azure, S3, GCS). With selective backups, you can choose to exclude droplets and packages from the backup artifact which will reduce the duration and size of the backup.

Pain point: “Restore is "all or nothing" process - we can't do partial restores, for example, restore only apps in a particular org if they got corrupted for whatever reason, or restore only PCF on a different bosh director.”

I hear that and as I mentioned before we are considering app-level backups as a mitigation.

Because of those issues on PCF dojos Pivotal anchors usually recommend to repave the foundation and re-push all your apps. This works very well if you automate your deployments and force your developers to push all their apps via a single CI/CD pipeline.

Most platform teams cannot guarantee that app devs on their platform will be able to repush. Additionally, we recently had a user who had an issue with one of their on-platform services. This was not a complete disaster and all they had to do is to recover their service bindings table. Backups can be useful in these scenarios to avoid alarming developers and asking them to repush all the workloads. We should not assume repushing workloads is cheap.

Having said that, repave instead of restore makes a lot of sense when there is a small set of workloads and repushing is easy.

In our case we have automation in place, but, because of our internal processes, we can't easily re-push all applications.

Excactly.

I’d be very keen to see what you came up with. The solution you propose includes ideas we’ve been pondering with for a while. Please stay in touch.

from bosh-backup-and-restore.

terminatingcode commented on August 27, 2024

Closing due to inactivity

from bosh-backup-and-restore.

Consider backing up application artifacts and configuration instead of database and blobstore about bosh-backup-and-restore HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent