Giter Club home page Giter Club logo

Comments (3)

cf-gitbot avatar cf-gitbot commented on August 27, 2024

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/163732535

The labels on this github issue will be updated when the story is started.

from bosh-backup-and-restore.

glestaris avatar glestaris commented on August 27, 2024

Hey Sergey,

I know we’ve talked about this GH issue on a number of occasions. Nevertheless, this issue warrants a public response for anyone else reading.

Pain point: “It is very complex and includes a lot of manual steps”

Backup is as simple as bbr backup -d CF-DEPLOYMENT-NAME. Restore is done with bbr restore -d CF-DEPLOYMENT-NAME. Presumably, the complexity you mentioned lives in re-creating the infrastructure rather than the restore process itself. Would you be able to elaborate on that?

Pain point: “Backup has API downtime”

This is a design choice. We lock the API in order to maintain referential integrity between CCDB entries and blobstore objects. If the API is not locked, and changes are being made during backup (e.g.: an app is being updated), the backup may not be restorable. Your suggested solution would have this problem if the app is being updated during backup.

Pain point: “We can't restore in a different environment.”

This is a broad statement. It depends on what different means and what the recovery use case is. Restoring in a different environment is possible and supported.

The complication with doing so is service bindings to on-platform services. The IP addresses of service instances may change depending on the IP ranges of your service broker and how the broker provisions instances. The solution you propose would have the same issue unless you add a smart way to reprovision service instances and rebind them to applications.

Pain point: “We can't restore in a live foundation.”

You can, you probably don’t want to. I assume this refers to the all-or-nothing nature of restore. Yes, if you restore an old backup to a live foundation some changes will be reverted. That’s inherent to the nature of backup and restore. The older the backup the harder to restore.
However, I do hear the problem with our all-or-nothing approach. We are considering app-level backups as a mitigation.

Pain point: “Because of the issues 1 and 2 we can't test our backup/restore procedure in a production environment. And that is a real problem for us - we did backup/restore test in infra environment, but we never tested it in prod. When production issue occurred, that required restore, we figured out that we can't use our production backups. (it was our own fault, not bbr issue, but if we were able to test the whole procedure in advance we could avoid that)”

I think we need to reassess this statement given my previous answers.

Pain point: “Backup/restore is very slow and backup files are huge (almost terabyte in our case ) Because of that we have to setup a dedicated concourse workers for backup and allocate a huge S3 bucket. Also we can't backup often - at most once a day, so our backups may already become outdated by the time we need to restore.”

Backup size and duration are, unfortunately, relative to the foundation size and the blobstore technology. We recommend external blobstores (S3, Azure, GCS) for faster and smaller backups.
We recently introduced incremental backups for blobstores that are backed by S3. This is a feature you will get for free if using latest CF-D. We are also working on selective backups which will soon be available for internal blobstores (next CAPI release) and are currently available for external blobstores (Azure, S3, GCS). With selective backups, you can choose to exclude droplets and packages from the backup artifact which will reduce the duration and size of the backup.

Pain point: “Restore is "all or nothing" process - we can't do partial restores, for example, restore only apps in a particular org if they got corrupted for whatever reason, or restore only PCF on a different bosh director.”

I hear that and as I mentioned before we are considering app-level backups as a mitigation.

Because of those issues on PCF dojos Pivotal anchors usually recommend to repave the foundation and re-push all your apps. This works very well if you automate your deployments and force your developers to push all their apps via a single CI/CD pipeline.

Most platform teams cannot guarantee that app devs on their platform will be able to repush. Additionally, we recently had a user who had an issue with one of their on-platform services. This was not a complete disaster and all they had to do is to recover their service bindings table. Backups can be useful in these scenarios to avoid alarming developers and asking them to repush all the workloads. We should not assume repushing workloads is cheap.

Having said that, repave instead of restore makes a lot of sense when there is a small set of workloads and repushing is easy.

In our case we have automation in place, but, because of our internal processes, we can't easily re-push all applications.

Excactly.

I’d be very keen to see what you came up with. The solution you propose includes ideas we’ve been pondering with for a while. Please stay in touch.

from bosh-backup-and-restore.

terminatingcode avatar terminatingcode commented on August 27, 2024

Closing due to inactivity

from bosh-backup-and-restore.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.