Giter Club home page Giter Club logo

Comments (7)

vedantthapa avatar vedantthapa commented on August 23, 2024 1

Think it might be due to a change in the referenced cluster's name:

cluster:
name: cpho-postgres14-cluster

Looks like these changes were not applied as the resource logs still point to the old name:

...
Spec:
  Cluster:
    Name:  cpho-postgres-cluster
Events:
  Type     Reason          Age                     From                   Message
  ----     ------          ----                    ----                   -------
  Warning  FindingCluster  4m32s (x1582 over 13h)  cloudnative-pg-backup  Unknown cluster cpho-postgres-cluster, will retry in 30 seconds

Flux is not watching these manifests so a manual apply is required. I've applied the necessary changes.

For future, we can separate out the Backup (triggers an on-demand backup) resource from ScheduledBackup and have flux sync the latter. What do you think?

from cpho-phase2.

Stephen-ONeil avatar Stephen-ONeil commented on August 23, 2024 1

Syncing ScheduledBackup sounds good, should keep it more reliable.

So setting/keeping the necessary service account role is a click-ops job right now? If it's not documented anywhere, could you add it as a comment or note somewhere, for future maintainers? We'll have to make that part of the bootstrap/IaC down the line.

from cpho-phase2.

vedantthapa avatar vedantthapa commented on August 23, 2024 1

The ability to load from backups works as well.

> k get cluster -n server
NAME                                     AGE     INSTANCES   READY   STATUS                     PRIMARY
cpho-postgres14-cluster                  28d     3           3       Cluster in healthy state   cpho-postgres14-cluster-2
cpho-postgres14-cluster-object-storage   9m37s   3           2       Creating a new replica     cpho-postgres14-cluster-object-storage-1

Again, the workloadIdentityUser was required for the backup resource's k8s service account. In general service accounts are created with the resource's metadata.name.

@Stephen-ONeil couid you please confirm if the data is consistent? I'm almost sure about this but it'd be good have a second set of eyes. Once done, we can take the recovered instance down.

from cpho-phase2.

vedantthapa avatar vedantthapa commented on August 23, 2024

The backups were failing with walArchiveFailing error after applying the updates. I guess this was somewhat related, since the K8s service account name was changed, therefore, the workloadIdentityUser role had to be re-applied to the cloud service account.

from cpho-phase2.

Stephen-ONeil avatar Stephen-ONeil commented on August 23, 2024

Spot checks look good, go ahead and kill the backups. No one's using or relying on the app being live right now, so I think it's worth recovering to the in-use DB for the confidence that there won't be any gotchas to that too.

from cpho-phase2.

vedantthapa avatar vedantthapa commented on August 23, 2024

I can confirm that all recovery mechanisms seem to be working as expected. I've pushed some docs regarding the resources to #162 and added some comments at relevant places.

re: retention policy; is there something specific you have in mind aside from -

retentionPolicy: "30d"

Here's some more info on what it implies.

from cpho-phase2.

Stephen-ONeil avatar Stephen-ONeil commented on August 23, 2024

Awesome! That's sufficient for retention policy, thanks 👍

from cpho-phase2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.