Giter Club home page Giter Club logo

Comments (2)

tomharrisonjr avatar tomharrisonjr commented on July 21, 2024

@dmitry I agree with this comment. I implemented a home-grown solution at a previous company, not very elegant but following these basic ideas:

  • database would be a copy from production
  • all user emails replaced with <user.id>@example.com
  • all password hashes replaced with nil or some garbage
  • except for several internal accounts (admin, qa, and other internal users)
  • transaction amounts, phone, address, etc as you have said would be replaced with dummy
  • except for those linked to above internal accounts

The approach would be cleaner with this gem, but these were some of the objectives:

  • I needed to dump the database on the (very secure) production environment
  • did this with a nightly cron task
  • this also acted as a backup
  • from this dump, loaded a separate database instance (also in production)
  • separate database is where the anonymizing tasks above were performed
  • then dump the anonymized database to a file, and move to a separate user account location
  • user account was accessible to allowed users via SSH keys within VPC
  • a symlink like latest_db_dump.sql.gz was created

This left me with:

  • a nightly production backup dump in a secure location not accessible to anyone other than admins
  • a nightly refresh of the anonymized db that was SSH-accessible to developers, qa, etc at a known endpoint/filename
  • finally, a task (like the ones in this gem) that could load the dump to replace non-production instances

Complicated, to be sure, but we were dealing with people's money, and all sorts of demanding security requirements.

Another consideration (just for completeness... :-)

When the production database gets big, it would be great (albeit hard) to take a sample of the database. Of course the problem is that you can't just take any old records -- you need a set of related records. I think if one was diligent about declaring Rails model relationships, it could be possible to follow a key entity or two (maybe User) and pull some sample of relationships (e.g. User has many transactions, and has a profile, and roles, and so on ... all depending on your schema).

from capistrano-db-tasks.

dmitry avatar dmitry commented on July 21, 2024

@tomharrisonjr thank you for the detailed explanation of your inside. I will try to split all those points into a tasks when I get enough time :)

from capistrano-db-tasks.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.