Giter Club home page Giter Club logo

ukgovdatascience.dotfiles's Introduction

GitHub tag GitHub commits

Tools to prevent the accidental commit of sensitive data to github

Two things to note:

1) If you are working with data that you don't want to accidentally appear on github the best way to ensure against accidental commits is to ensure that the data are not stored within repository in the first place. However using the tools described here provide a second layer of protection to ensure that accidents don't happen. That said: employ common sense!

2) These are templates, and are best edited to suit your needs.

Contents

This repo contains:

  • A template .gitignore file which will prevent git from seeing the most commonly used data formats.
  • A git_template which, if used, will install a pre-commit and pre-push hook into all new repositories to prevent data in common formats from being commited or pushed by accident.

If used correctly, the templates here will:

  • Prevent git from seeing common data formats and any files containing 'OFFICIAL'.
  • If the user attempts to commit these files (for example by using git add -f, at the point of commit and push, the user will not be able to proceed without explicitly setting a VARIABLE=1 git commit, where VARIABLE refers to the particular check being made. Documentation for each of these is given when the error occurs, and does not need to be provided here.

This repo is based on https://github.com/MastodonC/dotfiles.

Usage

First clone the repository and navigate to it:

git clone https://github.com/ukgovdatascience/dotfiles
cd dotfiles

Template .gitignore

A template .gitignore file is included in the repository. Using this file will condition git to ignore the following file formats:

  • Any file containing the word OFFICIAL
  • Common text file formats: csv, txt
  • Excel file formats: xls, xlsx, xlsm, xlst
  • SPSS file formats: sav
  • SAS file formats: sas7bdat
  • Open office sheets files: ods
  • Google sheets files: gsheet
  • Some database formats: db, sqlite
  • Feather format: feather
  • Pickle objects: pkl, pickle
  • R data files: RData, Rds

Additional file formats copied from https://gist.github.com/octocat/9257657:

  • Compressed file formats
  • Common system files (e.g. thumbs.db)

There are a number of ways one might use the .gitignore file:

Copy into the root of a repository

Copy the file into the root of a git repository and edit as approproate.

Set as .gitignore_global

To use for ALL git repos, the template can be set globally. Note that this is likely to be very annoying in its current form, so should be edited as appropriate, otherwise trivial files are likely to be invisible to git. Even ignored files can be added with git add -f, however you need to realise that they are being ignored!

cp .gitignore ~/.gitignore_global
git config --global core.excludesfile ~/.gitignore_global

Note that if you set this as the .gitignore_global, you can exclude individual repositories by running:

git config --local --unset core.excludesfile

The above can be used with --global to stop using it as the global .gitignore.

More information about ignoring files can be found here: https://help.github.com/articles/ignoring-files/

Note: In default state using this as a default .gitignore is likely to be very annoying! You are advised to use the default then customise it in each directory as required.

pre-commit and pre-push hooks

These hooks will check, at the point of each commit and push, that the the code does not contain any of the following:

  • AWS keys (determined by regex)
  • Private SSH keys (determined by regex)
  • .pem files
  • Various data formats:
    • xls, xlsx, xlsm, xlst,
    • csv, txt,
    • sav,
    • db, sqlite,
    • feather, pkl, pickle,
    • ods, gsheet,
    • Rdata, Rds
    • ipython/jupyer notebooks (ipynb)

Install in all new repositories

** Note that this will overwrite any prior pre-commit and pre-push hooks you may have already installed**

cp -r git_template ~/.git_template

git config --global init.templatedir '~/.git_template'

Install into any pre-existing repositories

Having done the previous step, run the following to sync git hooks with the defaults in ~/.git_template:

$(git config --path --get init.templatedir)/update.sh

Install all pre-existing repositories in a directory

The following code will cycle through all directories in a folder (assuming they are all managed by git), and install the default hooks into each of them:

current=$(pwd); for i in $(ls .); do echo $i; cd $i ; $(git config --path --get init.templatedir)/update.sh; cd $current done

Fix grep for Mac OS X users

Mac OS X seems to have an old grep version by default. to fix this, the homebrew version is a solution. If you get the following message when you make a commit:

usage: grep [-abcDEFGHhIiJLlmnOoqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
        [-e pattern] [-f file] [--binary-files=value] [--color=when]
        [--context[=num]] [--directories=action] [--label] [--line-buffered]
        [--null] [pattern] [file ...]
usage: grep [-abcDEFGHhIiJLlmnOoqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
        [-e pattern] [-f file] [--binary-files=value] [--color=when]
        [--context[=num]] [--directories=action] [--label] [--line-buffered]
        [--null] [pattern] [file ...]
usage: grep [-abcDEFGHhIiJLlmnOoqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
        [-e pattern] [-f file] [--binary-files=value] [--color=when]
        [--context[=num]] [--directories=action] [--label] [--line-buffered]
        [--null] [pattern] [file ...]
usage: grep [-abcDEFGHhIiJLlmnOoqRSsUVvwxZ] [-A num] [-B num] [-C[num]]
        [-e pattern] [-f file] [--binary-files=value] [--color=when]
        [--context[=num]] [--directories=action] [--label] [--line-buffered]
        [--null] [pattern] [file ...]

you will need to update grep using:

brew install grep --with-default-names

this will replace grep. Make sure your PATH has /usr/local/bin preceding /usr/bin or /bin

ukgovdatascience.dotfiles's People

Contributors

elisehuard avatar ivyleavedtoadflax avatar mammykins avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.