Giter Club home page Giter Club logo

soba's Introduction

soba: backup hosted git repositories

GitHub Release Codacy BadgeCodeQL Go Report Card

about

soba is tool for backing up private and public git repositories hosted on the most popular hosting providers. It generates a git bundle that stores a backup of each repository as a single file.

As unchanged git repositories create identical bundle files, new bundles will only be stored if changes to the repository have been made. This can be done by re-cloning each repository every time soba runs, or by comparing refs without cloning.

soba includes its own scheduler that triggers it to run every specified number of hours, or it can be run with other schedulers such as cron.

quick start

soba can run as a binary or using docker with the prebuilt image distributed with each release. For example, the following will create git bundles of all repositories in your GitHub user's account in the soba-backups directory:

$ mkdir soba-backups
$ docker run --rm -v ./soba-backups:/backups -e GITHUB_TOKEN=<token-here> -e GIT_BACKUP_DIR=/backups jonhadfield/soba:latest

latest updates

1.2.14 release 2024-03-15

  • Fix bug introduced in 1.2.13 where daemon exits on run error

1.2.13 release 2024-03-15

  • Improved error handling to catch and report provider errors
  • Return non-zero exit code for runs with failures
  • Remove pause after run if not daemonized

1.2.12 release 2024-03-13

  • Enable limiting GitHub repo backups to user owned

1.2.11 release 2024-03-10

  • Add support for Azure DevOps respositories

1.2.10 release 2024-03-04

  • Bugfix for notification error handling

See full changelog here.

supported OSes

Tested on Windows 10, MacOS, and Linux (amd64). Not tested, but should also work on builds for: Linux (386, arm386 and arm64), FreeBSD, NetBSD, and OpenBSD.

supported providers

  • Azure DevOps
  • BitBucket
  • Gitea
  • GitHub
  • GitLab

configuration

soba can be run from the command line or as a container. In both cases the only configuration required is an environment variable with the directory in which to create backups, and additional to define credentials for each the providers.

On Windows 10:

  • search for 'environment variables' and choose 'Edit environment variables for your account'
  • choose 'New...' under the top pane and enter the name/key and value for each of the settings

On Linux and MacOS you would set these using:

export GIT_BACKUP_DIR="/repo-backups/"

To set provider credentials see below.

run using the binary

Download the latest release here and then install:

install <soba binary> /usr/local/bin/soba

After setting GIT_BACKUP_DIR, set your provider token(s) as detailed here.

and then run:

soba

run with Docker

Using Docker enables you to run soba without anything else installed.

Docker requires you pass environment variables to the container using the '-e' option and that you mount your preferred backup directory. For example:

docker run --rm -t \
             -v <your backup dir>:/backup \
             -e GIT_BACKUP_DIR='/backup' \
             -e GITHUB_TOKEN='MYGITHUBTOKEN' \
             -e GITLAB_TOKEN='MYGITLABTOKEN' \
             ghcr.io/jonhadfield/soba

To hide credentials, you can instead use exported environment variables and specify using this syntax:

docker run --rm -t \
             -v <your backup dir>:/backup \
             -e GIT_BACKUP_DIR='/backup' \
             -e GITHUB_TOKEN=$GITHUB_TOKEN \
             -e GITLAB_TOKEN=$GITLAB_TOKEN \
             ghcr.io/jonhadfield/soba

run on Kubernetes

For instructions on how to run soba on Kubernetes, see here.

scheduling backups

Backups can be scheduled to run by setting an additional environment variable: GIT_BACKUP_INTERVAL. The value is the can be specified in hours (default) or minutes. For example, this will run the backup daily:

export GIT_BACKUP_INTERVAL=24h

and this will run the backup every 45 minutes:

export GIT_BACKUP_INTERVAL=45m

note:

  • if you don't specify the trailing 'm' or 'h' then hours are assumed.
  • the interval is added to the start of the last backup and not the time it finished, therefore ensure the interval is greater than the duration of a backup.

rotating backups

A new bundle is created every time a change is detected in the repository. To keep only the x most recent, use the following provider specific environment variables: GITEA_BACKUPS=x GITHUB_BACKUPS=x GITLAB_BACKUPS=x BITBUCKET_BACKUPS=x

notifications

webhooks

(since release 1.2.8) To send a webhook on completion of a run: set the environment variable SOBA_WEBHOOK_URL with the url of the endpoint. For example: $ export SOBA_WEBHOOK_URL=https://api.example.com/webhook

webhook payload

The payload is a JSON document containing details of the backup run. The default format lists each repository and the success or failure of its backup. You can see an example here. For a shorter format, with just stats on the success and failure counts, use the environment variable SOBA_WEBOOK_FORMAT. For example: $ export SOBA_WEBHOOK_FORMAT=short You can see a sample here. The default format (if not specified) is long

NOTE: The long format webhook will contain a list of your repos and, if there's an error, may contain other details including URLs. Please keep this in mind when sending to endpoints that may be insecure.

ntfy

(since release 1.2.10) ntfy is a popular service that enables push notifications for desktop and mobile apps. To send a message on completion of a run: set the environment variable SOBA_NTFY_URL with the url of the endpoint. For example: $ export SOBA_NTFY_URL=https://ntfy.sh/example-topic

logging

persistence

Messages are written to stdout and can be persisted by directing to a file, e.g. soba > soba.log

logging to /var/log/soba

create a user called soba: sudo adduser soba create a log directory: sudo mkdir /var/log/soba set user permissions: sudo chown soba /var/log/soba && sudo chmod 700 /var/log/soba switch to soba user: sudo su - soba run soba and direct output: soba > /var/log/soba/soba.log

rotation

Logrotate is a utility that comes with most Linux distributions and removes and/or compresses messages older than a certain number of hours or days. This example assumes you persist the log file to /var/log/soba/soba.log create a file in /etc/logrotate.d/soba with the following content:

/var/log/soba/soba.log {
  rotate 7      # remove backups older than seven days
  daily         # process log file each day
  compress      # compress the backup
  copytruncate  # don't delete the file after backup, but instread truncate
}

Each day, this copy the latest logs to a new file that is then compressed. The existing log file is then truncated. Any backups older than seven days are then removed.

keep running after reboot

In case the computer is rebooted or the process ends for another reason, you can ensure it automatically restarts with a simple script and cron job.

script

For example:

#!/bin/bash -e
export GIT_BACKUP_DIR=/backup-dir
export GITHUB_TOKEN=xxxxxxx   # avoid hard-coding if possible
export GITHUB_BACKUPS=7
export GIT_BACKUP_INTERVAL=12
export GITHUB_COMPARE=refs
/usr/local/bin/soba

cron job

ensure the user running soba has an entry in /etc/cron.allow.

run crontab -e

add the following (assuming you have a user called soba with a script to run it called backup in their home directory): * * * * * /usr/bin/flock -n /tmp/soba.lockfile /home/soba/backup >> /var/log/soba/soba.log 2>&1

note: A useful tool for testing cron jobs is crontab guru.

setting provider credentials

On Linux and MacOS you can set environment variables manually before each time you run soba:

export NAME='VALUE'

or by defining in a startup file for your shell so they are automatically set and available when you need them. For example, if using the bash shell and running soba as your user, add the relevant export statements to the following file:

/home/<your-user-id>/.bashrc

and run:

source /home/<your-user-id>/.bashrc
Provider Environment Variable(s) Generating token
Azure DevOps AZURE_DEVOPS_USERNAME instructions
AZURE_DEVOPS_PAT
AZURE_DEVOPS_ORGS
BitBucket BITBUCKET_USER instructions
BITBUCKET_KEY
BITBUCKET_SECRET
Gitea GITEA_APIURL instructions
GITEA_TOKEN
GitHub GITHUB_TOKEN instructions
GitLab GITLAB_TOKEN instructions
GITLAB_PROJECT_MIN_ACCESS_LEVEL instructions

additional options

Azure DevOps

Returning Organisations' repositories (available since soba 1.2.11)

An organisation must be specified using environment variable AZURE_DEVOPS_ORGS in order for soba to discover the projects and their repos. Note: Only a single organisation is currently supported.

Repo/Bundle comparison method

Environment variable: AZURE_DEVOPS_COMPARE

See explanation below

Value
clone (default) Clone the remote and compare latest bundle
refs Compare refs without downloading (available since soba 1.1.4)

BitBucket

Repo/Bundle comparison method

Environment variable: BITBUCKET_COMPARE

See explanation below

Value
clone (default) Clone the remote and compare latest bundle
refs Compare refs without downloading (available since soba 1.1.4)

Gitea

Gitea instructions

Official documentation

The value for GITEA_APIURL needs to be in the format: https://[domain]/api/v1, where domain is something like gitea.example.com.

GITEA_TOKEN is the secret you need to generate using the API (see official documentation above), or via the web GUI:

  • Login to Gitea
  • Select your user icon in the top right-hand corner and choose Settings from the dropdown
  • Select Applications
  • Enter a Token Name, e.g. "soba backups"
  • Select Public only or All depending on use-case
  • Expand the Select permissions menu
  • Select read:organization and read:repository.
  • Click on Generate Token and the value will appear at the top of the page

Returning Organisations' repositories

Repositories in Gitea organisations are not backed up by default. To back these up, specify a comma separated list of organisations in the environment variable: GITEA_ORGS. To include "all" organisations, set to *.

Gitea Repo/Bundle comparison method

Environment variable: GITEA_COMPARE

See explanation below

Value
clone (default) Clone the remote and compare latest bundle
refs Compare refs without downloading (available since soba 1.1.4)

GitHub

Returning Organisations' repositories

Repositories in GitHub organisations are not backed up by default. To back these up, specify a comma separated list of organisations in the environment variable: GITHUB_ORGS.

Skipping User repository backups

By default, all users' repositories will be backed up, even when specifying organisations. To skip user repositories set environment variable: GITHUB_SKIP_USER_REPOS to true.

Limit user repo backups to those owned by the user

By default, all repositories a user is affiliated with, e.g. a collaborator on, are included for backup. To limit these to only those owned by the user, set environment variable: GITHUB_LIMIT_USER_OWNED to true.

GitHub Repo/Bundle comparison method

Environment variable: GITHUB_COMPARE

See explanation below

Value
clone (default) Clone the remote and compare latest bundle
refs Compare refs without downloading (available since soba 1.1.4)

GitLab

filtering Projects by access level (available since soba 1.1.3)

The way in which a user's GitLab Projects are returned. By default, every Project a user has at least Reporter access to will be returned. New environment variable GITLAB_PROJECT_MIN_ACCESS_LEVEL can be set to override this, by specifying the number matching the desired access level shown here and here:

Access Level Value
Guest 10
Reporter 20
Developer 30
Maintainer 40
Owner 50

GitLab Repo/Bundle comparison method

Environment variable: GITLAB_COMPARE

See explanation below

Value
clone (default) Clone the remote and compare latest bundle
refs Compare refs without downloading (available since soba 1.1.4)

Comparing remote repository with local backup

By default, each repository will be cloned, bundled, and that bundle compared with the latest local bundle to check if it should be kept or discarded. When processing many large repositories, this can be a lengthy process. Alternatively, you can now compare the Git refs of the latest local bundle with the remote repository without having to clone. This is carried out using native commands git bundle list-heads <bundle file> and git ls-remote <remote repository>. This process is far quicker than cloning but should only be used if the following is understood: Comparing refs means comparing the tips of, and not the entire history of, the repository. This post on Stack Overflow goes into additional detail.

run on Synology NAS

The following was tested on DS916+

  1. Create a directory on your NAS for backing up Git repositories to
  2. Install Docker from the Synology Package Center
  3. Open Docker and select 'Image'
  4. Select 'Add' from the top menu and choose 'Add From URL'
  5. In 'Repository URL' enter 'jonhadfield/soba', leave other options as default and click 'Add'
  6. When it asks to 'Choose Tag' accept the default 'latest' by pressing 'Select'
  7. Select image 'jonhadfield/soba:latest' from the list and click 'Launch' from the top menu
  8. Set 'Container Name' to 'soba' and select 'Advanced Settings'
  9. Check 'Enable auto-restart'
  10. Under 'Volume' select 'Add folder' and choose the directory created in step 1. Set the 'Mount Path' to '/backup'
  11. Under 'Network' check 'Use the same network as Docker Host'
  12. Under 'Environment' click '+' to add the common configuration:
    • variable GIT_BACKUP_DIR Value /backup
    • variable GIT_BACKUP_INTERVAL Value (hours between backups)
  13. Also under 'Environment' click '+' to add the relevant provider specific configuration:
    • variable BITBUCKET_USER Value
    • variable BITBUCKET_KEY Value
    • variable BITBUCKET_SECRET Value
    • variable BITBUCKET_BACKUPS Value (Number of backups to keep for each repo)
    • variable GITEA_APIURL Value
    • variable GITEA_TOKEN Value
    • variable GITEA_ORGS Value
    • variable GITEA_BACKUPS Value
    • variable GITHUB_TOKEN Value
    • variable GITHUB_ORGS Value (Optional - comma separated list of organisations)
    • variable GITHUB_SKIP_USER_REPOS Value (Optional - defaults to false)
    • variable GITHUB_LIMIT_USER_OWNED Value (Optional - defaults to false)
    • variable GITHUB_BACKUPS Value (Number of backups to keep for each repo)
    • variable GITLAB_TOKEN Value
    • variable GITLAB_BACKUPS Value (Number of backups to keep for each repo)
    • variable GITLAB_PROJECT_MIN_ACCESS_LEVEL Value (Optional - scope of repos to backup)
  14. Click 'Apply'
  15. Leave settings as default and select 'Next'
  16. Check 'Run this container after the wizard is finished' and click 'Apply'

The container should launch in a few seconds. You can view progress by choosing 'Container' in the left-hand menu, select 'soba', choose 'details' and then click on 'Log'

restoring backups

A Git bundle is an archive of a Git repository. The simplest way to restore is to clone it like a remote repository.

git clone soba.20180708153107.bundle soba

soba's People

Contributors

dependabot[bot] avatar jonhadfield avatar stanbarrows avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

soba's Issues

Access to Private Organization Repos?

This seems to work great, but I don't see any of the repos from the GitHub Organization I'm the admin of. Is there a way to tweak this to get those to pull in as well? Thanks!

Ability to specify GitHub user repos

I'm a part of a bunch of repos that I don't want to backup. Specifically, I'd like to only backup repos under my username. I guess an envvar for accept/deny lists for usernames would work.

Add concurrency

Enable repository backups to run in parallel, keeping in mind throttling of API/cloning and potential blocking and race-conditions.

Callback URL in BitBucket OAuth consumer

Hello,

thanks for this tool, very useful !
I had troubles to make it work for Bitbucket. I tried several settings, but "listing Bitbucket repositories" always returned nothing.
In the end, what worked is : adding a "Callback URL" when creating the workspace OAuth consumer in Bitbucket.
Setting a dummy URL did the trick (http://example.com/).
Unfortunately, there is no error when "Callback URL" is empty (no error in Bitbucket settings, no error when running soba), but : the list of repos is always an empty list.

May be possible to mention this setting in the doc ?

Also, when trying accessing Bitbucket end points from curl (https://bitbucket.org/site/oauth2/access_token), it seems there is normally an error returned by Bitbucket in this situation (400/{"error": "invalid_request", "error_description": "No callback uri defined for the OAuth client."}) : may be it is silently ignored by soba ?

Documentation

Hi, as a user I have a slow start because I already fail at the correct token for Gitea.

Could you expand the readme and check the current configurations, I think it's a bit inconsistent.

No Backup for Gitlab Repos that I am not the Creator

Hi

When I run the Soba (v 1.1.3 beta) on my Gitlab account, I only back up the repositories I created. If I am not the creator of a repository, it does not back it up. Even if I have the "Owner" role of the repository, it does not do that. (It is the same for any other role like maintainer, developer, or reporter.)

This happens in three cases:
1- I have a Group that I created. My team members created many projects in my group, and I have the Owner role. But I can not back them up.
2- I have a separate account responsible for backup tasks with read-only (guest /reporter) access to repos (security reasons). This user can not back up anything.
3- I want to back up a subset of my repositories, and I created an account with read-only access to a subset of my repositories (e.g., a group). This user can not back up anything.

How can I solve this issue?

Bug: Sync All Organisations on Gitea

As a user, I want to sync all organizations with all repositories I am a member of.

Bug:

  • When I set GITEA_ORGS='*', only public and restricted organizations are synced.

Using the Bitbucket provider

Great tool! I just wanted to leave a note for peeps who may use this for Bitbucket.

Needed to do two things to make this work after creating the OAuth consumer.

  1. Authorize the OAuth consumer to my account by visiting https://bitbucket.org/site/oauth2/authorize?client_id=<client_id>&response_type=code once - fill in your client_id from the OAuth consumer creation step.
  2. Set the BITBUCKET_USER env variable to x-auth-token x-token-auth as described here.

Curious if your BITBUCKET_USER is set to your username because for some reason, I kept getting git 128 access errors with it.

Authentication errors just exit

Hello,
thanks for the notification webhook feature, works great !

I noticed an issue: when there is an authentication failure (e.g. GitHub token expired), there is no failure notification sent with webhook.
Also, it seems the GIT_BACKUP_INTERVAL schedule is ignored in that case, and soba just exits.
Which means when using automated restart (eg docker-compose with restart policy), it just loops on error, and hammers the provider with failed authentication attempts.

Is there a mean to handle this case as a "regular" error ? eg failure notification + next attempt according to schedule ?

Feature: Advanced Scheduler

As a User I want be able to make backups under 1 hour, like every 5 or 10 minutes.

DoD:

  • it should be possible to use the scheduler with hours or minutes.

Know status of scheduled backups

Hello, is there an easy way to know the status of scheduled backups ? Other than parsing the logs ? I would like to monitor that my backups did happen, and without errors.
Or would it be possible to add webhooks to be called after each scheduled backup, on success or on error?

Thanks again for this very useful tool.

GitLab Group projects do not get backed up

Hi! Does soba support backing up GitLab Groups?

I've just finished moving most of my projects into their appropriate Groups and it seems that soba does not pick up projects that are part of a Group. Am I missing something in terms of configuration?

Compare bundle with latest commit to reduce backup time

To address question from @drummingdemon:

I was wondering whether it is somehow
possible to fetch the latest commit from the previous bundle to speed up
the consequent soba runs (this way the git clone and bundle steps might be spared) - but as far as I can tell, the only way would be to restore the bundle back to a repo and then compare the latest commits...?

It seems you can get the SHA of the commit HEAD is pointing to from a remote repo without first cloning:

$ git ls-remote https://github.com/jonhadfield/soba.git|grep HEAD
a18fd2809de118625dc41760d22dbe47c1ac74fa	HEAD

And you can do the same for a bundle without unbundling:

$ git bundle list-heads myrepo.20221004200732.bundle
57c17ee3178cca6f4b7ed1b0f52589610be6dc2c refs/heads/master
57c17ee3178cca6f4b7ed1b0f52589610be6dc2c HEAD

From there I guess I could compare the two SHAs and make a decision on whether or not to re-clone, bundle, and compare.
I guess it could be a popular use-case, as a lot of devs won't be re-basing etc.

Is that the sort of thing you mean?

Handle provider API issues

If API is unavailable, can't be reached, or times out, should we crash out, report and continue, etc?

failure to list bitbucket repos

soba: 2018/07/16 13:54:22 main.go:113: [0.0.10-34938ba] 2018/07/14:18:26:39 UTC
soba: 2018/07/16 13:54:22 main.go:115: starting
soba: 2018/07/16 13:54:22 main.go:132: creating working directory:  /backup/.working
soba: 2018/07/16 13:54:22 main.go:161: backing up BitBucket repos
soba: 2018/07/16 13:54:22 bitbucket.go:31: listing BitBucket repositories
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x60743a]
goroutine 1 [running]:
github.com/jonhadfield/soba/githosts.bitbucketHost.describeRepos(0x69c57b, 0x9, 0x6a28da, 0x1d, 0xc420045900, 0xa1, 0xa10060ef82)
	/Users/hadfielj/Repositories/gocode/src/github.com/jonhadfield/soba/githosts/bitbucket.go:45 +0x22a
github.com/jonhadfield/soba/githosts.bitbucketHost.Backup(0x69c57b, 0x9, 0x6a28da, 0x1d, 0xc42001620f, 0x7)
	/Users/hadfielj/Repositories/gocode/src/github.com/jonhadfield/soba/githosts/bitbucket.go:73 +0x6a
github.com/jonhadfield/soba/githosts.Backup(0x69c66e, 0x9, 0xc42001620f, 0x7)
	/Users/hadfielj/Repositories/gocode/src/github.com/jonhadfield/soba/githosts/main.go:52 +0xd0
main.execProviderBackups()
	/Users/hadfielj/Repositories/gocode/src/github.com/jonhadfield/soba/main.go:162 +0x62f
main.main()
	/Users/hadfielj/Repositories/gocode/src/github.com/jonhadfield/soba/main.go:153 +0x52f 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.