Giter Club home page Giter Club logo

Comments (6)

newren avatar newren commented on July 18, 2024

Blob "protection" in BFG was such a misfeature, in my opinion (see

* Updates both index and working tree at end of rewrite
(With BFG and --no-blob-protection, these are still left out-of-date. This
is a doubly-whammy principle-of-least-astonishment violation: (1) users
are likely to accidentally commit the "staged" changes, re-introducing the
large blobs or removed passwords, (2) even if they don't commit the
changes the index holding them will prevent gc from shrinking the repo.
Fixing these two glaring problems not only makes --no-blob-protection
safe to recommend, it makes it safe to make it the default.)
* Fixes the "protection" defaults
(With BFG, it won't rewrite the tree for HEAD; it can't reasonably switch
to doing so because of the bugs mentioned above with updating the index
and working tree. However, this behavior comes with a surprise for users:
if HEAD is "protected" because users should manually update it first, why
isn't that also true of the other branches? In my opinion, there's no
user-facing distinction that makes sense for such a difference in
handling. "Protecting" HEAD can also be an error-prone requirement for
users -- why do they have to manually edit all files the same way
--replace-text is doing and why do they have to risk dirty diffs if they
get it slightly different (or a useless and ugly empty commit if they
manage to get it right)? Finally, a third reason this was in my opinion a
bad default was that it works really poorly in conjunction with other
types of history rewrites, e.g. --subdirectory-filter,
--to-subdirectory-filter, --convert-to-git-lfs, --path-rename, etc. For
all three of these reasons, and the fixes mentioned above to make it safe,
--no-blob-protection is made the default.)
) However...

You can use contrib/filter-repo-demos/bfg-ish script as is to achieve this affect, just pass it the --preserve-ref-tips option with HEAD or whichever branches you want to "preserve". If you are trying to do something special with filter-repo with some special callbacks and thus can't use bfg-ish, then you can look at the bfg-ish code (look for preserve_refs and the code surrounding it) and use it in your script that loads filter-repo as a module.

I'm worried that you want this merely because BFG advertised it, but if you really do have a good rationale, the above links should point out a couple routes you could take to achieve it. Knock yourself out.

from git-filter-repo.

schabluk avatar schabluk commented on July 18, 2024

Thank you for the tips.

--preserve-ref-tips

Can I use this option together with -strip-blobs-with-ids?
Like this:

./bfg-ish --strip-blobs-with-ids /path/to/purge.txt --preserve-ref-tips HEAD /path/to/repository

I need to perform a selective purging of large data files, to clean up the history, but still I need a last version of it, that is why I have to use --preserve option.

from git-filter-repo.

newren avatar newren commented on July 18, 2024

--preserve-ref-tips

Can I use this option together with -strip-blobs-with-ids?

Yes, of course.

I need to perform a selective purging of large data files, to clean up the history, but still I need a last version of it, that is why I have to use --preserve option.

This, however, doesn't make as much sense. Stripping by blob ids is different than stripping by filename. You have to go and figure out the blob ids you want stripped if you use it, and then it'll delete any file that has any of those blob ids. If a file was modified 18 times in history and you want all versions of it gone, then you have to figure out all 18 blob ids that it had. (And any other old version of another file that happened to match one of those 18 blob ids will be deleted too.) If you want to just delete the first 17 versions of that file that was modified 18 times (as your wording suggested), then you'd just list the first 17 blob ids it had. Deciding to list all 18 but countermanding with --preserve-ref-tips seems a little odd to me; are you just trying to remove that last version of that file from all commits older than HEAD and reintroduce it to HEAD?

from git-filter-repo.

schabluk avatar schabluk commented on July 18, 2024

Hi,

are you just trying to remove that last version of that file from all commits older than HEAD and reintroduce it to HEAD?

I'm trying to keep only the last version of that file, and remove all commits older than HEAD.

from git-filter-repo.

xtutu avatar xtutu commented on July 18, 2024

Thank you for the tips.

--preserve-ref-tips

Can I use this option together with -strip-blobs-with-ids?
Like this:

./bfg-ish --strip-blobs-with-ids /path/to/purge.txt --preserve-ref-tips HEAD /path/to/repository

I need to perform a selective purging of large data files, to clean up the history, but still I need a last version of it, that is why I have to use --preserve option.

@schabluk is this commad work?

when I add
--preserve-ref-tips HEAD

output:

# xxxx@ space1 in /work/temp/split-test [16:13:00] C:1
$ python3.7 bfg-ish /work/temp/split-test/2 --strip-blobs-bigger-than 1K --preserve-ref-tips HEAD  
Failed to translate --preserve-ref-tips arguments into refs
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).

if I remove --preserve-ref-tips HEAD, it work success, but without keep the latest commit.

from git-filter-repo.

ssb22 avatar ssb22 commented on July 18, 2024

I had a similar problem: I have a Git repository containing some automatically generated MP3 audio files which are served on GitLab Pages, plus the scripts to generate them. The generated files need to be kept in the repo because that's how they get served from GitLab Pages, but old versions of them do not need to be kept, and I'd like to remove these to reduce the size of the repo (there's a size limit). But I don't want to remove the most recent versions of the generated files, and of course I don't want to remove the history of the scripts.

I could just say "delete all MP3 files from all history, then add a new commit to put the current versions back" but that's losing information about when the most recent version of each of these files was previously committed, plus it makes me go to the trouble of regenerating them (or restoring them from backup) after git-filter-repo has done its work. So I'd rather tell git-filter-repo to "please don't touch the most recent version of anything".

So I did it by getting the blob IDs and excluding the most recent blob. Demo:

#!/bin/bash
set -e

# Set up the test history:
rm -rf /tmp/test
mkdir /tmp/test
cd /tmp/test
git init
echo "This is some text, please keep history" > file1.txt
echo "This is a binary, please delete history" > file2.bin
git add file1.txt file2.bin
git commit -m "initial commit"

echo "Second version of binary to delete" > file2.bin
git commit -am "Change the binary, but not latest version (this commit will be deleted)"

echo "Second version of text file, should have history" > file1.txt
echo "Latest version of binary file, should NOT have history" > file2.bin
git commit -am "commit the second text and the latest binary"

echo "Third version of text file, should have history" > file1.txt
git commit -am "commit the third version: no change to binary"

# Find blobs of old *.bin versions:
git rev-list --objects HEAD |
    python3 -c 'had=set() ; import sys
for i in sys.stdin:
    try: blob, path = i.split()
    except: continue # ignore blob without path
    if path in had: # it is not the latest version
        if path.endswith(".bin"): print (blob) # to delete
    had.add(path)' > .git/blobs-to-delete

# Now ask git-filter-repo to delete our little list:
DO_NOT_DO_THIS=--force # FOR DEMO ONLY: skip is-backed-up check
git filter-repo $DO_NOT_DO_THIS --strip-blobs-with-ids .git/blobs-to-delete
rm .git/blobs-to-delete

# and look at the result
( echo Current files: ; ls
  echo History: ; git log -p) | less

from git-filter-repo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.