Comments (6)
Blob "protection" in BFG was such a misfeature, in my opinion (see
git-filter-repo/contrib/filter-repo-demos/bfg-ish
Lines 32 to 56 in 8b7f39f
You can use contrib/filter-repo-demos/bfg-ish script as is to achieve this affect, just pass it the --preserve-ref-tips option with HEAD or whichever branches you want to "preserve". If you are trying to do something special with filter-repo with some special callbacks and thus can't use bfg-ish, then you can look at the bfg-ish code (look for preserve_refs and the code surrounding it) and use it in your script that loads filter-repo as a module.
I'm worried that you want this merely because BFG advertised it, but if you really do have a good rationale, the above links should point out a couple routes you could take to achieve it. Knock yourself out.
from git-filter-repo.
Thank you for the tips.
--preserve-ref-tips
Can I use this option together with -strip-blobs-with-ids
?
Like this:
./bfg-ish --strip-blobs-with-ids /path/to/purge.txt --preserve-ref-tips HEAD /path/to/repository
I need to perform a selective purging of large data files, to clean up the history, but still I need a last version of it, that is why I have to use --preserve option.
from git-filter-repo.
--preserve-ref-tips
Can I use this option together with
-strip-blobs-with-ids
?
Yes, of course.
I need to perform a selective purging of large data files, to clean up the history, but still I need a last version of it, that is why I have to use --preserve option.
This, however, doesn't make as much sense. Stripping by blob ids is different than stripping by filename. You have to go and figure out the blob ids you want stripped if you use it, and then it'll delete any file that has any of those blob ids. If a file was modified 18 times in history and you want all versions of it gone, then you have to figure out all 18 blob ids that it had. (And any other old version of another file that happened to match one of those 18 blob ids will be deleted too.) If you want to just delete the first 17 versions of that file that was modified 18 times (as your wording suggested), then you'd just list the first 17 blob ids it had. Deciding to list all 18 but countermanding with --preserve-ref-tips seems a little odd to me; are you just trying to remove that last version of that file from all commits older than HEAD and reintroduce it to HEAD?
from git-filter-repo.
Hi,
are you just trying to remove that last version of that file from all commits older than HEAD and reintroduce it to HEAD?
I'm trying to keep only the last version of that file, and remove all commits older than HEAD.
from git-filter-repo.
Thank you for the tips.
--preserve-ref-tips
Can I use this option together with
-strip-blobs-with-ids
?
Like this:./bfg-ish --strip-blobs-with-ids /path/to/purge.txt --preserve-ref-tips HEAD /path/to/repository
I need to perform a selective purging of large data files, to clean up the history, but still I need a last version of it, that is why I have to use --preserve option.
@schabluk is this commad workοΌ
when I add
--preserve-ref-tips HEAD
output:
# xxxx@ space1 in /work/temp/split-test [16:13:00] C:1
$ python3.7 bfg-ish /work/temp/split-test/2 --strip-blobs-bigger-than 1K --preserve-ref-tips HEAD
Failed to translate --preserve-ref-tips arguments into refs
fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
if I remove --preserve-ref-tips HEAD, it work success, but without keep the latest commit.
from git-filter-repo.
I had a similar problem: I have a Git repository containing some automatically generated MP3 audio files which are served on GitLab Pages, plus the scripts to generate them. The generated files need to be kept in the repo because that's how they get served from GitLab Pages, but old versions of them do not need to be kept, and I'd like to remove these to reduce the size of the repo (there's a size limit). But I don't want to remove the most recent versions of the generated files, and of course I don't want to remove the history of the scripts.
I could just say "delete all MP3 files from all history, then add a new commit to put the current versions back" but that's losing information about when the most recent version of each of these files was previously committed, plus it makes me go to the trouble of regenerating them (or restoring them from backup) after git-filter-repo has done its work. So I'd rather tell git-filter-repo to "please don't touch the most recent version of anything".
So I did it by getting the blob IDs and excluding the most recent blob. Demo:
#!/bin/bash
set -e
# Set up the test history:
rm -rf /tmp/test
mkdir /tmp/test
cd /tmp/test
git init
echo "This is some text, please keep history" > file1.txt
echo "This is a binary, please delete history" > file2.bin
git add file1.txt file2.bin
git commit -m "initial commit"
echo "Second version of binary to delete" > file2.bin
git commit -am "Change the binary, but not latest version (this commit will be deleted)"
echo "Second version of text file, should have history" > file1.txt
echo "Latest version of binary file, should NOT have history" > file2.bin
git commit -am "commit the second text and the latest binary"
echo "Third version of text file, should have history" > file1.txt
git commit -am "commit the third version: no change to binary"
# Find blobs of old *.bin versions:
git rev-list --objects HEAD |
python3 -c 'had=set() ; import sys
for i in sys.stdin:
try: blob, path = i.split()
except: continue # ignore blob without path
if path in had: # it is not the latest version
if path.endswith(".bin"): print (blob) # to delete
had.add(path)' > .git/blobs-to-delete
# Now ask git-filter-repo to delete our little list:
DO_NOT_DO_THIS=--force # FOR DEMO ONLY: skip is-backed-up check
git filter-repo $DO_NOT_DO_THIS --strip-blobs-with-ids .git/blobs-to-delete
rm .git/blobs-to-delete
# and look at the result
( echo Current files: ; ls
echo History: ; git log -p) | less
from git-filter-repo.
Related Issues (20)
- Update submodule hashes?
- Renaming paths into pre-existing path causes double-nested paths (sometimes)
- Keep last 3 months of package-lock.json diffs only HOT 1
- Question: prune lfs files
- remark: Pity that this tool can't run scripts/programs and it is not clearly stated HOT 1
- minor: Logic error with `_commits_referenced_but_removed` on a GitHub Gist web url in commit message π
- Crash when path contains emoji HOT 1
- Question: Recommended way to log the usage of git filter-repo and related changes? HOT 1
- Callback that gives both filename and blob
- Breaking change in git 2.43 or 2.44 HOT 2
- lint-history: --refs argument not working at all HOT 1
- Test suite succedes with Python 3.11 but has multiple failures with Python 3.12 HOT 7
- Turns out my assumption was wrong: `git lfs migrate export --everything --include="*"` does rewrite the whole history, across all branches, reinjecting all the large files' consecutive versions ([see here](https://github.com/git-lfs/git-lfs/issues/910#issuecomment-551566315)). Awesome! HOT 1
- Support for SHA256
- FR: Filter into new repo HOT 1
- clean-ignore of filter-repo-demos does not handle utf-8 characters HOT 1
- Trouble with Lock File HOT 1
- Really-43e2c HOT 2
- > main HOT 1
- Renaming path and then renaming it back to the original name deletes path rather than renaming it HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from git-filter-repo.