I'm migrating a repository from svn to git.
In this last step, I want to remove tons of files that aren't needed from the history.
I'm trying the following command:
git filter-branch --prune-empty --index-filter \
  "for file in $(cat files); do git rm -rf --cached --ignore-unmatch ${file}; done" -f
But it says that the argument list is too long.
I could rewrite this like:
for file in $(cat files); do
  git filter-branch --prune-empty --index-filter \
    "git rm -rf --cached --ignore-unmatch ${file}" -f
done
But it will run filter-branch tons of times, and the history is long.. so, it would take too much time.
Is there a faster way to filter-branch removing lots of files?
Delete Files using git rm. The easiest way to delete a file in your Git repository is to execute the “git rm” command and to specify the file to be deleted. Note that by using the “git rm” command, the file will also be deleted from the filesystem.
I'd recommend using The BFG, a simpler, faster alternative to git-filter-branch specifically designed for removing unwanted files from Git history.
You mentioned in your comment that the problem files are generally big binaries, and The BFG has a specific option for handling this - you should carefully follow the BFG's usage instructions, but the core part is just this:
$ java -jar bfg.jar  --strip-blobs-bigger-than 10M  my-repo.git
Any files over 10MB in size (that aren't in your latest commit) will be removed from your Git repository's history. You can then use git gc to clean away the dead data:
$ git gc --prune=now --aggressive
The BFG is typically at least 10-720x faster than running git-filter-branch, and generally easier to use.
Full disclosure: I'm the author of the BFG Repo-Cleaner.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With