If you’ve worked on software projects long enough, you’ve likely encountered a Git repository that has grown far beyond its expected size. Over time, large files, binaries, and obsolete assets get committed, making cloning and pushing changes slow and cumbersome. In this blog post, we’ll walk through the complete Git cleanup process, with a special focus on how to permanently remove large files using BFG Repo-Cleaner. Whether you’re managing a personal project or working with a team, this guide will help you maintain a lean, efficient Git history.
Why Clean Your Git Repository?
Before diving into the how-to, it’s important to understand the why. Git tracks every change ever made in a repository. While this is powerful, it also means that every accidentally committed large file lives forever in your history, even if you deleted it later. The .git
folder can balloon in size, impacting:
- Clone and pull speeds
- Backup storage
- CI/CD build times
- Team productivity
Cleaning your repository helps reclaim space and keep things running smoothly.
Step 1: Identify the Problem
First, check the size of your .git
folder:
du -sh .git
If it’s several hundred megabytes or even gigabytes, that’s a red flag. Next, find the largest files in your repository:
find . -type f -size +10M -exec du -h {} + | sort -hr
This shows all files over 10MB, which Git is notoriously inefficient at handling, especially if they’re binary files.
To find large files in Git history:
git rev-list --objects --all | \
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | \
sed -n 's/^blob //p' | \
sort -k2 -n | \
awk '$2 > 10485760' | \
numfmt --field=2 --to=iec | \
tail -n 20
Step 2: Backup the Repository
Before making any changes, back up your repository:
git clone --mirror https://your-repo-url.git
cp -r your-repo.git your-repo-backup.git
This ensures you can restore the original state if needed.
Step 3: Install and Use BFG Repo-Cleaner
BFG is a fast alternative to git filter-branch
and is specifically designed for removing large files from Git history.
Download BFG and install java runtime
wget https://repo1.maven.org/maven2/com/madgag/bfg/1.15.0/bfg-1.15.0.jar
apt install openjdk-17-jre -y
Run BFG to Remove Files
java -jar bfg-1.15.0.jar --delete-files your-large-file.ext
You can also remove all blobs larger than a certain size:
java -jar bfg-1.15.0.jar --strip-blobs-bigger-than 100M
Step 4: Final Cleanup and Garbage Collection
After BFG runs, clean out unreachable blobs:
cd your-repo.git
git reflog expire --expire=now --all
git gc --prune=now --aggressive
This permanently deletes old history and reduces the size of the .git
directory.
Step 5: Force Push the Cleaned Repo
Now, push the cleaned repository to your remote:
git push --force --all
git push --force --tags
If your repo is hosted on GitHub or GitLab, be aware that these platforms retain unreachable objects for up to 30 days. The full reduction in clone size might not be immediate.
Step 6: Prevent Future Issues
Use .gitignore
Add common large file types to .gitignore
:
*.zip
*.tar.gz
*.mp4
*.exe
*.iso
*.model
Use Git LFS
For large files that need versioning:
git lfs install
git lfs track "*.model"
git add .gitattributes
Conclusion
Keeping your Git repository clean isn’t just about saving space. It’s about ensuring long-term maintainability, faster CI/CD pipelines, and a better experience for everyone on your team. With tools like BFG Repo-Cleaner, the process is efficient, safe, and effective.
By following the steps in this guide, you can ensure your repositories remain lean and optimized for performance. Happy coding!