June 3, 2025

Clean Up Git Repositories with BFG Repo-Cleaner – Complete Guide

Clean Up Git Repositories with BFG Repo-Cleaner – Complete Guide

If you’ve worked on software projects long enough, you’ve likely encountered a Git repository that has grown far beyond its expected size. Over time, large files, binaries, and obsolete assets get committed, making cloning and pushing changes slow and cumbersome. In this blog post, we’ll walk through the complete Git cleanup process, with a special focus on how to permanently remove large files using BFG Repo-Cleaner. Whether you’re managing a personal project or working with a team, this guide will help you maintain a lean, efficient Git history.


Why Clean Your Git Repository?

Before diving into the how-to, it’s important to understand the why. Git tracks every change ever made in a repository. While this is powerful, it also means that every accidentally committed large file lives forever in your history, even if you deleted it later. The .git folder can balloon in size, impacting:

  • Clone and pull speeds
  • Backup storage
  • CI/CD build times
  • Team productivity

Cleaning your repository helps reclaim space and keep things running smoothly.


Step 1: Identify the Problem

First, check the size of your .git folder:

du -sh .git

If it’s several hundred megabytes or even gigabytes, that’s a red flag. Next, find the largest files in your repository:

find . -type f -size +10M -exec du -h {} + | sort -hr

This shows all files over 10MB, which Git is notoriously inefficient at handling, especially if they’re binary files.

To find large files in Git history:

git rev-list --objects --all | \
  git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' | \
  sed -n 's/^blob //p' | \
  sort -k2 -n | \
  awk '$2 > 10485760' | \
  numfmt --field=2 --to=iec | \
  tail -n 20

Step 2: Backup the Repository

Before making any changes, back up your repository:

git clone --mirror https://your-repo-url.git
cp -r your-repo.git your-repo-backup.git

This ensures you can restore the original state if needed.


Step 3: Install and Use BFG Repo-Cleaner

BFG is a fast alternative to git filter-branch and is specifically designed for removing large files from Git history.

Download BFG and install java runtime

wget https://repo1.maven.org/maven2/com/madgag/bfg/1.15.0/bfg-1.15.0.jar
apt install openjdk-17-jre -y

Run BFG to Remove Files

java -jar bfg-1.15.0.jar --delete-files your-large-file.ext

You can also remove all blobs larger than a certain size:

java -jar bfg-1.15.0.jar --strip-blobs-bigger-than 100M

Step 4: Final Cleanup and Garbage Collection

After BFG runs, clean out unreachable blobs:

cd your-repo.git
git reflog expire --expire=now --all
git gc --prune=now --aggressive

This permanently deletes old history and reduces the size of the .git directory.


Step 5: Force Push the Cleaned Repo

Now, push the cleaned repository to your remote:

git push --force --all
git push --force --tags

If your repo is hosted on GitHub or GitLab, be aware that these platforms retain unreachable objects for up to 30 days. The full reduction in clone size might not be immediate.


Step 6: Prevent Future Issues

Use .gitignore

Add common large file types to .gitignore:

*.zip
*.tar.gz
*.mp4
*.exe
*.iso
*.model

Use Git LFS

For large files that need versioning:

git lfs install
git lfs track "*.model"
git add .gitattributes

Conclusion

Keeping your Git repository clean isn’t just about saving space. It’s about ensuring long-term maintainability, faster CI/CD pipelines, and a better experience for everyone on your team. With tools like BFG Repo-Cleaner, the process is efficient, safe, and effective.

By following the steps in this guide, you can ensure your repositories remain lean and optimized for performance. Happy coding!

Amritpal

I’m the owner of “DevOpsTechy.online” and been in the industry for almost 6+ years. What I’ve noticed particularly about the industry is that it reacts slowly to the rapidly changing world of technology. I’ve done my best to introduce new technology into the community with the hopes that more technology can be utilized to serve our customers. I’m going to educate and at times demonstrate that technology can help businesses innovate and thrive. Throwing in a little bit of fun and entertainment couldn’t hurt right?

Amritpal

I’m the owner of “DevOpsTechy.online” and been in the industry for almost 6+ years. What I’ve noticed particularly about the industry is that it reacts slowly to the rapidly changing world of technology. I’ve done my best to introduce new technology into the community with the hopes that more technology can be utilized to serve our customers. I’m going to educate and at times demonstrate that technology can help businesses innovate and thrive. Throwing in a little bit of fun and entertainment couldn’t hurt right?

View all posts by Amritpal →

Leave a Reply

Your email address will not be published. Required fields are marked *