Welcome to the Power Users community on Codidact!
Power Users is a Q&A site for questions about the usage of computer software and hardware. We are still a small site and would like to grow, so please consider joining our community. We are looking forward to your questions and answers; they are the building blocks of a repository of knowledge we are building together.
How to delete old files in GIT while keeping history?
I'm far from an expert GIT user. I'm usually the only one working in a repository at a time, and know how to do the basics like committing snapshots. I still have to look up how to create and merge branches on the relatively rare occasions I need to use them.
We have some repositories that have gotten large, past the limit of the free hosting service we are using. Early on, some large files that don't really need to be tracked (like .EXEs) were accidentally included in the files GIT tracks. There are also a lot of old versions of some large files we'll never get back to. It would be nice to delete both these kinds of files, but still keep the commit history.
Of course we could copy the GIT repository, clean it out properly, make sure only the files we really need to be tracked are tracked, create a whole new repository and delete the old one (after archiving it on long-term media, of course). However, that looses the history being easily accessible in one place.
Is there a way to effectively fully delete files with all their versions from a GIT repository while still retaining the history and old versions of all other files?
1 answer
Depending on some semantics, it is possible to do what you want, for a sufficiently motivated cohort of "we." The tricky part is that you're rewriting history, and everyone has to agree on that new history.
Specifically, when you say
It would be nice to delete both these kinds of files, but still keep the commit history.
I'm interpreting that as "otherwise keep the commit messages with the diffs of files not to be removed." If you want to get technical, there is no way to purge the files from history without rewriting that history.
Background
The commit SHA, the basic identifier of a specific commit, is dependent on the state of the files in the repository.[1] The implication here is that every single commit forward of your oldest one with an unwanted file will have a different SHA than you use now.
When you're the only consumer of the repository,[2] this isn't too much of a concern. Just make the change, git push --force
, and move on.
If you are not the only user, everyone else will need to reset all of their local and remote branches to stem from your changed history instead of the common commit.
Rewriting history
Interactive rebase
Recent commits[3] can be fixed with an interactive rebase, as a commenter mentioned.
I personally like to tag the current head (say git tag tmp/master
), find the commit that made the change (say it's deadbeef
), and interactive rebase against the next older one: git rebase --interactive deadbeef^
.
You'll see a list of commits in your editor with pick
next to each one. Change pick
to edit
next to the commit(s) you need to modify, save, and quit. At each pause, Git will ask you to set the files to the state you wish. Then git add
them, commit, and git rebase --continue
.
Bigger tools for older commits
Usually when someone grabs a big hammer to rewrite early history, it's because they put an important credential into Git history that needs to be excised in all its forms.
Then, you'll need to employ a tool like git-filter-repo
or git-filter-branch
to purge unwanted files from all of history.
Consider practicing on a clone first!! These tools have extreme destructive power.
0 comment threads