Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it possible to download a single file from specified commits from remote?

Tags:

git

I'm trying to checkout a particular file from particular commits in remote.

Please note that the commit is not in local repo and is only part of remote repo.

  1. I do not want to download the raw file from GitHub/bitbucket interface. Because my remote is not on similar platforms.

  2. I do not want to do git fetch followed by git checkout because doing git fetch will download a bunch of other items which I don't want. I'm only interested in that particular file from the particular commit.

like image 765
itthrill Avatar asked Oct 23 '25 04:10

itthrill


1 Answers

edit: from comments:

I need to check same file from hundreds of commits spread over dozen of branches.

For this, you're going to need cooperation from the other repo's admin.

In Git, history is published by giving it a refname (branch, tag, whatever) and some sort of access via shared filesystem or hosted server.

The stuff that's not worth giving its own refname is either part of published history (that does have its own refname) or it's not.

If it is, Git will ensure you get a complete, internally-consistent pack that brings you up to date with the published history you asked for. Git's laser-focused on making that specific operation as fast and efficient as possible.

If it's not, then the hosting repo hasn't published it and (a) you ordinarily can't get it at all, and (b) you ordinarily don't even know how to ask for it, its object id.

To find an object's id, you have to hunt through history examining snapshots, ... which means you have to have the snapshots ... see?

Git doesn't like paying overhead costs twice, and it's built to be a vcs. You're trying to use it like a shared filesystem. Filesystems are built to be efficient at serving single objects frequently and repeatedly to the same client. dvcs's are built to be efficient at serving multiple complete revisions, at relatively quite long intervals, once per client. This is engineering-tradeoff territory: you can't be superbly efficient at both, and the better you get at one or the other, the harder it is to re-tool and do the other thing.

All that said: if you can get the other repo admin to do some custom work for you, this won't be hard:

git rev-list --branches --objects -- path/to/file | git pack-objects pack

will pack up the history of all branches' versions of that file: the commits that introduce new versions, the trees that show where they go, and their contents, and put it in two files named pack-<hashcode>.{idx,pack}. Put that pack in any repo's objects/pack directory and there you are: you've got everything you need to deal with just that file.

Such a sliced-up history is relatively difficult to work with, and the overhead of filling in the missing bits on demand is precisely what Git's built to avoid, but work with exactly what you've got, you can use e.g. git verify-pack -v to show you the exact contents of a pack and git cat-file -p to print individual objects. The commits in that pack are the ones that introduce new versions, you refer to your file in one of those by appending :path/to/file to its commit id.

So, when you run the verify-pack to see what you've got, you'll get a dump of waaayyyy too much information about its content and structure. To make it useful for your purposes here, you can scrape just the commit ids out, and list those by date order, with

# this is the pack I made for testing 
git verify-pack -v .git/objects/pack/pack-8d3bb7bca6a4cdc086778ad55c79f45e672ae7e5.idx \
| awk '$2=="commit"{print $1}' \
| git rev-list --stdin --date-order --no-walk

sub in log for rev-list to see the log messages, or you can show the blob you fetched with e.g. git show <commit-hash>:path/to/file. To show the blobs in time sequence you can

git     git verify-pack -v .git/objects/pack/pack-8d3bb7bca6a4cdc086778ad55c79f45e672ae7e5.idx \
| awk '$2=="commit"{print $1}' \
| git rev-list --stdin --date-order --no-walk --pretty=%h:path/to/file \
| git cat-file --batch

which will dump the content in scannable form.

. . . actually, if an all-in-one dump of the history will do ya, and you just need the content and sequence to match, not so much the resulting commit id's, Git's fast-export might do the job all in one for you, have the admin do

git fast-export --branches -- path/to/file | zstd >my-stuff.zst

which might even be more compact than the pack files (since it doesn't have to preserve id's) and ship that to you.

like image 125
jthill Avatar answered Oct 25 '25 18:10

jthill



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!