Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract commits related to code changes from commit tree

Right now I am able to traverse through the commit tree for a github repository using pygit2 library. I am getting all the commits for each file change in the repository. This means that I am getting changes for text files with extensions .rtf as well in the repository. How do I filter out the commits which are related to code changes only? I don't want the changes related to text documents.

Appreciate any help or pointers. Thanks.

last = repo[repo.head.target]

t0=last

f = open(outputFile,'w')

print t0.hex


for commit in repo.walk(last.id):
     if t0.hex == commit.hex:
        continue

     print commit.hex
     out=repo.diff(t0,commit)
     f.write(out.patch)
     t0=commit;

As part of the output, I get the difference in rtf files as well as below:

diff --git a/archived-output/NEW/action-core[best].rtf b/archived-output/NEW/action-core[best].rtf
deleted file mode 100644
index 56cdec6..0000000
--- a/archived-output/NEW/action-core[best].rtf
+++ /dev/null
@@ -1,8935 +0,0 @@
-{\rtf1\adeflang1025\ansi\ansicpg1252\uc1\adeff31507\deff0\stshfdbch31506\stshfloch31506\stshfhich31506\stshfbi31507\deflang1033\deflangfe1033\themelang1033\themelangfe0\themelangcs0{\fonttbl{\f0\fbidi \froman\fcharset0\fprq2{\*\panose 02020603050405020304}Times New Roman;}{\f1\fbidi \fswiss\fcharset0\fprq2{\*\panose 020b0604020202020204}Arial;}
-{\f2\fbidi \fmodern\fcharset0\fprq1{\*\panose 02070309020205020404}Courier New;}{\f3\fbidi \froman\fcharset2\fprq2{\*\panose 05050102010706020507}Symbol;}

Either I have to filter the commits from the tree or I have to filter the output . I was thinking if I could remove the changes related to rtf files by removing the corresponding commits while walking through the tree.

like image 488
Zack Avatar asked May 07 '26 02:05

Zack


1 Answers

If that is possible, how do we get the list of modified files?

Ah, now you're asking the right questions! Git, of course, does not store a list of modified files in each commit. Rather, each commit represents the state of the entire repository at a certain point in time. In order to find the modified files, you need to compare the files contained in one commit with the previous commit.

For each commit returned by repo.walk(), the tree attribute refers to the associated Tree object (which is itself a list of TreeEntry objects representing files and directories contained in that particular Tree).

A Tree object has a diff_to_tree() method that can be used to compare it against another Tree object. This returns a Diff object, which acts as an iterator over a list of Patch objects. Each Patch object refers to the changes in a single file between the two Trees that are being compared.

The Patch object is really the key to all this, because this is how we determine which files have been modified.

The following code demonstrates this. For each commit, it will print a list of new, modified, or deleted files:

import stat
import pygit2


repo = pygit2.Repository('.')

prev = None
for cur in repo.walk(repo.head.target):

    if prev is not None:
        print prev.id
        diff = cur.tree.diff_to_tree(prev.tree)
        for patch in diff:
            print patch.status, ':', patch.new_file_path,
            if patch.new_file_path != patch.old_file_path:
                print '(was %s)' % patch.old_file_path,
            print

    if cur.parents:
        prev = cur
        cur = cur.parents[0]

If we run this against a sample repository, we can look at the output for the first few commits:

c285a21e013892ee7601a53df16942cdcbd39fe6
D : fragments/configure-flannel.sh
A : fragments/flannel-config.service.yaml
A : fragments/write-flannel-config.sh
M : kubecluster.yaml
b06de8f2f366204aa1327491fff91574e68cd4ec
M : fragments/enable-services-master.sh
M : fragments/enable-services-minion.sh
c265ddedac7162c103672022633a574ea03edf6f
M : fragments/configure-flannel.sh
88a8bd0eefd45880451f4daffd47f0e592f5a62b
A : fragments/configure-docker-storage.sh
M : fragments/write-heat-params.yaml
M : kubenode.yaml

And compare that to the output of git log --oneline --name-status:

c285a21 configure flannel via systemd unit
D       fragments/configure-flannel.sh
A       fragments/flannel-config.service.yaml
A       fragments/write-flannel-config.sh
M       kubecluster.yaml
b06de8f call daemon-reload before starting services
M       fragments/enable-services-master.sh
M       fragments/enable-services-minion.sh
c265dde fix json syntax problem
M       fragments/configure-flannel.sh
88a8bd0 configure cinder volume for docker storage
A       fragments/configure-docker-storage.sh
M       fragments/write-heat-params.yaml
M       kubenode.yaml

...aaaand, that looks just about identical. Hopefully this is enough to you started.

like image 64
larsks Avatar answered May 08 '26 14:05

larsks



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!