Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do proper Git compatible hex sha packing/compression in Go

Tags:

git

ruby

go

I am going through the book Building Git by James Coglan, where James walks you through implementing a basic version of Git in Ruby. I decided to make things more complicated for myself by doing my implementation in Go.

I've gotten to the part where I need to store compressed hashes of file contents into a tree to write to disk, but I am having trouble doing this kind of hex compression/packing that Git is looking for.

Here is the Ruby code im working off of

ENTRY_FORMAT = "A7Z*H40"
MODE = "100644"
FILE_NAME = "tree.rb"
SHA = "baae99010b237a699ff0aba02fd5310c18903b1b"
[MODE, FILE_NAME , SHA].pack(ENTRY_FORMAT)

the Ruby pack method apparently:

The Array#pack method takes an array of various kinds of values and returns a string that represents those values. Exactly how each value gets represented in the string is determined by the format string we pass to pack.

The encoding for the MODE and FILE_NAME I think I am pretty good on. It's the last part that encodes the sha that I am struggling with.

• H40: this encodes a string of forty hexadecimal digits, entry.oid, by packing each pair of digits into a single byte

It's the "packing each pair of digits into a single byte that I can't get my head around. This is my current attempt:

mode := 100644
fileName := "tree.go"
sha:= "baae99010b237a699ff0aba02fd5310c18903b1b"
// slice of strings for constructing the packed sha
var eid []string

// iterate through each character in id
for i := 0; i < len(sha); i += 2 {
    // gathering them in pairs of two
    one, two := sha[i], sha[i+1]
    // compress two digits into one byte
    // using bitwise or?? addition?? bit shifting?? not sure.
    eid = append(eid, string(one|two))
}
// concat the new packed id with the mode and file name.
stringRep := fmt.Sprintf("%-7d", mode) + fileName + "\x00" + strings.Join(eid, "")

Go playground for above code

So for some reason that I can't figure out, the string representation of a tree entry that function produces isn't compatible with how Git stores trees on disk. I've tried shifting the bits before oring them, and I've tried just adding the bytes together, but nothing seems to be working. I basically need to replicate the behavior of the Ruby Array#pack method in a way that Git will accept.

Any guidance or advice is greatly appreciated. I'd be happy to explain more or post more code samples if necessary. Thank you so much for your time!

P.S. more context around the packing git is performing, from Building Git

Git is storing the ID of each entry in a packed format, using twenty bytes for each one. Each hexadecimal digit represents a number from zero to fifteen, where ten is represented by a, eleven by b, and so on up to f for fifteen. In a forty-digit object ID, each digit stands for four bits of a 160-bit number. Instead of splitting those bits into forty chunks of four bits each, we can split it into twenty blocks of eight bits—and eight bits is one byte. So all that’s happening here is that the 160-bit object ID is being stored in binary as twenty bytes, rather than as forty characters standing for hexadecimal digits.

like image 874
softpunk Avatar asked Oct 22 '25 04:10

softpunk


1 Answers

The functions to convert between binary and hexadecimal strings can be found in the hex package.

For example : the function to turn an input hex string into an array of bytes (where each byte contains two of the initial hex string digits) is hex.DecodeString -- or hex.Decode if your input is a []byte instead of a string.


If you want to re-implement this function :

  • each character of the input string should be converted to its numerical value,
  • each pair of values should be treated as a digit in base 16 : var newByte byte = 16*one + two
like image 159
LeGEC Avatar answered Oct 23 '25 18:10

LeGEC



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!