Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Creating URL ShortCode in C#

Tags:

c#

hash

md5

I am using this article to create a short code for a URL.

I've been working on this for a while and the pseudo code is just not making any sense to me. He states in "loop1" that I'm supposed to look from the first 4 bytes to the 4th 4 bytes, and then cast the bytes to an integer, then convert that to bits. I end up with 32 bits for each 4 bytes, but he's using 5 bytes in the "loop3" which isn't divisible by 32. I am not understanding what he's trying to say.

Then I noticed that he closes "loop2" at the bottom after you've written the short code to the database. That's not making any sense to me because I would be writing the same short code to the database over and over again.

Then I have "loop1" which is going to loop into infinity, again I'm not seeing why I would need to update the database to infinity.

I have tried to follow his example and ran it through the debugger line-by-line, but it's not making sense.

Here is the code I have so far, according to what I've been able to understand:

        private void button1_Click(object sender, EventArgs e)
    {
        string codeMap = "abcdefghijklmnopqrstuvwxyz012345"; // 32 bytes

        // Compute MD5 Hash
        MD5 md5 = MD5.Create();
        byte[] inputBytes = Encoding.ASCII.GetBytes(txtURL.Text);
        byte[] hash = md5.ComputeHash(inputBytes);

        // Loop from the first 4 bytes to the 4th 4 bytes
        byte[] FourBytes = new byte[4];
        for (int i = 0; i <= 3; i++)
        {
            FourBytes[i] = hash[i];
            //int CastedBytes = FourBytes[i];
            BitArray binary = new BitArray(FourBytes);
            int CastedBytes = 0;
            for(int ii = 0; i <=5; i++)
            {
                CastedBytes = CastedBytes + ii;
            }

        }

Can someone help me figure out what I'm doing wrong, so I can get this program working? I just need to convert URLs into short 6-digit unique codes.

Thanks.

like image 955
Slower is Faster Avatar asked Nov 18 '25 21:11

Slower is Faster


1 Answers

Your MD5 hash is 128 bits. The idea is to represent those 128 bits in 6 characters, ideally without losing any information.

The codeMap contains 32 characters

string codeMap = "abcdefghijklmnopqrstuvwxyz012345"

Note that 2^5 is also 32. The third loop is using 5 bits of the hash at a time, and converting those 5 bits to a character in the codeMap. For example, for the bit pattern

00001 00011 00100
  b     d     e

The algorithm uses 6 sets of 5 bits, so 30 bits in total. 2 bits are "wasted".

Note though that the 128 bit MD5 is being taken 4 bytes at a time, and those 4 bytes are converted to an integer. That is one approach to consuming the bits of the MD5, but certainly not the only one. It involves bit masking and bit shifting.

You may find it more straightforward to use a BitArray for the implementation. While this is probably slightly less efficient, it will not likely matter. If you go that path, initialize the BitArray with the bits of your MD5 hash, and then just take 5 bits at a time, converting them to a number in the range 0..31 to use as an index into codeMap.

This bit from the article is misleading

6 characters of short code can used to map 32^6 (1,073,741,824) URLs so it is unlikely to be used up in the near future

Due to the possibility of hash collisions, the system can manage far fewer than 1 billion URLs without a significant risk of the same short URL being assigned to two long URLs. See the Birthday Problem for more.

like image 191
Eric J. Avatar answered Nov 21 '25 12:11

Eric J.