Embedded C - Choice between switch/case & hashtable

Question

I work with a home-made board, with an ARM chip, in C. My code has to be fast to execute, and I also have space constraints. Now, I have to code a "parser" of hexadecimal values. Each hex number must be associate with a dec number (1; 2; 3 or 4). For now, and I don't think it will change much in the future, I have 100 hex values to parse. The hex values are "random", there is no particular pattern.

I started with a switch / case, like that :

switch (i)
{   
case 0xF3:
case 0xF7:
case 0x02:
    return 1;
    break;

case 0x20:
case 0x40:
case 0xE0:
case 0xC0:
    return 2;
    break;

case 0x21:
case 0x41:
case 0x81:
case 0x61:
case 0xA1:
    return 3;
    break;


case 0xBB:
case 0xCC:
case 0x63:
    return 4;
    break;

default:
    return 0;
    break;
}

But I was thinking about a hashtable instead. Of course, it will be faster for the worst case, but it will take more space and is the hashtable really worth it for 100 values ?

Thanks for your answers, if you want precisions, just ask !

Antoine

user694733 · Accepted Answer

You can put values in 256 byte array for fast access:

static uint8_t const table[256] = { 2, 3, 1, 4, ... };
return table[i];

You are only using 100 of 256 values, so there are "holes" which are wasted space, but it can compete with switch.

But since you only need 4 values, those can be reseprented in 2 bits. You can pack four values to one byte. Just use values 0-3 instead of 1-4:

#define PACK4(a, b, c, d) \
    (((a)-1 << 0) | ((b)-1 << 2) | ((c)-1 << 4) | ((d)-1 << 6))
static uint8_t const table[64] = { PACK4(2, 3, 1, 4), PACK4(... };
int byteOffset = i / 4;
int bitOffset = i % 4 * 2;
return (table[byteOffset] >> bitOffset & 0x03) + 1;

barak manos · Answer

As you've noted, there are two ways for implementing this.

In order to decide which method is favorable, you need to analyze each method under two aspects:

Space (memory consumption)
Time (execution performance)

However, the answer to your question is platform-dependent.

In order to analyze the memory consumption, you need to compile both functions, check the disassembly and determine the amount of memory used for each one of them. Bare in mind that memory is used not only for variables, but also for code.

In order to analyze the execution performance, you basically need to run both functions a large number of times, and measure the average duration of each one of them. Keep in mind that running time also depends on the caching heuristic deployed by the underlying HW architecture, so the results will not necessarily be consistent if, for example, you test the first function immediately after the second function, and then again test the second function immediately after the first function.

Here is the memory consumption analysis on my platform (VS2013 compiler over x64):

Method 1:

uint8_t func1(uint8_t i)
{
    switch (i)
    {   
        case 0x02:
        case 0xF3:
        case 0xF7:
            return 1;
        case 0x20:
        case 0x40:
        case 0xC0:
        case 0xE0:
            return 2;
        case 0x21:
        case 0x41:
        case 0x61:
        case 0x81:
        case 0xA1:
            return 3;
        case 0x63:
        case 0xBB:
        case 0xCC:
            return 4;
        default:
            return 0;
    }
}

Disassembled into 114 bytes of code:

00007FF778131050  mov         byte ptr [rsp+8],cl  
00007FF778131054  push        rdi  
00007FF778131055  sub         rsp,10h  
00007FF778131059  mov         rdi,rsp  
00007FF77813105C  mov         ecx,4  
00007FF778131061  mov         eax,0CCCCCCCCh  
00007FF778131066  rep stos    dword ptr [rdi]  
00007FF778131068  movzx       ecx,byte ptr [i]  
00007FF77813106D  movzx       eax,byte ptr [i]  
00007FF778131072  mov         dword ptr [rsp],eax  
00007FF778131075  mov         eax,dword ptr [rsp]  
00007FF778131078  sub         eax,2  
00007FF77813107B  mov         dword ptr [rsp],eax  
00007FF77813107E  cmp         dword ptr [rsp],0F5h  
00007FF778131085  ja          $LN5+10h (07FF7781310B6h)  
00007FF778131087  movsxd      rax,dword ptr [rsp]  
00007FF77813108B  lea         rcx,[__ImageBase (07FF778130000h)]  
00007FF778131092  movzx       eax,byte ptr [rcx+rax+10D4h]  
00007FF77813109A  mov         eax,dword ptr [rcx+rax*4+10C0h]  
00007FF7781310A1  add         rax,rcx  
00007FF7781310A4  jmp         rax  
00007FF7781310A6  mov         al,1  
00007FF7781310A8  jmp         $LN5+12h (07FF7781310B8h)  
00007FF7781310AA  mov         al,2  
00007FF7781310AC  jmp         $LN5+12h (07FF7781310B8h)  
00007FF7781310AE  mov         al,3  
00007FF7781310B0  jmp         $LN5+12h (07FF7781310B8h)  
00007FF7781310B2  mov         al,4  
00007FF7781310B4  jmp         $LN5+12h (07FF7781310B8h)  
00007FF7781310B6  xor         al,al  
00007FF7781310B8  add         rsp,10h  
00007FF7781310BC  pop         rdi  
00007FF7781310BD  ret  
00007FF7781310BE  xchg        ax,ax  
00007FF7781310C0  cmps        byte ptr [rsi],byte ptr [rdi]  
00007FF7781310C1  adc         byte ptr [rax],al

Method 2:

uint8_t func2(uint8_t i)
{
    static const uint8_t hash_table[] =
    {
    /*  0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9, 0xA, 0xB, 0xC, 0xD, 0xE, 0xF, */
          0,   0,   1,   0,   0,   0,   0,   0,   0,   0,    0,   0,   0,   0,   0,  0, /* 0x00 - 0x0F */
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0,   0,   0,   0,   0,  0, /* 0x10 - 0x1F */
          2,   3,   0,   0,   0,   0,   0,   0,   0,   0,    0,   0,   0,   0,   0,  0, /* 0x20 - 0x2F */
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0,   0,   0,   0,   0,  0, /* 0x30 - 0x3F */
          2,   3,   0,   0,   0,   0,   0,   0,   0,   0,    0,   0,   0,   0,   0,  0, /* 0x40 - 0x4F */
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0,   0,   0,   0,   0,  0, /* 0x50 - 0x5F */
          0,   3,   0,   4,   0,   0,   0,   0,   0,   0,    0,   0,   0,   0,   0,  0, /* 0x60 - 0x6F */
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0,   0,   0,   0,   0,  0, /* 0x70 - 0x7F */
          0,   3,   0,   0,   0,   0,   0,   0,   0,   0,    0,   0,   0,   0,   0,  0, /* 0x80 - 0x8F */
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0,   0,   0,   0,   0,  0, /* 0x90 - 0x9F */
          0,   3,   0,   0,   0,   0,   0,   0,   0,   0,    0,   0,   0,   0,   0,  0, /* 0xA0 - 0xAF */
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0,   4,   0,   0,   0,  0, /* 0xB0 - 0xBF */
          2,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0,   0,   4,   0,   0,  0, /* 0xC0 - 0xCF */
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0,   0,   0,   0,   0,  0, /* 0xD0 - 0xDF */
          2,   0,   0,   0,   0,   0,   0,   0,   0,   0,    0,   0,   0,   0,   0,  0, /* 0xE0 - 0xEF */
          0,   0,   0,   1,   0,   0,   0,   1,   0,   0,    0,   0,   0,   0,   0,  0, /* 0xF0 - 0xFF */
    };
    return hash_table[i];
}

Disassembled into 23 bytes of code:

00007FF778131030  mov         byte ptr [rsp+8],cl  
00007FF778131034  push        rdi  
00007FF778131035  movzx       eax,byte ptr [i]  
00007FF77813103A  lea         rcx,[hash_table (07FF7781368C0h)]  
00007FF778131041  movzx       eax,byte ptr [rcx+rax]  
00007FF778131045  pop         rdi  
00007FF778131046  ret

And of course, 256 bytes of data.

A couple of notes:

You've mentioned in your question that using a hash-table will be faster than a switch/case statement in the worst case. Now, although this is not dictated by the C-language standard and every compiler may handle switch/case statements differently, a switch/case statements typically consists of a single branch, hence the amount of operations executed is the same for every case.
Please note that I've declared the hash_table variable as static. As a result, this array resides in the data-section instead of in the stack and initialized as part of the executable image (i.e., hard-coded), instead of every time the function is called. Again, this is not something that is dictated by the C-language standard, but most C compilers handle it in the same way. I cannot argue that it will improve memory consumption, since it depends on the initial amount of memory allocated to each segment (data-section and stack). But it will definitely improve execution performance, since the hash-table will be initialized as the executable image is loaded into memory.

Embedded C - Choice between switch/case & hashtable

Tags:

c

Antoine W

2 Answers

user694733

barak manos

Recent Activity

Donate For Us

Embedded C - Choice between switch/case & hashtable

Tags:

c

Antoine W

2 Answers

user694733

barak manos

Related questions

Recent Activity

Donate For Us