I work with a home-made board, with an ARM chip, in C. My code has to be fast to execute, and I also have space constraints. Now, I have to code a "parser" of hexadecimal values. Each hex number must be associate with a dec number (1; 2; 3 or 4). For now, and I don't think it will change much in the future, I have 100 hex values to parse. The hex values are "random", there is no particular pattern.
I started with a switch / case, like that :
switch (i)
{
case 0xF3:
case 0xF7:
case 0x02:
return 1;
break;
case 0x20:
case 0x40:
case 0xE0:
case 0xC0:
return 2;
break;
case 0x21:
case 0x41:
case 0x81:
case 0x61:
case 0xA1:
return 3;
break;
case 0xBB:
case 0xCC:
case 0x63:
return 4;
break;
default:
return 0;
break;
}
But I was thinking about a hashtable instead. Of course, it will be faster for the worst case, but it will take more space and is the hashtable really worth it for 100 values ?
Thanks for your answers, if you want precisions, just ask !
Antoine
You can put values in 256 byte array for fast access:
static uint8_t const table[256] = { 2, 3, 1, 4, ... };
return table[i];
You are only using 100 of 256 values, so there are "holes" which are wasted space, but it can compete with switch.
But since you only need 4 values, those can be reseprented in 2 bits. You can pack four values to one byte. Just use values 0-3 instead of 1-4:
#define PACK4(a, b, c, d) \
(((a)-1 << 0) | ((b)-1 << 2) | ((c)-1 << 4) | ((d)-1 << 6))
static uint8_t const table[64] = { PACK4(2, 3, 1, 4), PACK4(... };
int byteOffset = i / 4;
int bitOffset = i % 4 * 2;
return (table[byteOffset] >> bitOffset & 0x03) + 1;
As you've noted, there are two ways for implementing this.
In order to decide which method is favorable, you need to analyze each method under two aspects:
However, the answer to your question is platform-dependent.
In order to analyze the memory consumption, you need to compile both functions, check the disassembly and determine the amount of memory used for each one of them. Bare in mind that memory is used not only for variables, but also for code.
In order to analyze the execution performance, you basically need to run both functions a large number of times, and measure the average duration of each one of them. Keep in mind that running time also depends on the caching heuristic deployed by the underlying HW architecture, so the results will not necessarily be consistent if, for example, you test the first function immediately after the second function, and then again test the second function immediately after the first function.
Here is the memory consumption analysis on my platform (VS2013 compiler over x64):
Method 1:
uint8_t func1(uint8_t i)
{
switch (i)
{
case 0x02:
case 0xF3:
case 0xF7:
return 1;
case 0x20:
case 0x40:
case 0xC0:
case 0xE0:
return 2;
case 0x21:
case 0x41:
case 0x61:
case 0x81:
case 0xA1:
return 3;
case 0x63:
case 0xBB:
case 0xCC:
return 4;
default:
return 0;
}
}
Disassembled into 114 bytes of code:
00007FF778131050 mov byte ptr [rsp+8],cl
00007FF778131054 push rdi
00007FF778131055 sub rsp,10h
00007FF778131059 mov rdi,rsp
00007FF77813105C mov ecx,4
00007FF778131061 mov eax,0CCCCCCCCh
00007FF778131066 rep stos dword ptr [rdi]
00007FF778131068 movzx ecx,byte ptr [i]
00007FF77813106D movzx eax,byte ptr [i]
00007FF778131072 mov dword ptr [rsp],eax
00007FF778131075 mov eax,dword ptr [rsp]
00007FF778131078 sub eax,2
00007FF77813107B mov dword ptr [rsp],eax
00007FF77813107E cmp dword ptr [rsp],0F5h
00007FF778131085 ja $LN5+10h (07FF7781310B6h)
00007FF778131087 movsxd rax,dword ptr [rsp]
00007FF77813108B lea rcx,[__ImageBase (07FF778130000h)]
00007FF778131092 movzx eax,byte ptr [rcx+rax+10D4h]
00007FF77813109A mov eax,dword ptr [rcx+rax*4+10C0h]
00007FF7781310A1 add rax,rcx
00007FF7781310A4 jmp rax
00007FF7781310A6 mov al,1
00007FF7781310A8 jmp $LN5+12h (07FF7781310B8h)
00007FF7781310AA mov al,2
00007FF7781310AC jmp $LN5+12h (07FF7781310B8h)
00007FF7781310AE mov al,3
00007FF7781310B0 jmp $LN5+12h (07FF7781310B8h)
00007FF7781310B2 mov al,4
00007FF7781310B4 jmp $LN5+12h (07FF7781310B8h)
00007FF7781310B6 xor al,al
00007FF7781310B8 add rsp,10h
00007FF7781310BC pop rdi
00007FF7781310BD ret
00007FF7781310BE xchg ax,ax
00007FF7781310C0 cmps byte ptr [rsi],byte ptr [rdi]
00007FF7781310C1 adc byte ptr [rax],al
Method 2:
uint8_t func2(uint8_t i)
{
static const uint8_t hash_table[] =
{
/* 0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9, 0xA, 0xB, 0xC, 0xD, 0xE, 0xF, */
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0x00 - 0x0F */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0x10 - 0x1F */
2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0x20 - 0x2F */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0x30 - 0x3F */
2, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0x40 - 0x4F */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0x50 - 0x5F */
0, 3, 0, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0x60 - 0x6F */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0x70 - 0x7F */
0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0x80 - 0x8F */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0x90 - 0x9F */
0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0xA0 - 0xAF */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, 0, /* 0xB0 - 0xBF */
2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 4, 0, 0, 0, /* 0xC0 - 0xCF */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0xD0 - 0xDF */
2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* 0xE0 - 0xEF */
0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, /* 0xF0 - 0xFF */
};
return hash_table[i];
}
Disassembled into 23 bytes of code:
00007FF778131030 mov byte ptr [rsp+8],cl
00007FF778131034 push rdi
00007FF778131035 movzx eax,byte ptr [i]
00007FF77813103A lea rcx,[hash_table (07FF7781368C0h)]
00007FF778131041 movzx eax,byte ptr [rcx+rax]
00007FF778131045 pop rdi
00007FF778131046 ret
And of course, 256 bytes of data.
A couple of notes:
switch/case statement in the worst case. Now, although this is not dictated by the C-language standard and every compiler may handle switch/case statements differently, a switch/case statements typically consists of a single branch, hence the amount of operations executed is the same for every case.hash_table variable as static. As a result, this array resides in the data-section instead of in the stack and initialized as part of the executable image (i.e., hard-coded), instead of every time the function is called. Again, this is not something that is dictated by the C-language standard, but most C compilers handle it in the same way. I cannot argue that it will improve memory consumption, since it depends on the initial amount of memory allocated to each segment (data-section and stack). But it will definitely improve execution performance, since the hash-table will be initialized as the executable image is loaded into memory.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With