I use Delphi 10.1 Berlin in Windows 10.
I have two records of different sizes. I wrote code to loop through two TList<T> of these records to test elapsed times. Looping through the list of the larger record runs much slower.
Can anyone explain the reason, and provide a solution to make the loop run faster?
type
  tTestRecord1 = record
    Field1: array[0..4] of Integer;
    Field2: array[0..4] of Extended;
    Field3: string;
  end;
  tTestRecord2 = record
    Field1: array[0..4999] of Integer;
    Field2: array[0..4999] of Extended;
    Field3: string;
  end;
procedure TForm1.Button1Click(Sender: TObject);
var
  _List: TList<tTestRecord1>;
  _Record: tTestRecord1;
  _Time: TTime;
  i: Integer;
begin
  _List := TList<tTestRecord1>.Create;
  for i := 0 to 4999 do
  begin
    _List.Add(_Record);
  end;
  _Time := Time;
  for i := 0 to 4999 do
  begin
    if _List[i].Field3 = 'abcde' then
    begin
      Break;
    end;
  end;
  Button1.Caption := FormatDateTime('s.zzz', Time - _Time); // 0.000
  _List.Free;
end;
procedure TForm1.Button2Click(Sender: TObject);
var
  _List: TList<tTestRecord2>;
  _Record: tTestRecord2;
  _Time: TTime;
  i: Integer;
begin
  _List := TList<tTestRecord2>.Create;
  for i := 0 to 4999 do
  begin
    _List.Add(_Record);
  end;
  _Time := Time;
  for i := 0 to 4999 do
  begin
    if _List[i].Field3 = 'abcde' then
    begin
      Break;
    end;
  end;
  Button2.Caption := FormatDateTime('s.zzz', Time - _Time); // 0.045
  _List.Free;
end;
First of all, I want to consider the entire code, even the code that populates the list which I do realise you have not timed. Because the second record is larger in size more memory needs to be copied when you make an assignment of that record type. Further when you read from the list the larger record is less cache friendly than the smaller record which impacts performance. This latter effect is likely less significant than the former.
Related to this is that as you add items the list's internal array of records has to be resized. Sometimes the resizing leads to a reallocation that cannot be performed in-place. When that happens a new block of memory is allocated and the previous content is copied to this new block. That copy is clearly ore expensive for the larger record. You can mitigate this by allocating the array once up front if you know it's length. The list Capacity is the mechanism to use. Of course, not always will you know the length ahead of time. 
Your program does very little beyond memory allocation and memory access. Hence the performance of these memory operations dominates.
Now, your timing is only of the code that reads from the lists. So the memory copy performance difference on population is not part of the benchmarking that you performed. Your timing differences are mainly down to excessive memory copy when reading, as I will explain below.
Consider this code:
if _List[i].Field3 = 'abcde' then
Because _List[i] is a record, a value type, the entire record is copied to an implicit hidden local variable. The code is actually equivalent to:
var
  tmp: tTestRecord2;
...
tmp := _List[i]; // copy of entire record
if tmp.Field3 = 'abcde' then
There are a few ways to avoid this copy:
TList<T> to dynamic array TArray<T>. That simple change will allow the compiler to access individual fields directly without copying entire records.TList<T>.List to obtain access to the list object's underlying array holding the data. That would have the same effect as the previous item.Item 4 above is the simplest change you could make to see a large difference. You would replace
if _List[i].Field3 = 'abcde' then
with
if _List.List[i].Field3 = 'abcde' then
and that should yield a very significant change in performance.
Consider this program:
{$APPTYPE CONSOLE}
uses
  System.Diagnostics,
  System.Generics.Collections;
type
  tTestRecord2 = record
    Field1: array[0..4999] of Integer;
    Field2: array[0..4999] of Extended;
    Field3: string;
  end;
procedure Main;
const
  N = 100000;
var
  i: Integer;
  Stopwatch: TStopwatch;
  List: TList<tTestRecord2>;
  Rec: tTestRecord2;
begin
  List := TList<tTestRecord2>.Create;
  List.Capacity := N;
  for i := 0 to N-1 do
  begin
    List.Add(Rec);
  end;
  Stopwatch := TStopwatch.StartNew;
  for i := 0 to N-1 do
  begin
    if List[i].Field3 = 'abcde' then
    begin
      Break;
    end;
  end;
  Writeln(Stopwatch.ElapsedMilliseconds);
end;
begin
  Main;
  Readln;
end.
I had to compile it for 64 bit to avoid an out of memory condition. The output on my machine is around 700. Change List[i].Field3 to List.List[i].Field3 and the output is in single figures. The timing is rather crude, but I think this demonstrates the point.
The issue of the large record not being cache friendly remains. That is more complicated to deal with and would require a detailed analysis of how the real world code operated on its data.
As an aside, if you care about performance then you won't use Extended. Because it has size 10, not a power of two, memory access is frequently mis-aligned. Use Double or Real which is an alias to Double. 
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With