Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance issue with StreamReader and StreamWriter

I just found a performance problem in my application related with the creation of StreamWriter and StreamReader. I have been testing the performance of a very simple application, these tests have been done locally in the same machine, several times.

The client application, try to create 4000 connections and send a message every 200ms. The server is an ECHO service that accepts connections, and return the input.

Using this code in the server to handle the socket connection and return data:

StreamReader sr = null;
StreamWriter sw = null;
try
{
    var stream = client.GetStream();
    sr = new StreamReader(stream, Encoding.UTF8);
    sw = new StreamWriter(stream, Encoding.UTF8);
    while (!cancel.IsCancellationRequested && client.Connected)
    {
        var msg = await sr.ReadLineAsync(); ;

        if (msg == null)
            continue;

        _inMessages.Increment();    
        _inBytes.IncrementBy(msg.Length);   

        await sw.WriteLineAsync(msg);   
        await sw.FlushAsync();  

        _outMessages.Increment();   
        _outBytes.IncrementBy(msg.Length);  

    }   
}   
catch (Exception aex)   
{   
    var ex = aex.GetBaseException();    
    Console.WriteLine("Client error: " + ex.Message);   
}   
finally 
{   
    _connected.Decrement(); 
    if(sr != null)  
        sr.Dispose();   

    if(sw != null)  
        sw.Dispose();   
}   

Allows to connect the 4000 clients very quick, using a 28-30% of CPU is processing around 14000 (14 thousands) messages per second consistently.

In the other hand, with this code:

 StreamReader sr = null;
 StreamWriter sw = null;
 try
 {
     var stream = client.GetStream();
     while (!cancel.IsCancellationRequested && client.Connected)
     {
         sr = new StreamReader(stream, Encoding.UTF8); // moved
         sw = new StreamWriter(stream, Encoding.UTF8); // moved

         var msg = await sr.ReadLineAsync(); ;

         if (msg == null)
             continue;

         _inMessages.Increment();
         _inBytes.IncrementBy(msg.Length);

         await sw.WriteLineAsync(msg);
         await sw.FlushAsync();

         _outMessages.Increment();
         _outBytes.IncrementBy(msg.Length);

     }
 }
 catch (Exception aex)
 {
     var ex = aex.GetBaseException();
     Console.WriteLine("Client error: " + ex.Message);
 }
 finally
 {
     _connected.Decrement();
     if(sr != null)
         sr.Dispose();

     if(sw != null)
         sw.Dispose();
 }

Allows to connect the 4000 clients, but the last 500 take a while to connect. Using a 30-32% of CPU, process around 6000 messages per second.

During both test, there were around 20%-30% CPU available and plenty of RAM memory.

I understand that creating objects in a loop is not efficient, but this impact is too big, and I would like to understand what is going on here. If in the second code snippet, I put the sr and sw on using statements, it is even worse, only 1500 clients get to connect, and only processes around 1000 messages per second, probably because the StreamReader and StreamWriter are disposing (or trying to dispose) the underlying NetworkStream.

Does the performance decrease that much only because the StreamReader and StreamWriter objects allocation? Or there is something else with those particular classes?

The full code can be found here: https://github.com/vtortola/AynchronousTCPListener

In the real code, till I read the first bytes of the Stream (the frame header), I don't know if the information is binary or text, so I cannot create the reader and writer before hand. Basically I get the head, and then I can decide what to do with the message. What would be a better approach?

UPDATE:

I have enabled two performance counters to monitorize the number of threads for the server application. I let it run for 5 min, and these are the figures I got:

With the the first code snippet (the fast one):

# of current logical Threads: 108
# of current physical Threads: 106

With the the second code snippet (the slow one):

# of current logical Threads: 22
# of current physical Threads: 20

This explains why the drop of performance, but why does that change so much impact in the threading?

Also, the memory usage in the first case is around 120Mb, and in the second 320Mb, an increment of 266%.

like image 404
vtortola Avatar asked Dec 27 '25 20:12

vtortola


1 Answers

There are few problems with the second version and the slower message processing is just a peek of the iceberg. Let's start.

Why it's slower?

You are right that creating objects should not be a problem and it is not, but managing them in memory is. If you add GC performance counters to the performance monitor you would observe a steep growth of garbage collections. Look at the 2 images below:

First (correct) case:

enter image description here

Second (wrong) case:

enter image description here

The time spent by CPU on garbage collecting is much much higher leaving less precious CPU time for your code. Also, as you noted, in the second case memory usage is much higher than in the first case. To understand this you need to realize how GC heap is built. Basically there are 3 memory segments called generations (there is also Large Object Heap but it's out of interest in our case):

  • Gen0 (the smallest) is for short-lived objects that did not survived even one GC collection
  • Gen1 (medium size) is for objects that have lived longer in your application and survived one GC collection (thus were moved from Gen0)
  • Gen2 (the biggest) is for long-lived objects - there is no time limit for objects living here and (because of the size) Gen2 garbage collections are the most expensive and thus are performed less often by GC

Moving back to your application. When you create a StreamReader or StreamWriter class in each loop you are quickly depleting Gen0 free space forcing GC to collect memory in this segment. The stream objects won't be immediately disposed as references to them might be hold by the asynchronous tasks. Therefore they are moved to the Gen1 segment, again depleting it and causing GC to perform Gen1 garbage collection. Finally they either are disposed by Getn1 garbage collection or land in Gen2. As I said previously Gen2 grows in size when objects stored in it are not disposed which explains the higher memory usage in the second case. Thanks to GC reluctance to perform Gen2 collections your network streams (which are used by readers and writes) are not quickly disposed allowing your server to receive clients messages. And we are slowly moving to the next point which is:

Why it's wrong?

When you create your readers and writers you are using constructors which force them to close the underlying stream on disposing. Which means that you have no control over when your client's network streams will be closed. That's why you probably observe many connection drops and retrials on the client side. Better suited constructors for this case would be:

sr = new StreamReader(stream, Encoding.UTF8, true, 1024, true);
sw = new StreamWriter(stream, Encoding.UTF8, 1024, true);

which leave the undelying connection open leaving you in charge of disposing it (which you SHOULD DO). To conclude, stay with the first version but change constructors of reader and writer and add client.Dispose to your finally block :)

like image 53
Sebastian Avatar answered Dec 30 '25 13:12

Sebastian



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!