3 different messages are being sent to the same port at different rates:
Message  size (bytes)  Sent everytransmit speed
High           232                 10 ms          100Hz                  
Medium     148                 20ms           50Hz                    
Low            20                   60 ms          16.6Hz                 
I can only process one message every ~ 6 ms.
Single threaded.  Blocking read.
A strange situation is occurring, and I don't have an explanation for it.
When I set my receive buffer to 4,799 bytes, all of my low speed messages get dropped.
I see maybe one or two get processed, and then nothing.  
When I set my receive buffer to 4,800(or higher!), it appears as though all of the low speed messages start getting processed.  I see about 16/17 a second.  
This has been observed consistently. The application sending the packets is always started before the receiving application. The receiving application always has a long delay after the sockets are created, and before it begins processing. So the buffer is always full when the processing starts, and it is not the same starting buffer each time a test occurs. This is because the socket is created after the sender is already sending out messages, so the receiver might start listening in the middle of a send cycle.
Why does increasing the received buffer size a single byte, cause a huge change in low speed message processing?
I built a table to better visualize the expected processing: 
  
As some of these messages get processed, more messages presumably get put on the queue instead of being dropped.
Nonetheless, I would expect a 4,799 byte buffer to behave the same way as 4,800 bytes.  
However that is not what I have observed.
I think the issue is related to the fact that low speed messages are sent at the same time as the other two messages. It is always received after the high/medium speed messages. (This has been confirmed over wireshark).
For example, assuming the buffer was empty to begin with,  it is clear that the low speed message would need queued longer than the other messages.
*1 message every 6ms is about 5 messages every 30ms.

This still doesn't explain the buffer size.
We are running VxWorks, and using their sockLib, which is an implementation of Berkeley sockets.  Here is a snippet of what our socket creation looks like:
SOCKET_BUFFER_SIZE is what I'm changing.  
struct sockaddr_in tSocketAddress;                          // Socket address
int     nSocketAddressSize = sizeof(struct sockaddr_in);    // Size of socket address structure
int     nSocketOption = 0;
// Already created
if (*ptParameters->m_pnIDReference != 0)
    return FALSE;
// Create UDP socket
if ((*ptParameters->m_pnIDReference = socket(AF_INET, SOCK_DGRAM, 0)) == ERROR)
{
    // Error
    CreateSocketMessage(ptParameters, "CreateSocket: Socket create failed with error.");
    // Not successful
    return FALSE;
}
// Valid local address
if (ptParameters->m_szLocalIPAddress != SOCKET_ADDRESS_NONE_STRING && ptParameters->m_usLocalPort != 0)
{
    // Set up the local parameters/port
    bzero((char*)&tSocketAddress, nSocketAddressSize);
    tSocketAddress.sin_len = (u_char)nSocketAddressSize;
    tSocketAddress.sin_family = AF_INET;
    tSocketAddress.sin_port = htons(ptParameters->m_usLocalPort);
    // Check for any address
    if (strcmp(ptParameters->m_szLocalIPAddress, SOCKET_ADDRESS_ANY_STRING) == 0)
        tSocketAddress.sin_addr.s_addr = htonl(INADDR_ANY);
    else
    {
        // Convert IP address for binding
        if ((tSocketAddress.sin_addr.s_addr = inet_addr(ptParameters->m_szLocalIPAddress)) == ERROR)
        {
            // Error
            CreateSocketMessage(ptParameters, "Unknown IP address.");
            // Cleanup socket
            close(*ptParameters->m_pnIDReference);
            *ptParameters->m_pnIDReference = ERROR;
            // Not successful
            return FALSE;
        }
    }
    // Bind the socket to the local address
    if (bind(*ptParameters->m_pnIDReference, (struct sockaddr *)&tSocketAddress, nSocketAddressSize) == ERROR)
    {
        // Error
        CreateSocketMessage(ptParameters, "Socket bind failed.");
        // Cleanup socket
        close(*ptParameters->m_pnIDReference);
        *ptParameters->m_pnIDReference = ERROR;
        // Not successful
        return FALSE;
    }
}
// Receive socket
if (ptParameters->m_eType == SOCKTYPE_RECEIVE || ptParameters->m_eType == SOCKTYPE_RECEIVE_AND_TRANSMIT)
{
    // Set the receive buffer size
    nSocketOption = SOCKET_BUFFER_SIZE;
    if (setsockopt(*ptParameters->m_pnIDReference, SOL_SOCKET, SO_RCVBUF, (char *)&nSocketOption, sizeof(nSocketOption)) == ERROR)
    {
        // Error
        CreateSocketMessage(ptParameters, "Socket buffer size set failed.");
        // Cleanup socket
        close(*ptParameters->m_pnIDReference);
        *ptParameters->m_pnIDReference = ERROR;
        // Not successful
        return FALSE;
    }
}
and the socket receive that's being called in an infinite loop:
*The buffer size is definitely large enough  
int SocketReceive(int nSocketIndex, char *pBuffer, int nBufferLength)
{
    int nBytesReceived = 0;
    char szError[256];
    // Invalid index or socket
    if (nSocketIndex < 0 || nSocketIndex >= SOCKET_COUNT || g_pnSocketIDs[nSocketIndex] == 0)
    {
        sprintf(szError,"SocketReceive: Invalid socket (%d) or ID (%d)", nSocketIndex, g_pnSocketIDs[nSocketIndex]);
        perror(szError);
        return -1;
    }
    // Invalid buffer length
    if (nBufferLength == 0)
    {
        perror("SocketReceive: zero buffer length");
        return 0;
    }
    // Send data
    nBytesReceived = recv(g_pnSocketIDs[nSocketIndex], pBuffer, nBufferLength, 0);
    // Error in receiving
    if (nBytesReceived == ERROR)
    {
        // Create error string
        sprintf(szError, "SocketReceive: Data Receive Failure: <%d> ", errno);
        // Set error message
        perror(szError);
        // Return error
        return ERROR;
    }
    // Bytes received
    return nBytesReceived;
}
Any clues on why increasing the buffer size to 4,800 results in successful and consistent reading of low speed messages?
Congestion in the network is the primary reason for packet loss in UDP, as every communication network has a flow limit. For example, network congestion is similar to a traffic jam on the road, where exceeding the maximum number of vehicles allowed on a given road may cause traffic to slow or stop during peak hours.
The default send buffer size for UDP sockets is 65535 bytes. The default receive buffer size for UDP sockets is 2147483647 bytes.
On every UDP socket, there's a “socket send buffer” that you put packets into. The Linux kernel deals with those packets and sends them out as quickly as possible. So if you have a network card that's too slow or something, it's possible that it will not be able to send the packets as fast as you put them in!
The basic answer to the question of why a SO_RCVBUF size of 4799 results in lost low speed messages and a size of 4800 works fine is that with the mixture of the UDP packets coming in, the rate at which they are coming in, the rate at which you are processing incoming packets, and the sizing of the mbuff and cluster numbers in your vxWorks kernel allow for sufficient network stack throughput that the low speed messages are not being discarded with the larger size.
The SO_SNDBUF option description in the setsockopt() man page at URL http://www.vxdev.com/docs/vx55man/vxworks/ref/sockLib.html#setsockopt mentioned in a comment above  has this to say about the size specified and the effect on mbuff usage:
The effect of setting the maximum size of buffers (for both SO_SNDBUF and SO_RCVBUF, described below) is not actually to allocate the mbufs from the mbuf pool. Instead, the effect is to set the high-water mark in the protocol data structure, which is used later to limit the amount of mbuf allocation.
UDP packets are discrete units. If you send 10 packets of size 232 that is not considered to be 2320 bytes of data in contiguous memory. Instead that is 10 memory buffers within the network stack because UDP is discrete packets while TCP is a continuous stream of bytes.
See How do I tune the network buffering in VxWorks 5.4? in the DDS community web site which gives a discussion about the interdependence of the mixture of mbuff sizes and network clusters.
See How do I resolve a problem with VxWorks buffers? in the DDS community web site.
See this PDF of a slide presentation, A New Tool to study Network Stack Exhaustion in VxWorks from 2004 which discusses using various tools such as mBufShow and inetStatShow to see what is happening in the network stack.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With