So, we have a long standing commercial product, that is well established and I've never seen this type of issue before. We use a client program to send data to a server. Sometimes, because of firewalls in customer environments, we allow the end user to specify outbound port ranges to bind, however, in this particular issue i'm seeing, we're not doing that, and are using port 0 to perform a bind. From everything i've read, this means to pick a random port. But what I can't find out is, what does that mean to the kernel/OS. If i'm asking for a random port, how can that already be in use? Strictly speaking, only the unique pairing of src ip/src port & dst ip/port make the connection unique. I believe the same port can be used, if talking to another destination ip, but maybe that's not relevant here.
Also, this doesn't happen on all the customer's systems, only some. So, this may be some form of load related issue. The systems are fairly busy i'm told.
Here is the code we're using. I left out some of the ifdef code for windows, and left out what we do after the bind for shortness.
    _SocketCreateClient(Socket_pwtP sock, SocketInfoP sInfo )
{
int nRetries;                       /* number of times to try connect()  */
unsigned short port;
BOOL success = FALSE;
BOOL gotaddr = FALSE;
char buf[INET6_ADDRSTRLEN] ="";
int connectsuccess =1;
int ipv6compat =0;
#ifdef SOCKET_SEND_TIMEOUT
struct timeval time;
#endif /* SOCKET_SEND_TIMEOUT */
nRetries = sInfo->si_nRetries;
sock->s_hostName = strdup(sInfo->si_hostName);
#ifdef DEBUG_SOCKET
LogWrite(LogF,LOG_WARNING,"Socket create client");
LogWrite(LogF,LOG_WARNING,"Number of retries = %d", nRetries);
#endif
ipv6compat = GetIPVer();
if (ipv6compat == -1) /* ipv6 not supported */
    gotaddr = GetINAddr(sInfo->si_hostName, &sock->s_sAddr.sin_addr);
else
    gotaddr = GetINAddr6(sInfo->si_hostName, &sock->s_sAddr6.sin6_addr);
/* translate supplied host name to an internet address */
if (!gotaddr) {
                        /* print this message only once */
                        if ( sInfo->si_logInfo && ( sInfo->si_nRetries == 1 ) )
                        {
                           LogWrite(LogF, LOG_ERR,
           "unable to resolve ip address for host '%s'", sInfo->si_hostName);
                        }
                        sock = _SocketDestroy(sock);
}
else {
    if (ipv6compat == 1) /* ipv6 supported */
    {
            /* try to print the address in sock->s_sAddr6.sin6_addr to make sure it's good.  from call above */
            LogWrite(LogF, LOG_DEBUG2, "Before call to inet_ntop");
            inet_ntop(AF_INET6, &sock->s_sAddr6.sin6_addr, buf, sizeof(buf));
            LogWrite (LogF, LOG_DEBUG2, "Value of sock->s_sAddr6.sin6_addr from GetINAddr6: %s", buf);
            LogWrite (LogF, LOG_DEBUG2, "Value of sock->s_sAddr6.sin6_scope_id from if_nametoindex: %d", sock->s_sAddr6.sin6_scope_id);
            LogWrite (LogF, LOG_DEBUG2, "Value of sock->s_type: %d", sock->s_type);
    }
    /* try to create the socket nRetries times */
    while (sock && sock->s_id == INVALID_SOCKET) {
        int socketsuccess = FALSE;
        /* create the actual socket */
        if (ipv6compat == -1) /* ipv6 not supported */
            socketsuccess = sock->s_id = socket(AF_INET, sock->s_type, 0);
        else
            socketsuccess = sock->s_id = socket(AF_INET6, sock->s_type, 0);
        if ((socketsuccess) == INVALID_SOCKET) {
            GETLASTERROR;
            LogWrite(LogF, LOG_ERR, "unable to create socket: Error %d: %s", errno,
            strerror(errno) );
            sock = _SocketDestroy(sock);
        }
        else
        {
             /* cycle through outbound port range for firewall support */
            port = sInfo->si_startPortRange;
         while ( !success && port <= sInfo->si_endPortRange ) {
                    int bindsuccess = 1;
             /* bind to outbound port number */
                    if ( ipv6compat == -1) /* ipv6 not supported */
                    {
                            sock->s_sourceAddr.sin_port   = htons(port);
                            bindsuccess = bind(sock->s_id, (struct sockaddr *) &sock->s_sourceAddr,
                                             sizeof(sock->s_sourceAddr));
                    }
                    else {
                            sock->s_sourceAddr6.sin6_port   = htons(port);
                            inet_ntop(AF_INET6, &sock->s_sourceAddr6.sin6_addr, buf, sizeof(buf));
                            LogWrite(LogF, LOG_DEBUG,
                                            "attempting bind to s_sourceAddr6 %s ", buf);
                            bindsuccess = bind(sock->s_id, (struct sockaddr *) &sock->s_sourceAddr6,
                                             sizeof(sock->s_sourceAddr6));
                    }
                     if (bindsuccess == -1) {
                            GETLASTERROR;
                            LogWrite(LogF, LOG_ERR,
                                    "unable to bind port %d to socket: Error %d: %s. Will attempt next port if protomgr port rules configured(EAV_PORTS).", port, errno, strerror(errno) );
                            /* if port in use, try next port number */
                          port++;
              }
              else {
                    /* only log if outbound port was specified */
                    if (port != 0)
                             {
                               if ( sInfo->si_sourcehostName ) {
                                  LogWrite(LogF, LOG_DEBUG,
                                       "bound outbound address %s:%d to socket",
                                             sInfo->si_sourcehostName, port);
                               }
                               else {
                                  LogWrite(LogF, LOG_DEBUG,
                                       "bound outbound port %d to socket", port);
                               }
                            }
                            success = TRUE;
              }
         }
        }
    }
}
return(sock);
}
The errors we're seeing in our log file look like this. It's making 2 tries and both fail:
protomgr[628453] : ERROR: unable to bind port 0 to socket: Error 98: Address already in use. Will attempt next port if protomgr port rules configured(EAV_PORTS).
protomgr[628453] : ERROR: unable to bind port(s) to socket: Error 98: Address already in use. Consider increase the number of EAV_PORTS if this msg is from protomgr.
protomgr[628453] : ERROR: unable to bind port 0 to socket: Error 98: Address already in use. Will attempt next port if protomgr port rules configured(EAV_PORTS).
protomgr[628453] : ERROR: unable to bind port(s) to socket: Error 98: Address already in use. Consider increase the number of EAV_PORTS if this msg is from protomgr.
When trying to bind on port 0, actually a random port is selected.
Another option is to specify port 0 to bind() . That will allow you to bind to a specific IP address (in case you have multiple installed) while still binding to a random port. If you need to know which port was picked, you can use getsockname() after the binding has been performed.
The Error “address already in use” occurred because some process was already running on the same port. So we can resolve the issue just by killing the process. To stop the process, we need the process ID (PID), which we can fetch using the lsof command.
To do so, open the program options by going to Edit -> Options -> Browsers and change the value of the WebSockets port. The same value must then also be set in the browser add-on (within the browser itself).
So, it looks like this was related to the system running out of available ports, and it being configured to only have about 9000 port available.
This setting, in /etc/sysctl.conf controls the available ports: net.ipv4.ip_local_port_range = 9000 65500
the first number is the starting port, and the second is the max. This example was pulled from a unaltered Suse Enterprise linux server 11.0. The customer of ours who reported this problem had their configured in such a way it only had around 9000 ports available in the range they defined, and all were used on the system.
Hopefully, this helps someone else in the future.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With