Fast TCP Loopback Performance and Low Latency with Windows Server 2012 TCP Loopback Fast Path

TCP Loopback Fast Path is a new feature introduced in Windows Server 2012 and Windows 8. If you use the TCP loopback interface for inter-process communications (IPC), you may be interested in the improved performance, improved predictability, and reduced latency the TCP Loopback Fast Path can provide. This feature preserves TCP socket semantics and platform capabilities including the Windows Filtering Platform (WFP), and works on both non-virtualized and virtualized operating system instances.

This article will provide an overview of the TCP loopback fast path capability, and will show how to use it programmatically in Windows 8 and Windows Server 2012 using both the native Winsock API and the .NET Socket Classes.

Overview

The TCP loopback interface provides a simple local IPC mechanism for processes on the same operating system instance, and it can easily be switched to a remote IPC mechanism by simply changing the destination IP address. This makes it easy, for example, to run the client and server portions of a client-server application on the same machine, and then reconfigure the processes so that they run on separate machines.

Although different IPC mechanisms can provide higher performance and lower latency, (e.g. shared memory, MPI, etc.), the TCP loopback interface remains a popular choice because it is easy to configure and use.

The Internet Protocol (IP) specifications include a loopback network to allow the exchange of network traffic between endpoints residing on the same the local host system. In IPv4, the loopback network is specified by either the address block 127.0.0.0/8, (commonly specified as 127.0.0.1), or the reserved top level domain name "localhost" (RFC 2606) may be used to specify the loopback network. IPv6 provides a single loopback interface unicast address 0:0:0:0:0:0:0:1, which can also be abbreviated as ::1 (RFC 3513 ). It is this loopback network that the loopback fast path feature acts upon.

Performance

The performance of the default TCP loopback interface is quite good, and adequate for most applications. However, applications with more demanding throughput requirements, or applications in which achieving low latency IPC is important may benefit from the use of this newly introduced capability.

The following table provides a comparison of TCP loopback latency, message rates, and jitter for Windows Server 2008R2 and Windows Server 2012 using IO completion ports and RIO sockets in conjunction with the loopback fast path capability. 

The test uses two single-threaded processes: one sending TCP messages, and the other echoing the messages back to the sender. Each outbound message contains a payload of 64 bytes, and includes a 64 bit origination timestamp from QueryPerformanceCounter. When the sender receives the message back from the echo process, it again calls QueryPerformanceCounter , and calculates the round-trip delay using the timestamp in arriving message. Each round-trip time measurement is stored in a table which is used to generate a statistical summary at the end of the test run. On this particular machine, a desktop class 6-core AMD 3.2 GHz processor , the time stamp returned from QueryPerformanceCounter has a resolution of 319 nanoseconds and is derived from the CPU timestamp counter (TSC) register. The message send rate is constrained by the round-trip-time in this test, and is therefore equal to the reciprocal of the round trip time because the sender must wait for a response before sending the next outbound message. Because there is very little variation in the round-trip times in a particular test run, the arrival rate is uniform. The TCP Nagle algorithm is disabled (TCP_NODELAY socket option) on both the sender and echo process sockets.

Windows Server Version

Socket Completion Style

Loopback Fast Path Enabled

Mean Latency in Microseconds (Round Trip)

Round Trip Messages Per Second

Relative Latency Reduction

Relative Jitter Reduction

WS 2008R2

IOCP wait

No

27

 37,000

0%

0%

WS 2008R2

IOCP poll

No

26

 38,600

4%

12%

WS 2012

IOCP wait

No

17

 59,000

37%

45%

WS 2012

IOCP wait

Yes

12

  83,000

55%

45%

WS 2012

RIO poll

No

 9

111,000

67%

76%

WS 2012

RIO poll

Yes

 3

333,000

89%

90%

               

The table illustrates that Windows Server 2012 can achieve lower latency, and higher message rates when using the TCP loopback interface than Windows Server 2008R2 can. In this particular test, just moving to Windows Server 2012 from Windows Server 2008R2 improves the performance and latency by almost 37%. Adding in loopback fast path and RIO sockets, provides about an 8 or 9 fold improvement in performance, latency, and jitter.

Jitter, in this context, refers to the variation in round-trip latency during a test run. The table above compares the relative reduction in jitter using the Windows Server 2008R2 IOCP measurements appearing in row 1 as the base case. The very low jitter which can be achieved with Windows Server 2012 even at very high message rates highlights the effectiveness of the Windows Server 2012 networking and kernel enhancements. In the bottom row, representing RIO polling with Loopback Fast Path, most of the latency samples were within ± 319 nanoseconds of the mean value.

How TCP Loopback Fast Path Works

The default behavior of the TCP loopback interface is to move local TCP traffic through most of the network stack, including AFD (which is essentially the kernel mode representation of a user mode TCP socket), as well as the layers corresponding to TCP and IP protocol layers.

Windows Server 2012 and Windows 8 introduce a new optional fast path, which takes a shorter path through the network stack as depicted below. This capability is activated on a per-socket basis by the SIO_LOOPBACK_FAST_PATH I/O Control Code (IOCTL). When activated, local TCP traffic bypasses some of the more computationally expensive processing in the lower stack layers resulting in higher throughput, lower latency and improved predictability.

 

 

Using the TCP Loopback Fast Path

The newly introduced Winsock I/O Control Code (IOCTL) SIO_LOOPBACK_FAST_PATH is used to enable the loopback fast path on a per-socket basis. It will work only with TCP (and not UDP, multicast, or RAW). The IOCTL is defined in the header file mstcpip.h.

It can be used with standard Winsock sockets and the newly introduced RIO (Registered I/O) sockets.

To enable the fast path: 

  1. Both ends of the TCP session must enable the capability by setting the SIO_LOOPBACK_FAST_PATH IOCTL.
  2. The process initiating the TCP connection must set the SIO_LOOPBACK_FAST_PATH IOCTL priorto establishing the TCP session.
  3. The target of the connection request must set the SIO_LOOPBACK_FAST_PATH IOCTL on the listen socket, that is, prior to accepting the connection.
  4. This capability requires Windows Server 2012 or Windows 8.

The following code excerpt demonstrates how this is done in native C++ code. The code sets the value of the SIO_LOOPBACK_FAST_PATH option to 1, thereby enabling the loopback fast path.

     int OptionValue = 1;      DWORD NumberOfBytesReturned = 0;      int status =          WSAIoctl(              Socket,              SIO_LOOPBACK_FAST_PATH,              &OptionValue,              sizeof(OptionValue),              NULL,              0,              &NumberOfBytesReturned,              0,              0);      if (SOCKET_ERROR == status) {          DWORD LastError = ::GetLastError();          if (WSAEOPNOTSUPP == LastError) {              // This system is not Windows Windows              // Server 2012, and the call is not              // supported.          }          else {              LogAMessageSomeWhere(                  "Loopback Fastpath WSAIoctl failed: ",                  LastError);          }      }

In managed code, this socket option is not currently part of the .NET IOCTL enumeration, but can be easily specified using the System.Net.Socket.IOControl method using a numerical parameter in place of the enumerated value. The numerical value of the SIO_LOOPBACK_FAST_PATH defined in mstcpip.h is 0x98000010. The following C# sample demonstrates how to do this:

 
    // The value of SIO_LOOPBACK_FAST_PATH in mstpcip.h
    // is 0x98000010 which is the same as (-1744830448). 
 
    const int SIO_LOOPBACK_FAST_PATH = (-1744830448);

    Socket S = 
        new Socket(
           AddressFamily.InterNetwork,
           SocketType.Stream,
           ProtocolType.Tcp);
 
    Byte[] OptionInValue = BitConverter.GetBytes(1);

    try {
        S.IOControl(
             SIO_LOOPBACK_FAST_PATH,
             OptionInValue,
             null);
     }
     catch (SystemException e) {

         // If the operating system version on this machine did
         // not support SIO_LOOPBACK_FAST_PATH (i.e. version
         // prior to Windows 8 / Windows Server 2012), handle the exception
 
         Console.WriteLine( "Setting Loopback Fast Path Failed");

     }

Existing Applications and Loopback Fast Path

The default behavior of the TCP loopback interface is unchanged in Windows Server 2012 thereby preserving compatibility. Adding support for the loopback fast path enhancement to an existing or new application requires adding the code to set the SIO_LOOPBACK_FAST_PATH IOCTL in the manner described above.   

Since SIO_LOOPBACK_FAST_PATH is a new feature added in Windows 8 and Windows Sever 2012, it will not work on prior operating system versions, and the call to set the IOCTL will fail, and set the last error value to WSAEOPNOTSUPP.

 

Summary

The TCP Loopback Fast Path feature is new to Windows 8 and Windows Server 2012. It offers software developers an opportunity to reduce latency and improve performance for applications which use the TCP loopback interface for inter-process communication. It can be used in both Native and Managed code, and on virtualized or non-virtualized systems.

System performance is influenced by many factors, including workload characteristics, application design, measurement techniques, hardware capabilities, and processor speed, to name a few. Accordingly, the performance you achieve may be different from that described above.

Fortunately, in many cases, trying loopback fast path is easy, and trying it should provide you with the information you need to determine if it is suitable for use in your environment.

 

Ed Briggs

Principal Program Manager (PACE)

Microsoft Windows Server and Cloud Division