I believe that the reason that the PowerPC on Windows 7 is much faster on localhost loopback throughput, is because it can use NetDMA.
The Microsoft article NetDMA (Windows Drivers) defines NetDMA as :
The NetDMA interface provides a generic interface for memory-to-memory direct memory access (DMA) transfers. Although the interface is designed to copy packets that are received from high-performance network interface cards (NICs), you can also use the interface for other applications. There is no direct relationship between NetDMA and NDIS.
When using localhost loopback, it stands to reason that memory copy operations are the main factor of throughput, as frames are copied from the source-application memory, then between TCP layers and finally to the memory of the target-application.
NetDMA can have an impact, since it allows network adapters to transfer data directly to your application, perhaps this way reducing the number of memory copies even for the trivial loopback adapter.
Enabling NetDMA can be done in two ways :
- Enter
netsh int tcp set global netdma=enabled
in Command Prompt (cmd) that is run as Administrator, then reboot. - Regedit to
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
and create a new DWORD item namedEnableTCPA
with the value 1, then reboot.
However, there are two prerequisites to enabling NetDMA :
- The Microsoft article Enabling NetDMA has this :
NetDMA must be enabled in the BIOS before performing this procedure. NetDMA support is often labeled IOAT support.
- The Microsoft article NetDMA (Windows Drivers) has this note :
The NetDMA interface is not supported in Windows 8 and later.
Putting both these requirements together, I can hazard the guess that, as NetDMA is a BIOS function, it was not implemented in UEFI which is used in Windows 8/2012.
Microsoft had therefore to improve localhost loopback throughput in another way, especially for using in Hyper-V, and had therefore created in Windows 8/2012 the Fast TCP Loopback, defined as :
TCP Loopback Fast Path is a new feature introduced in Windows Server 2012 and Windows 8. If you use the TCP loopback interface for inter-process communications (IPC), you may be interested in the improved performance, improved predictability, and reduced latency the TCP Loopback Fast Path can provide. This feature preserves TCP socket semantics and platform capabilities including the Windows Filtering Platform (WFP), and works on both non-virtualized and virtualized operating system instances.
The TCP loopback interface provides a simple local IPC mechanism for processes on the same operating system instance, and it can easily be switched to a remote IPC mechanism by simply changing the destination IP address.
Unfortunately, Fast TCP Loopback is not transparent, requiring applications to issue a WSAIoctl system call on the sockets for both sender and receiver, therefore not being backward-compatible with existing bandwidth-measuring applications such as PsPing and PCATTCP.
In my own tests on Windows 7, I have not fathomed all the mysteries surrounding NetDMA, but I have managed to briefly turn it on, with the immediate benefit of doubling my bandwidth as measured by PsPing. But as NetDMA did not survive a reboot on that computer, I do not recommend depending on it for throughput even on computers that theoretically support it.