WAN Op = NoOp : Part two : The Glitch

This is the second part on a series of posts on the background and current viability of WAN Optimization. Read the first post here.

The glitch:

In the IBM mainframe dominated days of the 1970s, before Local Area Networks and TCP/IP corporate networks, mainframes would fetch data from large standalone disks called Direct Access Storage Devices or DASD.  The communications between the mainframe and the DASD used a Fixed Block Architecture (FBA) that could access data in up to 64K byte blocks per transaction across an interface called Channels that were very similar to a high speed bus.  Each block access would require a round trip, which we refer to as a turn, between the mainframe and the DASD.  The bigger the file the more turns were required. The Channel interface is relatively low latency so the turn times per block access were not significant.

IBM enabled PCs to access mainframes across local area networks by creating software on DOS called Server Message Block (SMB).  SMB, now commonly referred to as CIFS, retained the same restriction of 64K block read/writes for compatibility with the legacy mainframe’s software.    Later, when Microsoft and IBM jointly worked on the initial versions OS/2, they adopted the same SMB approach to retain continued compatibility. Eventually, SMB/CIFS was supported on Windows as well and it was ported to run over TCP/IP from its prior non IP protocols. All along the same one turn per 64K byte block access restriction was quietly sitting in the software.

In addition, Microsoft defaulted its TCP/IP implementation to only allow a maximum of 64K bytes of data to be in transit (TCP window size) between computers which resulted yet another turn between client and server for data transfers.  Since memory in those days was tight and the TCP window size required dedicated memory on both client and the server for each connection it was a logical choice.  These were the days before the World Wide Web so the major applications were terminals emulation software and file/print servers’ access.   Since the 64K block size restriction existed anyway for server access because of SMB, allocating more memory to TCP window size would have wasted the memory for most typical uses.  It was possible to change this default setting, but for most users it was too complex to edit the Windows configuration registry for this obscure setting.  Even if this was done, the other computer would require the same setting or both computers would downgrade to the smaller window for all interactions.  Since Microsoft was pervasive, the probability of getting much benefit from setting the registry was not really worth the time.

Between the window size and SMB/CIFS issue, Microsoft use of TCP was just not WAN friendly.  Frankly, it was downright WAN hostile.

Few at the time complained since there were other more significant bottlenecks in the data path.  The network and the servers themselves were typically local to the location and pretty slow, but that started to change.  Memory became cheap, local area networks became fast and files got bigger.  Servers migrated from being local on premise to being remote for many branch offices.  The distance of the networks introduced greater latencies and bandwidth bottlenecks.  Each access to a file block required a turn between the client and the server.  Same again for each TCP window transfer used for non SMB/CIFS application such as browsing the web. The larger the data access, the more turns, which meant more latency adding up for each round trip.  So although the LAN and WAN networks were evolving the benefits were not showing since the block and window size issue became the new bottleneck.  Transfers were taking too long and bandwidth was being wasted because the pipe was not able to be filled.

WAN Optimization Appliances jumped in and solved the problem by locally terminating the SMB/CIFS and TCP sessions at each location.  They transformed the WAN hostile protocol into a much more WAN friendly protocol. This prevented the clients or servers from seeing the turn latency since their turns ended at the local WAN optimization appliance and as a result did not have to traverse the high latency WAN.  The latency still existed across the WAN, but the WAN Optimization appliances were able to better fill the pipe since they are not restricted to a single block read/write between the peer appliances.  Transfers took less time and the bandwidth pipe was being better filled. There exists other network applications beyond SMB/CIFS, such as MS Outlook’s Message Application Programming interface (MAPI), HTTP and FTP that WAN Optimization has done similar turn time reduction with but the key money application was with SMB.

The glitch continued to exist in Microsoft servers though Windows XP.  With Microsoft Vista, the TCP window default was increased dramatically and the new SMB 2.0 was included. SMB 2.0 allowed for multiple blocks per turn.   In other words, Microsoft quietly just fixed the glitch. Immediately a substantial part of the value WAN Optimization appliances provided by locally terminating the SMB/CIFS and TCP evaporated.  SMB/CIFS still is a block protocol and Microsoft has more to do here over time, but the SMB/CIFS glitch had seen its best days behind it.  Similar to SMB/CIFS, many of the other protocols that WAN Op local termination is useful with are also improving to be much more WAN friendly. Microsoft with it 365 cloud based solution will endeavor to continue to improve its applications suites’ WAN friendliness.

Categories: Software Defined WAN (SD-WAN)