Friday, October 02, 2009

x86 Rises, Part 3: x86 Grows in Performance and Scalability

Several years ago I drafted a white paper I called "x86 Everywhere". I started it in the fall of 2004, let it sit, and updated it in April 2005. It remains unfinished, but with the release today of Intel's Nehalem processor, I took a look at it again. Here it is:

Three trends could allow what I call "x86 Everywhere" to happen.

The second trend is the prospect of several vendors offering scalable 64-bit x86 systems large enough to meet most customer's workloads.

The desktop megahertz wars of the late 1990s and early 2000s between Intel and AMD drove x86 performance at a rate exceeding Moore's law. This directly benefited Intel x86 server performance, making x86 servers available for larger workloads. At the same time, enterprise applications were being rearchitected to multi-tier web-based applications, requiring deployment of additional web and application servers. RISC still had advantages over x86 in this environment, as running Microsoft Windows on x86 servers required the purchase of client access licenses (CALs) for each discreet user. This was extremely expensive for emerging self-service web-based ERP and CRM applications, but it was impossible for B2C ecommerce applications. Enter Linux. In the late 1990s, Linux became established as an entry server operating system, which unlike Microsoft Windows, did not require the purchase of client access licenses (CALs) for each user. Linux quickly became established as the web server OS of choice. The result was a positive feeback loop. Application server ISVs aggressively ported their J2EE appservers to Linux, and improved their clustering so their appservers would work well on clusters of low-cost entry x86 servers. ERP vendors quickly followed porting their application tier to Linux on x86. The low purchase cost of the Linux/x86 architecture was driven home by the dot-com bust and worldwide recession of the early 2000s.

At the same time as the desktop megahertz war, the smaller x86 chip manufacturers each tried to establish their products into a niche area. Via acquired Cyrix and focused in the “system on a chip” market for very low-cost desktops. Transmeta focused on very low power consumption chips for low-end laptops and embedded markets. AMD, long a player in the budget desktop market, decided to focus on the server market by designing an x86 architecture, called “Hammer” which addressed the weaknesses of Intel's existing Xeon x86 server processor, primarily the latter's lack of 64-bit memory addressing. The release of Hammer, branded as Opteron, forced Intel to follow suit with its 64-bit x86 technology, long rumored under the codename “Yamhill”, and branded as EM64T technology.

The emergence of a truly competitive x86 server processor marketplace is driving new innovation in x86 processors, as AMD tries to stay one step ahead of Intel, and as Intel tries to leapfrog AMD. Dual-core processors, improved power management, virtualization technologies, and other improvements are announced on a regular basis.

After the emergence of 64-bit x86 technology in 2004, in 2005 dual-core x86 processors were released. These two technologies have strong synergies. 64-bit addressing increases the size of the workload which can run on an x86 server, and dual-core processors increases the size of server which can be built with x86 processors.

With dual-core 64-bit x86 processors now shipping, and four-core 64-bit x86 processors possible in two to three years, four to eight socket servers may provide the capacity required for most customers' workloads. Beyond that, workloads requiring large, single system image servers (HPTC, large data warehouses, etc.), may be relegated to a niche market. Ordinarily, such a niche market could still justify large, scalable RISC/UNIX systems. But the market for large, single system image servers is not limited to RISC/UNIX. For some time, the scalable x86 market has been a targeted by some system vendors.

In the mid-1990s, Sequent, with its NUMA-Q system, was one of the first vendors of large, scalable x86 systems. Data General offered a very similar NUMA system during the same time period. Both of these systems provided very limited performance because of their architecture. Data General's system failed to gain significant market share, and was end of lifed not long after EMC acquired Data General. Sequent targeted decision support and data warehouse workloads with its NUMA-Q system and had some success. Sequent was acquired by IBM, and IBM released a more advanced x86 NUMA system which offered greater node to node bandwidth and large L4 caches to better manage inter-node latencies. In 2005 IBM released its third generation of x86 NUMA systems.

In the late 1990s, Unisys built a large, scalable SMP x86 system using a technology it calls cellular multiprocessing, or CMP. This technology was derived from Unisys' Clearpath mainframe systems. In fact, Unisys offers a version of its x86 CMP system which runs the Clearpath mainframe OS ported to the x86 architecture. Despite the mainframe heritage and mainframe variant of Unisys' x86 CMP systems, sales have not been strong. These systems were limited by the lack of scalability of Intel's x86 architecture, as well as the x86's lack of 64-bit memory addressing. Unisys now offers a second-generation CMP design, with simpler eight socket entry systems as well as large 32 socket systems.

Both IBM and Unisys offer 32-socket Intel Xeon systems, but both of these systems continue to be limited by the inherent lack of scalability in Intel's Xeon architecture.

The limits of x86 scalability changed with AMD's Opteron. Opteron is the first scalable x86 processor architecture. By virtue of its high-performance, coherent Hypertransport MP interconnect, Opteron is scalable in SMP design. Because of its 64-bit memory addressing, Opteron is scalable in memory capacity, with memory addressing balanced with processor performance. Four to eight socket x86 servers are no longer crippled with saturated SMP busses or inadequate memory capacity. Intel has followed suit with 64-bit memory addressing for Xeon, and a unique dual front side bus (FSB). But the dual FSB, while providing temporary relief to Xeon's saturated SMP bus, is actually designed for the soon to be released dual-core Xeon processors. Dual-core Xeons will likely once again saturate the SMP busses. Better SMP interconnects will be required for efficient scaling of Xeon systems to four sockets and above.

Over the next several years, x86 systems with eight-sockets and greater will become more prevalent. Newisys, a division of SCI-Samna, a major OEM manufacturer of AMD Opteron systems, is planning a 32-way Opteron chipset called Horus. Intel has promised future Itanium and Xeon processors will support a common chipset, allowing a next generation scalable Itanium server architecture to also serve as a scalable Xeon platform. This means traditional large scalable Itanium system vendors, HP, SGI, and NEC could enter the large scalable Xeon system market. The other possibilities are a higher-end AMD Opteron chip with more Hypertransport links allowing more scalable glueless MP topologies, similar to Compaq Alpha EV7's architecture, or the possibility of Intel introducing a scalable glueless chip to chip interconnect. It is important to note, Intel has access to the design of the EV7 interconnect and now employees the developers of the EV7's interconnect through an agreement with Compaq before Compaq was acquired by HP. Regardless, increased SMP scalability of x86 servers seems likely in the next few years.

Related Posts:

x86 Rises, Part 2: Decreasing Value of Big UNIX

x86 Rises, Part 1: The Background